Data Preparation

A dataset for estimating (7) is obtained from simulation which contains a) five factors corresponding to simulation parameters, b) prediction methods, c) number of components, d) replications and e) prediction error for four responses. The prediction error is computed using predictor components ranging from 0 to 10 for every 50 replicates as,

\[\begin{equation*} \left(\widehat{\mathcal{PE_\circ}}\right)_{ijklr} = \frac{1}{\sigma_{y_{ij}\mid x}^2}\left[ \left(\boldsymbol{\beta}_{ij} - \hat{\boldsymbol{\beta}}_{ijklr}\right)^t \left(\boldsymbol{\Sigma}_{xx}\right)_{i} \left(\boldsymbol{\beta}_{ij} - \hat{\boldsymbol{\beta}}_{ijklr}\right) \right] + 1 \end{equation*}\]

Thus there are 32 (designs) \(\times\) 5 (methods) \(\times\) 11 (number of components) \(\times\) 50 (replications), i.e. 88000 observations corresponding to the response variables from Y1 to Y4.

Since our discussions focus on the average minimum prediction error that a method can obtain and the average number of components they use to get the minimum prediction error in each replicates, the dataset discussed above is summarized as constructing the following two smaller datasets. Let us call them Error Dataset and Component Dataset.

Error Dataset:: For each prediction method, design and response, an average prediction error is computed over all replicates for each component. Next, a component that gives the minimum of this average prediction error is selected, i.e., \[\begin{equation} l_\circ = \operatorname*{argmin}_{l}\left[\frac{1}{50}\sum_{i=1}^{50}{\left(\mathcal{PE}_\circ\right)_{ijklr}}\right] \tag{9} \end{equation}\]; Using the component \(l_\circ\), a dataset of \(\left(\mathcal{PE}_\circ\right)_{ijkl_\circ r}\) is used as the Error Dataset. Let \(\mathbf{u}_{(8000 \times 4)} = (u_j)\) for \(j = 1, \ldots 4\) be the outcome variables measuring the prediction error corresponding to the response number \(j\) in the context of this dataset.
Component Dataset:: The number of components that gives the minimum prediction error in each replication is referred to as the Component Dataset, i.e., \[\begin{equation} l_{\circ} = \operatorname*{argmin}_{l}\left[\mathcal{PE}_{ijklr}\right] \tag{10} \end{equation}\] Here \(l_\circ\) is the number of components that gives minimum prediction error \(\left(\mathcal{PE}_\circ\right)_{ijklr}\) for design \(i\), response \(j\), method \(k\) and replicate \(r\). Let \(\mathbf{v}_{(8000 \times 4)} = (v_j)\) for \(j = 1, \ldots 4\) be the outcome variables measuring the number of components used for minimum prediction error corresponding to the response \(j\) in the context of this dataset.