Exploration

This section explores the variation in the error dataset and the component dataset for which we have used Principal Component Analysis (PCA). Let \(\mathbf{t}_u\) and \(\mathbf{t}_v\) be the principal component score sets corresponding to PCA run on the \(\mathbf{u}\) and \(\mathbf{v}\) matrices respectively. The scores density in Figure-5 corresponds to the first principal component of \(\mathbf{u}\), i.e. the first column of \(\mathbf{t}_u\).

Since higher prediction errors correspond to high scores, the plot shows that the PCR, PLS1 and PLS2 methods are influenced by the two levels of the position of relevant predictor components. When the relevant predictors are at positions 5, 6, 7, 8, the eigenvalues corresponding to them are relatively smaller. This also suggests that PCR, PLS1 and PLS2 depend greatly on the position of the relevant components, and the variation of these components affects their prediction performance. However, the envelope methods appeared to be less influenced by relpos in this regard.

Scores density corresponding to first principal component of error dataset (\(\mathbf{u}\)) subdivided by methods, gamma and eta and grouped by relpos.

Figure 5: Scores density corresponding to first principal component of error dataset (\(\mathbf{u}\)) subdivided by methods, gamma and eta and grouped by relpos.

In addition, the plot also shows that the effect of gamma, i.e., the level of multicollinearity, has a lesser effect when the relevant predictors are at positions 1, 2, 3, 4. This indicates that the methods are somewhat robust for handling collinear predictors. Nevertheless, when the relevant predictors are at positions 5, 6, 7, 8, high multicollinearity results in a small variance of these relevant components and consequently yields poor prediction. This is in accordance with the findings of Helland and Almøy (1994).

Furthermore, the density curves for PCR, PLS1 and PLS2 are similar for different levels of eta, i.e., the factor controlling the correlation between responses. However, the envelope models have been shown to have distinct interactions between the positions of relevant components (relpos) and eta. Here higher levels of eta have yielded higher scores and clear separation between two levels of relpos. In the case of high multicollinearity, envelope methods have resulted in some large outliers indicating that in some cases that the methods can result in giving an unexpected prediction.

Score density corresponding to first principal component of component dataset (\(\mathbf{v}\)) subdivided by methods, gamma and eta and grouped by relpos.

Figure 6: Score density corresponding to first principal component of component dataset (\(\mathbf{v}\)) subdivided by methods, gamma and eta and grouped by relpos.

In Figure 6, the higher scores suggest that methods have used a larger number of components to give minimum prediction error. The plot also shows that the relevant predictor components at 5, 6, 7, 8 give larger prediction errors than those in positions 1, 2, 3, 4. The pattern is more distinct in large multicollinearity cases and PCR and PLS methods. Both the envelope methods have shown equally enhanced performance at both levels of relpos and gamma. However, for data with low multicollinearity (\(\gamma = 0.2\)), the envelope methods have used a lesser number of components on average than in the high multicollinearity cases to achieve minimum prediction error.