Simulation Model

Consider a model where the response vector \((\mathbf{y})\) with \(m\) elements and predictor vector \((\mathbf{x})\) with \(p\) elements follow a multivariate normal distribution as follows,

\[\begin{equation} \begin{bmatrix} \mathbf{y} \\ \mathbf{x} \end{bmatrix} \sim \mathcal{N} \left( \begin{bmatrix} \boldsymbol{\mu}_y \\ \boldsymbol{\mu}_x \end{bmatrix}, \begin{bmatrix} \boldsymbol{\Sigma}_{yy} & \boldsymbol{\Sigma}_{yx} \\ \boldsymbol{\Sigma}_{xy} & \boldsymbol{\Sigma}_{xx} \end{bmatrix} \right) \tag{1} \end{equation}\]

where, \(\boldsymbol{\Sigma}_{xx}\) and \(\boldsymbol{\Sigma}_{yy}\) are the variance-covariance matrices of \(\mathbf{x}\) and \(\mathbf{y}\), respectively, \(\boldsymbol{\Sigma}_{xy}\) is the covariance between \(\mathbf{x}\) and \(\mathbf{y}\) and \(\boldsymbol{\mu}_x\) and \(\boldsymbol{\mu}_y\) are mean vectors of \(\mathbf{x}\) and \(\mathbf{y}\), respectively. A linear model based on (1) is,

\[\begin{equation} \mathbf{y} = \boldsymbol{\mu}_y + \boldsymbol{\beta}^t(\mathbf{x} - \boldsymbol{\mu_x}) + \boldsymbol{\epsilon} \tag{2} \end{equation}\] where, \(\underset{m\times p}{\boldsymbol{\beta}^t}\) is a matrix of regression coefficients and \(\boldsymbol{\epsilon}\) is an error term such that \(\boldsymbol{\epsilon} \sim \mathcal{N}(0, \boldsymbol{\Sigma}_{y|x})\). Here, \(\boldsymbol{\beta}^t = \mathbf{\Sigma}_{yx}\mathbf{\Sigma}_{xx}^{-1}\) and \(\boldsymbol{\Sigma}_{y|x} = \boldsymbol{\Sigma}_{yy} - \boldsymbol{\Sigma}_{yx}\boldsymbol{\Sigma}_{xx}^{-1}\boldsymbol{\Sigma}_{xy}\)

In a model like (2), we assume that the variation in response \(\mathbf{y}\) is partly explained by the predictor \(\mathbf{x}\). However, in many situations, only a subspace of the predictor space is relevant for the variation in the response \(\mathbf{y}\). This space can be referred to as the relevant space of \(\mathbf{x}\) and the rest as irrelevant space. In a similar way, for a certain model, we can assume that a subspace in the response space exists and contains the information that the relevant space in predictor can explain (Figure-1). Cook, Li, and Chiaromonte (2010) and Cook and Zhang (2015) have referred to the relevant space as material space and the irrelevant space as immaterial space.

Relevant space in a regression model

Figure 1: Relevant space in a regression model

With an orthogonal transformation of \(\mathbf{y}\) and \(\mathbf{x}\) to latent variables \(\mathbf{w}\) and \(\mathbf{z}\), respectively, by \(\mathbf{w=Qy}\) and \(\mathbf{z = Rx}\), where \(\mathbf{Q}\) and \(\mathbf{R}\) are orthogonal rotation matrices, an equivalent model to (1) in terms of the latent variables can be written as,

\[\begin{equation} \begin{bmatrix} \mathbf{w} \\ \mathbf{z} \end{bmatrix} \sim \mathcal{N} \left( \begin{bmatrix} \boldsymbol{\mu}_w \\ \boldsymbol{\mu}_z \end{bmatrix}, \begin{bmatrix} \boldsymbol{\Sigma}_{ww} & \boldsymbol{\Sigma}_{wz} \\ \boldsymbol{\Sigma}_{zw} & \boldsymbol{\Sigma}_{zz} \end{bmatrix} \right) \tag{3} \end{equation}\]

where, \(\boldsymbol{\Sigma}_{ww}\) and \(\boldsymbol{\Sigma}_{zz}\) are the variance-covariance matrices of \(\mathbf{w}\) and \(\mathbf{z}\), respectively. \(\boldsymbol{\Sigma}_{zw}\) is the covariance between \(\mathbf{z}\) and \(\mathbf{w}\). \(\boldsymbol{\mu}_w\) and \(\boldsymbol{\mu}_z\) are the mean vector of \(\mathbf{z}\) and \(\mathbf{w}\) respectively.

Here, the elements of \(\mathbf{w}\) and \(\mathbf{z}\) are the principal components of responses and predictors, which will respectively be referred to respectively as “response components” and “predictor components”. The column vectors of respective rotation matrices \(\mathbf{Q}\) and \(\mathbf{R}\) are the eigenvectors corresponding to these principal components. We can write a linear model based on (3) as,

\[\begin{equation} \mathbf{w} = \boldsymbol{\mu}_w + \boldsymbol{\alpha}^t(\mathbf{z} - \boldsymbol{\mu_z}) + \boldsymbol{\tau} \tag{4} \end{equation}\] where, \(\underset{m\times p}{\boldsymbol{\alpha}^t}\) is a matrix of regression coefficients and \(\boldsymbol{\tau}\) is an error term such that \(\boldsymbol{\tau} \sim \mathcal{N}(0, \boldsymbol{\Sigma}_{w|z})\).

Following the concept of relevant space, a subset of predictor components can be imagined to span the predictor space. These components can be regarded as relevant predictor components. Naes and Martens (1985) introduced the concept of relevant components which was explored further by Helland (1990), Næs and Helland (1993), Helland and Almøy (1994) and Helland (2000). The corresponding eigenvectors were referred to as relevant eigenvectors. A similar logic is introduced by Cook, Li, and Chiaromonte (2010) and later by Cook, Helland, and Su (2013) as an envelope which is the space spanned by the relevant eigenvectors (Cook 2018, 101).

In addition, various simulation studies have been performed with the model based on the concept of relevant subspace. A simulation study by Almøy (1996) has used a single response simulation model based on reduced regression and has compared some contemporary multivariate estimators. In recent years Helland, Saebø, and Tjelmeland (2012), Sæbø, Almøy, and Helland (2015), Helland et al. (2018) and Rimal, Almøy, and Sæbø (2018) implemented similar simulation examples similar to those we are discussing in this study. This paper, however, presents an elaborate comparison of the prediction using multi-response simulated linear model data. The properties of the simulated data are varied through different levels of simulation-parameters based on an experimental design. Rimal, Almøy, and Sæbø (2018) provide a detailed discussion of the simulation model that we have adopted here. The following section presents the estimators being compared in more detail.