Regression/classification using lpls and cross - validation with potential jackknife variable selection and optional refitting of model to selected variables.

lplsReg.cv(X1, X2, X3, npc.sel = 1:5, alphavek = seq(0, 1, by = 0.2),
  npc.ref = NULL, testlevel = 0.05, dreduce = F, colcent = c(T, T),
  rowcent = c(F, F), grandcent = c(F, F), folds, err.eval.type = "rate",
  cvreport = TRUE)

Arguments

X1

A response vector or matrix for regression. For classification this should be either a factor or a dummy coded 0/1 matrix with one column per group.

X2

Predictor matrix of size (n x p).

X3

Background information matrix of size (m x p)

npc.sel

A vector of component numbers to be tested in the initial LPLS model based on all variables in the inner CV - loop. Default is 1:5.

alphavek

A vector of alpha - values to be tested in the initial LPLS model based on all variables in the inner CV - loop. Default is a single value 0. See lplsReg for details on alpha.

npc.ref

A vector of component numbers to be tested in the re - fitted LPLS model based on selected variables in the inner CV - loop. Default is NULL which gives no refitting.

testlevel

Testlevel for the jackknife testing of the variables. Deafult is 0.05

dreduce

Logical. Should variable selection on the columns of X3 (parallel to X2) also be applied to the rows of X3? This is logical only if X3 is a (p x p) matrix expressing some dependency or simlarity between the variables in X2, hence, in cases where both the rows and columns of X3 relate to the variables of X2.

colcent

Logical vector of length referring to X2 and X3. Should column centering be performed?

rowcent

Logical vector of length referring to X2 and X3. Should row centering be performed?

grandcent

Logical vector of length referring to X2 and X3. Should overall centering be performed?

folds

A list of length k defining the sample numbers in each fold of k - fold cross - validation. May use balanced.folds to make the segment list up front of the analysis.

err.eval.type

The evaluation criterion for prediction/classification performance. Either "rate" (total error rate), "rmsep" (root mean square error), or "rmsep2" a modified rmsep where only predictions between 0 and 1 contribute to the error. Predictions outside this range are considered as perfect predictions.

cvreport

Logical. Should an iteration report be printed on screen during the computations?

Value

X1hatmat

An array holding predicted X1 - values for each number of components (initial model and refitted) and alpha values.

folds

The CV - segments used.

coefs.all

An array holding all estimated regression coefficients for all components (initial model) and alphavalues.

sdcoef

The standard deviations of the regressions coefficients.

trueclass

For clasification:True class of sample

pval

The p - values from jackknife testing of each regression coefficient for all levels of components and alpha.

apost

For clasification:The posterior probability of each sample to belong to each class in case of classification.

class

For clasification:The predicted class of each sample for all levels of components and alpha.

err

The total error (as defined by argument err.eval.type for all level of components and alpha.

sigvars

An array of logicals defining wether a variable is found to be significant or not. Significance is given for all levels of components and alpha,

Examples

data(BCdata) segs <- balanced.folds(BCdata$Y, 5) fit.cv <- lplsReg.cv(factor(BCdata$Y), BCdata$X, BCdata$Z, folds = segs)
#> Segment 1 of 5 completed #> Segment 2 of 5 completed #> Segment 3 of 5 completed #> Segment 4 of 5 completed #> Segment 5 of 5 completed