============ Description: ============ This is the driver function to perform Functional Linear Regression (FLM), which includes functional regression diagnostics and a boostrap test for the overall significance of the functional regression, where the predictor is a function X(t_x) and the response can be either a function Y(t_y) or a scalar. Reference: [9] This function includes the following steps: 1) FPCA using the PACE method for X(t_x) and/or Y(t_y). 2) Computation of the regression parameter function \beta. 3) Prediction of response functions, given observations of predictor trajectories. 4) Computation of functional R-square and Quasi R-square (a simple predictive measure based on the concept of variance explained by the regression). 5) Functional regression diagnostics (described in Chiou, J.M., M\"uller, H.G. (2007). Diagnostics for functional regression via residual processes. Computational Statistics and Data Analysis 51, 4849-4863). 6) Boostrap test for regression effect, for the null hypothesis that the functional R-square equals zero (testing for overall functional regression effect, as described in M\"uller, H.G., Chiou, J.M., Leng, X. (2008). Inferring gene expression dynamics via functional regression analysis. BMC Bioinformatics 9:60, See Reference [13]). ======== Usage: ======== [res xx yy h d rej pv cb grid_x grid_y] = FPCdiag(x, t_x, y, t_y,... param_X, param_Y, FIT, K_x, K_y, isNewSub, bootstrap, nsim, alpha) ======= Input: ======= x : 1*n cell array for predictor function x, where x{i} is the row vector of measurements for the ith subject, i=1,...,n. It contains those subjects that are used for prediction, when "isNewSub" is defined as either a vector of 0's and 1's or as a positive integer. Then, nn is the number of subjects to be used for estimation and n-nn is the number of subjects to be used for prediction. When "isNewSub" is set to [], all n subjects are used for estimation and no prediction will be calculated. See "isNewSub". t_x : 1*n cell array, t_x{i} is the row vector of time points for the ith subject at which the corresponding measurements x{i} are taken, i=1,...,n. It contains subjects that are used for prediction. See above for two different cases of "isNewSub" and the definition of "isNewSub" for more details. y : i) When no prediction is requested, that is, isNewSub = [] or isNewSub = 0: 1*n cell array for the case of a response function y, where y{i} is the response row vector for the ith subject, i = 1,...,n or a 1*n vector in the case of a scalar response y, where y(i) is the response value for the ith subject, i = 1,...,n. ii) When prediction is requested, that is isNewSub is either a vector of 0's and 1's or a positive integer, then it is a cell array of dim 1*nn or a vector of dim 1*nn, where nn is the number of subjects used for estimation. iii) When prediction is requested and y is of dim 1*n, it will be truncated to dim 1*nn, according to the isNewSub definition; see "isNewSub" for more details. t_y : 1*n cell array, t_y{i} is the row vector of time points for the ith subject on which corresponding measurements y{i} are taken, i = 1,...,n, or [] if y is a scalar response. See above for the two different options for "isNewSub", and the definition of "isNewSub" for more details. param_X: an object that is returned by setOptions() and defines the input arguments for the FPCA() of the X (predictor) functions (for default, set param_X = []). The default method of choosing the number of principal components is 'FVE'. For the other defaults, see setOptions(). param_Y: an object that is returned by setOptions(), and defines the input arguments for the FPCA() of the Y (response) functions (for default, set param_Y = []). The default method of choosing the number of principal components for the response trajectories is 'BIC1'. See setOptions() for more details. When y is a scalar, this object will be ignored. K_x: positive integer, normally not needed. Number of principal components of predictor x used for functional regression, must be smaller than or equal to the maximum number of principal components available from FPCA. Default K_x = [], then K_x is the number of components selected by FPCA.m for predictor functions. K_y: positive integer, normally not needed. This has no effect in the the case of a scalar response. For functional response, it is the number of principal components for the response y used for functional regression, and it must be smaller than or equal to the maximum number of principal components available from FPCA. Default K_y = []. FIT: an indicator with values 0 or -1: FIT = 0: Fitting a functional linear regression (default) through decomposition into simple linear regressions through the origin between the FPC scores of Y (responses) and those of X (predictors) (denoted as 'YPC' in the following). FIT = -1: Prediction with a functional linear regression with a functional response Y, specifically adapted to the case of sparsely sampled response trajectories and using a linear regression of observed responses Y on FPC scores of X, applying weighted least squares with the covariance surface of Y as the weight matrix (denoted as 'YCO' in the following). Note: The option FIT = -1 aims at optimal prediction for sparsely observed functional responses (i.e. when Y is sparse and functional), and does not target the regression parameter surface \beta. isNewSub: i) 1*n vector of 0s or 1s, with elements: 1 if the data for the corresponding subject, i.e., x(isNewSub == 1), t_x(isNewSub==1), t_y(isNewSub == 1) are used for prediction; There will be n-nn subjects with isNewSub = 1. 0 if the data for the corresponding subject, i.e., x(isNewSub == 0), t_x(isNewSub == 0), y and t_y(isNewSub == 0) are used for estimation. There will be nn subjects with isNewSub = 0 Note: This option is convenient to compute leave-one-out prediction if desired. ii) If this is a positive integer, say 5, then the last 5 subjects from the arrays x, t_x, t_y are used for prediction. In other words, when choosing this option, append the new subjects to the end of x, t_x and t_y. Then the subjects with data in the first nn rows of arrays x, t_x,t_y and all y (of length nn) are used for estimation and the subjects with data in the remaining rows of x,t_x and t_y (of length n-nn) are used for prediction. Example: isNewSub = 5; n = length(x) x(1:(n-5)), t_x(1:(n-5)), y, t_y(1:(n-5)) are used for estimation, while x((n-4):n), t_x((n-4):n) and t_y((n-4):n) are used for prediction This option is convenient to perform prediction for a large number of new subjects. iii) set isNewSub = [] for the default value, i.e., no prediction. Note: o nn > n - nn, and should not be too small. In other words, the number of subjects used for estimation should be larger than the number of subjects used for prediction. o When no prediction is requested, x, t_x, y, t_y all will be of length of n = nn. o When prediction is requested, x, t_x, t_y will be of length of n, which includes nn subjects for estimation and n-nn subjects for prediction. Here y is always of length nn (< n). o When prediction is requested and y is of length n (> nn), a warning message will be issued and y will be truncated to length nn. This assumes that only nn among the n available data for y are used for estimation and the remaining n-nn data will be ignored. bootstrap: Bootstrap method. If bootstrap = 'pre', confidence bands for beta and R2test are based on bootstrapping by resampling from the original subject data, which is more accurate but computationally intensive and thus very slow. The default [] bootstrap is based on resampling from the estimated functional principal scores. The variability coming from both measurement errors and errors in estimating mean and eigenfunctions is neglected in this variant, which is therefore less accurate, but much faster. nsim: size of the bootstrap sample bootstrap = 'pre' ==> default is nsim = 200 bootstrap = 'post' ==> default is nsim = 500 alpha: level of hypothesis tests and confidence bands, optional; pointwise 100(1-alpha)% confidence bands will be created for the regression parameter function/surface. The null hypothesis of no regression relation is tested at level alpha, based on bootstrapping the functonal R-square under the null hypothesis. The default value for alpha is 0.05. Details: i) There are no default values for the first 3 arguments, that is, these data must be provided as part of the input arguments. Note t_y can be set to [] when y is a scalar response. ii) Any unused or optional arguments can be set to "[]" to request default values; iii) FPCA() calls PCA(); setOptions() sets the input arguments for PCA() and the returned object contains all values returned by PCA(); iv) Names of objects can be retrieved through the names() function, i.e., names(xx), names(yy), names(res) and their actual values can be retrieved through the getVal() function, example: getVal(res, 'BETA'), getVal(res, 'newy') etc. v) When isNewSub is set to be [], no prediction will be performed, and the number of subjects in x is the same as the number of subjects in y; when isNewSub is either a vector of 0's and 1's or a positive integer, then the number of subjects in x is larger than the number of subjects of y, since the former contains new subjects for which no response data are used and prediction of responses is requested. Any such data will not be used in the model estimation process. By default, no prediction will be performed. ======= Output: ======= res: an aggregated object that contains BETA, b, newx, newtx, newy, new_ty, r2, cov_xy, sigma_xy. 1) BETA: an object that contains beta, grid_x and grid_y. (See Reference [3]) i) beta: a ngrid_x*ngrid_y matrix; the estimated regression parameter surface or function, evaluated at given grid points. ii) grid_x : 1*ngrid_x vector, contains ngrid distinct time points for the x function (predictor function) where ngrid is defined in setOptions(). iii) grid_y : 1*ngrid_y vector, contains ngrid distinct time points for the y function (response function), where ngrid is defined in setOptions() when y is a scalar, it is []. 2) b: a K_x*K_y matrix of estimated coefficients for the regression parameter function in the eigenbasis representation, where K_x is the number of FPC for x and K_y is the number of FPC for y; alternatively interpreted as regression slope coefficients of the response FPC scores on predictor FPC scores. 3) newx: 1*numNewSub cell array contains measurements for new x (predictor) functions. 4)new_tx: 1*numNewSub cell array containing time points corresponding to newx. 5) newy: 1*numNewSub cell array containing predicted measurements for corresponding newx. 6) new_ty: 1*numNewSub cell array containing time points corresponding to newy. 7) r2: functional R-square. (See Reference [3]) 8) Q: Quasi R-square; a measure of the fraction of variance of the responses that is explained by the regression: (See Reference [20] Eq. (20)). If Y is a scalar, Q = 1-sum((Y_i-Yhat_i)^2)/sum((Y_i-mean(Y))^2). If Y is a function, Q=1-sum((Y_i-Yhat_i)'*(Y_i-Yhat_i)/n_i)/ sum((Y_i-mean(Y))'*(Y_i-mean(Y))/n_i). xx: an aggregated object that contains the returned values from FPCA(x,t_x,param_X). See PCA() or pcaHELP.txt for more details. Use getVal() to retrieve the individual values. yy: an aggregated object that contains the returned values from FPCA(y,t_y,param_Y) or returns [] if y is a scalar response (see PCA() or pcaHELP.txt for more details). Use getVal() to retrieve the individual values. h: a n*n matrix; estimated functional hat matrix. (See Reference [9]) d: a 1*n matrix; estimated functional Cook's distances. rej: 1=reject the null hypothesis of no regression relation; 0 = do not reject (depending on alpha). pv: p-value for the null hypothesis of no regression relation. cb: a ngrid_x*ngrid_y*2 matrix; cb(:,:,1) is the estimated lower confidence band at given grid points; cb(:,:,2) is the estimated upper confidence band at given grid points; grid_x: 1*ngrid_x vector, contains ngrid distinct time points for the x (predictor) function, where ngrid is defined in setOptions(). grid_y: 1*ngrid_y vector, contains ngrid distinct time points for the y (response) function, where ngrid is defined in setOptions(), or returns [] if y is a scalar. See o example0.m for an example of sparse irregular data, where both predictors and responses are functional o example2.m for an example of regular data, where both predictors and responses are functional o example_scal for an example of regular data, where the predictor is functional and the response is scalar. See also FPCA, FPCreg, getDiag, getCB and Rtest Diagnostics of the fitted functional linear regression model is available only for the case fit = 0.