function anfactpcwod(X) %ANFACTPCWOD Factor Analysis by the Principal Components Method Without Data. % This m-file deals with the principal component solution of the factor model thru the % R-correlation matrix (without the matrix of data), the latent root criterion, and % uses the varimax factor rotation. Optionally, it also gives the residual matrix, which % results of the difference between the original correlation and the correlation structure % for the factor model. % The purpose of Factor Analysis is to describe as possible the covariance relationships % among many variables in terms of a few underlying and unobservable random quantities % called factors. It can be considered as an extension of Principal Components Analysis, % but its approximation is more elaborate. The factor model postulates that the X-observable % random vector with p components is linearly dependent upon a few unobservable random % variables F_1,F_2,...,F_m, called common factors, and p additional sources of variation % e_1,e_2,...,e_m, called errors or specific factors (a component unique to that particular % X_i and not shared by the others X´s). So, the model is: % % X_1 = b_11*F_1 + b_12*F_2 + ...+ b_1m*F_m + e_1 % X_2 = b_21*F_1 + b_22*F_2 + ...+ b_2m*F_m + e_2 % . . % : : % X_p = b_p1*F_1 + b_p2*F_2 + ...+ b_pm*F_m + e_p % % In matrix notation it can be written as, % X = B*F + e % Where, B is the matrix of loadings of the i-th variable on the j-th factor. F and e are, % respectively, the random vectors of the common factors and errors. Value m are the number % of factors specified ahead of time to complete the model and selected by a spacific criteria % as the latent root, a priori, percentage of variance or scree test. The latent root criterion % can be used as a guideline for a first attempt or as a definitive selection of the number of % factors: it must be less that the p number of components(variables). % According to Rencher (2002), there are four approaches to estimation of the loadings and % communalities:(1) Principal Component Metdod; (2) Principal Factor Method; (3) Iterated Principal % Factor Method, and (4) Maximum Likelihood Method. The two most popular methods of parameter % estimation are the principal component and the maximum likelihood method. The solution from % either method can be rotated in order to simplify the interpretation of factors. It is always % prudent to try more than one method of solution. % Some of the purposes for which Factor Analysis can be used are (1) that the number of variables % for further research can be minimized while also maximizing the amount of information in the % analysis (the smaller set can be used as operational representatives of the constructs underlying % the complete set of variables), (2) can be used to search data for possible qualitative and % quantitative distinctions and particularly useful when the sheer amount of available data exceeds % comprehensibility, and (3) if the domain of data can be hypothesized to have certain qualitative % and quantitative distinctions, then this hypothesis can be tested by factor analysis. % % Syntax: function anfactpcwod(X) % % Input: % X - Correlation matrix. One can also input a covariance matrix, that thru the % standardization of any X-input matrix it assure a R-correlation matrix of % it and needed for the procedure. % Outputs: % Complete Factor Analysis Results such as: % - Table of the Extraction of Components. % - Table of Unrotated Principal Components of the Factor Analysis. % - Proportion of Total (standardized) Sample Variance. % - Table of Cumulative Proportion of Total (standardized) Sample Variance. % Optionally: % - Table of Varimax Rotated Principal Components of the Factor Analysis. % - Residual Matrix. % % Example: From the example 13.5.3 of Rencher (2002, p. 437). We take the correlation matrix % from head measurements on first and second adult sons in a sample of 25 families. % Where we are interested to extract, if possible, groups of high-correlated variables % that represent a single underlying construct or factors. % % [1.000 0.735 0.711 0.704] % [0.735 1.000 0.693 0.709] % R = [0.711 0.693 1.000 0.839] % [0.704 0.709 0.839 1.000] % % Data matrix must be: % X=[1.000 0.735 0.711 0.704;0.735 1.000 0.693 0.709; % 0.711 0.693 1.000 0.839;0.704 0.709 0.839 1.000]; % % Calling on Matlab the function: % anfactpcwod(X) % % Answer is: % % Table of the Extraction of Components. % ------------------------------------------------------------- % Percent of Cummulative % Factors Eigenvalue Variance Percent of Variance % ------------------------------------------------------------- % 1 3.1965 79.9118 79.9118 % 2 0.3778 9.4446 89.3564 % 3 0.2660 6.6510 96.0074 % 4 0.1597 3.9926 100.0000 % ------------------------------------------------------------- % % By the latent root citerion the number of factors suggested are 1 % % Do you need to work with this number of factors? (y/n): n % Give me the number of factors you need: 2 % % Table of Unrotated Principal Components of the Factor Analysis. % ----------------------------------------------------------------------------- % Factors = % -0.8793 0.2945 0.8599 % -0.8753 0.3308 0.8755 % -0.9090 -0.3087 0.9216 % -0.9116 -0.2938 0.9173 % ----------------------------------------------------------------------------- % On factors, Factor 1 = column 1 and so forth to 2 % Last column are the Communality % On variates, Variate 1 = first row and so forth to 4 % % Proportion of Total (standardized) Sample Variance. % ----------------------------------------------------------------------------- % pp = % 0.7991 0.0944 0.8936 % ----------------------------------------------------------------------------- % % Do you want to do a varimax factor rotation? (y/n): y % % Table of Varimax Rotated Principal Components of the Factor Analysis. % ----------------------------------------------------------------------------- % Factors = % -0.4234 0.8250 0.8599 % -0.3951 0.8482 0.8755 % -0.8660 0.4142 0.9216 % -0.8575 0.4266 0.9173 % ----------------------------------------------------------------------------- % On factors, Factor 1 = column 1 and so forth to 2 % Last column are the communality % On variates, Variate 1 = first row and so forth to 4 % % Table of Cumulative Proportion of Total (standardized) Sample Variance. % ----------------------------------------------------------------------------- % pp = % 0.4552 0.4384 0.8936 % ----------------------------------------------------------------------------- % % Do you need to output the residual matrix? (y/n): y % % Residual matrix: % % Rm = % 0 -0.1320 0.0027 -0.0110 % -0.1320 0 -0.0005 0.0083 % 0.0027 -0.0005 0 -0.0803 % -0.0110 0.0083 -0.0803 0 % % Created by A. Trujillo-Ortiz, R. Hernandez-Walls, A. Castro-Perez and M. Rodriguez-Ceja % Facultad de Ciencias Marinas % Universidad Autonoma de Baja California % Apdo. Postal 453 % Ensenada, Baja California % Mexico. % atrujo@uabc.mx % And the special collaboration of the post-graduate students of the 2006:1 % Multivariate Statistics Course: A.L. Melendez-Sanchez, E. del-Angel-Bustos, % M. Melo-Rosales, B. Vega-Rodriguez, C. Moreno-Medina, A. Ramirez-Valdez, % J.P. D'Olivo-Cordero, L.D. Espinosa-Chaurand, and G.L. Beltran-Flores. % % Copyright. March 29, 2006. % % ---Special thanks are given to Lukáš Malec, Institute of Chemical Technology, % Faculty of Environmental Technology, Technická 5, 166 28 Prague 6, Czech Republic % (Lukas.Malec@vscht.cz) for encouraging us to create this m-file--- % % To cite this file, this would be an appropriate format: % Trujillo-Ortiz, A., R. Hernandez-Walls, A. Castro-Perez, M. Rodriguez-Ceja, A.L. Melendez-Sanchez, % E. del-Angel-Bustos, M. Melo-Rosales, B. Vega-Rodriguez, C. Moreno-Medina, A. Ramirez-Valdez, % J.P. D'Olivo-Cordero, L.D. Espinosa-Chaurand, and G.L. Beltran-Flores. (2006). ANFACTPCWOD:Factor % Analysis by the Principal Components Method Without Data. A MATLAB file. [WWW document]. URL http:// % www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=10602 % % References: % % Rencher, A. C. (2002), Methods of Multivariate Analysis. 2nd. ed. % New-Jersey:John Wiley & Sons. Chapter 13 (pp. 408-450). % error(nargchk(1,1,nargin)) [r c] = size(X); if r ~= c, error('Input matrix it is not square.'); elseif ~all(all(X==X')), error('Input matrix it is not symmetric.'); elseif any(diag(X) <= 0), error('The covariance/correlation matrix must be positive definite.'); else end %Standardization of any X-input matrix in order to assure a R-correlation matrix of it %and needed for the procedure. dX = diag(X); D = 1./sqrt(dX); R = X.* (D*D'); p = r; [A L A] = svd(R,0); l = diag(L); %eigenvalues vector P = (l/sum(l))*100; %percent of variance CP = cumsum(P); %cummulative percent of variance ft = [1:p]'; disp(' ') disp('Table of the Extraction of Components.') fprintf('-------------------------------------------------------------\n'); disp(' Percent of Cummulative '); disp(' Factors Eigenvalue Variance Percent of Variance '); fprintf('-------------------------------------------------------------\n'); fprintf(' %d %10.4f %10.4f %10.4f\n',[ft,l,P,CP].'); fprintf('-------------------------------------------------------------\n'); B = A*sqrt(L); f = L >= 1.0; %extraction of components m = sum(sum(f)); disp(' '); fprintf('By the latent root citerion the number of factors suggested are %.i\n', m); disp(' '); ask = input('Do you need to work with this number of factors? (y/n): ','s'); if~strcmp(ask,'y'), m = input('Give me the number of factors you need: '); while (m > p) disp(' '); fprintf('Error: The number of factors requested is too large for the number of the observed variables. It must be equal or lesser than %.i\n', p); disp(' '); m = input('Give me the number of factors you need: '); end end F = B(:,1:m); %matrix loadings of the selected unrotated factors pt = sum(F.^2)/p; pp = [pt sum(pt)]; %Communality estimation C = F.^2; C = sum(C,2); Factors = [F C]; disp(' ') disp('Table of Unrotated Principal Components of the Factor Analysis.') disp('-----------------------------------------------------------------------------'); Factors fprintf('-----------------------------------------------------------------------------\n'); fprintf('On factors, Factor 1 = column 1 and so forth to %.i\n', m); disp('Last column are the Communality'); fprintf('On variates, Variate 1 = first row and so forth to %.i\n', p); disp(' ') disp('Proportion of Total (standardized) Sample Variance.'); disp('-----------------------------------------------------------------------------'); pp fprintf('-----------------------------------------------------------------------------\n'); sv = 1 - C; %specific variance (specificity=error) Re = F*F' + diag(sv); %correlation structure for the factor model Rm = R - Re; %residual matrix disp(' '); rt = input('Do you want to do a varimax factor rotation? (y/n): ','s'); if rt == 'y' loadings = F; b = loadings; [n,nf] = size(loadings); hjsq = diag(loadings*loadings'); % communalities hj =sqrt(hjsq); % function to compute variances of loadings ^2 for iter = 1:10; % program cycles through to factors at a time for i = 1:nf-1, jl = i+1; for j = jl:nf, xj = loadings(:,i)./hj; yj = loadings(:,j)./hj; uj = xj.*xj - yj.*yj; vj = 2*xj.*yj; A = sum(uj); B = sum(vj); C = uj'*uj - vj'*vj; D = 2*uj'*vj; num = D - 2*A*B/n; den = C - (A^2 - B^2)/n; tan4p = num/den; phi = atan2(num,den)/4; angle = phi*180/pi; if abs(phi) > 0.00001; Xj = cos(phi)*xj + sin(phi)*yj; Yj = -sin(phi)*xj + cos(phi)*yj; bj1 = Xj.*hj; bj2 = Yj.*hj; b(:,i) = bj1; b(:,j) = bj2; loadings(:,i) = b(:,i); loadings(:,j) = b(:,j); end end end loadings = b; end F = loadings; pt = sum(F.^2)/p; pp = [pt sum(pt)]; %Communality estimation C = F.^2; C = sum(C,2); Factors = [F C]; disp(' ') disp('Table of Varimax Rotated Principal Components of the Factor Analysis.') disp('-----------------------------------------------------------------------------'); Factors fprintf('-----------------------------------------------------------------------------\n'); fprintf('On factors, Factor 1 = column 1 and so forth to %.i\n', m); disp('Last column are the communality'); fprintf('On variates, Variate 1 = first row and so forth to %.i\n', p); disp(' ') disp('Table of Cumulative Proportion of Total (standardized) Sample Variance.'); disp('-----------------------------------------------------------------------------'); pp fprintf('-----------------------------------------------------------------------------\n'); sv = 1 - C; %specific variance (specificity=error) Re = F*F' + diag(sv); %correlation structure for the factor model Rm = R - Re; %residual matrix else end disp(' '); rm = input('Do you need to output the residual matrix? (y/n): ','s'); disp(' '); if rm == 'y', disp('Residual matrix:'); Rm else end return