Factor Analysis (FA) and Principal Component Analysis (PCA)

1. Introduction

Factor Analysis (FA) and Principal Component Analysis (PCA) are multivariate statistical techniques used for data reduction, structure detection, and latent variable modeling. Both methods aim to summarize information contained in a large number of correlated variables into a smaller set of uncorrelated or less correlated variables, but their objectives, assumptions, and interpretations differ fundamentally.

PCA is primarily a descriptive technique that transforms observed variables into linear combinations called principal components. FA, on the other hand, is a model-based inferential technique that assumes observed variables are driven by a smaller number of unobserved latent factors.

2. Mathematical Preliminaries

Let

\(\mathbf{X} = (X_1, X_2, \dots, X_p)'\) be a vector of \(p\) observed variables
Mean vector: \(E(\mathbf{X}) = \boldsymbol{\mu}\)
Covariance matrix: \(\boldsymbol{\Sigma} = \text{Cov}(\mathbf{X})\)

For standardized variables, we work with the correlation matrix \(\mathbf{R}\).

3. Principal Component Analysis (PCA)

3.1 Objective of PCA

The objective of PCA is to find linear combinations of the observed variables that:

Are mutually uncorrelated
Capture maximum variance in descending order

The \(k\)-th principal component is given by

\[ Z_k = a_{k1}X_1 + a_{k2}X_2 + \cdots + a_{kp}X_p \]

subject to

\[ \sum_{j=1}^p a_{kj}^2 = 1 \]

3.2 Eigenvalue–Eigenvector Formulation

PCA is obtained by solving the eigenvalue problem:

\[ | \mathbf{R} - \lambda \mathbf{I} | = 0 \]

where

\(\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_p\)
Each eigenvalue represents the variance explained by the corresponding component

The total variance is

\[ \sum_{j=1}^p \lambda_j = p \quad (\text{for standardized variables}) \]

3.3 Proportion of Variance Explained

The proportion of variance explained by the \(k\)-th component is

\[ \text{PV}_k = \frac{\lambda_k}{\sum_{j=1}^p \lambda_j} \]

Cumulative proportion:

\[ \text{CPV}_m = \sum_{k=1}^m \text{PV}_k \]

3.4 Example: PCA with Three Variables

Consider the correlation matrix

\[ \mathbf{R} = \begin{pmatrix} 1 & 0.8 & 0.6 \\ 0.8 & 1 & 0.7 \\ 0.6 & 0.7 & 1 \end{pmatrix} \]

Eigenvalues:

\[ \lambda_1 = 2.30, \quad \lambda_2 = 0.50, \quad \lambda_3 = 0.20 \]

PC1 explains \(2.30/3 = 76.7\%\) variance
PC2 explains \(16.7\%\)
PC3 explains \(6.6\%\)

Thus, the first two components explain over 93% of the total variance.

3.5 Interpretation of Component Loadings

Component loadings are given by

\[ \ell_{jk} = \sqrt{\lambda_k} \, a_{jk} \]

They represent correlations between variables and components.

4. Factor Analysis (FA)

4.1 Objective of Factor Analysis

FA seeks to explain correlations among observed variables in terms of a small number of latent factors.

The basic factor model is

\[ X_i - \mu_i = \ell_{i1}F_1 + \ell_{i2}F_2 + \cdots + \ell_{im}F_m + \varepsilon_i \]

where

\(F_1, \dots, F_m\) are common factors
\(\ell_{ij}\) are factor loadings
\(\varepsilon_i\) are unique factors

4.2 Matrix Form of the Factor Model

\[ \mathbf{X} = \boldsymbol{\mu} + \mathbf{L}\mathbf{F} + \boldsymbol{\varepsilon} \]

Assumptions:

\(E(\mathbf{F}) = 0, \; \text{Cov}(\mathbf{F}) = \mathbf{I}\)
\(E(\boldsymbol{\varepsilon}) = 0\)
\(\text{Cov}(\boldsymbol{\varepsilon}) = \boldsymbol{\Psi}\) (diagonal)
\(\text{Cov}(\mathbf{F}, \boldsymbol{\varepsilon}) = 0\)

Then

\[ \boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}' + \boldsymbol{\Psi} \]

4.3 Communality and Uniqueness

For variable \(X_i\):

Communality: \[ h_i^2 = \sum_{j=1}^m \ell_{ij}^2 \]
Uniqueness: \[ \psi_i = 1 - h_i^2 \]

Communality represents the proportion of variance explained by common factors.

4.4 Methods of Factor Extraction

Principal Axis Factoring (PAF)
Maximum Likelihood Factor Analysis (MLFA)
Image Factoring

In PAF, communalities are iteratively estimated and substituted into the diagonal of \(\mathbf{R}\).

4.5 Example: Factor Analysis with Four Variables

Suppose the estimated loading matrix is

\[ \mathbf{L} = \begin{pmatrix} 0.80 & 0.10 \\ 0.75 & 0.20 \\ 0.10 & 0.85 \\ 0.20 & 0.80 \end{pmatrix} \]

Communalities:

\[ h_1^2 = 0.80^2 + 0.10^2 = 0.65 \]

\[ h_3^2 = 0.10^2 + 0.85^2 = 0.73 \]

Thus, variables 1 and 2 load strongly on Factor 1, while 3 and 4 load on Factor 2.

5. Factor Rotation

5.1 Need for Rotation

Initial factor solutions are often difficult to interpret. Rotation improves simple structure without changing communalities.

5.2 Orthogonal Rotation (Varimax)

Varimax maximizes the variance of squared loadings:

\[ V = \sum_{j=1}^m \left[ \frac{1}{p} \sum_{i=1}^p \ell_{ij}^4 - \left( \frac{1}{p} \sum_{i=1}^p \ell_{ij}^2 \right)^2 \right] \]

5.3 Oblique Rotation

Allows correlated factors:

Promax
Oblimin

Produces:

Pattern matrix
Structure matrix

6. Determining the Number of Components/Factors

Kaiser Criterion: \(\lambda > 1\)
Scree Plot
Cumulative Variance Criterion
Parallel Analysis

7. PCA vs Factor Analysis: A Comparison

Aspect	PCA	FA
Nature	Descriptive	Model-based
Variance used	Total variance	Common variance
Error term	Not explicit	Explicit
Objective	Data reduction	Latent structure

8. Applications

Psychology: Intelligence, personality traits
Social sciences: Attitude scales
Health sciences: Symptom clustering
Economics: Composite indices

9. Assumptions and Diagnostics

Adequate correlations among variables
Bartlett’s Test of Sphericity
Kaiser–Meyer–Olkin (KMO) measure:

\[ \text{KMO} = \frac{\sum r_{ij}^2}{\sum r_{ij}^2 + \sum q_{ij}^2} \]

where \(q_{ij}\) are partial correlations.

10. Concluding Remarks

PCA and FA are powerful multivariate tools with distinct purposes. PCA is suitable for summarization and index construction, while FA is appropriate when the goal is to uncover latent constructs. Proper understanding of their mathematical foundations ensures correct application and interpretation in empirical research.

End of Notes