Factor Analysis (FA) and Principal Component Analysis (PCA)


1. Introduction

Factor Analysis (FA) and Principal Component Analysis (PCA) are multivariate statistical techniques used for data reduction, structure detection, and latent variable modeling. Both methods aim to summarize information contained in a large number of correlated variables into a smaller set of uncorrelated or less correlated variables, but their objectives, assumptions, and interpretations differ fundamentally.

PCA is primarily a descriptive technique that transforms observed variables into linear combinations called principal components. FA, on the other hand, is a model-based inferential technique that assumes observed variables are driven by a smaller number of unobserved latent factors.


2. Mathematical Preliminaries

Let

  • \(\mathbf{X} = (X_1, X_2, \dots, X_p)'\) be a vector of \(p\) observed variables
  • Mean vector: \(E(\mathbf{X}) = \boldsymbol{\mu}\)
  • Covariance matrix: \(\boldsymbol{\Sigma} = \text{Cov}(\mathbf{X})\)

For standardized variables, we work with the correlation matrix \(\mathbf{R}\).


3. Principal Component Analysis (PCA)

3.1 Objective of PCA

The objective of PCA is to find linear combinations of the observed variables that:

  1. Are mutually uncorrelated
  2. Capture maximum variance in descending order

The \(k\)-th principal component is given by

\[ Z_k = a_{k1}X_1 + a_{k2}X_2 + \cdots + a_{kp}X_p \]

subject to

\[ \sum_{j=1}^p a_{kj}^2 = 1 \]


3.2 Eigenvalue–Eigenvector Formulation

PCA is obtained by solving the eigenvalue problem:

\[ | \mathbf{R} - \lambda \mathbf{I} | = 0 \]

where

  • \(\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_p\)
  • Each eigenvalue represents the variance explained by the corresponding component

The total variance is

\[ \sum_{j=1}^p \lambda_j = p \quad (\text{for standardized variables}) \]


3.3 Proportion of Variance Explained

The proportion of variance explained by the \(k\)-th component is

\[ \text{PV}_k = \frac{\lambda_k}{\sum_{j=1}^p \lambda_j} \]

Cumulative proportion:

\[ \text{CPV}_m = \sum_{k=1}^m \text{PV}_k \]


3.4 Example: PCA with Three Variables

Consider the correlation matrix

\[ \mathbf{R} = \begin{pmatrix} 1 & 0.8 & 0.6 \\ 0.8 & 1 & 0.7 \\ 0.6 & 0.7 & 1 \end{pmatrix} \]

Eigenvalues:

\[ \lambda_1 = 2.30, \quad \lambda_2 = 0.50, \quad \lambda_3 = 0.20 \]

  • PC1 explains \(2.30/3 = 76.7\%\) variance
  • PC2 explains \(16.7\%\)
  • PC3 explains \(6.6\%\)

Thus, the first two components explain over 93% of the total variance.


3.5 Interpretation of Component Loadings

Component loadings are given by

\[ \ell_{jk} = \sqrt{\lambda_k} \, a_{jk} \]

They represent correlations between variables and components.


4. Factor Analysis (FA)

4.1 Objective of Factor Analysis

FA seeks to explain correlations among observed variables in terms of a small number of latent factors.

The basic factor model is

\[ X_i - \mu_i = \ell_{i1}F_1 + \ell_{i2}F_2 + \cdots + \ell_{im}F_m + \varepsilon_i \]

where

  • \(F_1, \dots, F_m\) are common factors
  • \(\ell_{ij}\) are factor loadings
  • \(\varepsilon_i\) are unique factors

4.2 Matrix Form of the Factor Model

\[ \mathbf{X} = \boldsymbol{\mu} + \mathbf{L}\mathbf{F} + \boldsymbol{\varepsilon} \]

Assumptions:

  • \(E(\mathbf{F}) = 0, \; \text{Cov}(\mathbf{F}) = \mathbf{I}\)
  • \(E(\boldsymbol{\varepsilon}) = 0\)
  • \(\text{Cov}(\boldsymbol{\varepsilon}) = \boldsymbol{\Psi}\) (diagonal)
  • \(\text{Cov}(\mathbf{F}, \boldsymbol{\varepsilon}) = 0\)

Then

\[ \boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}' + \boldsymbol{\Psi} \]


4.3 Communality and Uniqueness

For variable \(X_i\):

  • Communality: \[ h_i^2 = \sum_{j=1}^m \ell_{ij}^2 \]

  • Uniqueness: \[ \psi_i = 1 - h_i^2 \]

Communality represents the proportion of variance explained by common factors.


4.4 Methods of Factor Extraction

  1. Principal Axis Factoring (PAF)
  2. Maximum Likelihood Factor Analysis (MLFA)
  3. Image Factoring

In PAF, communalities are iteratively estimated and substituted into the diagonal of \(\mathbf{R}\).


4.5 Example: Factor Analysis with Four Variables

Suppose the estimated loading matrix is

\[ \mathbf{L} = \begin{pmatrix} 0.80 & 0.10 \\ 0.75 & 0.20 \\ 0.10 & 0.85 \\ 0.20 & 0.80 \end{pmatrix} \]

Communalities:

\[ h_1^2 = 0.80^2 + 0.10^2 = 0.65 \]

\[ h_3^2 = 0.10^2 + 0.85^2 = 0.73 \]

Thus, variables 1 and 2 load strongly on Factor 1, while 3 and 4 load on Factor 2.


5. Factor Rotation

5.1 Need for Rotation

Initial factor solutions are often difficult to interpret. Rotation improves simple structure without changing communalities.


5.2 Orthogonal Rotation (Varimax)

Varimax maximizes the variance of squared loadings:

\[ V = \sum_{j=1}^m \left[ \frac{1}{p} \sum_{i=1}^p \ell_{ij}^4 - \left( \frac{1}{p} \sum_{i=1}^p \ell_{ij}^2 \right)^2 \right] \]


5.3 Oblique Rotation

Allows correlated factors:

  • Promax
  • Oblimin

Produces:

  • Pattern matrix
  • Structure matrix

6. Determining the Number of Components/Factors

  1. Kaiser Criterion: \(\lambda > 1\)
  2. Scree Plot
  3. Cumulative Variance Criterion
  4. Parallel Analysis

7. PCA vs Factor Analysis: A Comparison

Aspect PCA FA
Nature Descriptive Model-based
Variance used Total variance Common variance
Error term Not explicit Explicit
Objective Data reduction Latent structure

8. Applications

  • Psychology: Intelligence, personality traits
  • Social sciences: Attitude scales
  • Health sciences: Symptom clustering
  • Economics: Composite indices

9. Assumptions and Diagnostics

  • Adequate correlations among variables
  • Bartlett’s Test of Sphericity
  • Kaiser–Meyer–Olkin (KMO) measure:

\[ \text{KMO} = \frac{\sum r_{ij}^2}{\sum r_{ij}^2 + \sum q_{ij}^2} \]

where \(q_{ij}\) are partial correlations.


10. Concluding Remarks

PCA and FA are powerful multivariate tools with distinct purposes. PCA is suitable for summarization and index construction, while FA is appropriate when the goal is to uncover latent constructs. Proper understanding of their mathematical foundations ensures correct application and interpretation in empirical research.


End of Notes