- Overview
- Similarities
- Differences
- Principal Component Analysis
- Example
- Theory
- Math
- Exploratory Factor Analysis
- Example
- Theory
- Math
- Sample size
November 20, 2017
In PCA composite variables called Components
In EFA composite variables called Factors
Math used to derive components and factors are similar (but not the same)
Researcher determines the number of components/factors to retain and determines if they are interpretable
Components in PCA are assumed to be orthogonal while factors are (generally) assumed to be correlated
These theoretical differences result in mathematical differences
Say you have a study where you are trying to predict breast cancer diagnosis in a sample of 1000 breast cancer survivors using gene expression from a microarray.
\(c\) is the principal component that we are reducing the measured gene expressions into
\(x_j\) are the measured gene expressions
\(a_j\) are weights relating the gene expression to the principal component
\(C_m = a_{1}X_1 + a_{2}X_2 + ... + a_{j}X_j\)
where:Each k extracted component exists in k dimensional euclidean space
Components are orthogonal from each other
Variables need to be on the same scale
Sometimes PCA isn't useful/doesn't work
In EFA we aren't as much interested in reducing the number of variables as we are trying to measure a latent (unobserved) construct
This is the first step in validating a self-report measure
PCA has roots in classical statistical tradition (general linear model) which operates on observed variables and assumes no measurement error
EFA comes from psychometric tradition and assumes all observed scores come from unobserved common factors
You want to develop a measure of fatigue in cancer patients
You've created an 8 item measure and want to see if all these items measure a single latent construct
\(f\) is the latent factor Fatigue we are interested in measuring
\(a_j\) is the relationship between item j and the latent factor Fatigue
\(U_j\) is the uniqueness in variable j that is not due to Fatigue
\(X_j = a_{j1}F_1 + a_{j2}F_2 + ... + a_{jm}F_m + d_jU_j\)
where:Similar to PCA
Need at least 3 items per factor
Factors aren't orthogonal like in PCA
While these techniques are fairly easy to implement, the theory and math behind them are complex
Particularly with EFA, make sure you work closely with your analyst to ensure stats and theory are aligned
With categorical variables, different techniques are used (and are more complicated)