Introduction

Principal Components Regression (PCR) is a statistical technique used to address the problem of multicollinearity in multiple linear regression models. Multicollinearity occurs when predictor variables are highly correlated, which leads to unstable coefficient estimates and inflated standard errors in ordinary least squares (OLS) regression. Although OLS estimates remain unbiased under multicollinearity, their variance may be large, making predictions unreliable.

PCR resolves this by transforming the original correlated predictors into a smaller set of uncorrelated variables called principal components, which are linear combinations of the original variables. These components are then used as predictors in a linear regression model.

By discarding components associated with low variance (often linked to noise or multicollinearity), PCR introduces a form of bias but significantly reduces variance, often resulting in better generalization to new data.

Note: PCR differs from ridge regression, which addresses multicollinearity by shrinking coefficients through a penalty term. Ridge regression retains all predictors, while PCR transforms them.

Key Steps in PCR

  1. Standardize the predictors
  2. Perform Principal Component Analysis (PCA)
  3. Select the number of principal components (via cross-validation)
  4. Fit a regression model using the selected components
  5. Predict and evaluate model performance

Assumptions

PCR assumes the same conditions as standard linear regression:

Diagnostic Checks

Linearity

  • Check: Residuals vs. each original predictor plots
  • Look for: Random scatter. Non-linear patterns suggest misspecification.

Constant Variance

  • Check: Residuals vs. Fitted values (Yhat) plot
  • Look for: A horizontal band. A funnel shape indicates heteroscedasticity.

Independence

  • Check: Residuals vs. Time (if applicable)
  • Look for: Trends or cycles suggest autocorrelation.
  • OR Residuals vs. each original predictor plots – A non-random pattern suggests a lack of independence. OR Residuals vs. Yhat plot – The residuals should be randomly scattered around zero, with no discernible pattern.

Advantages of PCR

Limitations of PCR