7/17/2020

Measures of Influence

Material extrated from: https://cran.r-project.org/web/packages/olsrr/vignettes/influence_measures.html Introduction: It is possible for a single observation to have a great influence on the results of a regression analysis. It is therefore important to detect influential observations and to take them into consideration when interpreting the results. Package olsrr offers the following tools to detect influential observations:

  • Cook’s D Bar Plot
  • Cook’s D Chart
  • DFBETAs Panel
  • DFFITs Plot
  • Studentized Residual Plot
  • Standardized Residual Chart
  • Studentized Residuals vs Leverage Plot
  • Deleted Studentized Residual vs Fitted Values Plot
  • Hadi Plot
  • Potential Residual Plot

Cook’s D Bar Plot

Bar Plot of Cook’s distance to detect observations that strongly influence fitted values of the model. Cook’s distance was introduced by American statistician R Dennis Cook in 1977. It is used to identify influential data points. It depends on both the residual and leverage i.e it takes it account both the x value and y value of the observation.

Cook’s D Bar Plot

DFBETAs Panel

DFBETA measures the difference in each parameter estimate with and without the influential point. There is a DFBETA for each data point i.e if there are n observations and k variables, there will be n∗k DFBETAs. In general, large values of DFBETAS indicate observations that are influential in estimating a given parameter. Belsley, Kuh, and Welsch recommend 2 as a general cutoff value to indicate influential observations and 2n√n as a size-adjusted cutoff.

DFBETAs Panel

DFFITS Plot

Proposed by Welsch and Kuh (1977). It is the scaled difference between the ith fitted value obtained from the full data and the ith fitted value obtained by deleting the ithobservation. DFFIT - difference in fits, is used to identify influential data points. It quantifies the number of standard deviations that the fitted value changes when the ith data point is omitted. Steps to compute DFFITs:

  • delete observations one at a time.
  • refit the regression model on remaining observations
  • examine how much all of the fitted values change when the ith observation is deleted.

An observation is deemed influential if the absolute value of its DFFITS value is greater than:

where n is the number of observations and p is the number of predictors including intercept.