Sameer Mathur
Examples, Visualization, Theory
(Part of Regression Diagnostics)
---
Reference
As income increases, the variability of food consumption will increase.
A poorer person will spend a rather constant amount by always eating inexpensive food;
A wealthier person may occasionally buy inexpensive food and at other times eat expensive food.
Those with higher incomes display a greater variability of food consumption.
Imagine your team is tracking a rocket take off and measuring the distance it has traveled each second.
Your team writes a model to predict the distance travelled by the rocket, with time.
For the first few seconds, your model predition may be accurate to the nearest centimeter, say.
However, 5 minutes later, as the rocket recedes into space, the accuracy of your model may only be good to 100 m, because of the increased distance, atmospheric distortion and a variety of other factors.
Such a model you estimate could exhibit heteroscedasticity.
Suppose 100 students enroll in a typing class
Some students have typing experience while others do not.
After the first class there would be a great deal of dispersion in the number of typing mistakes.
After the final class the dispersion would be smaller .
The error variance is non constant - it decreases with the number of days of coaching.
One of the important assumptions of linear regression is that, there should be no heteroscedasticity of residuals.
The errors have the same but unknown variance i.e. \( E(\epsilon^2_i) = \sigma^2 \), where \( i = 1, 2, ... , n \).
This is known as constant variance or homoscedasticity.
When this assumption is violated, the problem is known as heteroscedasticity.
One of the assumptions of the classical linear regression model is that there is no heteroscedasticity.
Regression analysis using heteroscedastic data will still provide an unbiased estimate for the relationship between the predictor variable and the outcome, but standard errors and therefore inferences obtained from data analysis are suspect.
The OLS estimators and regression predictions based on them remains unbiased and consistent.
The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
Because of the inconsistency of the covariance matrix of the estimated regression coefficients, hypotheses tests (t-test, F-test) are no longer valid.
There are several methods to test for the presence of heteroscedasticity.
Some popular tests in regression
These tests consist of a test statistic (a mathematical expression yielding a numerical value as a function of the data), a hypothesis that is going to be tested (the null hypothesis), an alternative hypothesis, and a statement about the distribution of statistic under the null hypothesis.
There are four common corrections for heteroscedasticity
View logarithmized data. Non-logarithmized series that are growing exponentially often appear to have increasing variability as the series rises over time. The variability in percentage terms may, however, be rather stable.
Use a different specification for the model (different
X
variables, or perhaps non-linear transformations of theX
variables).
Apply a weighted least squares estimation method, in which OLS is applied to transformed or weighted values of
X
andY
.