Heteroskedasticity

Sameer Mathur

Examples, Visualization, Theory

(Part of Regression Diagnostics)

---

Reference

https://en.wikipedia.org/wiki/Heteroscedasticity

Eg. 1 - FoodExpenditure = F(Income)

As income increases, the variability of food consumption will increase.

A poorer person will spend a rather constant amount by always eating inexpensive food;

A wealthier person may occasionally buy inexpensive food and at other times eat expensive food.

Those with higher incomes display a greater variability of food consumption.

Eg. 2 -- A Rocket in Space: Distance = F(Time)

Imagine your team is tracking a rocket take off and measuring the distance it has traveled each second.

Your team writes a model to predict the distance travelled by the rocket, with time.

For the first few seconds, your model predition may be accurate to the nearest centimeter, say.

However, 5 minutes later, as the rocket recedes into space, the accuracy of your model may only be good to 100 m, because of the increased distance, atmospheric distortion and a variety of other factors.

Such a model you estimate could exhibit heteroscedasticity.

Eg. 3 -- Typing Error = F(Days of Coaching)

Suppose 100 students enroll in a typing class

Some students have typing experience while others do not.

After the first class there would be a great deal of dispersion in the number of typing mistakes.

After the final class the dispersion would be smaller .

The error variance is non constant - it decreases with the number of days of coaching.

Eg. 1 - Visualizing FoodExpenditure = F(Income)

Eg. 1 - Heteroskedasticity in FoodExpenditure = F(Income)

Eg. 1 - Heteroskedasticity in FoodExpenditure = F(Income)

The Linear Regression Model assumes Homoscedastic Pattern of Errors

Hetroscedasticity and Homoscedasticity

Hetroscedasticity in the Linear Regression Model

(A little) Theory

Theory -- Linear Regression

One of the important assumptions of linear regression is that, there should be no heteroscedasticity of residuals.

The variance of residuals should not increase with fitted values of the response variable.

Theory -- Linear Regression

The errors have the same but unknown variance i.e. \( E(\epsilon^2_i) = \sigma^2 \), where \( i = 1, 2, ... , n \).

This is known as constant variance or homoscedasticity.
When this assumption is violated, the problem is known as heteroscedasticity.

Why is it important to check for heteroscedasticity?

Reason 1

One of the assumptions of the classical linear regression model is that there is no heteroscedasticity.

Breaking this assumption means that the Gauss-Markov theorem does not apply, meaning that OLS estimators are not the Best Linear Unbiased Estimators (BLUE) and their variance is not the lowest of all other unbiased estimators.

Reason 2

Regression analysis using heteroscedastic data will still provide an unbiased estimate for the relationship between the predictor variable and the outcome, but standard errors and therefore inferences obtained from data analysis are suspect.

Biased standard errors lead to biased inference, so results of hypothesis tests are possibly wrong.

Major Consequences of Heteroscedasticity

Consequence 1

The OLS estimators and regression predictions based on them remains unbiased and consistent.

Consequence 2

The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.

Consequence 3

Because of the inconsistency of the covariance matrix of the estimated regression coefficients, hypotheses tests (t-test, F-test) are no longer valid.

How to detect hetroscedasticity?

There are several methods to test for the presence of heteroscedasticity.

Statistical Tests to Detect Hetroscedasticity

Some popular tests in regression

These tests consist of a test statistic (a mathematical expression yielding a numerical value as a function of the data), a hypothesis that is going to be tested (the null hypothesis), an alternative hypothesis, and a statement about the distribution of statistic under the null hypothesis.

How to fix hetroscedasticity?

There are four common corrections for heteroscedasticity

Correction 1

View logarithmized data. Non-logarithmized series that are growing exponentially often appear to have increasing variability as the series rises over time. The variability in percentage terms may, however, be rather stable.

Correction 2

Use a different specification for the model (different X variables, or perhaps non-linear transformations of the X variables).

Correction 3

Apply a weighted least squares estimation method, in which OLS is applied to transformed or weighted values of X and Y.

The weights vary over observations, usually depending on the changing error variances.
In one variation the weights are directly related to the magnitude of the dependent variable.