1. Issue Summary

What is Heteroskedasticity?

Heteroskedasticity is an issue where the variance of the errors or residuals in a regression model is not constant across different predictors. This can be caused by outliers in the data or by omitted variable biases. Heteroskedasticity results in biased estimates of the standard errors, and unbiased but less precise point estimates.

Null and Alternative Hypothesis

Whites test:
- Null Hypothesis: Homoskedasticity (p> threshold (e.g. 0.05))
- Alternative Hypothesis: p<0.05 heteroskedasticity assumed

I agree with the test logic, as we can see whether or not the variation in standard errors, in this case using squared residuals, is attributed to the regressors used in the model. Additionally, in the presence of homoskedasticity, we should observe, other than the constant, coefficient estimates around zero, and we should observe small R^2 values.

2.Coding

Data:

library(AER)

## Warning: package 'AER' was built under R version 4.2.3

## Loading required package: car

## Loading required package: carData

## Loading required package: lmtest

## Warning: package 'lmtest' was built under R version 4.2.3

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 4.2.3

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## Warning: package 'sandwich' was built under R version 4.2.3

## Loading required package: survival

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.2.3

library(rmarkdown)

## Warning: package 'rmarkdown' was built under R version 4.2.3

library("skedastic")

## Warning: package 'skedastic' was built under R version 4.2.3

library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.2.3

## Warning: package 'tibble' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## Warning: package 'readr' was built under R version 4.2.3

## Warning: package 'purrr' was built under R version 4.2.3

## Warning: package 'forcats' was built under R version 4.2.3

## Warning: package 'lubridate' was built under R version 4.2.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ✔ readr     2.1.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ dplyr::recode() masks car::recode()
## ✖ MASS::select()  masks dplyr::select()
## ✖ purrr::some()   masks car::some()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(psych)

## Warning: package 'psych' was built under R version 4.2.3

## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
## 
## The following object is masked from 'package:car':
## 
##     logit

data("Guns")
subset_Guns <- Guns[,c(2,4,5,8)]

Simple Regression:

\[ Violent_i = \beta_0 + robbery_i\beta_1 + male_i\beta_2 + prisoners_i\beta_3 + \epsilon \]

Violent: violent crime rate (incidents per 100,000 members of the population)

law: robbery rate (incidents per 100,000)

male: percent of state population that is male, ages 10 to 29

prisoners: incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; value for the previous year)

OLS_1 <- lm(violent ~ ., data = subset_Guns)
stargazer(OLS_1, type = "text")

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               violent          
## -----------------------------------------------
## robbery                      1.458***          
##                               (0.025)          
##                                                
## prisoners                    0.546***          
##                               (0.027)          
##                                                
## male                          4.776**          
##                               (2.284)          
##                                                
## Constant                      66.654*          
##                              (39.150)          
##                                                
## -----------------------------------------------
## Observations                   1,173           
## R2                             0.876           
## Adjusted R2                    0.875           
## Residual Std. Error     118.052 (df = 1169)    
## F Statistic         2,742.709*** (df = 3; 1169)
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

stargazer(subset_Guns, type = "text")

## 
## =================================================
## Statistic   N    Mean   St. Dev.  Min      Max   
## -------------------------------------------------
## violent   1,173 503.075 334.277  47.000 2,921.800
## robbery   1,173 161.820 170.510  6.400  1,635.100
## prisoners 1,173 226.580 178.888    19     1,913  
## male      1,173 16.081   1.732   12.214  22.353  
## -------------------------------------------------

White Test:

white(mainlm = OLS_1, interactions = TRUE)

## # A tibble: 1 × 5
##   statistic  p.value parameter method       alternative
##       <dbl>    <dbl>     <dbl> <chr>        <chr>      
## 1      284. 5.05e-56         9 White's Test greater

In this case, we have a very low p-value below the threshold of 0.05. Therefore, we must reject the null hypothesis that there is homoskedasticity. In this case, we have heteroskedasticity.

Residual Analysis:

plot(OLS_1, which = 1)

In this residuals vs fitted plot above, we observe that there is a strong pattern associated with the data, and we can see that the line of best fit is not very linear. The distribution of the plot is clustered to the left and dispersed on the right, while there seems to be an outlier value highlighted by the plot (point 189). These evaluations point more toward the idea that heteroskedasticity is at play in this scenario.

Auxiliary Regression:

subset_Guns$residuals <- resid(object = OLS_1)
summary(subset_Guns$residuals)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -492.20  -84.08  -15.52    0.00   64.36  488.30

subset_Guns$squared_residuals <- (subset_Guns$residuals^2)

Auxiliary_test <- lm(squared_residuals ~ robbery + male + prisoners + I(robbery^2) + I(male^2) + I(prisoners^2) + robbery:male + robbery:prisoners + male:prisoners, data = subset_Guns )
stargazer(Auxiliary_test, type = "text")

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                          squared_residuals     
## -----------------------------------------------
## robbery                     246.017***         
##                              (78.102)          
##                                                
## male                      -33,530.860***       
##                             (8,559.157)        
##                                                
## prisoners                    -183.813*         
##                              (99.396)          
##                                                
## I(robbery2)                  0.149***          
##                               (0.014)          
##                                                
## I(male2)                    980.787***         
##                              (243.591)         
##                                                
## I(prisoners2)                0.127***          
##                               (0.020)          
##                                                
## robbery:male                -14.708***         
##                               (4.753)          
##                                                
## robbery:prisoners            -0.274***         
##                               (0.029)          
##                                                
## male:prisoners               11.682**          
##                               (5.903)          
##                                                
## Constant                  291,238.600***       
##                            (75,138.460)        
##                                                
## -----------------------------------------------
## Observations                   1,173           
## R2                             0.243           
## Adjusted R2                    0.237           
## Residual Std. Error   23,159.340 (df = 1163)   
## F Statistic          41.376*** (df = 9; 1163)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Based on the auxiliary regression, we can see that the adjusted R^2 value is .237 which tells us that 23.7% of the variance in squared residuals is explained by the model. Therefore, in the presence of the current model, we have heteroskedasticity.

Computing Test Stat and p_value:

Auxiliary_test_summary <- summary(Auxiliary_test)
chisq_p_value <- pchisq(q = Auxiliary_test_summary$r.squared * nobs(Auxiliary_test), df = 9, lower.tail = FALSE)
chisq_p_value

## [1] 5.052167e-56

Based on the computed p_value above, we can see that it is equivalent to the p_value produced by the white test from the package ‘skedastic’.

#Test Critical Value
Auxiliary_test_summary$r.squared * nobs(Auxiliary_test)

## [1] 284.4949

We can also see that we are able to replicate the test stat.

Based on the tests run above, it is safe to say that this model contains heteroskedasticity, as the p_value = 5.052167e-56 is very low, and the R^2 value of .243 for the whites test shows us that 24.3% of the variation in the squared errors is explained by the model. Additionally, we have a fairly large critical value at 284.4949. This shows us that there is likely a case of dependency between the values associated with the model used, since our degrees of freedom are low and the p_value is so low.

Heteroskedasticity Discussion

Daniel Schick

2023-12-05