Detecting Heteroscedasticity

Issue Summary

What is “heteroskedasticity”, and the econometric issue it causes (affects point estimates or standard errors)? Do not confuse heteroskedasticity with other terms like multicollinearity, serial correlation, et cetra (2-3 sentences in your own words - EG do not copy/paste directly from web.)
1. Heteroscedasticity is when the variance of the residuals is not equal over a range of values. In a graph of residuals when heteroscedasticity is detected, the residuals may look like they are fanning out. The econometric issue it causes is that the standard errors might be incorrect, due to the model not being the best fit. This can cause for analyses to be skewed.
What is the null and alternative hypothesis in BPLinks to an external site. or WhiteLinks to an external site. test? The hypothesis are the same, but the auxiliary regression specification is slightly different. Do you agree with the test logic? (2-3 sentences)
1. The null hypothesis is that homoscedasticity is present. The alternative hypothesis is that heteroscedasticity is present.

1

data("USArrests")
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

glimpse(USArrests)

## Rows: 50
## Columns: 4
## $ Murder   <dbl> 13.2, 10.0, 8.1, 8.8, 9.0, 7.9, 3.3, 5.9, 15.4, 17.4, 5.3, 2.…
## $ Assault  <int> 236, 263, 294, 190, 276, 204, 110, 238, 335, 211, 46, 120, 24…
## $ UrbanPop <int> 58, 48, 80, 50, 91, 78, 77, 72, 80, 60, 83, 54, 83, 65, 57, 6…
## $ Rape     <dbl> 21.2, 44.5, 31.0, 19.5, 40.6, 38.7, 11.1, 15.8, 31.9, 25.8, 2…

Main Regression

arrests_mod<- lm(formula = Murder ~ UrbanPop + Assault + Rape, 
             data    = USArrests
             )
summary(arrests_mod)

## 
## Call:
## lm(formula = Murder ~ UrbanPop + Assault + Rape, data = USArrests)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3990 -1.9127 -0.3444  1.2557  7.4279 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.276639   1.737997   1.885   0.0657 .  
## UrbanPop    -0.054694   0.027880  -1.962   0.0559 .  
## Assault      0.039777   0.005912   6.729 2.33e-08 ***
## Rape         0.061399   0.055740   1.102   0.2764    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.574 on 46 degrees of freedom
## Multiple R-squared:  0.6721, Adjusted R-squared:  0.6507 
## F-statistic: 31.42 on 3 and 46 DF,  p-value: 3.322e-11

plot(arrests_mod)

2

Whites Test

library("skedastic")
sked_white_arrests <- white(mainlm = arrests_mod, 
                                  interactions = TRUE
                                  ) 
sked_white_arrests

## # A tibble: 1 × 5
##   statistic p.value parameter method       alternative
##       <dbl>   <dbl>     <dbl> <chr>        <chr>      
## 1      5.96   0.744         9 White's Test greater

The test statistic has a value of 5.96, follows a chi-squared distribution with parameter (the number of regressors without the constant in the model) degrees of freedom, and it has a p-value of approximately \(74.37 \%\). We fail to reject the null hypothesis because the p value is greater than .05. We do not have enough evidence to conclude that heteroscedasticity exists in the regression model.

3 Auxiliary Regression

Y variable

USArrests$residuals <- resid(object = arrests_mod)

  summary(USArrests$residuals) # mean should be zero

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -4.3990 -1.9127 -0.3444  0.0000  1.2557  7.4279

USArrests$squared_residuals <- (USArrests$residuals)^2 

  summary(USArrests$squared_residuals) # should not have any negative values

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.02619  0.60569  2.71720  6.09665  7.52751 55.17357

  # subset_Boston$residuals <- NULL  # remove residuals

X variable

arrests_auxillary <- lm(formula = squared_residuals ~ Assault     +   UrbanPop      +  Rape  + 
                                                     I(Assault^2)  + I(UrbanPop^2)   + I(Rape^2) +
                                                   Assault:UrbanPop + UrbanPop:Rape + Rape:Assault ,
                             data = USArrests
                   ) 

auxillary_r2 <- summary(arrests_auxillary)

auxillary_r2

## 
## Call:
## lm(formula = squared_residuals ~ Assault + UrbanPop + Rape + 
##     I(Assault^2) + I(UrbanPop^2) + I(Rape^2) + Assault:UrbanPop + 
##     UrbanPop:Rape + Rape:Assault, data = USArrests)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.912 -4.985 -1.990  3.403 44.633 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)      -5.0656406 25.4178271  -0.199    0.843
## Assault           0.0421292  0.1644486   0.256    0.799
## UrbanPop         -0.1534841  0.8893057  -0.173    0.864
## Rape              1.4438482  1.4803244   0.975    0.335
## I(Assault^2)     -0.0001310  0.0003706  -0.353    0.726
## I(UrbanPop^2)     0.0012443  0.0074934   0.166    0.869
## I(Rape^2)        -0.0096548  0.0239051  -0.404    0.688
## Assault:UrbanPop  0.0004574  0.0022185   0.206    0.838
## UrbanPop:Rape    -0.0107482  0.0147820  -0.727    0.471
## Assault:Rape     -0.0004960  0.0060879  -0.081    0.935
## 
## Residual standard error: 9.57 on 40 degrees of freedom
## Multiple R-squared:  0.1192, Adjusted R-squared:  -0.07895 
## F-statistic: 0.6016 on 9 and 40 DF,  p-value: 0.7879

R-squared is low here, suggesting no heteroscedasticty.

4 Chi Square Test

sked_white_arrests

## # A tibble: 1 × 5
##   statistic p.value parameter method       alternative
##       <dbl>   <dbl>     <dbl> <chr>        <chr>      
## 1      5.96   0.744         9 White's Test greater

chisq_p_value <-  pchisq(q = auxillary_r2$r.squared * nobs(arrests_auxillary), 
                         df = 9,       # degrees of freedom is the number of parameters estimated in the model minus 1 (for constant term) i.e.  equal to the number of variables in the auxillary regression
                         lower.tail = FALSE  )
chisq_p_value

## [1] 0.7437783

We get the same p value from the chi square test as we did in the White Test

auxillary_r2$r.squared * nobs(arrests_auxillary)

## [1] 5.961369

We get the same test statistic as we did in the White Test.

# critical value
qchisq(p = .05, df = 9)

## [1] 3.325113

# test statistic 
qchisq(p  = auxillary_r2$r.squared * nobs(arrests_auxillary), 
       df = 9, 
       lower.tail = FALSE)

## Warning in qchisq(p = auxillary_r2$r.squared * nobs(arrests_auxillary), : NaNs
## produced

## [1] NaN

Fail to reject the null of homoscedasticity! The critical value is less than the test statistic.