Issue Summary
What is “heteroskedasticity”, and the econometric issue it causes
(affects point estimates or standard errors)? Do not confuse
heteroskedasticity with other terms like multicollinearity, serial
correlation, et cetra (2-3 sentences in your own words - EG do not
copy/paste directly from web.)
- Heteroscedasticity is when the variance of the residuals is not
equal over a range of values. In a graph of residuals when
heteroscedasticity is detected, the residuals may look like they are
fanning out. The econometric issue it causes is that the standard errors
might be incorrect, due to the model not being the best fit. This can
cause for analyses to be skewed.
What is the null and alternative hypothesis in BPLinks
to an external site. or WhiteLinks to an
external site. test? The hypothesis are the same, but the auxiliary
regression specification is slightly different. Do you agree with the
test logic? (2-3 sentences)
- The null hypothesis is that homoscedasticity is present. The
alternative hypothesis is that heteroscedasticity is present.
1
data("USArrests")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
glimpse(USArrests)
## Rows: 50
## Columns: 4
## $ Murder <dbl> 13.2, 10.0, 8.1, 8.8, 9.0, 7.9, 3.3, 5.9, 15.4, 17.4, 5.3, 2.…
## $ Assault <int> 236, 263, 294, 190, 276, 204, 110, 238, 335, 211, 46, 120, 24…
## $ UrbanPop <int> 58, 48, 80, 50, 91, 78, 77, 72, 80, 60, 83, 54, 83, 65, 57, 6…
## $ Rape <dbl> 21.2, 44.5, 31.0, 19.5, 40.6, 38.7, 11.1, 15.8, 31.9, 25.8, 2…
2
Whites Test
library("skedastic")
sked_white_arrests <- white(mainlm = arrests_mod,
interactions = TRUE
)
sked_white_arrests
## # A tibble: 1 × 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 5.96 0.744 9 White's Test greater
The test statistic has a value of 5.96, follows a chi-squared
distribution with parameter (the number of regressors
without the constant in the model) degrees of freedom, and it has a
p-value of approximately \(74.37 \%\).
We fail to reject the null hypothesis because the p value is greater
than .05. We do not have enough evidence to conclude that
heteroscedasticity exists in the regression model.
3 Auxiliary Regression
Y variable
USArrests$residuals <- resid(object = arrests_mod)
summary(USArrests$residuals) # mean should be zero
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.3990 -1.9127 -0.3444 0.0000 1.2557 7.4279
USArrests$squared_residuals <- (USArrests$residuals)^2
summary(USArrests$squared_residuals) # should not have any negative values
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02619 0.60569 2.71720 6.09665 7.52751 55.17357
# subset_Boston$residuals <- NULL # remove residuals
X variable
arrests_auxillary <- lm(formula = squared_residuals ~ Assault + UrbanPop + Rape +
I(Assault^2) + I(UrbanPop^2) + I(Rape^2) +
Assault:UrbanPop + UrbanPop:Rape + Rape:Assault ,
data = USArrests
)
auxillary_r2 <- summary(arrests_auxillary)
auxillary_r2
##
## Call:
## lm(formula = squared_residuals ~ Assault + UrbanPop + Rape +
## I(Assault^2) + I(UrbanPop^2) + I(Rape^2) + Assault:UrbanPop +
## UrbanPop:Rape + Rape:Assault, data = USArrests)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.912 -4.985 -1.990 3.403 44.633
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.0656406 25.4178271 -0.199 0.843
## Assault 0.0421292 0.1644486 0.256 0.799
## UrbanPop -0.1534841 0.8893057 -0.173 0.864
## Rape 1.4438482 1.4803244 0.975 0.335
## I(Assault^2) -0.0001310 0.0003706 -0.353 0.726
## I(UrbanPop^2) 0.0012443 0.0074934 0.166 0.869
## I(Rape^2) -0.0096548 0.0239051 -0.404 0.688
## Assault:UrbanPop 0.0004574 0.0022185 0.206 0.838
## UrbanPop:Rape -0.0107482 0.0147820 -0.727 0.471
## Assault:Rape -0.0004960 0.0060879 -0.081 0.935
##
## Residual standard error: 9.57 on 40 degrees of freedom
## Multiple R-squared: 0.1192, Adjusted R-squared: -0.07895
## F-statistic: 0.6016 on 9 and 40 DF, p-value: 0.7879
R-squared is low here, suggesting no heteroscedasticty.
4 Chi Square Test
sked_white_arrests
## # A tibble: 1 × 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 5.96 0.744 9 White's Test greater
chisq_p_value <- pchisq(q = auxillary_r2$r.squared * nobs(arrests_auxillary),
df = 9, # degrees of freedom is the number of parameters estimated in the model minus 1 (for constant term) i.e. equal to the number of variables in the auxillary regression
lower.tail = FALSE )
chisq_p_value
## [1] 0.7437783
We get the same p value from the chi square test as we did in the
White Test
auxillary_r2$r.squared * nobs(arrests_auxillary)
## [1] 5.961369
We get the same test statistic as we did in the White Test.
# critical value
qchisq(p = .05, df = 9)
## [1] 3.325113
# test statistic
qchisq(p = auxillary_r2$r.squared * nobs(arrests_auxillary),
df = 9,
lower.tail = FALSE)
## Warning in qchisq(p = auxillary_r2$r.squared * nobs(arrests_auxillary), : NaNs
## produced
## [1] NaN
Fail to reject the null of homoscedasticity! The critical value is
less than the test statistic.