What is “heteroskedasticity”, and the econometric issue it causes (affects point estimates or standard errors)? Do not confuse heteroskedasticity with other terms like multicollinearity, serial correlation, et cetra (2-3 sentences in your own words - EG do not copy/paste directly from the web.)
Heteroskedasticity is essentially when the residuals or how the actual data varies from the predicted data are not evenly spaced around 0. These values change depending on the values of our input variables. This causes issues because the effects of our inputs on outputs or “estimates” are not completely trustworthy. They may not be completely inaccurate, but they are not as accurate as they could be.
What is the null and alternative hypothesis in BPLinks to an external site. or WhiteLinks to an external site. test? The hypothesis is the same, but the auxiliary regression specification is slightly different. Do you agree with the test logic? (2-3 sentences)
H0: There is homoskedasticity ie constant variance of residuals
Ha: There is heteroskedasticity.
CODING
Choose a dataset, specify your linear regression, and estimate the regression in R. Please keep at least 3 independent variables in your regression. This is your main regression.
Install the “skedastic” package (installation helpLinks to an external site.) and compute White’s test for your fitted model/main regression. How do you interpret the output i.e. what is the test statistic, and p-value? What is the interpretation i.e. do you reject/fail to reject the null, and what is the conclusion i.e. is there heteroskedasticity or not?
subset_mtcars <- mtcars[ ,c(1,4,6,7)] # indexing to choose my subset# Linear regression modellm.mod <-lm(formula = mpg ~ . , data = subset_mtcars )summary(lm.mod ) # confirm you have the independant vraobles you want
Call:
lm(formula = mpg ~ ., data = subset_mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.8591 -1.6418 -0.4636 1.1940 5.6092
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.61053 8.41993 3.279 0.00278 **
hp -0.01782 0.01498 -1.190 0.24418
wt -4.35880 0.75270 -5.791 3.22e-06 ***
qsec 0.51083 0.43922 1.163 0.25463
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.578 on 28 degrees of freedom
Multiple R-squared: 0.8348, Adjusted R-squared: 0.8171
F-statistic: 47.15 on 3 and 28 DF, p-value: 4.506e-11
# Create a residual plot of residuals vs. fitted valuesplot(lm.mod, which =1)
# Set up the plotting region / graphical parameters to have four subplotspar(mfrow =c(2, 2))# Create a residual plotplot(lm.mod)
par(mfrow =c(1, 1))
library("skedastic") # install.packages("lmtest") # FOR BREUSH PAGAN TESTrequire("lmtest")
Loading required package: lmtest
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
?lmtest
No documentation for 'lmtest' in specified packages and libraries:
you could try '??lmtest'
?white # This function implements the popular method of White (1980) for testing for heteroskedasticity in a linear regression model. Read the paper in your Dropbox fodler for Week 4. # Conduct White test for heteroscedasticityskedastic_package_white <-white(mainlm = lm.mod, interactions =TRUE ) # View the test resultsskedastic_package_white
# A tibble: 1 × 5
statistic p.value parameter method alternative
<dbl> <dbl> <dbl> <chr> <chr>
1 12.5 0.185 9 White's Test greater
Our p-value is large >0.05 so we fail to reject the null here. Assume there is no heteroskedasticity.
Now, run the auxiliary regression and interpret the R-squared value - what does it tell you about heteroscedasticity? To compute the auxiliary regression, you will first have to find the residuals from the main regression, square them and this vector will be your dependent variable. Your independent variables will be
subset_mtcars$residuals <-resid(object = lm.mod)summary(mtcars$residuals) # mean should be zero
Length Class Mode
0 NULL NULL
subset_mtcars$squared_residuals <- (subset_mtcars$residuals)^2summary(subset_mtcars$squared_residuals) # should not have any negative values
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.02014 0.61070 2.15717 5.81435 7.41810 31.46273
Call:
lm(formula = squared_residuals ~ hp + wt + qsec + I(hp^2) + I(wt^2) +
I(qsec^2) + wt:hp + wt:qsec + hp:qsec, data = subset_mtcars)
Residuals:
Min 1Q Median 3Q Max
-12.7832 -3.5743 -0.6157 2.7745 19.6617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.763e+02 2.913e+02 -1.292 0.210
hp 1.586e-01 1.342e+00 0.118 0.907
wt 1.168e+02 9.184e+01 1.272 0.217
qsec 2.093e+01 3.006e+01 0.696 0.494
I(hp^2) 1.472e-04 1.388e-03 0.106 0.916
I(wt^2) 5.810e+00 5.763e+00 1.008 0.324
I(qsec^2) 4.856e-02 8.631e-01 0.056 0.956
hp:wt -1.274e-01 2.013e-01 -0.633 0.533
wt:qsec -7.641e+00 5.352e+00 -1.428 0.167
hp:qsec 1.046e-02 7.797e-02 0.134 0.894
Residual standard error: 8.169 on 22 degrees of freedom
Multiple R-squared: 0.3916, Adjusted R-squared: 0.1427
F-statistic: 1.573 on 9 and 22 DF, p-value: 0.1847
R-squared is okay, could be lower. Lower means no heteroskedasticity, while higher R-squared indicates it could be present.
chisq_p_value <-pchisq(q = white_auxillary_reg_summary$r.squared *nobs(white_auxillary_reg), df =9, # degrees of freedom is the number of parameters estimated in the model minus 1 (for constant term) i.e. equal to the number of variables in the auxillary regressionlower.tail =FALSE )chisq_p_value
[1] 0.1849867
Same p value as above.
skedastic_package_white
# A tibble: 1 × 5
statistic p.value parameter method alternative
<dbl> <dbl> <dbl> <chr> <chr>
1 12.5 0.185 9 White's Test greater
# TEST CRITICAL VALUE white_auxillary_reg_summary$r.squared *nobs(white_auxillary_reg)