Heteroskedasticity Discussion

Author

Gina Occhipinti

ISSUE SUMMARY

  1. What is “heteroskedasticity”, and the econometric issue it causes (affects point estimates or standard errors)? Do not confuse heteroskedasticity with other terms like multicollinearity, serial correlation, et cetra (2-3 sentences in your own words - EG do not copy/paste directly from the web.)

    1. Heteroskedasticity is essentially when the residuals or how the actual data varies from the predicted data are not evenly spaced around 0. These values change depending on the values of our input variables. This causes issues because the effects of our inputs on outputs or “estimates” are not completely trustworthy. They may not be completely inaccurate, but they are not as accurate as they could be.
  2. What is the null and alternative hypothesis in BPLinks to an external site. or WhiteLinks to an external site. test? The hypothesis is the same, but the auxiliary regression specification is slightly different. Do you agree with the test logic?  (2-3 sentences)

    1. H0: There is homoskedasticity ie constant variance of residuals

    2. Ha: There is heteroskedasticity.

  1. CODING  

    1. Choose a dataset, specify your linear regression, and estimate the regression in R.  Please keep at least 3 independent variables in your regression. This is your main regression.
mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
?mtcars

\(MPG = \beta_0 + \beta_1 \ HP \ + \beta_2 \ WT + \\ \beta_3 \ QSEC + \epsilon_{i}\)

  1. Install the “skedastic” package (installation helpLinks to an external site.) and compute White’s test for your fitted model/main regression.  How do you interpret the output i.e. what is the test statistic, and p-value? What is the interpretation i.e. do you reject/fail to reject the null, and what is the conclusion i.e. is there heteroskedasticity or not?
subset_mtcars <- mtcars[ ,c(1,4,6,7)] # indexing to choose my subset

# Linear regression model
lm.mod <- lm(formula = mpg ~ . , 
             data    = subset_mtcars
             )

summary(lm.mod ) # confirm you have the independant vraobles you want

Call:
lm(formula = mpg ~ ., data = subset_mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.8591 -1.6418 -0.4636  1.1940  5.6092 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 27.61053    8.41993   3.279  0.00278 ** 
hp          -0.01782    0.01498  -1.190  0.24418    
wt          -4.35880    0.75270  -5.791 3.22e-06 ***
qsec         0.51083    0.43922   1.163  0.25463    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.578 on 28 degrees of freedom
Multiple R-squared:  0.8348,    Adjusted R-squared:  0.8171 
F-statistic: 47.15 on 3 and 28 DF,  p-value: 4.506e-11
# Create a residual plot of residuals vs. fitted values
plot(lm.mod, which = 1)

# Set up the plotting region / graphical parameters to have four subplots
par(mfrow = c(2, 2))

# Create a residual plot
plot(lm.mod)

par(mfrow = c(1, 1))
library("skedastic") 

# install.packages("lmtest")    # FOR BREUSH PAGAN TEST
require("lmtest")
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
?lmtest
No documentation for 'lmtest' in specified packages and libraries:
you could try '??lmtest'
?white # This function implements the popular method of White (1980) for testing for heteroskedasticity in a linear regression model.  Read the paper in your Dropbox fodler for Week 4.  


# Conduct White test for heteroscedasticity
skedastic_package_white  <- white(mainlm =  lm.mod, 
                                  interactions = TRUE
                                  )         

# View the test results
skedastic_package_white
# A tibble: 1 × 5
  statistic p.value parameter method       alternative
      <dbl>   <dbl>     <dbl> <chr>        <chr>      
1      12.5   0.185         9 White's Test greater    

Our p-value is large >0.05 so we fail to reject the null here. Assume there is no heteroskedasticity.

Now, run the auxiliary regression and interpret the R-squared value - what does it tell you about heteroscedasticity? To compute the auxiliary regression, you will first have to find the residuals from the main regression, square them and this vector will be your dependent variable.  Your independent variables will be 

subset_mtcars$residuals <- resid(object = lm.mod)

  summary(mtcars$residuals) # mean should be zero
Length  Class   Mode 
     0   NULL   NULL 
subset_mtcars$squared_residuals <- (subset_mtcars$residuals)^2 

  summary(subset_mtcars$squared_residuals) # should not have any negative values
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.02014  0.61070  2.15717  5.81435  7.41810 31.46273 
  subset_mtcars$residuals <- NULL  # remove residuals 
white_auxillary_reg <- lm(formula = squared_residuals ~ hp + wt      +  qsec  +  I(hp^2)  + I(wt^2)   + I(qsec^2) + wt:hp + wt:qsec + hp:qsec , data = subset_mtcars
                   ) 

white_auxillary_reg_summary <- summary(white_auxillary_reg)
white_auxillary_reg_summary

Call:
lm(formula = squared_residuals ~ hp + wt + qsec + I(hp^2) + I(wt^2) + 
    I(qsec^2) + wt:hp + wt:qsec + hp:qsec, data = subset_mtcars)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.7832  -3.5743  -0.6157   2.7745  19.6617 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.763e+02  2.913e+02  -1.292    0.210
hp           1.586e-01  1.342e+00   0.118    0.907
wt           1.168e+02  9.184e+01   1.272    0.217
qsec         2.093e+01  3.006e+01   0.696    0.494
I(hp^2)      1.472e-04  1.388e-03   0.106    0.916
I(wt^2)      5.810e+00  5.763e+00   1.008    0.324
I(qsec^2)    4.856e-02  8.631e-01   0.056    0.956
hp:wt       -1.274e-01  2.013e-01  -0.633    0.533
wt:qsec     -7.641e+00  5.352e+00  -1.428    0.167
hp:qsec      1.046e-02  7.797e-02   0.134    0.894

Residual standard error: 8.169 on 22 degrees of freedom
Multiple R-squared:  0.3916,    Adjusted R-squared:  0.1427 
F-statistic: 1.573 on 9 and 22 DF,  p-value: 0.1847

R-squared is okay, could be lower. Lower means no heteroskedasticity, while higher R-squared indicates it could be present.

chisq_p_value <-  pchisq(q = white_auxillary_reg_summary$r.squared * nobs(white_auxillary_reg), 
                         df = 9,       # degrees of freedom is the number of parameters estimated in the model minus 1 (for constant term) i.e.  equal to the number of variables in the auxillary regression
                         lower.tail = FALSE  )
chisq_p_value
[1] 0.1849867

Same p value as above.

skedastic_package_white
# A tibble: 1 × 5
  statistic p.value parameter method       alternative
      <dbl>   <dbl>     <dbl> <chr>        <chr>      
1      12.5   0.185         9 White's Test greater    
# TEST CRITICAL VALUE 
white_auxillary_reg_summary$r.squared * nobs(white_auxillary_reg)
[1] 12.53129

Gives us a test stat to compare to critical value

# critical value
qchisq(p = .05, df = 9)
[1] 3.325113
# test statistic 
qchisq(p  = white_auxillary_reg_summary$r.squared * nobs(white_auxillary_reg), 
       df = 9, 
       lower.tail = FALSE
       )
Warning in qchisq(p = white_auxillary_reg_summary$r.squared *
nobs(white_auxillary_reg), : NaNs produced
[1] NaN