Estimiate a multivariate regression

Choosing data

This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973

remove(list=ls())
library("skedastic")
library("stargazer")

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

library("psych")
library("lmtest")

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library("AER")

## Loading required package: car

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

## Loading required package: sandwich

## Loading required package: survival

data("USArrests")


model <- lm(log(Murder) ~ ., data = USArrests)

stargazer(model, type = "text")

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                             log(Murder)        
## -----------------------------------------------
## Assault                      0.006***          
##                               (0.001)          
##                                                
## UrbanPop                      -0.003           
##                               (0.005)          
##                                                
## Rape                           0.015           
##                               (0.009)          
##                                                
## Constant                     0.799***          
##                               (0.281)          
##                                                
## -----------------------------------------------
## Observations                    50             
## R2                             0.650           
## Adjusted R2                    0.627           
## Residual Std. Error       0.416 (df = 46)      
## F Statistic           28.509*** (df = 3; 46)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Next, we will conduct the residual plot analysis

par(mfrow = c(2, 2))
plot(model)

Since this is a relatively small dataset, the first residual vs fitted graph displays a approximately equal relationship between the posiitve and the negative residuals, which indicates a relatively significant linear relationship
The Q-Q plot almost follows a straight line ascending, which indicates the normality of the values are satisfied.
When assessing equal variables, the third graph of scale location indicates that the variance in residuals appears equal (constant) for all values of X/predicted Y, which suggests that the data is homoscedastic.
The last graph indicates the data has outliers.

Testing with Packges

White’s Test

skedastic_package_white  <- white(mainlm =  model, 
                                  interactions = TRUE
                                  )
skedastic_package_white

## # A tibble: 1 × 5
##   statistic p.value parameter method       alternative
##       <dbl>   <dbl>     <dbl> <chr>        <chr>      
## 1      7.62   0.573         9 White's Test greater

The test statistic has a value of 7.62, follows a chi-squared distribution with parameter (the number of regressors without the constant in the model) degrees of freedom, and it has a p-value of approximately 0.573 - thus we fail to reject the null of homoskedasticity.

Manually Testing ” Auxillary Regression”

We will find the residuals from our model and put it in our data frame.

In heteroskedastcity regression tests, the dependent variable is residual squared - so we must construct it manually. Note, the residual is coming from the model for which you want to test for heteroskedasticity, and you can use the resid command to create them easily. The y variable is not income anymore.

USArrests$residuals <- resid(object = model)
summary(USArrests$residuals)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -1.248238 -0.320083 -0.002855  0.000000  0.330991  0.785010

Mean is Zero, confirmed.

USArrests$squared_residuals <- (USArrests$residuals)^2 
squared_residuals <- (USArrests$residuals)^2
  summary(USArrests$squared_residuals)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000322 0.0322093 0.1085053 0.1590068 0.1986654 1.5580987

No negative values detected

Creating X values

white_auxillary_reg <- lm(formula = squared_residuals ~  Assault+ UrbanPop  + Rape + 
                                                     I(Assault^2)  + I(UrbanPop^2) + I(Rape^2) +
                                                   Assault:UrbanPop + Assault:Rape + UrbanPop:Rape,
                          data = USArrests)
white_auxillary_reg_summary <- summary(white_auxillary_reg)
white_auxillary_reg_summary

## 
## Call:
## lm(formula = squared_residuals ~ Assault + UrbanPop + Rape + 
##     I(Assault^2) + I(UrbanPop^2) + I(Rape^2) + Assault:UrbanPop + 
##     Assault:Rape + UrbanPop:Rape, data = USArrests)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32363 -0.12121 -0.04277  0.06287  1.14093 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)       9.502e-01  6.436e-01   1.476    0.148
## Assault          -2.062e-03  4.164e-03  -0.495    0.623
## UrbanPop         -9.713e-03  2.252e-02  -0.431    0.669
## Rape             -1.808e-02  3.749e-02  -0.482    0.632
## I(Assault^2)     -2.015e-08  9.384e-06  -0.002    0.998
## I(UrbanPop^2)     2.776e-05  1.898e-04   0.146    0.884
## I(Rape^2)         1.947e-06  6.053e-04   0.003    0.997
## Assault:UrbanPop  7.803e-06  5.618e-05   0.139    0.890
## Assault:Rape      5.349e-05  1.542e-04   0.347    0.730
## UrbanPop:Rape     1.004e-04  3.743e-04   0.268    0.790
## 
## Residual standard error: 0.2423 on 40 degrees of freedom
## Multiple R-squared:  0.1524, Adjusted R-squared:  -0.03831 
## F-statistic: 0.7991 on 9 and 40 DF,  p-value: 0.6192

Low R-squared suggests no heteroskedasticity.

chisq_p_value <-  pchisq(q = white_auxillary_reg_summary$r.squared * nobs(white_auxillary_reg), 
                         df = 9,       # degrees of freedom is the number of parameters estimated in the model minus 1 (for constant term) i.e.  equal to the number of variables in the auxillary regression
                         lower.tail = FALSE  )
chisq_p_value

## [1] 0.5728579

The Chisquare P-Value matches between Hand and Package

white_auxillary_reg_summary$r.squared * nobs(white_auxillary_reg)

## [1] 7.619895

The Critical Value also matches

Breusch-Pagan Test

The Breusch-Pagan test fits a linear regression model to the residuals of a linear regression model (by default the same explanatory variables are taken as in the main regression model) and rejects if too much of the variance is explained by the additional explanatory variables.

lmtest_package_bp <- bptest(formula = model,  # fitted "lm" object, or run the regression
                         studentize = TRUE   # by default, so this can be commented out
)
skedastic_package_bp <- skedastic::breusch_pagan(model)
# View the test results
lmtest_package_bp

## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 5.2704, df = 3, p-value = 0.153

skedastic_package_bp

## # A tibble: 1 × 5
##   statistic p.value parameter method                alternative
##       <dbl>   <dbl>     <dbl> <chr>                 <chr>      
## 1      5.27   0.153         3 Koenker (studentised) greater

lmtest::bptest gives the exact same outpout as skedastic::breusch_pagan, as expected.

The test statistic value is 5.27, follows a chi-squared distribution with parameter (the number of regressors without the constant in the model) degrees of freedom, and it has a p-value of approximately 0.153 thus we fail to reject the null of homoskedasticity.

White Test

Songqi Cheng

2023-04-18

Issue Summary

What is Heteroskedasticity