This assignment will ask you to fit several linear regression models and generalized linear regression models, to investigate the assumptions of those regression models, and to compare the inferences based on conventional standard errors to those based on heteroskedasticity-robust standard errors. I have saved the relevant data files for this assignment in an .Rdata file, which is loaded as follows:

load("Assignment 7 data.RData")
ls()

## [1] "ACS"  "STAR"

Tennessee STAR Experiment

1. Examining model fit (4 pts.)

Recall the Tennessee STAR study, which was a large, block-randomized experiment that evaluated the effect of class size (the number of students per classroom) on elementary-grade students' academic performance. (For further details, see Assignment 4.) The data from this study is stored in the STAR data frame.

The following linear regression model predict a student's 3rd grade math performance (g3tmathss) based on their school (g1schid) and whether the student was assigned to a small- or regular-sized classroom (cmpstype). The model also includes the student's gender, race, and birthdate as covariates.

STAR_fit <- lm(g3tmathss ~ cmpstype + birthdate + gender + race + factor(g1schid), data = STAR)

Using the diagnostic tools we discussed in class, check the assumptions of this regression model, including

the functional form of the regression specification
homoskedasticity of the errors
normality of the errors
presence of outliers

Present graphical or numerical evidence regarding each assumption, and explain your interpretation of that evidence.

plot(STAR_fit, which = 1)

plot of chunk unnamed-chunk-3

plot(STAR_fit, which = 3)

plot of chunk unnamed-chunk-3

plot(STAR_fit, which = 2)

plot of chunk unnamed-chunk-3

hist(residuals(STAR_fit)) # outlier in Y

plot of chunk unnamed-chunk-3

plot(STAR_fit, which = 5) # outlier in X

plot of chunk unnamed-chunk-3

2. Testing differences by race (1 pt)

Using the STAR_fit regression from Question 1, test whether children's average math scores differ by race (controlling for the other predictors in the model). Report an appropriate test statistic and p-value, and explain your intepretation of the results.

STAR_fit_norace <- update(STAR_fit, .~. - race)
anova(STAR_fit, STAR_fit_norace)

## Analysis of Variance Table
## 
## Model 1: g3tmathss ~ cmpstype + birthdate + gender + race + factor(g1schid)
## Model 2: g3tmathss ~ cmpstype + birthdate + gender + factor(g1schid)
##   Res.Df     RSS Df Sum of Sq    F  Pr(>F)    
## 1   1481 1913905                              
## 2   1485 1950477 -4    -36572 7.08 1.2e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to the results of the above ANOVA, children's average math scores differ by race when controlling for the other predictors in the model, as the F-value is 7.075 and the its p-value is smaller than 0.05.

3. Testing for effect heterogeneity (1 pt)

Conduct a test for school-level treatment effect heterogeneity, i.e., whether the effect of being in a small class differs by school. Note that you will need to run a new regression model to do this. Report an appropriate test statistic and p-value, and explain your intepretation of the results.

# solution1: test interaction of treat by school
STAR_fit_treatbyschool <- lm(g3tmathss ~ cmpstype * factor(g1schid) + birthdate + gender + race, data = STAR)
summary(STAR_fit_treatbyschool)

## 
## Call:
## lm(formula = g3tmathss ~ cmpstype * factor(g1schid) + birthdate + 
##     gender + race, data = STAR)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -101.67  -23.65   -1.39   22.86  133.84 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)                          5.35e+02   3.24e+01   16.49  < 2e-16
## cmpstypeSMALL                       -5.05e+01   2.41e+01   -2.09  0.03666
## factor(g1schid)123056               -2.47e+01   2.16e+01   -1.14  0.25266
## factor(g1schid)128076               -3.51e+01   2.12e+01   -1.66  0.09791
## factor(g1schid)128079               -6.89e+01   2.23e+01   -3.08  0.00210
## factor(g1schid)130085                3.72e+00   2.24e+01    0.17  0.86807
## factor(g1schid)159171               -3.15e+01   2.04e+01   -1.54  0.12261
## factor(g1schid)161176               -3.94e+01   2.12e+01   -1.86  0.06329
## factor(g1schid)161183               -2.41e+01   2.16e+01   -1.12  0.26341
## factor(g1schid)162184                7.03e+00   2.16e+01    0.33  0.74467
## factor(g1schid)164198               -2.85e+01   2.19e+01   -1.30  0.19290
## factor(g1schid)168214                6.19e+00   2.39e+01    0.26  0.79594
## factor(g1schid)169229               -3.89e+01   2.00e+01   -1.95  0.05151
## factor(g1schid)169231               -2.89e+01   2.13e+01   -1.36  0.17492
## factor(g1schid)169280               -3.15e+01   2.39e+01   -1.32  0.18776
## factor(g1schid)173312               -8.78e+00   2.05e+01   -0.43  0.66904
## factor(g1schid)176329               -3.59e+01   2.15e+01   -1.67  0.09557
## factor(g1schid)189378               -4.18e+01   2.05e+01   -2.04  0.04193
## factor(g1schid)189382               -1.78e+01   2.00e+01   -0.89  0.37341
## factor(g1schid)189396               -2.75e+01   2.20e+01   -1.25  0.21127
## factor(g1schid)191411               -4.50e+01   2.30e+01   -1.96  0.05070
## factor(g1schid)193423               -2.15e+01   2.15e+01   -1.00  0.31776
## factor(g1schid)201449               -2.06e+01   1.98e+01   -1.04  0.29923
## factor(g1schid)203452               -3.55e+01   1.97e+01   -1.80  0.07194
## factor(g1schid)203457                4.84e+00   2.24e+01    0.22  0.82853
## factor(g1schid)205488               -1.39e+01   2.72e+01   -0.51  0.60863
## factor(g1schid)205490                7.28e+00   2.07e+01    0.35  0.72544
## factor(g1schid)205491               -1.07e+01   2.24e+01   -0.48  0.63422
## factor(g1schid)205492               -5.58e+01   2.29e+01   -2.44  0.01482
## factor(g1schid)208501                4.55e+01   2.14e+01    2.12  0.03392
## factor(g1schid)208503               -2.08e+01   2.40e+01   -0.87  0.38496
## factor(g1schid)209510                2.33e+01   2.20e+01    1.06  0.28898
## factor(g1schid)212522               -1.41e+01   2.40e+01   -0.59  0.55645
## factor(g1schid)215533               -1.08e+01   2.00e+01   -0.54  0.58964
## factor(g1schid)216537               -1.25e+01   1.95e+01   -0.64  0.52074
## factor(g1schid)218562               -9.10e+00   2.11e+01   -0.43  0.66616
## factor(g1schid)221571               -1.44e+01   2.11e+01   -0.68  0.49430
## factor(g1schid)221574               -3.51e+01   2.39e+01   -1.47  0.14246
## factor(g1schid)225585               -3.71e+01   2.07e+01   -1.79  0.07360
## factor(g1schid)228606               -2.33e+00   2.38e+01   -0.10  0.92185
## factor(g1schid)230612               -3.88e+01   2.24e+01   -1.73  0.08361
## factor(g1schid)231616               -1.23e+01   2.52e+01   -0.49  0.62606
## factor(g1schid)234628               -2.98e+01   2.05e+01   -1.45  0.14682
## factor(g1schid)244708               -5.34e+01   2.14e+01   -2.49  0.01275
## factor(g1schid)244723               -2.67e+00   2.17e+01   -0.12  0.90197
## factor(g1schid)244727               -4.02e+01   2.14e+01   -1.87  0.06113
## factor(g1schid)244728               -3.54e+01   2.18e+01   -1.62  0.10516
## factor(g1schid)244736               -2.94e+01   2.51e+01   -1.17  0.24132
## factor(g1schid)244746               -1.65e+01   2.14e+01   -0.77  0.44114
## factor(g1schid)244774               -3.11e+01   2.51e+01   -1.24  0.21414
## factor(g1schid)244776               -4.20e+01   2.22e+01   -1.89  0.05885
## factor(g1schid)244780               -3.15e+01   2.29e+01   -1.38  0.16880
## factor(g1schid)244799               -1.27e+01   2.18e+01   -0.58  0.56164
## factor(g1schid)244801                1.01e+01   2.71e+01    0.37  0.70982
## factor(g1schid)244806               -1.89e+01   2.14e+01   -0.88  0.37935
## factor(g1schid)244831               -2.02e+01   2.38e+01   -0.85  0.39661
## factor(g1schid)244839                1.51e+00   2.23e+01    0.07  0.94598
## factor(g1schid)252885               -3.03e+01   2.06e+01   -1.47  0.14206
## factor(g1schid)253888               -7.74e+00   2.15e+01   -0.36  0.71922
## factor(g1schid)257899               -1.81e+01   1.97e+01   -0.92  0.35666
## factor(g1schid)257905               -9.98e+00   2.04e+01   -0.49  0.62548
## factor(g1schid)259915               -4.65e+01   2.31e+01   -2.02  0.04400
## factor(g1schid)261927               -1.58e+01   2.10e+01   -0.75  0.45097
## factor(g1schid)262937               -5.80e+00   2.07e+01   -0.28  0.77897
## factor(g1schid)264945               -3.70e+01   2.15e+01   -1.72  0.08583
## birthdate                            4.24e-07   6.80e-08    6.23  6.1e-10
## genderMALE                           1.21e+00   1.87e+00    0.65  0.51701
## raceBLACK                           -4.49e+01   1.87e+01   -2.39  0.01679
## raceHISPANIC                         2.94e+00   3.21e+01    0.09  0.92702
## raceOTHER                           -5.83e+01   3.19e+01   -1.83  0.06801
## raceWHITE                           -2.54e+01   1.86e+01   -1.37  0.17212
## cmpstypeSMALL:factor(g1schid)123056  3.68e+01   2.90e+01    1.27  0.20510
## cmpstypeSMALL:factor(g1schid)128076  7.30e+01   2.80e+01    2.61  0.00911
## cmpstypeSMALL:factor(g1schid)128079  1.03e+02   2.92e+01    3.54  0.00042
## cmpstypeSMALL:factor(g1schid)130085  7.79e+01   2.95e+01    2.64  0.00845
## cmpstypeSMALL:factor(g1schid)159171  6.26e+01   2.68e+01    2.33  0.01971
## cmpstypeSMALL:factor(g1schid)161176  6.37e+01   2.81e+01    2.27  0.02349
## cmpstypeSMALL:factor(g1schid)161183  5.58e+01   2.80e+01    1.99  0.04645
## cmpstypeSMALL:factor(g1schid)162184  3.26e+01   2.80e+01    1.16  0.24427
## cmpstypeSMALL:factor(g1schid)164198  6.89e+01   2.92e+01    2.36  0.01843
## cmpstypeSMALL:factor(g1schid)168214  4.27e+01   3.09e+01    1.38  0.16672
## cmpstypeSMALL:factor(g1schid)169229  8.51e+01   2.66e+01    3.20  0.00140
## cmpstypeSMALL:factor(g1schid)169231  4.41e+01   2.79e+01    1.58  0.11453
## cmpstypeSMALL:factor(g1schid)169280  6.74e+01   2.99e+01    2.25  0.02438
## cmpstypeSMALL:factor(g1schid)173312  5.04e+01   2.86e+01    1.76  0.07783
## cmpstypeSMALL:factor(g1schid)176329  4.86e+01   2.82e+01    1.73  0.08464
## cmpstypeSMALL:factor(g1schid)189378  7.98e+01   2.83e+01    2.82  0.00491
## cmpstypeSMALL:factor(g1schid)189382  5.92e+01   2.77e+01    2.14  0.03262
## cmpstypeSMALL:factor(g1schid)189396  8.56e+01   2.89e+01    2.97  0.00307
## cmpstypeSMALL:factor(g1schid)191411  6.45e+01   2.98e+01    2.16  0.03079
## cmpstypeSMALL:factor(g1schid)193423  3.99e+01   2.77e+01    1.44  0.14952
## cmpstypeSMALL:factor(g1schid)201449  6.14e+01   2.65e+01    2.31  0.02080
## cmpstypeSMALL:factor(g1schid)203452  7.98e+01   2.65e+01    3.01  0.00268
## cmpstypeSMALL:factor(g1schid)203457  3.78e+01   2.92e+01    1.30  0.19525
## cmpstypeSMALL:factor(g1schid)205488  6.46e+01   3.29e+01    1.96  0.05001
## cmpstypeSMALL:factor(g1schid)205490 -1.14e+01   3.00e+01   -0.38  0.70254
## cmpstypeSMALL:factor(g1schid)205491  7.00e+01   2.93e+01    2.39  0.01705
## cmpstypeSMALL:factor(g1schid)205492  6.42e+01   3.02e+01    2.13  0.03361
## cmpstypeSMALL:factor(g1schid)208501  2.13e+01   2.80e+01    0.76  0.44689
## cmpstypeSMALL:factor(g1schid)208503  9.27e+01   3.09e+01    3.00  0.00273
## cmpstypeSMALL:factor(g1schid)209510  3.72e+01   2.83e+01    1.31  0.18967
## cmpstypeSMALL:factor(g1schid)212522  3.03e+01   3.02e+01    1.00  0.31693
## cmpstypeSMALL:factor(g1schid)215533  5.24e+01   2.71e+01    1.93  0.05327
## cmpstypeSMALL:factor(g1schid)216537  3.57e+01   2.63e+01    1.36  0.17501
## cmpstypeSMALL:factor(g1schid)218562  4.46e+01   2.78e+01    1.61  0.10831
## cmpstypeSMALL:factor(g1schid)221571  2.99e+01   2.80e+01    1.07  0.28490
## cmpstypeSMALL:factor(g1schid)221574  6.15e+01   3.01e+01    2.04  0.04118
## cmpstypeSMALL:factor(g1schid)225585  5.79e+01   2.74e+01    2.11  0.03467
## cmpstypeSMALL:factor(g1schid)228606  8.38e+01   3.03e+01    2.76  0.00581
## cmpstypeSMALL:factor(g1schid)230612  8.51e+01   2.95e+01    2.89  0.00395
## cmpstypeSMALL:factor(g1schid)231616  6.28e+01   3.15e+01    1.99  0.04663
## cmpstypeSMALL:factor(g1schid)234628  5.97e+01   2.74e+01    2.18  0.02921
## cmpstypeSMALL:factor(g1schid)244708  6.99e+01   2.89e+01    2.42  0.01559
## cmpstypeSMALL:factor(g1schid)244723  6.18e+01   2.84e+01    2.18  0.02950
## cmpstypeSMALL:factor(g1schid)244727  6.69e+01   2.86e+01    2.34  0.01922
## cmpstypeSMALL:factor(g1schid)244728  8.16e+01   3.08e+01    2.65  0.00807
## cmpstypeSMALL:factor(g1schid)244736  5.40e+01   3.21e+01    1.68  0.09256
## cmpstypeSMALL:factor(g1schid)244746  9.33e+01   3.00e+01    3.11  0.00191
## cmpstypeSMALL:factor(g1schid)244774  7.21e+01   3.10e+01    2.33  0.02003
## cmpstypeSMALL:factor(g1schid)244776  8.93e+01   2.86e+01    3.12  0.00183
## cmpstypeSMALL:factor(g1schid)244780  8.00e+01   3.04e+01    2.63  0.00866
## cmpstypeSMALL:factor(g1schid)244799  1.04e+02   2.90e+01    3.59  0.00034
## cmpstypeSMALL:factor(g1schid)244801  2.80e+01   3.31e+01    0.85  0.39679
## cmpstypeSMALL:factor(g1schid)244806  3.71e+01   3.13e+01    1.18  0.23640
## cmpstypeSMALL:factor(g1schid)244831  5.44e+01   3.07e+01    1.77  0.07639
## cmpstypeSMALL:factor(g1schid)244839  5.09e+01   2.89e+01    1.76  0.07849
## cmpstypeSMALL:factor(g1schid)252885  6.08e+01   2.72e+01    2.23  0.02574
## cmpstypeSMALL:factor(g1schid)253888  1.71e+01   2.98e+01    0.57  0.56674
## cmpstypeSMALL:factor(g1schid)257899  4.24e+01   2.75e+01    1.54  0.12291
## cmpstypeSMALL:factor(g1schid)257905  4.16e+01   2.70e+01    1.54  0.12380
## cmpstypeSMALL:factor(g1schid)259915  6.34e+01   2.94e+01    2.15  0.03141
## cmpstypeSMALL:factor(g1schid)261927  5.58e+01   2.74e+01    2.04  0.04193
## cmpstypeSMALL:factor(g1schid)262937  6.65e+01   2.71e+01    2.46  0.01418
## cmpstypeSMALL:factor(g1schid)264945  7.08e+01   2.79e+01    2.53  0.01137
##                                        
## (Intercept)                         ***
## cmpstypeSMALL                       *  
## factor(g1schid)123056                  
## factor(g1schid)128076               .  
## factor(g1schid)128079               ** 
## factor(g1schid)130085                  
## factor(g1schid)159171                  
## factor(g1schid)161176               .  
## factor(g1schid)161183                  
## factor(g1schid)162184                  
## factor(g1schid)164198                  
## factor(g1schid)168214                  
## factor(g1schid)169229               .  
## factor(g1schid)169231                  
## factor(g1schid)169280                  
## factor(g1schid)173312                  
## factor(g1schid)176329               .  
## factor(g1schid)189378               *  
## factor(g1schid)189382                  
## factor(g1schid)189396                  
## factor(g1schid)191411               .  
## factor(g1schid)193423                  
## factor(g1schid)201449                  
## factor(g1schid)203452               .  
## factor(g1schid)203457                  
## factor(g1schid)205488                  
## factor(g1schid)205490                  
## factor(g1schid)205491                  
## factor(g1schid)205492               *  
## factor(g1schid)208501               *  
## factor(g1schid)208503                  
## factor(g1schid)209510                  
## factor(g1schid)212522                  
## factor(g1schid)215533                  
## factor(g1schid)216537                  
## factor(g1schid)218562                  
## factor(g1schid)221571                  
## factor(g1schid)221574                  
## factor(g1schid)225585               .  
## factor(g1schid)228606                  
## factor(g1schid)230612               .  
## factor(g1schid)231616                  
## factor(g1schid)234628                  
## factor(g1schid)244708               *  
## factor(g1schid)244723                  
## factor(g1schid)244727               .  
## factor(g1schid)244728                  
## factor(g1schid)244736                  
## factor(g1schid)244746                  
## factor(g1schid)244774                  
## factor(g1schid)244776               .  
## factor(g1schid)244780                  
## factor(g1schid)244799                  
## factor(g1schid)244801                  
## factor(g1schid)244806                  
## factor(g1schid)244831                  
## factor(g1schid)244839                  
## factor(g1schid)252885                  
## factor(g1schid)253888                  
## factor(g1schid)257899                  
## factor(g1schid)257905                  
## factor(g1schid)259915               *  
## factor(g1schid)261927                  
## factor(g1schid)262937                  
## factor(g1schid)264945               .  
## birthdate                           ***
## genderMALE                             
## raceBLACK                           *  
## raceHISPANIC                           
## raceOTHER                           .  
## raceWHITE                              
## cmpstypeSMALL:factor(g1schid)123056    
## cmpstypeSMALL:factor(g1schid)128076 ** 
## cmpstypeSMALL:factor(g1schid)128079 ***
## cmpstypeSMALL:factor(g1schid)130085 ** 
## cmpstypeSMALL:factor(g1schid)159171 *  
## cmpstypeSMALL:factor(g1schid)161176 *  
## cmpstypeSMALL:factor(g1schid)161183 *  
## cmpstypeSMALL:factor(g1schid)162184    
## cmpstypeSMALL:factor(g1schid)164198 *  
## cmpstypeSMALL:factor(g1schid)168214    
## cmpstypeSMALL:factor(g1schid)169229 ** 
## cmpstypeSMALL:factor(g1schid)169231    
## cmpstypeSMALL:factor(g1schid)169280 *  
## cmpstypeSMALL:factor(g1schid)173312 .  
## cmpstypeSMALL:factor(g1schid)176329 .  
## cmpstypeSMALL:factor(g1schid)189378 ** 
## cmpstypeSMALL:factor(g1schid)189382 *  
## cmpstypeSMALL:factor(g1schid)189396 ** 
## cmpstypeSMALL:factor(g1schid)191411 *  
## cmpstypeSMALL:factor(g1schid)193423    
## cmpstypeSMALL:factor(g1schid)201449 *  
## cmpstypeSMALL:factor(g1schid)203452 ** 
## cmpstypeSMALL:factor(g1schid)203457    
## cmpstypeSMALL:factor(g1schid)205488 .  
## cmpstypeSMALL:factor(g1schid)205490    
## cmpstypeSMALL:factor(g1schid)205491 *  
## cmpstypeSMALL:factor(g1schid)205492 *  
## cmpstypeSMALL:factor(g1schid)208501    
## cmpstypeSMALL:factor(g1schid)208503 ** 
## cmpstypeSMALL:factor(g1schid)209510    
## cmpstypeSMALL:factor(g1schid)212522    
## cmpstypeSMALL:factor(g1schid)215533 .  
## cmpstypeSMALL:factor(g1schid)216537    
## cmpstypeSMALL:factor(g1schid)218562    
## cmpstypeSMALL:factor(g1schid)221571    
## cmpstypeSMALL:factor(g1schid)221574 *  
## cmpstypeSMALL:factor(g1schid)225585 *  
## cmpstypeSMALL:factor(g1schid)228606 ** 
## cmpstypeSMALL:factor(g1schid)230612 ** 
## cmpstypeSMALL:factor(g1schid)231616 *  
## cmpstypeSMALL:factor(g1schid)234628 *  
## cmpstypeSMALL:factor(g1schid)244708 *  
## cmpstypeSMALL:factor(g1schid)244723 *  
## cmpstypeSMALL:factor(g1schid)244727 *  
## cmpstypeSMALL:factor(g1schid)244728 ** 
## cmpstypeSMALL:factor(g1schid)244736 .  
## cmpstypeSMALL:factor(g1schid)244746 ** 
## cmpstypeSMALL:factor(g1schid)244774 *  
## cmpstypeSMALL:factor(g1schid)244776 ** 
## cmpstypeSMALL:factor(g1schid)244780 ** 
## cmpstypeSMALL:factor(g1schid)244799 ***
## cmpstypeSMALL:factor(g1schid)244801    
## cmpstypeSMALL:factor(g1schid)244806    
## cmpstypeSMALL:factor(g1schid)244831 .  
## cmpstypeSMALL:factor(g1schid)244839 .  
## cmpstypeSMALL:factor(g1schid)252885 *  
## cmpstypeSMALL:factor(g1schid)253888    
## cmpstypeSMALL:factor(g1schid)257899    
## cmpstypeSMALL:factor(g1schid)257905    
## cmpstypeSMALL:factor(g1schid)259915 *  
## cmpstypeSMALL:factor(g1schid)261927 *  
## cmpstypeSMALL:factor(g1schid)262937 *  
## cmpstypeSMALL:factor(g1schid)264945 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.3 on 1418 degrees of freedom
##   (1351 observations deleted due to missingness)
## Multiple R-squared:  0.291,  Adjusted R-squared:  0.225 
## F-statistic: 4.38 on 133 and 1418 DF,  p-value: <2e-16

anova(STAR_fit, STAR_fit_treatbyschool)

## Analysis of Variance Table
## 
## Model 1: g3tmathss ~ cmpstype + birthdate + gender + race + factor(g1schid)
## Model 2: g3tmathss ~ cmpstype * factor(g1schid) + birthdate + gender + 
##     race
##   Res.Df     RSS Df Sum of Sq   F Pr(>F)    
## 1   1481 1913905                            
## 2   1418 1765069 63    148836 1.9  4e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# solution2: test models with and without school as the predictor
STAR_fit_noschool <- update(STAR_fit, .~. - factor(g1schid))
anova(STAR_fit, STAR_fit_noschool)

## Analysis of Variance Table
## 
## Model 1: g3tmathss ~ cmpstype + birthdate + gender + race + factor(g1schid)
## Model 2: g3tmathss ~ cmpstype + birthdate + gender + race
##   Res.Df     RSS  Df Sum of Sq    F Pr(>F)    
## 1   1481 1913905                              
## 2   1544 2269589 -63   -355684 4.37 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to the results of the above ANOVA, the effect of being in a small class differs by school, as the F-value is 4.3688 and its p-value is smaller than 0.05.

4. Robust test (+1 Extra Credit)

Repeat the analysis from Question 3, but use heteroskedasticity-robust variance estimation (Use the HC3 version). Report an appropriate test statistic and p-value. Explain your intepretation of the results and note whether your conclusion is consistent or inconsistent with your conclusion from Question 3.

library(lmtest)

## Error: there is no package called 'lmtest'

library(sandwich)

## Error: there is no package called 'sandwich'

# solution1: test interaction of treat by school
waldtest(STAR_fit, STAR_fit_treatbyschool, vcov = vcovHC(STAR_fit_treatbyschool, type = "HC3"), test = "F")

## Error: could not find function "waldtest"

# solution2: test models with and without school as the predictor
waldtest(STAR_fit, STAR_fit_noschool, vcov = vcovHC(STAR_fit, type = "HC3"), test = "F")

## Error: could not find function "waldtest"

Using heteroskedasticity-robust variance estimation, the conclusion is consistent with what I get from Question 3, which is the effect of being in a small class differs by school, as the F-value is 4.3307 and its p-value is smaller than 0.05.

American Communities Survey

In Section 17.1 of Lander (2014), the author provides an example of a Poisson regression model for family size (NumChildren). The data come from the American Communities, which is stored in the ACS data frame. Here is the model discussed in the example:

n_kids_pois <- glm(NumChildren ~ FamilyIncome + FamilyType + OwnRent, family = poisson, data = ACS)
summary(n_kids_pois)

## 
## Call:
## glm(formula = NumChildren ~ FamilyIncome + FamilyType + OwnRent, 
##     family = poisson, data = ACS)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.995  -1.324  -1.205   0.946   6.378  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -3.26e-01   2.10e-02  -15.49   <2e-16 ***
## FamilyIncome         5.42e-07   6.57e-08    8.25   <2e-16 ***
## FamilyTypeMale Head -6.30e-02   3.85e-02   -1.64      0.1    
## FamilyTypeMarried    1.44e-01   2.15e-02    6.71    2e-11 ***
## OwnRentOutright     -1.97e+00   2.29e-01   -8.61   <2e-16 ***
## OwnRentRented        4.09e-01   2.07e-02   19.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 35240  on 22744  degrees of freedom
## Residual deviance: 34643  on 22739  degrees of freedom
## AIC: 61370
## 
## Number of Fisher Scoring iterations: 5

5. GLM for family size (2 pt)

The example from Lander (2014) uses the Poisson variance function \( V(\mu_i) = \mu_i \), which assumes that the variance of the outcome is equal to the mean of the outcome. This is quite a strong assumption. A weaker assumption would be that the variance is of the outcome is proportional to the mean, but not necessarily exactly equal; the variance function would then be \[ V(\mu_i) = \theta \mu_i, \] where the unknown parameter \( \theta > 0 \) measures the extent of under-dispersion (when \( \theta < 1 \)) or over-dispersion (when \( \theta > 1 \)) of the variance. This variance function corresponds to the quasi-Poisson family.

Re-fit the generalized linear model for family size, but use the quasi-Poisson family that allows for under- or over-dispersion. (For details about the syntax, see the examples under ?family.) Compare the point estimates, standard errors, and p-values of the two models. Which model you would recommend using?

n_kids_quasipois <- glm(NumChildren ~ FamilyIncome + FamilyType + OwnRent, family = quasipoisson, data = ACS)
summary(n_kids_quasipois)

## 
## Call:
## glm(formula = NumChildren ~ FamilyIncome + FamilyType + OwnRent, 
##     family = quasipoisson, data = ACS)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.995  -1.324  -1.205   0.946   6.378  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -3.26e-01   2.55e-02  -12.78  < 2e-16 ***
## FamilyIncome         5.42e-07   7.97e-08    6.80  1.1e-11 ***
## FamilyTypeMale Head -6.30e-02   4.66e-02   -1.35     0.18    
## FamilyTypeMarried    1.44e-01   2.60e-02    5.53  3.2e-08 ***
## OwnRentOutright     -1.97e+00   2.78e-01   -7.10  1.3e-12 ***
## OwnRentRented        4.09e-01   2.51e-02   16.31  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 1.47)
## 
##     Null deviance: 35240  on 22744  degrees of freedom
## Residual deviance: 34643  on 22739  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

All the point estimates are the same in these two models, while each of the standard errors in the Poisson model is smaller than its counterpart in the quasi-Poisson model, and each of the p-values in the Poisson model is smaller than its counterpart in the quasi-Poisson model. I would recommend using the Poisson regression model as the standard errors are smaller in this model.

6. GLM with robust variance estimation (1 pt)

Heteroskedasticity-robust variance estimation works in generalized linear models too. Calculate heteroskedasticity-robust standard errors for the n_kids_pois model from Question 5. Compare the results to the model-based standard errors from the Poisson model and the quasi-Poisson model.

coeftest(n_kids_pois, vcov = vcovHC)

## Error: could not find function "coeftest"

The heteroskedasticity-robust standard errors of the Poison model are larger than those in the Poisson model, but smaller than those in the quasi-Poisson model.

7. Bootstrap dispersion test (+3 Extra credit)

The quasi-Poisson model from Question 5 allows for under- or over-dispersion of the variance, rather than assuming that it is fixed. It would be useful to test whether the weaker model is needed—that is, to test the null hypothesis \( H_0: \theta = 1 \). Use parametric bootstrapping to test this null hypothesis against the one-sided alternative hypothesis \( H_A: \theta > 1 \), by following these steps:

Get the value of the estimated dispersion parameter (call this \( \hat\theta \)) from the model that you fit for Question 5. (Note that this value is not stored in the fitted glm object, but can be obtained from the object produced by the summary function.)
Use the simulate function to generate many (say 1000) vectors of simulated outcome data based on the fitted n_kids_pois model.
For each simulated vector of outcome data, re-fit the quasi-Poisson model and get the values of the estimated dispersion parameter. These are the bootstrapped values of the dispersion estimates, or “booties.”
Compare the distribution of the “booties” to the estimated dispersion parameter \( \hat\theta \) from step 1. The p-value for the null hypothesis is the proportion of “booties” that are greater than \( \hat\theta \).
Explain your interpretation of the results (i.e., do you reject the null hypothesis of unit dispersion?).

#R code here

Explain your interpretation here.

Data analysis project

8. Regression model checking (3 pts.)

Fit a linear regression model using the data from your course project. Using the diagnostic tools we discussed in class, check the model's assumptions, including

the functional form of the regression specification
homoskedasticity of the errors
normality of the errors
presence of outliers

Present graphical or numerical evidence regarding each assumption, and explain your interpretation of that evidence.

load("STORM_DMG.Rdata")

## Warning: cannot open compressed file 'STORM_DMG.Rdata', probable reason
## 'No such file or directory'

## Error: cannot open the connection

STORM_Merged <- merge(STORM, STORM_DMG)

## Error: object 'STORM' not found

TORNADO <- subset(STORM_Merged, EVENT_TYPE == "Tornado")

## Error: object 'STORM_Merged' not found

lm_tornado <- lm(CASUALITIES ~ TOR_F_SCALE + TOR_LENGTH + TOR_WIDTH, data = TORNADO)

## Error: object 'TORNADO' not found

summary(lm_tornado)

## Error: object 'lm_tornado' not found

plot(lm_tornado)

## Error: object 'lm_tornado' not found