This assignment will ask you to fit several linear regression models and generalized linear regression models, to investigate the assumptions of those regression models, and to compare the inferences based on conventional standard errors to those based on heteroskedasticity-robust standard errors. I have saved the relevant data files for this assignment in an .Rdata file, which is loaded as follows:
load("Assignment 7 data.RData")
ls()
## [1] "ACS" "STAR"
Recall the Tennessee STAR study, which was a large, block-randomized experiment that evaluated the effect of class size (the number of students per classroom) on elementary-grade students' academic performance. (For further details, see Assignment 4.) The data from this study is stored in the STAR data frame.
The following linear regression model predict a student's 3rd grade math performance (g3tmathss) based on their school (g1schid) and whether the student was assigned to a small- or regular-sized classroom (cmpstype). The model also includes the student's gender, race, and birthdate as covariates.
STAR_fit <- lm(g3tmathss ~ cmpstype + birthdate + gender + race + factor(g1schid), data = STAR)
Using the diagnostic tools we discussed in class, check the assumptions of this regression model, including
Present graphical or numerical evidence regarding each assumption, and explain your interpretation of that evidence.
plot(STAR_fit, which = 1)
plot(STAR_fit, which = 3)
plot(STAR_fit, which = 2)
hist(residuals(STAR_fit)) # outlier in Y
plot(STAR_fit, which = 5) # outlier in X
Using the STAR_fit regression from Question 1, test whether children's average math scores differ by race (controlling for the other predictors in the model). Report an appropriate test statistic and p-value, and explain your intepretation of the results.
STAR_fit_norace <- update(STAR_fit, .~. - race)
anova(STAR_fit, STAR_fit_norace)
## Analysis of Variance Table
##
## Model 1: g3tmathss ~ cmpstype + birthdate + gender + race + factor(g1schid)
## Model 2: g3tmathss ~ cmpstype + birthdate + gender + factor(g1schid)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1481 1913905
## 2 1485 1950477 -4 -36572 7.08 1.2e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
According to the results of the above ANOVA, children's average math scores differ by race when controlling for the other predictors in the model, as the F-value is 7.075 and the its p-value is smaller than 0.05.
Conduct a test for school-level treatment effect heterogeneity, i.e., whether the effect of being in a small class differs by school. Note that you will need to run a new regression model to do this. Report an appropriate test statistic and p-value, and explain your intepretation of the results.
# solution1: test interaction of treat by school
STAR_fit_treatbyschool <- lm(g3tmathss ~ cmpstype * factor(g1schid) + birthdate + gender + race, data = STAR)
summary(STAR_fit_treatbyschool)
##
## Call:
## lm(formula = g3tmathss ~ cmpstype * factor(g1schid) + birthdate +
## gender + race, data = STAR)
##
## Residuals:
## Min 1Q Median 3Q Max
## -101.67 -23.65 -1.39 22.86 133.84
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.35e+02 3.24e+01 16.49 < 2e-16
## cmpstypeSMALL -5.05e+01 2.41e+01 -2.09 0.03666
## factor(g1schid)123056 -2.47e+01 2.16e+01 -1.14 0.25266
## factor(g1schid)128076 -3.51e+01 2.12e+01 -1.66 0.09791
## factor(g1schid)128079 -6.89e+01 2.23e+01 -3.08 0.00210
## factor(g1schid)130085 3.72e+00 2.24e+01 0.17 0.86807
## factor(g1schid)159171 -3.15e+01 2.04e+01 -1.54 0.12261
## factor(g1schid)161176 -3.94e+01 2.12e+01 -1.86 0.06329
## factor(g1schid)161183 -2.41e+01 2.16e+01 -1.12 0.26341
## factor(g1schid)162184 7.03e+00 2.16e+01 0.33 0.74467
## factor(g1schid)164198 -2.85e+01 2.19e+01 -1.30 0.19290
## factor(g1schid)168214 6.19e+00 2.39e+01 0.26 0.79594
## factor(g1schid)169229 -3.89e+01 2.00e+01 -1.95 0.05151
## factor(g1schid)169231 -2.89e+01 2.13e+01 -1.36 0.17492
## factor(g1schid)169280 -3.15e+01 2.39e+01 -1.32 0.18776
## factor(g1schid)173312 -8.78e+00 2.05e+01 -0.43 0.66904
## factor(g1schid)176329 -3.59e+01 2.15e+01 -1.67 0.09557
## factor(g1schid)189378 -4.18e+01 2.05e+01 -2.04 0.04193
## factor(g1schid)189382 -1.78e+01 2.00e+01 -0.89 0.37341
## factor(g1schid)189396 -2.75e+01 2.20e+01 -1.25 0.21127
## factor(g1schid)191411 -4.50e+01 2.30e+01 -1.96 0.05070
## factor(g1schid)193423 -2.15e+01 2.15e+01 -1.00 0.31776
## factor(g1schid)201449 -2.06e+01 1.98e+01 -1.04 0.29923
## factor(g1schid)203452 -3.55e+01 1.97e+01 -1.80 0.07194
## factor(g1schid)203457 4.84e+00 2.24e+01 0.22 0.82853
## factor(g1schid)205488 -1.39e+01 2.72e+01 -0.51 0.60863
## factor(g1schid)205490 7.28e+00 2.07e+01 0.35 0.72544
## factor(g1schid)205491 -1.07e+01 2.24e+01 -0.48 0.63422
## factor(g1schid)205492 -5.58e+01 2.29e+01 -2.44 0.01482
## factor(g1schid)208501 4.55e+01 2.14e+01 2.12 0.03392
## factor(g1schid)208503 -2.08e+01 2.40e+01 -0.87 0.38496
## factor(g1schid)209510 2.33e+01 2.20e+01 1.06 0.28898
## factor(g1schid)212522 -1.41e+01 2.40e+01 -0.59 0.55645
## factor(g1schid)215533 -1.08e+01 2.00e+01 -0.54 0.58964
## factor(g1schid)216537 -1.25e+01 1.95e+01 -0.64 0.52074
## factor(g1schid)218562 -9.10e+00 2.11e+01 -0.43 0.66616
## factor(g1schid)221571 -1.44e+01 2.11e+01 -0.68 0.49430
## factor(g1schid)221574 -3.51e+01 2.39e+01 -1.47 0.14246
## factor(g1schid)225585 -3.71e+01 2.07e+01 -1.79 0.07360
## factor(g1schid)228606 -2.33e+00 2.38e+01 -0.10 0.92185
## factor(g1schid)230612 -3.88e+01 2.24e+01 -1.73 0.08361
## factor(g1schid)231616 -1.23e+01 2.52e+01 -0.49 0.62606
## factor(g1schid)234628 -2.98e+01 2.05e+01 -1.45 0.14682
## factor(g1schid)244708 -5.34e+01 2.14e+01 -2.49 0.01275
## factor(g1schid)244723 -2.67e+00 2.17e+01 -0.12 0.90197
## factor(g1schid)244727 -4.02e+01 2.14e+01 -1.87 0.06113
## factor(g1schid)244728 -3.54e+01 2.18e+01 -1.62 0.10516
## factor(g1schid)244736 -2.94e+01 2.51e+01 -1.17 0.24132
## factor(g1schid)244746 -1.65e+01 2.14e+01 -0.77 0.44114
## factor(g1schid)244774 -3.11e+01 2.51e+01 -1.24 0.21414
## factor(g1schid)244776 -4.20e+01 2.22e+01 -1.89 0.05885
## factor(g1schid)244780 -3.15e+01 2.29e+01 -1.38 0.16880
## factor(g1schid)244799 -1.27e+01 2.18e+01 -0.58 0.56164
## factor(g1schid)244801 1.01e+01 2.71e+01 0.37 0.70982
## factor(g1schid)244806 -1.89e+01 2.14e+01 -0.88 0.37935
## factor(g1schid)244831 -2.02e+01 2.38e+01 -0.85 0.39661
## factor(g1schid)244839 1.51e+00 2.23e+01 0.07 0.94598
## factor(g1schid)252885 -3.03e+01 2.06e+01 -1.47 0.14206
## factor(g1schid)253888 -7.74e+00 2.15e+01 -0.36 0.71922
## factor(g1schid)257899 -1.81e+01 1.97e+01 -0.92 0.35666
## factor(g1schid)257905 -9.98e+00 2.04e+01 -0.49 0.62548
## factor(g1schid)259915 -4.65e+01 2.31e+01 -2.02 0.04400
## factor(g1schid)261927 -1.58e+01 2.10e+01 -0.75 0.45097
## factor(g1schid)262937 -5.80e+00 2.07e+01 -0.28 0.77897
## factor(g1schid)264945 -3.70e+01 2.15e+01 -1.72 0.08583
## birthdate 4.24e-07 6.80e-08 6.23 6.1e-10
## genderMALE 1.21e+00 1.87e+00 0.65 0.51701
## raceBLACK -4.49e+01 1.87e+01 -2.39 0.01679
## raceHISPANIC 2.94e+00 3.21e+01 0.09 0.92702
## raceOTHER -5.83e+01 3.19e+01 -1.83 0.06801
## raceWHITE -2.54e+01 1.86e+01 -1.37 0.17212
## cmpstypeSMALL:factor(g1schid)123056 3.68e+01 2.90e+01 1.27 0.20510
## cmpstypeSMALL:factor(g1schid)128076 7.30e+01 2.80e+01 2.61 0.00911
## cmpstypeSMALL:factor(g1schid)128079 1.03e+02 2.92e+01 3.54 0.00042
## cmpstypeSMALL:factor(g1schid)130085 7.79e+01 2.95e+01 2.64 0.00845
## cmpstypeSMALL:factor(g1schid)159171 6.26e+01 2.68e+01 2.33 0.01971
## cmpstypeSMALL:factor(g1schid)161176 6.37e+01 2.81e+01 2.27 0.02349
## cmpstypeSMALL:factor(g1schid)161183 5.58e+01 2.80e+01 1.99 0.04645
## cmpstypeSMALL:factor(g1schid)162184 3.26e+01 2.80e+01 1.16 0.24427
## cmpstypeSMALL:factor(g1schid)164198 6.89e+01 2.92e+01 2.36 0.01843
## cmpstypeSMALL:factor(g1schid)168214 4.27e+01 3.09e+01 1.38 0.16672
## cmpstypeSMALL:factor(g1schid)169229 8.51e+01 2.66e+01 3.20 0.00140
## cmpstypeSMALL:factor(g1schid)169231 4.41e+01 2.79e+01 1.58 0.11453
## cmpstypeSMALL:factor(g1schid)169280 6.74e+01 2.99e+01 2.25 0.02438
## cmpstypeSMALL:factor(g1schid)173312 5.04e+01 2.86e+01 1.76 0.07783
## cmpstypeSMALL:factor(g1schid)176329 4.86e+01 2.82e+01 1.73 0.08464
## cmpstypeSMALL:factor(g1schid)189378 7.98e+01 2.83e+01 2.82 0.00491
## cmpstypeSMALL:factor(g1schid)189382 5.92e+01 2.77e+01 2.14 0.03262
## cmpstypeSMALL:factor(g1schid)189396 8.56e+01 2.89e+01 2.97 0.00307
## cmpstypeSMALL:factor(g1schid)191411 6.45e+01 2.98e+01 2.16 0.03079
## cmpstypeSMALL:factor(g1schid)193423 3.99e+01 2.77e+01 1.44 0.14952
## cmpstypeSMALL:factor(g1schid)201449 6.14e+01 2.65e+01 2.31 0.02080
## cmpstypeSMALL:factor(g1schid)203452 7.98e+01 2.65e+01 3.01 0.00268
## cmpstypeSMALL:factor(g1schid)203457 3.78e+01 2.92e+01 1.30 0.19525
## cmpstypeSMALL:factor(g1schid)205488 6.46e+01 3.29e+01 1.96 0.05001
## cmpstypeSMALL:factor(g1schid)205490 -1.14e+01 3.00e+01 -0.38 0.70254
## cmpstypeSMALL:factor(g1schid)205491 7.00e+01 2.93e+01 2.39 0.01705
## cmpstypeSMALL:factor(g1schid)205492 6.42e+01 3.02e+01 2.13 0.03361
## cmpstypeSMALL:factor(g1schid)208501 2.13e+01 2.80e+01 0.76 0.44689
## cmpstypeSMALL:factor(g1schid)208503 9.27e+01 3.09e+01 3.00 0.00273
## cmpstypeSMALL:factor(g1schid)209510 3.72e+01 2.83e+01 1.31 0.18967
## cmpstypeSMALL:factor(g1schid)212522 3.03e+01 3.02e+01 1.00 0.31693
## cmpstypeSMALL:factor(g1schid)215533 5.24e+01 2.71e+01 1.93 0.05327
## cmpstypeSMALL:factor(g1schid)216537 3.57e+01 2.63e+01 1.36 0.17501
## cmpstypeSMALL:factor(g1schid)218562 4.46e+01 2.78e+01 1.61 0.10831
## cmpstypeSMALL:factor(g1schid)221571 2.99e+01 2.80e+01 1.07 0.28490
## cmpstypeSMALL:factor(g1schid)221574 6.15e+01 3.01e+01 2.04 0.04118
## cmpstypeSMALL:factor(g1schid)225585 5.79e+01 2.74e+01 2.11 0.03467
## cmpstypeSMALL:factor(g1schid)228606 8.38e+01 3.03e+01 2.76 0.00581
## cmpstypeSMALL:factor(g1schid)230612 8.51e+01 2.95e+01 2.89 0.00395
## cmpstypeSMALL:factor(g1schid)231616 6.28e+01 3.15e+01 1.99 0.04663
## cmpstypeSMALL:factor(g1schid)234628 5.97e+01 2.74e+01 2.18 0.02921
## cmpstypeSMALL:factor(g1schid)244708 6.99e+01 2.89e+01 2.42 0.01559
## cmpstypeSMALL:factor(g1schid)244723 6.18e+01 2.84e+01 2.18 0.02950
## cmpstypeSMALL:factor(g1schid)244727 6.69e+01 2.86e+01 2.34 0.01922
## cmpstypeSMALL:factor(g1schid)244728 8.16e+01 3.08e+01 2.65 0.00807
## cmpstypeSMALL:factor(g1schid)244736 5.40e+01 3.21e+01 1.68 0.09256
## cmpstypeSMALL:factor(g1schid)244746 9.33e+01 3.00e+01 3.11 0.00191
## cmpstypeSMALL:factor(g1schid)244774 7.21e+01 3.10e+01 2.33 0.02003
## cmpstypeSMALL:factor(g1schid)244776 8.93e+01 2.86e+01 3.12 0.00183
## cmpstypeSMALL:factor(g1schid)244780 8.00e+01 3.04e+01 2.63 0.00866
## cmpstypeSMALL:factor(g1schid)244799 1.04e+02 2.90e+01 3.59 0.00034
## cmpstypeSMALL:factor(g1schid)244801 2.80e+01 3.31e+01 0.85 0.39679
## cmpstypeSMALL:factor(g1schid)244806 3.71e+01 3.13e+01 1.18 0.23640
## cmpstypeSMALL:factor(g1schid)244831 5.44e+01 3.07e+01 1.77 0.07639
## cmpstypeSMALL:factor(g1schid)244839 5.09e+01 2.89e+01 1.76 0.07849
## cmpstypeSMALL:factor(g1schid)252885 6.08e+01 2.72e+01 2.23 0.02574
## cmpstypeSMALL:factor(g1schid)253888 1.71e+01 2.98e+01 0.57 0.56674
## cmpstypeSMALL:factor(g1schid)257899 4.24e+01 2.75e+01 1.54 0.12291
## cmpstypeSMALL:factor(g1schid)257905 4.16e+01 2.70e+01 1.54 0.12380
## cmpstypeSMALL:factor(g1schid)259915 6.34e+01 2.94e+01 2.15 0.03141
## cmpstypeSMALL:factor(g1schid)261927 5.58e+01 2.74e+01 2.04 0.04193
## cmpstypeSMALL:factor(g1schid)262937 6.65e+01 2.71e+01 2.46 0.01418
## cmpstypeSMALL:factor(g1schid)264945 7.08e+01 2.79e+01 2.53 0.01137
##
## (Intercept) ***
## cmpstypeSMALL *
## factor(g1schid)123056
## factor(g1schid)128076 .
## factor(g1schid)128079 **
## factor(g1schid)130085
## factor(g1schid)159171
## factor(g1schid)161176 .
## factor(g1schid)161183
## factor(g1schid)162184
## factor(g1schid)164198
## factor(g1schid)168214
## factor(g1schid)169229 .
## factor(g1schid)169231
## factor(g1schid)169280
## factor(g1schid)173312
## factor(g1schid)176329 .
## factor(g1schid)189378 *
## factor(g1schid)189382
## factor(g1schid)189396
## factor(g1schid)191411 .
## factor(g1schid)193423
## factor(g1schid)201449
## factor(g1schid)203452 .
## factor(g1schid)203457
## factor(g1schid)205488
## factor(g1schid)205490
## factor(g1schid)205491
## factor(g1schid)205492 *
## factor(g1schid)208501 *
## factor(g1schid)208503
## factor(g1schid)209510
## factor(g1schid)212522
## factor(g1schid)215533
## factor(g1schid)216537
## factor(g1schid)218562
## factor(g1schid)221571
## factor(g1schid)221574
## factor(g1schid)225585 .
## factor(g1schid)228606
## factor(g1schid)230612 .
## factor(g1schid)231616
## factor(g1schid)234628
## factor(g1schid)244708 *
## factor(g1schid)244723
## factor(g1schid)244727 .
## factor(g1schid)244728
## factor(g1schid)244736
## factor(g1schid)244746
## factor(g1schid)244774
## factor(g1schid)244776 .
## factor(g1schid)244780
## factor(g1schid)244799
## factor(g1schid)244801
## factor(g1schid)244806
## factor(g1schid)244831
## factor(g1schid)244839
## factor(g1schid)252885
## factor(g1schid)253888
## factor(g1schid)257899
## factor(g1schid)257905
## factor(g1schid)259915 *
## factor(g1schid)261927
## factor(g1schid)262937
## factor(g1schid)264945 .
## birthdate ***
## genderMALE
## raceBLACK *
## raceHISPANIC
## raceOTHER .
## raceWHITE
## cmpstypeSMALL:factor(g1schid)123056
## cmpstypeSMALL:factor(g1schid)128076 **
## cmpstypeSMALL:factor(g1schid)128079 ***
## cmpstypeSMALL:factor(g1schid)130085 **
## cmpstypeSMALL:factor(g1schid)159171 *
## cmpstypeSMALL:factor(g1schid)161176 *
## cmpstypeSMALL:factor(g1schid)161183 *
## cmpstypeSMALL:factor(g1schid)162184
## cmpstypeSMALL:factor(g1schid)164198 *
## cmpstypeSMALL:factor(g1schid)168214
## cmpstypeSMALL:factor(g1schid)169229 **
## cmpstypeSMALL:factor(g1schid)169231
## cmpstypeSMALL:factor(g1schid)169280 *
## cmpstypeSMALL:factor(g1schid)173312 .
## cmpstypeSMALL:factor(g1schid)176329 .
## cmpstypeSMALL:factor(g1schid)189378 **
## cmpstypeSMALL:factor(g1schid)189382 *
## cmpstypeSMALL:factor(g1schid)189396 **
## cmpstypeSMALL:factor(g1schid)191411 *
## cmpstypeSMALL:factor(g1schid)193423
## cmpstypeSMALL:factor(g1schid)201449 *
## cmpstypeSMALL:factor(g1schid)203452 **
## cmpstypeSMALL:factor(g1schid)203457
## cmpstypeSMALL:factor(g1schid)205488 .
## cmpstypeSMALL:factor(g1schid)205490
## cmpstypeSMALL:factor(g1schid)205491 *
## cmpstypeSMALL:factor(g1schid)205492 *
## cmpstypeSMALL:factor(g1schid)208501
## cmpstypeSMALL:factor(g1schid)208503 **
## cmpstypeSMALL:factor(g1schid)209510
## cmpstypeSMALL:factor(g1schid)212522
## cmpstypeSMALL:factor(g1schid)215533 .
## cmpstypeSMALL:factor(g1schid)216537
## cmpstypeSMALL:factor(g1schid)218562
## cmpstypeSMALL:factor(g1schid)221571
## cmpstypeSMALL:factor(g1schid)221574 *
## cmpstypeSMALL:factor(g1schid)225585 *
## cmpstypeSMALL:factor(g1schid)228606 **
## cmpstypeSMALL:factor(g1schid)230612 **
## cmpstypeSMALL:factor(g1schid)231616 *
## cmpstypeSMALL:factor(g1schid)234628 *
## cmpstypeSMALL:factor(g1schid)244708 *
## cmpstypeSMALL:factor(g1schid)244723 *
## cmpstypeSMALL:factor(g1schid)244727 *
## cmpstypeSMALL:factor(g1schid)244728 **
## cmpstypeSMALL:factor(g1schid)244736 .
## cmpstypeSMALL:factor(g1schid)244746 **
## cmpstypeSMALL:factor(g1schid)244774 *
## cmpstypeSMALL:factor(g1schid)244776 **
## cmpstypeSMALL:factor(g1schid)244780 **
## cmpstypeSMALL:factor(g1schid)244799 ***
## cmpstypeSMALL:factor(g1schid)244801
## cmpstypeSMALL:factor(g1schid)244806
## cmpstypeSMALL:factor(g1schid)244831 .
## cmpstypeSMALL:factor(g1schid)244839 .
## cmpstypeSMALL:factor(g1schid)252885 *
## cmpstypeSMALL:factor(g1schid)253888
## cmpstypeSMALL:factor(g1schid)257899
## cmpstypeSMALL:factor(g1schid)257905
## cmpstypeSMALL:factor(g1schid)259915 *
## cmpstypeSMALL:factor(g1schid)261927 *
## cmpstypeSMALL:factor(g1schid)262937 *
## cmpstypeSMALL:factor(g1schid)264945 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35.3 on 1418 degrees of freedom
## (1351 observations deleted due to missingness)
## Multiple R-squared: 0.291, Adjusted R-squared: 0.225
## F-statistic: 4.38 on 133 and 1418 DF, p-value: <2e-16
anova(STAR_fit, STAR_fit_treatbyschool)
## Analysis of Variance Table
##
## Model 1: g3tmathss ~ cmpstype + birthdate + gender + race + factor(g1schid)
## Model 2: g3tmathss ~ cmpstype * factor(g1schid) + birthdate + gender +
## race
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1481 1913905
## 2 1418 1765069 63 148836 1.9 4e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# solution2: test models with and without school as the predictor
STAR_fit_noschool <- update(STAR_fit, .~. - factor(g1schid))
anova(STAR_fit, STAR_fit_noschool)
## Analysis of Variance Table
##
## Model 1: g3tmathss ~ cmpstype + birthdate + gender + race + factor(g1schid)
## Model 2: g3tmathss ~ cmpstype + birthdate + gender + race
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1481 1913905
## 2 1544 2269589 -63 -355684 4.37 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
According to the results of the above ANOVA, the effect of being in a small class differs by school, as the F-value is 4.3688 and its p-value is smaller than 0.05.
Repeat the analysis from Question 3, but use heteroskedasticity-robust variance estimation (Use the HC3 version). Report an appropriate test statistic and p-value. Explain your intepretation of the results and note whether your conclusion is consistent or inconsistent with your conclusion from Question 3.
library(lmtest)
## Error: there is no package called 'lmtest'
library(sandwich)
## Error: there is no package called 'sandwich'
# solution1: test interaction of treat by school
waldtest(STAR_fit, STAR_fit_treatbyschool, vcov = vcovHC(STAR_fit_treatbyschool, type = "HC3"), test = "F")
## Error: could not find function "waldtest"
# solution2: test models with and without school as the predictor
waldtest(STAR_fit, STAR_fit_noschool, vcov = vcovHC(STAR_fit, type = "HC3"), test = "F")
## Error: could not find function "waldtest"
Using heteroskedasticity-robust variance estimation, the conclusion is consistent with what I get from Question 3, which is the effect of being in a small class differs by school, as the F-value is 4.3307 and its p-value is smaller than 0.05.
In Section 17.1 of Lander (2014), the author provides an example of a Poisson regression model for family size (NumChildren). The data come from the American Communities, which is stored in the ACS data frame. Here is the model discussed in the example:
n_kids_pois <- glm(NumChildren ~ FamilyIncome + FamilyType + OwnRent, family = poisson, data = ACS)
summary(n_kids_pois)
##
## Call:
## glm(formula = NumChildren ~ FamilyIncome + FamilyType + OwnRent,
## family = poisson, data = ACS)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.995 -1.324 -1.205 0.946 6.378
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.26e-01 2.10e-02 -15.49 <2e-16 ***
## FamilyIncome 5.42e-07 6.57e-08 8.25 <2e-16 ***
## FamilyTypeMale Head -6.30e-02 3.85e-02 -1.64 0.1
## FamilyTypeMarried 1.44e-01 2.15e-02 6.71 2e-11 ***
## OwnRentOutright -1.97e+00 2.29e-01 -8.61 <2e-16 ***
## OwnRentRented 4.09e-01 2.07e-02 19.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 35240 on 22744 degrees of freedom
## Residual deviance: 34643 on 22739 degrees of freedom
## AIC: 61370
##
## Number of Fisher Scoring iterations: 5
The example from Lander (2014) uses the Poisson variance function \( V(\mu_i) = \mu_i \), which assumes that the variance of the outcome is equal to the mean of the outcome. This is quite a strong assumption. A weaker assumption would be that the variance is of the outcome is proportional to the mean, but not necessarily exactly equal; the variance function would then be \[ V(\mu_i) = \theta \mu_i, \] where the unknown parameter \( \theta > 0 \) measures the extent of under-dispersion (when \( \theta < 1 \)) or over-dispersion (when \( \theta > 1 \)) of the variance. This variance function corresponds to the quasi-Poisson family.
Re-fit the generalized linear model for family size, but use the quasi-Poisson family that allows for under- or over-dispersion. (For details about the syntax, see the examples under ?family.) Compare the point estimates, standard errors, and p-values of the two models. Which model you would recommend using?
n_kids_quasipois <- glm(NumChildren ~ FamilyIncome + FamilyType + OwnRent, family = quasipoisson, data = ACS)
summary(n_kids_quasipois)
##
## Call:
## glm(formula = NumChildren ~ FamilyIncome + FamilyType + OwnRent,
## family = quasipoisson, data = ACS)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.995 -1.324 -1.205 0.946 6.378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.26e-01 2.55e-02 -12.78 < 2e-16 ***
## FamilyIncome 5.42e-07 7.97e-08 6.80 1.1e-11 ***
## FamilyTypeMale Head -6.30e-02 4.66e-02 -1.35 0.18
## FamilyTypeMarried 1.44e-01 2.60e-02 5.53 3.2e-08 ***
## OwnRentOutright -1.97e+00 2.78e-01 -7.10 1.3e-12 ***
## OwnRentRented 4.09e-01 2.51e-02 16.31 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 1.47)
##
## Null deviance: 35240 on 22744 degrees of freedom
## Residual deviance: 34643 on 22739 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
All the point estimates are the same in these two models, while each of the standard errors in the Poisson model is smaller than its counterpart in the quasi-Poisson model, and each of the p-values in the Poisson model is smaller than its counterpart in the quasi-Poisson model. I would recommend using the Poisson regression model as the standard errors are smaller in this model.
Heteroskedasticity-robust variance estimation works in generalized linear models too. Calculate heteroskedasticity-robust standard errors for the n_kids_pois model from Question 5. Compare the results to the model-based standard errors from the Poisson model and the quasi-Poisson model.
coeftest(n_kids_pois, vcov = vcovHC)
## Error: could not find function "coeftest"
The heteroskedasticity-robust standard errors of the Poison model are larger than those in the Poisson model, but smaller than those in the quasi-Poisson model.
The quasi-Poisson model from Question 5 allows for under- or over-dispersion of the variance, rather than assuming that it is fixed. It would be useful to test whether the weaker model is needed—that is, to test the null hypothesis \( H_0: \theta = 1 \). Use parametric bootstrapping to test this null hypothesis against the one-sided alternative hypothesis \( H_A: \theta > 1 \), by following these steps:
glm object, but can be obtained from the object produced by the summary function.)simulate function to generate many (say 1000) vectors of simulated outcome data based on the fitted n_kids_pois model. #R code here
Explain your interpretation here.
Fit a linear regression model using the data from your course project. Using the diagnostic tools we discussed in class, check the model's assumptions, including
Present graphical or numerical evidence regarding each assumption, and explain your interpretation of that evidence.
load("STORM_DMG.Rdata")
## Warning: cannot open compressed file 'STORM_DMG.Rdata', probable reason
## 'No such file or directory'
## Error: cannot open the connection
STORM_Merged <- merge(STORM, STORM_DMG)
## Error: object 'STORM' not found
TORNADO <- subset(STORM_Merged, EVENT_TYPE == "Tornado")
## Error: object 'STORM_Merged' not found
lm_tornado <- lm(CASUALITIES ~ TOR_F_SCALE + TOR_LENGTH + TOR_WIDTH, data = TORNADO)
## Error: object 'TORNADO' not found
summary(lm_tornado)
## Error: object 'lm_tornado' not found
plot(lm_tornado)
## Error: object 'lm_tornado' not found