Author: Elhakim Ibrahim
Instructor: Corey Sparks, PhD1
April 06, 2020
In this exercise, we shall:
offset term is necessary? Why or why not?Poisson regression model for the outcome
Negative binomial modelmodel fits of the alternative models based on AIC valuesThe exercise is based on data for the State of Texas extracted from nationally representative 2016 Behavioral Risk Factor Surveillance System (BRFSS) SMART metro area survey data set were employed2.
setwd("C:/A/01/02S20/D283/H/05")
suppressPackageStartupMessages({
library(car)
library(dplyr)
library(MASS)
library(gtools)
library(lmtest)
library(survey)
library(sjPlot)
library(sjmisc)
library(forcats)
library(stargazer)
library(questionr)
library(htmlTable)
library(sjlabelled)
})load(url("https://github.com/coreysparks/data/blob/master/brfss_2017.Rdata?raw=true"))
mydata0 <- brfss_17
rawnames <- names(mydata0)
newnames <- tolower(gsub(pattern = "_", replacement = "", x = rawnames))
# above identifies and removes '_' from the variable names and changes all
# to lowercase
names(mydata0) <- newnames
head(newnames, n = 35) [1] "dispcode" "statere1" "safetime" "hhadult" "genhlth" "physhlth"
[7] "menthlth" "poorhlth" "hlthpln1" "persdoc2" "medcost" "checkup1"
[13] "bphigh4" "bpmeds" "cholchk1" "toldhi2" "cholmed1" "cvdinfr4"
[19] "cvdcrhd4" "cvdstrk3" "asthma3" "asthnow" "chcscncr" "chcocncr"
[25] "chccopd1" "havarth3" "addepev2" "chckidny" "diabete3" "diabage2"
[31] "lmtjoin3" "arthdis2" "arthsocl" "joinpai1" "sex"
Alcohol consumption
Our outcome of interest is number of days respondent consumed alcohol. We will measure this as a count variable
# explore variable number of alcday5
summarytools::freq(mydata0$alcday5, round.digits = 1, cumul = T, report.nas = T)Frequencies
mydata0$alcday5
Label: DAYS IN PAST 30 HAD ALCOHOLIC BEVERAGE
Type: Numeric
Freq % Valid % Valid Cum. % Total % Total Cum.
----------- -------- --------- -------------- --------- --------------
101 9402 4.3 4.3 4.1 4.1
102 7786 3.5 7.8 3.4 7.4
103 5242 2.4 10.2 2.3 9.7
104 2477 1.1 11.3 1.1 10.8
105 2317 1.1 12.4 1.0 11.8
106 739 0.3 12.7 0.3 12.1
107 4392 2.0 14.7 1.9 14.0
201 18747 8.5 23.2 8.1 22.1
202 14233 6.5 29.7 6.2 28.3
203 8112 3.7 33.4 3.5 31.8
204 6823 3.1 36.5 3.0 34.8
205 6849 3.1 39.6 3.0 37.7
206 2494 1.1 40.8 1.1 38.8
207 1769 0.8 41.6 0.8 39.6
208 2021 0.9 42.5 0.9 40.5
209 179 0.1 42.6 0.1 40.5
210 5266 2.4 45.0 2.3 42.8
211 37 0.0 45.0 0.0 42.8
212 1029 0.5 45.5 0.4 43.3
213 42 0.0 45.5 0.0 43.3
214 350 0.2 45.6 0.2 43.4
215 4238 1.9 47.6 1.8 45.3
216 117 0.1 47.6 0.1 45.3
217 54 0.0 47.6 0.0 45.4
218 114 0.1 47.7 0.0 45.4
219 13 0.0 47.7 0.0 45.4
220 4190 1.9 49.6 1.8 47.2
221 105 0.0 49.6 0.0 47.3
222 86 0.0 49.7 0.0 47.3
223 45 0.0 49.7 0.0 47.3
224 102 0.0 49.8 0.0 47.4
225 1812 0.8 50.6 0.8 48.2
226 94 0.0 50.6 0.0 48.2
227 109 0.0 50.7 0.0 48.2
228 469 0.2 50.9 0.2 48.4
229 194 0.1 51.0 0.1 48.5
230 7606 3.5 54.4 3.3 51.8
777 1712 0.8 55.2 0.7 52.6
888 97541 44.4 99.6 42.2 94.8
999 913 0.4 100.0 0.4 95.2
<NA> 11055 4.8 100.0
Total 230875 100.0 100.0 100.0 100.0
# subset data and limit to realistic number of alcday5
mydata0 = subset(mydata0, alcday5 < 777)
# check the variable structure and counts
str(mydata0$alcday5) num [1:119654] 107 215 210 204 201 102 101 205 107 107 ...
Min. 1st Qu. Median Mean 3rd Qu. Max.
101 107 202 180 207 230
Frequencies
mydata0$alcday5
Type: Numeric
Freq % Valid % Valid Cum. % Total % Total Cum.
----------- -------- --------- -------------- --------- --------------
101 9402 7.9 7.9 7.9 7.9
102 7786 6.5 14.4 6.5 14.4
103 5242 4.4 18.7 4.4 18.7
104 2477 2.1 20.8 2.1 20.8
105 2317 1.9 22.8 1.9 22.8
106 739 0.6 23.4 0.6 23.4
107 4392 3.7 27.0 3.7 27.0
201 18747 15.7 42.7 15.7 42.7
202 14233 11.9 54.6 11.9 54.6
203 8112 6.8 61.4 6.8 61.4
204 6823 5.7 67.1 5.7 67.1
205 6849 5.7 72.8 5.7 72.8
206 2494 2.1 74.9 2.1 74.9
207 1769 1.5 76.4 1.5 76.4
208 2021 1.7 78.1 1.7 78.1
209 179 0.1 78.2 0.1 78.2
210 5266 4.4 82.6 4.4 82.6
211 37 0.0 82.6 0.0 82.6
212 1029 0.9 83.5 0.9 83.5
213 42 0.0 83.5 0.0 83.5
214 350 0.3 83.8 0.3 83.8
215 4238 3.5 87.4 3.5 87.4
216 117 0.1 87.5 0.1 87.5
217 54 0.0 87.5 0.0 87.5
218 114 0.1 87.6 0.1 87.6
219 13 0.0 87.6 0.0 87.6
220 4190 3.5 91.1 3.5 91.1
221 105 0.1 91.2 0.1 91.2
222 86 0.1 91.3 0.1 91.3
223 45 0.0 91.3 0.0 91.3
224 102 0.1 91.4 0.1 91.4
225 1812 1.5 92.9 1.5 92.9
226 94 0.1 93.0 0.1 93.0
227 109 0.1 93.1 0.1 93.1
228 469 0.4 93.5 0.4 93.5
229 194 0.2 93.6 0.2 93.6
230 7606 6.4 100.0 6.4 100.0
<NA> 0 0.0 100.0
Total 119654 100.0 100.0 100.0 100.0
It is hypothesized that number of alcohol consumption days will not differ signifcantly by racial/ethinic identity net of other covariates. To test the hypothesis, we employ race/ethnicity as key indpendent variable and would control for age, sex, educational attainment, employment and smoking statuses as important covariates. Model specification for the outcome will include an offset term. This is due to unequal population shares and expectedly unequal rate of distribution of parity across racial/ethnic groups.
Race/ethnic identity
mydata0$racegr3 <- as.factor(mydata0$racegr3)
mydata0 <- mydata0 %>% mutate(racegrp = factor(racegr3, levels = c(1, 2, 3,
4, 5), labels = c("NH White", "NH Black", "Other", "Other", "Hispanic")))
mydata0$racegrp <- fct_relevel(mydata0$racegrp, "NH White", "NH Black", "Hispanic")
summarytools::freq(mydata0$racegrp, round.digits = 1, cumul = T, report.nas = T)Frequencies
mydata0$racegrp
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
-------------- -------- --------- -------------- --------- --------------
NH White 92815 78.9 78.9 77.6 77.6
NH Black 9634 8.2 87.1 8.1 85.6
Hispanic 9237 7.8 94.9 7.7 93.3
Other 6003 5.1 100.0 5.0 98.4
<NA> 1965 1.6 100.0
Total 119654 100.0 100.0 100.0 100.0
Age group
mydata0$agegrp <- cut(mydata0$age80, breaks = c(0, 24, 39, 59, 79, 99))
summarytools::freq(mydata0$agegrp, round.digits = 1, cumul = T, report.nas = T)Frequencies
mydata0$agegrp
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
------------- -------- --------- -------------- --------- --------------
(0,24] 7587 6.3 6.3 6.3 6.3
(24,39] 24167 20.2 26.5 20.2 26.5
(39,59] 40422 33.8 60.3 33.8 60.3
(59,79] 40801 34.1 94.4 34.1 94.4
(79,99] 6677 5.6 100.0 5.6 100.0
<NA> 0 0.0 100.0
Total 119654 100.0 100.0 100.0 100.0
Sex
mydata0$sex <- Recode(mydata0$sex, recodes = "1='Male';2='Female';else=NA",
as.factor = T)
summarytools::freq(mydata0$sex, round.digits = 1, cumul = T, report.nas = T)Frequencies
mydata0$sex
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
------------ -------- --------- -------------- --------- --------------
Female 60593 50.7 50.7 50.6 50.6
Male 58996 49.3 100.0 49.3 99.9
<NA> 65 0.1 100.0
Total 119654 100.0 100.0 100.0 100.0
Education attainment
mydata0$edu <- Recode(mydata0$educa, recodes = "1:2='0Primary';3='1Some HS';4='2HS Grad';
5='3Some Coll';6='4Coll Grad';9=NA",
as.factor = T)
mydata0$edu <- relevel(mydata0$edu, "4Coll Grad", )
summarytools::freq(mydata0$edu, round.digits = 1, cumul = T, report.nas = T)Frequencies
mydata0$edu
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
---------------- -------- --------- -------------- --------- --------------
4Coll Grad 60998 51.1 51.1 51.0 51.0
0Primary 1168 1.0 52.1 1.0 52.0
1Some HS 2971 2.5 54.6 2.5 54.4
2HS Grad 22669 19.0 73.5 18.9 73.4
3Some Coll 31586 26.5 100.0 26.4 99.8
<NA> 262 0.2 100.0
Total 119654 100.0 100.0 100.0 100.0
Employment status
mydata0$employ <- Recode(mydata0$employ1, recodes = "1:2='0Employed';2:6='3NILF';7='2Retired';
8='1Unemployed';else=NA",
as.factor = T)
summarytools::freq(mydata0$employ, round.digits = 1, cumul = T, report.nas = T)Frequencies
mydata0$employ
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
----------------- -------- --------- -------------- --------- --------------
0Employed 72056 60.6 60.6 60.2 60.2
1Unemployed 3884 3.3 63.9 3.2 63.5
2Retired 30347 25.5 89.4 25.4 88.8
3NILF 12642 10.6 100.0 10.6 99.4
<NA> 725 0.6 100.0
Total 119654 100.0 100.0 100.0 100.0
Smoking status
mydata0$smoke <- Recode(mydata0$smoker3, recodes = "1:2='Current'; 3='Former';
4='Never smoked'; else=NA",
as.factor = T)
mydata0$smoke <- relevel(mydata0$smoke, ref = "Never smoked")
summarytools::freq(mydata0$smoke, round.digits = 1, cumul = T, report.nas = T)Frequencies
mydata0$smoke
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
------------------ -------- --------- -------------- --------- --------------
Never smoked 66616 56.0 56.0 55.7 55.7
Current 16487 13.9 69.8 13.8 69.5
Former 35910 30.2 100.0 30.0 99.5
<NA> 641 0.5 100.0
Total 119654 100.0 100.0 100.0 100.0
alcday5 racegrp agegrp sex
Min. :101 NH White:91781 (0,24] : 7382 Female:59073
1st Qu.:107 NH Black: 9485 (24,39]:23435 Male :57125
Median :202 Hispanic: 9036 (39,59]:39226
Mean :180 Other : 5896 (59,79]:39722
3rd Qu.:207 (79,99]: 6433
Max. :230
edu employ smoke
4Coll Grad:59460 0Employed :70480 Never smoked:65077
0Primary : 1110 1Unemployed: 3759 Current :16025
1Some HS : 2879 2Retired :29603 Former :35096
2HS Grad :22044 3NILF :12356
3Some Coll:30705
mmsawt ststr
Min. : 0 Min. : 1011
1st Qu.: 122 1st Qu.:13051
Median : 306 Median :28042
Mean : 686 Mean :26200
3rd Qu.: 748 3rd Qu.:39099
Max. :35484 Max. :52122
Absolute distribution
NH White NH Black Hispanic Other Sum
101 6730 873 934 557 9094
102 5915 682 592 393 7582
103 4218 366 307 219 5110
104 2033 144 141 98 2416
105 2012 93 90 74 2269
106 639 26 25 24 714
107 3751 181 159 154 4245
201 12920 2056 2075 1102 18153
202 10086 1497 1398 818 13799
203 5999 761 705 416 7881
204 5089 588 591 352 6620
205 5365 517 471 314 6667
206 1961 160 176 121 2418
207 1410 117 113 90 1730
208 1620 117 147 94 1978
209 141 14 11 9 175
210 4279 319 299 222 5119
211 27 2 4 3 36
212 844 50 67 46 1007
213 34 1 4 1 40
214 276 26 26 14 342
215 3530 213 215 168 4126
216 97 2 11 4 114
217 40 6 1 5 52
218 90 8 8 4 110
219 7 3 0 1 11
220 3625 166 134 152 4077
221 83 2 7 6 98
222 82 1 1 1 85
223 39 4 0 1 44
224 87 6 4 4 101
225 1632 49 40 48 1769
226 84 3 2 3 92
227 96 6 1 3 106
228 422 12 9 12 455
229 172 4 2 9 187
230 6346 410 266 354 7376
Sum 91781 9485 9036 5896 116198
poisson.fit <- svyglm(alcday5 ~ factor(racegrp) + factor(agegrp) + factor(sex) +
factor(edu) + factor(employ) + factor(smoke), design = survdes, family = poisson)
Call:
svyglm(formula = alcday5 ~ factor(racegrp) + factor(agegrp) +
factor(sex) + factor(edu) + factor(employ) + factor(smoke),
design = survdes, family = poisson)
Survey design:
svydesign(ids = ~1, strata = ~ststr, weights = ~mmsawt, data = mydata)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.202104 0.006094 853.59 < 2e-16 ***
factor(racegrp)NH Black 0.001926 0.005380 0.36 0.72032
factor(racegrp)Hispanic -0.002394 0.005872 -0.41 0.68346
factor(racegrp)Other 0.000495 0.007311 0.07 0.94599
factor(agegrp)(24,39] -0.011627 0.006199 -1.88 0.06072 .
factor(agegrp)(39,59] -0.017805 0.006135 -2.90 0.00371 **
factor(agegrp)(59,79] -0.019309 0.007061 -2.73 0.00625 **
factor(agegrp)(79,99] -0.045230 0.012126 -3.73 0.00019 ***
factor(sex)Male -0.014553 0.003209 -4.53 0.0000057708 ***
factor(edu)0Primary -0.007652 0.015779 -0.48 0.62772
factor(edu)1Some HS -0.006887 0.010102 -0.68 0.49540
factor(edu)2HS Grad 0.013108 0.004362 3.01 0.00266 **
factor(edu)3Some Coll 0.008221 0.003832 2.15 0.03191 *
factor(employ)1Unemployed 0.044737 0.007631 5.86 0.0000000046 ***
factor(employ)2Retired 0.011776 0.005490 2.15 0.03195 *
factor(employ)3NILF -0.003676 0.005119 -0.72 0.47263
factor(smoke)Current 0.002360 0.005013 0.47 0.63778
factor(smoke)Former -0.007652 0.004040 -1.89 0.05821 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 12.5)
Number of Fisher Scoring iterations: 4
factor(racegrp)NH Black factor(racegrp)Hispanic
1.00 1.00
factor(racegrp)Other factor(agegrp)(24,39]
1.00 0.99
factor(agegrp)(39,59] factor(agegrp)(59,79]
0.98 0.98
factor(agegrp)(79,99] factor(sex)Male
0.96 0.99
factor(edu)0Primary factor(edu)1Some HS
0.99 0.99
factor(edu)2HS Grad factor(edu)3Some Coll
1.01 1.01
factor(employ)1Unemployed factor(employ)2Retired
1.05 1.01
factor(employ)3NILF factor(smoke)Current
1.00 1.00
factor(smoke)Former
0.99
Interpretation:
The results suggest that no significant difference in mean count of days of alcohol comsumption by race/ethnicity. Meanwhile, the average count is inversely associated with age with significant reduction in the mean as age increased. The mean count is also lower for males than females. Compared with college graduates, those who are high school graduates and those with some college education had higher mean count of alcohol consumption days. While the retirees had higher mean count of alcohol consumption days than those in other employment groups, no signifcant difference was observed by smoking status.
Check for overdispersion
Overdispersion measure tells about variablity in our data and the degree to which our selected model fit the data. If there is more variability than expected in the data, it is overdispersed. In other words, there is overdispersion which basically implies that the model do not fit the data. The main problem with overdispersion is that it leads to standard errors for model parameters that are too small. Thus, if present and not accounted for, overdispersion could lead us to wrongly infer a relationship as being significant when indeed is not!
We will evaluate overdispersion using two easy approaches:
ratio should equal 1 if the model fits the data; implies that the mean and variance are equaltest should be insignificant with p-value>0 if the model fits the dataNote that the svyglm() function includes a scaling term for overdispersion, so this is already taken into account. But assuming we have a non-complex survey data without survey design parameters, we can measure overdispersionby using the residual deviance.
To do that, we would refit our model using glm () function and without survey design specification as implemented below.
Refit model
poisson.fit2 = glm(alcday5 ~ factor(racegrp) + factor(agegrp) + factor(sex) +
factor(edu) + factor(employ) + factor(smoke), data = mydata0, family = poisson)
summary(poisson.fit2)
Call:
glm(formula = alcday5 ~ factor(racegrp) + factor(agegrp) + factor(sex) +
factor(edu) + factor(employ) + factor(smoke), family = poisson,
data = mydata0)
Deviance Residuals:
Min 1Q Median 3Q Max
-7.04 -5.83 1.66 1.99 4.05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.204404 0.001005 5177.06 < 2e-16 ***
factor(racegrp)NH Black -0.002689 0.000817 -3.29 0.00100 **
factor(racegrp)Hispanic -0.002716 0.000866 -3.14 0.00170 **
factor(racegrp)Other 0.000540 0.001012 0.53 0.59347
factor(agegrp)(24,39] -0.010547 0.001017 -10.37 < 2e-16 ***
factor(agegrp)(39,59] -0.014890 0.000980 -15.19 < 2e-16 ***
factor(agegrp)(59,79] -0.013283 0.001061 -12.53 < 2e-16 ***
factor(agegrp)(79,99] -0.030900 0.001431 -21.59 < 2e-16 ***
factor(sex)Male -0.007996 0.000445 -17.97 < 2e-16 ***
factor(edu)0Primary -0.005564 0.002332 -2.39 0.01705 *
factor(edu)1Some HS 0.003872 0.001467 2.64 0.00829 **
factor(edu)2HS Grad 0.003547 0.000614 5.78 7.4e-09 ***
factor(edu)3Some Coll 0.007298 0.000538 13.55 < 2e-16 ***
factor(employ)1Unemployed 0.036848 0.001266 29.12 < 2e-16 ***
factor(employ)2Retired 0.005086 0.000699 7.28 3.4e-13 ***
factor(employ)3NILF 0.002659 0.000753 3.53 0.00042 ***
factor(smoke)Current 0.001589 0.000688 2.31 0.02095 *
factor(smoke)Former -0.003952 0.000516 -7.66 1.9e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1614201 on 116197 degrees of freedom
Residual deviance: 1612008 on 116180 degrees of freedom
(3456 observations deleted due to missingness)
AIC: 2423802
Number of Fisher Scoring iterations: 4
Residuals ratio test
# Compare ratios of residuals
resid.ratios = sqrt(poisson.fit2$deviance/poisson.fit2$df.residual)
resid.ratios[1] 3.72
The ratio exceeds the expected value of 1. This indicates overdispersion as the variance much larger than the mean.
Goodness of fit test
# Use Chi-square of residuals
resid.chisq = 1 - pchisq(poisson.fit2$deviance, df = poisson.fit2$df.residual)
resid.chisq[1] 0
The value is significant; p-values=0. This further confirms overdispersion and that the model does not fit the data.
In fitting negative binomial models R will not use survey design. Hence, we will fit the model using sample weights. We achieve this by standardizing the weights to equal the sample size rather than the population size by dividing each person’s weight by mean weight. We will also use clustered standard errors to calculate the standard errors for the model which account for homogeneity within strata.
Compare coefficients generated with clustered standard errors with coefficient in model with non-clustered specification.
library(sandwich)
coeftest(poisson.fit2, vcov = vcovHC(poisson.fit2, type = "HC1", cluster = "ststr"))
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.20440 0.00338 1541.62 < 2e-16 ***
factor(racegrp)NH Black -0.00269 0.00276 -0.97 0.32980
factor(racegrp)Hispanic -0.00272 0.00292 -0.93 0.35142
factor(racegrp)Other 0.00054 0.00349 0.15 0.87715
factor(agegrp)(24,39] -0.01055 0.00340 -3.11 0.00190 **
factor(agegrp)(39,59] -0.01489 0.00328 -4.55 0.0000054445 ***
factor(agegrp)(59,79] -0.01328 0.00360 -3.69 0.00022 ***
factor(agegrp)(79,99] -0.03090 0.00511 -6.04 0.0000000015 ***
factor(sex)Male -0.00800 0.00157 -5.08 0.0000003681 ***
factor(edu)0Primary -0.00556 0.00816 -0.68 0.49518
factor(edu)1Some HS 0.00387 0.00502 0.77 0.44072
factor(edu)2HS Grad 0.00355 0.00215 1.65 0.09890 .
factor(edu)3Some Coll 0.00730 0.00189 3.87 0.00011 ***
factor(employ)1Unemployed 0.03685 0.00393 9.38 < 2e-16 ***
factor(employ)2Retired 0.00509 0.00252 2.02 0.04349 *
factor(employ)3NILF 0.00266 0.00258 1.03 0.30296
factor(smoke)Current 0.00159 0.00244 0.65 0.51408
factor(smoke)Former -0.00395 0.00185 -2.14 0.03267 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
negbin.fit1 = glm.nb(alcday5 ~ factor(racegrp), data = mydata, weights = mmsawt/mean(mmsawt,
na.rm = T))
negbin.fit2 = glm.nb(alcday5 ~ factor(racegrp) + factor(agegrp) + factor(sex) +
factor(edu) + factor(employ) + factor(smoke), data = mydata, weights = mmsawt/mean(mmsawt,
na.rm = T))
# clx2(fit.nb2,cluster =sub$ststr)
test1 <- coeftest(negbin.fit1, vcov = vcovHC(negbin.fit2, type = "HC1", cluster = "ststr"))
test2 <- coeftest(negbin.fit2, vcov = vcovHC(negbin.fit2, type = "HC1", cluster = "ststr"))
stargazer(negbin.fit1, negbin.fit2, style = "demography", type = "text", t.auto = F,
p.auto = F, coef = list(test1[, 1], test2[, 1]), se = list(test1[, 2], test2[,
2]), p = list(test1[, 4], test2[, 4]), font.size = "normalsize", dep.var.caption = c("Outcome: Parity"),
dep.var.labels.include = F, align = T, single.row = T, no.space = T, column.sep.width = "4pt",
covariate.labels = c("NH Black", "Hispanic", "Other", "25-39", "40-59",
"60-79", "80+", "Male", "Primary", "Some HS", "HS Grad", "Some Coll",
"Unemployed", "NILF", "Retired", "Current", "Former"), column.labels = c("Model 1, ß(SE)",
"Model 2, ß(SE)"), column.separate = c(1, 1), star.cutoffs = c(0.05,
0.01, 0.001), object.names = F, model.numbers = F, report = "vcs*",
notes.align = "l", notes.label = "Notes:", multicolumn = T, out = "tabnegativebinomial.htm")
--------------------------------------------------
Model 1, ß(SE) Model 2, ß(SE)
--------------------------------------------------
NH Black 0.007 (0.005) 0.002 (0.005)
Hispanic -0.002 (0.006) -0.003 (0.006)
Other 0.002 (0.007) 0.0004 (0.007)
25-39 -0.012 (0.006)
40-59 -0.018 (0.006)**
60-79 -0.019 (0.007)**
80+ -0.045 (0.012)***
Male -0.015 (0.003)***
Primary -0.008 (0.016)
Some HS -0.007 (0.010)
HS Grad 0.013 (0.004)**
Some Coll 0.008 (0.004)*
Unemployed 0.045 (0.008)***
NILF 0.012 (0.006)*
Retired -0.004 (0.005)
Current 0.002 (0.005)
Former -0.008 (0.004)
Constant 5.180 (0.006)*** 5.200 (0.006)***
N 116,198 116,198
Log Likelihood -621,903.000 -621,758.000
theta 12.500*** (0.055) 12.500*** (0.055)
AIC 1,243,814.000 1,243,552.000
--------------------------------------------------
Notes: *p < .05; **p < .01; ***p < .001
Model evaluation:
Both the log likelihood and AIC values of model 2 are lower than observed for model 1. This result indicates that model 2 with additional covariates fit the data better than model 1 with race/ethnicity as the only predictor. In other words, model 2 explain more variations in the outcome based on the data than model 1.
This content was inspired by Dr. Corey Sparks’ lab session and his easy-to-follow instructional materials accessible at https://rpubs.com/corey_sparks.
The 2016 BRFSS SMART metro area survey data are open source resources made available by the US Centers for Disease Control and Prevention at https://www.cdc.gov/brfss/smart/smart_2016.html.
Stargazer is authored by Marek Hlavac and full document can be downloaded from https://cran.r-project.org/web/packages/stargazer/stargazer.pdf