Assignment 2

Val Pocus

October 11, 2015

Introduction (1)

Zhou et. al looked at the effect of homeownership on the number of mentally unhealthy days using the 2008 BRFSS data. They evaluated results using logistic, linear, Poisson, negative binomial, and zero-inflated negative binomial models.

Zhou, Hong, et al. “Peer Reviewed: Models for Count Data With an Application to Healthy Days Measures: Are You Driving in Screws With a Hammer?.” Preventing chronic disease 11 (2014).

Introduction (2)

This assignment attempted to:

Methods

Treatment of data and variables

Data cleaning

##All states have mentally unhealthy days. For this analysis, will limit to states that had mentally unhealthy days available in the 2009 BRFSS anyway (for comparison to Zhou et. al article): Alabama, Arkansas, California, Hawaii, Illinois, Kansas, Louisiana, Nebraska, New Mexico, Oklahoma, South Carolina, Wisconsin)
BRFSS13 <-  BRFSS13[BRFSS13$X_STATE %in% c(1, 5, 6, 15, 17, 20, 22, 31, 35, 40, 45, 55), ]

BRFSS13$X_STATE<-factor(BRFSS13$X_STATE, labels = c("Alabama", "Arkansas", "California", "Hawaii", "Illinois", "Kansas", "Louisiana", "Nebraska", "New Mexico", "Oklahoma", "South Carolina", "Wisconsin"))

##Looking at states where number of mentally unhealthy days is available
BRFSS13$X_STATE3 <-  factor(BRFSS13$X_STATE)

###Greate new age variable to exclude under age 35
BRFSS13$X_AGE_G2 <- factor(BRFSS13$X_AGE_G, labels=c("Age 18 to 24", "Age 25 to 34", "Age 35 to 44", "Age  45 to 54", "Age 55 to 64", "Age 65+"), levels=1:6)
levels(BRFSS13$X_AGE_G2)<-c(NA ,NA, "Age 35 to 44", "Age  45 to 54", "Age 55 to 64", "Age 65+")

###Creating new race/ethnicity variable
BRFSS13$X_RACEGR3C <- factor(BRFSS13$X_RACEGR3)
levels(BRFSS13$X_RACEGR3C) <- c("White Non-Hispanic", "Black Non-Hispanic", "Others", "Others", "Hispanic", NA, NA)

###Creating new education variable
BRFSS13$X_EDUCAG2 <- factor(BRFSS13$X_EDUCAG)
levels(BRFSS13$X_EDUCAG2) <- c("Less than high school", "High school graduate", "<4 yr of college", "=> 4 y of college", NA, NA)

###Creating new income variable
BRFSS13$INCOME3 <- factor(BRFSS13$X_INCOMG)
levels(BRFSS13$INCOME3) <- c("<25,000", "<25,000","25,000 to <50,000", "25,000 to <50,000","50,000 or more", "Unknown")

###Creating new marital status variable
BRFSS13$MARITAL <- factor(BRFSS13$X_IMPMRTL)
levels(BRFSS13$MARITAL) <- c("Married", "Divorced/Widowed/Separated","Divorced/Widowed/Separated", "Divorced/Widowed/Separated","Never married", NA)

###Creating new employment variable
BRFSS13$EMPLOY2 <- factor(BRFSS13$EMPLOY1)
levels(BRFSS13$EMPLOY2) <- c("Employed", "Employed","Unemployed", "Unemployed","Homemaker", NA, "Retired", "Unable to work", NA, NA)

###Creating household size variable
BRFSS13$X_CHLDCNT2<-BRFSS13$X_CHLDCNT
BRFSS13$X_CHLDCNT2[BRFSS13$X_CHLDCNT2 > 6] = NA
BRFSS13$HOUSEHOLD<- BRFSS13$NUMADULT + BRFSS13$X_CHLDCNT2

###Creating household size variable for descriptive statistics
BRFSS13$HOUSEHOLD2 <- factor(BRFSS13$HOUSEHOLD)
BRFSS13$HOUSEHOLDDESC <- cut(BRFSS13$HOUSEHOLD,
                     breaks=c(-Inf, 2, 4, 6, Inf),
                     labels=c("1 or 2","3 or 4","5 or 6", "7 or more"))

###Homeownership
BRFSS13$X_IMPHOME2 <- factor(BRFSS13$X_IMPHOME)
levels(BRFSS13$X_IMPHOME2) <- c("Own", "Does not own","Does not own")
BRFSS13$X_IMPHOME2 <- factor(BRFSS13$X_IMPHOME2, levels = c("Does not own", "Own"))

###Number of unhealthy days
BRFSS13$MENTHLTH2<-BRFSS13$MENTHLTH
BRFSS13$MENTHLTH2[BRFSS13$MENTHLTH2 == 99] = NA
BRFSS13$MENTHLTH2[BRFSS13$MENTHLTH2 == 77] = NA
BRFSS13$MENTHLTH2[BRFSS13$MENTHLTH2 == 88] = 0

###Creating mental health days for descriptive statistics
BRFSS13$MENTHLTHDESC <- factor(BRFSS13$MENTHLTH2)
BRFSS13$MENTHLTHDESC <- cut(BRFSS13$MENTHLTH2,
                     breaks=c(-Inf, 0, 10, 20, Inf),
                     labels=c("Zero","1-10","11-20", "21-30"))

###Sex
BRFSS13$SEX2 <- factor(BRFSS13$SEX)
levels(BRFSS13$SEX2) <- c("Male", "Female")

BRFSS13_2 <- subset(BRFSS13, select=c(MENTHLTH2, X_AGE_G2, SEX2, X_RACEGR3C, X_EDUCAG2, INCOME3, MARITAL, EMPLOY2, HOUSEHOLDDESC, X_IMPHOME2, MENTHLTHDESC))

BRFSS13_3<-subset(na.omit(BRFSS13_2))

Descriptive Statistics of Adults Aged 35 and Older

Age Group
  Age 35 to 44 7,013 (10%)
  Age 45 to 54 11,983 (17%)
  Age 55 to 64 19,329 (27%)
  Age 65+ 33,942 (47%)
Sex
  Male 26,555 (37%)
  Female 45,712 (63%)
Race/Ethnicity
  White Non-Hispanic 56,936 (79%)
  Black Non-Hispanic 6,463 (9%)
  Others 5,186 (7%)
  Hispanic 3,682 (5%)
Education
  Less than high school 5,943 (8%)
  High school graduate 21,964 (30%)
  <4 yr of college 19,728 (27%)
  => 4 y of college 24,632 (34%)
Income
  <25,000 18,867 (26%)
  25,000 to <50,000 17,588 (24%)
  50,000 or more 26,493 (37%)
  Unknown 9,319 (13%)
Marital status
  Married 41,198 (57%)
  Divorced/Widowed/Separated 25,506 (35%)
  Never married 5,563 (8%)
Employment status
  Employed 30,035 (42%)
  Unemployed 2,556 (4%)
  Homemaker 4,915 (7%)
  Retired 28,736 (40%)
  Unable to work 6,025 (8%)
Household size
  1 or 2 24,800 (34%)
  3 or 4 38,192 (53%)
  5 or 6 7,727 (11%)
  7 or more 1,548 (2%)
Homeownership
  Does not own 12,031 (17%)
  Own 60,236 (83%)
Number of mentally unhealthy days
  Zero 52,870 (73%)
  1-10 12,572 (17%)
  11-20 2,820 (4%)
  21-30 4,005 (6%)

Frequency distribution for mentally unhealthy days

Mean 2.97
Variance 53.98
Standard dev 7.35
Median 0.00
Quantiles.27% 0.00
Quantiles.75% 1.00

Model 1: Logistic regression (Less than 15 days vs 15 or more days)

Reference groups: aged 35 to 44, male, White Non-Hispanic, less than high school, income less than $25,000, married, employed, 1-2 in household, Nonhomeowner
Model 1: Logistic
     Intercept -2.01 (0.09)***
Covariates
     Aged 45 to 54 -0.08 (0.05)
     Aged 55 to 64 -0.33 (0.05)***
     Aged 65+ -0.83 (0.06)***
     Female 0.29 (0.03)***
     Black Non-Hispanic -0.25 (0.05)***
     Other race 0.07 (0.05)
     Hispanic 0.02 (0.06)
     High-school graduate -0.21 (0.05)***
     Less than 4 years of college -0.16 (0.05)***
     Greater than 4 years of college -0.36 (0.05)***
     $25,000 to <$50,000 -0.30 (0.04)***
     $50,000 or more -0.61 (0.05)***
     Unknown income -0.41 (0.05)***
     Divorced/Widowed/Separated 0.28 (0.04)***
     Never married 0.06 (0.06)
     Unemployed 0.95 (0.06)***
     Homemaker 0.20 (0.06)**
     Retired 0.24 (0.04)***
     Unable to work 1.74 (0.04)***
     3-4 in Household 0.08 (0.04)*
     5-6 in Household 0.01 (0.06)
     7+ in Household 0.02 (0.10)
Homeownership
     Homeowner -0.13 (0.03)***
AIC 38479.44
BIC 38699.95
Log Likelihood -19215.72
Deviance 38431.44
Num. obs. 72267
p < 0.001, p < 0.01, p < 0.05
## Loading required package: lattice
## Classes and Methods for R developed in the
## 
## Political Science Computational Laboratory
## 
## Department of Political Science
## 
## Stanford University
## 
## Simon Jackman
## 
## hurdle and zeroinfl functions by Achim Zeileis

Table 2: Comparison of Regression Models in Examining the Association Between Homeownership and Number of Mentally Unhealthy Days in the Previous Month, 2013 Behavioral Risk Factor Surveillance System

Reference group: Nonhomeowner
Model 1: Logistic Model 2: Linear Model 3: Poisson Model 4: Negative binomial
Homeowner -0.13*** -0.53*** -0.11*** -0.13***
(0.03) (0.08) (0.01) (0.03)
AIC 38479.44 720413.83 213462.21
BIC 38699.95 720634.34 213691.91
Log Likelihood -19215.72 -360182.91 -106706.11
Deviance 38431.44 647901.06 41345.58
Num. obs. 72267 72267 72267 72267
R2 0.11
Adj. R2 0.11
RMSE 6.93
p < 0.001, p < 0.01, p < 0.05

Table 2 (Cont): Comparison of Regression Models in Examining the Association Between Homeownership and Number of Mentally Unhealthy Days in the Previous Month, 2013 Behavioral Risk Factor Surveillance System

Reference group: Nonhomeowner
Model 5: Zero-Inflated Negative binomial
Negative binomial component -0.04 (0.02)*
Zero-inflated component 0.14 (0.03)***
AIC 206523.56
Log Likelihood -103212.78
Num. obs. 72267
p < 0.001, p < 0.01, p < 0.05

Comparison of covariates across models

##                                      Logistic       Linear      Poisson
## (Intercept)                       -2.01376045  3.910324922  1.238590581
## X_AGE_G2Age  45 to 54             -0.08141413 -0.007948733 -0.016999279
## X_AGE_G2Age 55 to 64              -0.32604219 -0.712660265 -0.193056768
## X_AGE_G2Age 65+                   -0.83143133 -1.861768428 -0.591555646
## SEX2Female                         0.28542464  0.722097185  0.282339807
## X_RACEGR3CBlack Non-Hispanic      -0.24508951 -0.608271245 -0.152693736
## X_RACEGR3COthers                   0.07300584  0.179696586  0.058759651
## X_RACEGR3CHispanic                 0.02024939 -0.051286079  0.002704846
## X_EDUCAG2High school graduate     -0.21110462 -0.492146847 -0.134320493
## X_EDUCAG2<4 yr of college         -0.16369572 -0.297392856 -0.084314505
## X_EDUCAG2=> 4 y of college        -0.35740385 -0.563900420 -0.200554133
## INCOME325,000 to <50,000          -0.29710646 -0.864939236 -0.222650261
## INCOME350,000 or more             -0.60959528 -1.323758916 -0.439133937
## INCOME3Unknown                    -0.40587488 -1.193544860 -0.327554481
## MARITALDivorced/Widowed/Separated  0.28010256  0.725441632  0.192004290
## MARITALNever married               0.06447790  0.224594486  0.055598925
## EMPLOY2Unemployed                  0.94754342  2.739804237  0.695162510
## EMPLOY2Homemaker                   0.19712990  0.329212394  0.114245390
## EMPLOY2Retired                     0.23878862  0.534208651  0.135471834
## EMPLOY2Unable to work              1.74439060  6.999061283  1.232615817
## HOUSEHOLDDESC3 or 4                0.08302426  0.398162935  0.056171743
## HOUSEHOLDDESC5 or 6                0.01174552  0.238707319  0.057857771
## HOUSEHOLDDESC7 or more             0.01823692  0.234643044  0.054679828
## X_IMPHOME2Own                     -0.13305486 -0.527137027 -0.108673417
##                                   Negative Binomial
## (Intercept)                              1.23425423
## X_AGE_G2Age  45 to 54                   -0.02866192
## X_AGE_G2Age 55 to 64                    -0.17737711
## X_AGE_G2Age 65+                         -0.52877957
## SEX2Female                               0.32107459
## X_RACEGR3CBlack Non-Hispanic            -0.05050304
## X_RACEGR3COthers                         0.09443010
## X_RACEGR3CHispanic                       0.10358262
## X_EDUCAG2High school graduate           -0.20708074
## X_EDUCAG2<4 yr of college               -0.13408249
## X_EDUCAG2=> 4 y of college              -0.27408608
## INCOME325,000 to <50,000                -0.25029728
## INCOME350,000 or more                   -0.44015257
## INCOME3Unknown                          -0.36975206
## MARITALDivorced/Widowed/Separated        0.23602659
## MARITALNever married                     0.11496953
## EMPLOY2Unemployed                        0.69383447
## EMPLOY2Homemaker                         0.08988737
## EMPLOY2Retired                           0.11361531
## EMPLOY2Unable to work                    1.24583577
## HOUSEHOLDDESC3 or 4                      0.05363712
## HOUSEHOLDDESC5 or 6                      0.10030186
## HOUSEHOLDDESC7 or more                   0.07575185
## X_IMPHOME2Own                           -0.13215462
##                                   Zero-inflated Negative Binomial
## (Intercept)                                           2.237279896
## X_AGE_G2Age  45 to 54                                -0.018661587
## X_AGE_G2Age 55 to 64                                 -0.063620824
## X_AGE_G2Age 65+                                      -0.198447400
## SEX2Female                                            0.081303153
## X_RACEGR3CBlack Non-Hispanic                         -0.024928091
## X_RACEGR3COthers                                      0.108776238
## X_RACEGR3CHispanic                                    0.074270854
## X_EDUCAG2High school graduate                        -0.130558804
## X_EDUCAG2<4 yr of college                            -0.125434133
## X_EDUCAG2=> 4 y of college                           -0.293121776
## INCOME325,000 to <50,000                             -0.150386205
## INCOME350,000 or more                                -0.293894263
## INCOME3Unknown                                       -0.112472668
## MARITALDivorced/Widowed/Separated                     0.147685723
## MARITALNever married                                  0.039647325
## EMPLOY2Unemployed                                     0.401006696
## EMPLOY2Homemaker                                      0.097687545
## EMPLOY2Retired                                        0.121252368
## EMPLOY2Unable to work                                 0.690663655
## HOUSEHOLDDESC3 or 4                                  -0.002389577
## HOUSEHOLDDESC5 or 6                                  -0.046352810
## HOUSEHOLDDESC7 or more                               -0.031280354
## X_IMPHOME2Own                                         0.022146064

Model fit comparisons

Logistic Linear Poisson Negative Binomial Zero-inflated Negative Binomial
logLik -19216.00 -242449.00 -360183.00 -106706.00 -105452.00
Df 24.00 25.00 24.00 25.00 27.00
Observed vs. Model zero counts
Obs 80650.00
Logistic 17998.00
Poisson 8860.00
NB 52175.00
Zero-inflated Negative Binomial 52792.00

Plots

Vuong test - comparison of negative binomial and zero-inflated negative binomial models

## Vuong Non-Nested Hypothesis Test-Statistic: 
## (test-statistic is asymptotically distributed N(0,1) under the
##  null that the models are indistinguishible)
## -------------------------------------------------------------
##               Vuong z-statistic             H_A    p-value
## Raw                   -24.78722 model2 > model1 < 2.22e-16
## AIC-corrected         -24.74768 model2 > model1 < 2.22e-16
## BIC-corrected         -24.56603 model2 > model1 < 2.22e-16

End

Packages

install.packages(“xtable”) install.packages(“Gmisc”) install.packages(“texreg”) install.packages(“pscl”)

getwd() setwd(“/Users/valeriepocus/Documents/BIOS751”) setwd(“G:/My Documents/Misc/School/BIOS751”) BRFSS13 = read.csv(“BRFSS2013_Data.csv”)