Introduction

Background

In the article by Zhou et. al1, the authors looked at the number of mentally unhealthy days in the past month as reported in the 2008 Behavioral Risk Factor Surveillance System (BRFSS). The authors assessed the association of a number of variables, including homeownership, on the number of mentally unhealthy days using logistic, linear, Poisson, negative binomial, and zero-inflated negative binomial models. This assignment attempted to reproduce the methods in this article using the 2013 BRFSS data and assess model fit.

1Zhou, Hong, et al. “Peer Reviewed: Models for Count Data With an Application to Healthy Days Measures: Are You Driving in Screws With a Hammer?.” Preventing chronic disease 11 (2014).

Methods

Data source and treatment of variables

Data from the 2013 BRFSS questionnaire was used in this assignment. Results were limited to the same 12 states used in Zhou et. al’s analysis to improve comparison of the results. The dependent variable was the number of mentally unhealthy days in the past month, a count variable that ranged from 0 to 30 days. The independent variable of interest was homeownership, which classified respondents as homeowner and non-homeowner (the latter of which included those who reported renting a home). Other covariates were constructed to follow the same categories used in the Zhou et. al analysis. The covariates included age group (35 to 44, 45 to 54, 55 to 64, and 65 or over), sex, race/ethnicity (non-Hispanic White, non-Hispanic Black, Hispanic, and all others), education (less than high school, high school graduate, attended college, and graduated from college), income (<$25,000, $25,000 to <$50,000, $50,000 or more, and unknown income), marital status (married, divorced/widowed/separated, and never married), employment status (employed, unemployed, homemaker, retired, and unable to work), and household size (1-2, 3-4, 5-6, and 7 or greater). Those who were excluded included respondents under age 35, those who reported being a student in their employment status, and those who reported missing or unknown for all variables (with the exception of income). The analyzed sample included 72,267 adults.

Data analysis

Five models were used to compare the association between the number of mentally unhealthy days and homeownership. These included logistic regression (using a dichotomous outcome variable of less than 15 mentally unhealthy days and 15 or more mentally unhealthy days), linear regression, Poisson regression, negative binomial regression, and zero-inflated negative binomial regression. Each model adjusted was adjusted for the covariates (age group, sex, race/ethnicity, education, income level, marital status, employment status, and household size). R and RStudio were used to perform all data cleaning and statistical analysis.

Results

Descriptive statistics

Among those included in the analysis, 83% reported owning a home, and 17% reported not owning or renting a home (Table 1). These proportions were comparable to Zhou et. al’s analysis (79% vs 21%, respectively). 73% reported having zero mentally unhealthy days, and 17%, 4%, and 6% reported having 1-10, 11-20, and 21-30 mentally unhealthy days, respectively. As indicated by the histogram in Figure 1, there was a positive skew in the distribution of mentally unhealthy days, with a mean of 3 days and a variance of 54. Positive skew was also evident in the 2008 BRFSS data.

Comparison of regression models

Across all models, there was found to be a statistically significant association at the 0.05 level between homeownership and the number of mentally unhealthy days (Table 2). In the logistic model, the parameter estimate of homeownership and having 14 or more mentally unhealthy days was -0.13 (adjusted odds ratio = 0.88, 95% CI 0.82-0.94, p-value < 0.001). The parameter estimates in the linear, Poisson, and negative binomial models were -0.52, -0.11 , and -0.13, respectively. In the zero-inflated negative binomial model, the parameter estimate of the zero component was 0.14 (adusted odds ratio of 1.15, 95% CI 1.09-1.22), from which it may be interpreted that homeowners were 15% more likely to report having greater than zero mentally unhealthy days. The parameter estimate in the negative binomial component of the model was -0.04 (adusted rate ratio of 0.96, 95% CI 0.91-0.99), suggesting that after zero-adjustment, homeowners were slightly less likely to report having greater than zero mentally unhealthy days. In considering the log likelihood, the Poisson model fit appears to be inferior to the other models. The negative binomial appears to improve the fit, and the zero-inflation component improves this further. This might also be reflected when assessing the observed versus expected zero counts (Table 3), with the zero-inflated negative binomial model coming closest to the observed number of nonoccurrence (zero mentally unhealthy days). However, all models had underestimated predictions of reporting zero days in comparison to the observed counts. The Vuong test had a signiciant z-value in the comparison of negative binomial and zero-inflated negative binomial models, which suggests that the latter provided a better fit than the standard negative binomial model.

Discussion

In this analysis, it appears that homeownership may be associated with the reported number of mentally unhealthy days. Logistic, linear, Poisson, negative binomial, and zero-inflated negative binomial models found statistically significant associations between homeownership and reported mentally unhealthy days, which differed from Zhou et al’s findings with the 2008 BRFSS data. Besides the possibility that the 2013 BRFSS data contained true associations, a greater number of respondents were used in this analysis (72,267 vs 59,563 adults), which may have strengthened findings of statistical significance. It is also possible that covariates were categorized differently (i.e., removing students in this analysis), given that Zhou et al did not provide clear methods for the treatment of these variables. The log likelihood and Vuong tests suggest that over-dispersion in the data may have been better captured by the negative binomial model and zero-inflated adjustment. However, none of the models fully addressed the problem of excess observed zeros. As shown in Figures 2-6, there appears to be clear trends in the residuals when plotting against the predicted values of the logistic, linear, Poisson, negative binomial, and zero-inflated negative binomial models. The QQ plots in Figures 7-11 also seem to show that the residuals are not normally distributed. Therefore, the model assumptions may not be met in this data, and the findings of statistical significance in this analysis may be misleading.


Table 1: Descriptive Statistics of Adults Aged 35 and Older

Age Group
  Age 35 to 44 7,013 (10%)
  Age 45 to 54 11,983 (17%)
  Age 55 to 64 19,329 (27%)
  Age 65+ 33,942 (47%)
Sex
  Male 26,555 (37%)
  Female 45,712 (63%)
Race/Ethnicity
  White Non-Hispanic 56,936 (79%)
  Black Non-Hispanic 6,463 (9%)
  Others 5,186 (7%)
  Hispanic 3,682 (5%)
Education
  Less than high school 5,943 (8%)
  High school graduate 21,964 (30%)
  <4 yr of college 19,728 (27%)
  => 4 y of college 24,632 (34%)
Income
  <25,000 18,867 (26%)
  25,000 to <50,000 17,588 (24%)
  50,000 or more 26,493 (37%)
  Unknown 9,319 (13%)
Marital status
  Married 41,198 (57%)
  Divorced/Widowed/Separated 25,506 (35%)
  Never married 5,563 (8%)
Employment status
  Employed 30,035 (42%)
  Unemployed 2,556 (4%)
  Homemaker 4,915 (7%)
  Retired 28,736 (40%)
  Unable to work 6,025 (8%)
Household size
  1 or 2 24,800 (34%)
  3 or 4 38,192 (53%)
  5 or 6 7,727 (11%)
  7 or more 1,548 (2%)
Homeownership
  Does not own 12,031 (17%)
  Own 60,236 (83%)
Number of mentally unhealthy days
  Zero 52,870 (73%)
  1-10 12,572 (17%)
  11-20 2,820 (4%)
  21-30 4,005 (6%)

Figure 1: Distribution of mentally unhealthy days in past month


Table 2: Comparison of Regression Models in Examining the Association Between Homeownership and Number of Mentally Unhealthy Days in the Previous Month, 2013 BRFSS

Reference group: Aged 35-44, Male, White non-Hispanic, Income <$25,000, Married, Employed, 1-2 in Household, Nonhomeowner
Model 1: Logistic Model 2: Linear Model 3: Poisson Model 4: Negative binomial Model 5: Zero-inflated negative binomial
Intercept -2.01*** 3.91*** 1.24*** 1.23*** 2.34***
(0.09) (0.19) (0.01) (0.09) (0.06)
Aged 45 to 54 -0.08 -0.01 -0.02* -0.03 -0.01
(0.05) (0.11) (0.01) (0.05) (0.03)
Aged 55 to 64 -0.33*** -0.71*** -0.19*** -0.18*** -0.03
(0.05) (0.11) (0.01) (0.05) (0.03)
Aged 65+ -0.83*** -1.86*** -0.59*** -0.53*** -0.07
(0.06) (0.12) (0.01) (0.06) (0.04)
Female 0.29*** 0.72*** 0.28*** 0.32*** -0.02
(0.03) (0.06) (0.00) (0.03) (0.02)
Black Non-Hispanic -0.25*** -0.61*** -0.15*** -0.05 -0.02
(0.05) (0.09) (0.01) (0.04) (0.03)
Other race 0.07 0.18 0.06*** 0.09* 0.11***
(0.05) (0.10) (0.01) (0.05) (0.03)
Hispanic 0.02 -0.05 0.00 0.10 0.06
(0.06) (0.12) (0.01) (0.06) (0.04)
High-school graduate -0.21*** -0.49*** -0.13*** -0.21*** -0.10**
(0.05) (0.10) (0.01) (0.05) (0.03)
Less than 4 years of college -0.16*** -0.30** -0.08*** -0.13** -0.12***
(0.05) (0.11) (0.01) (0.05) (0.03)
Greater than 4 years of college -0.36*** -0.56*** -0.20*** -0.27*** -0.29***
(0.05) (0.11) (0.01) (0.05) (0.03)
$25,000 to <$50,000 -0.30*** -0.86*** -0.22*** -0.25*** -0.12***
(0.04) (0.08) (0.01) (0.04) (0.02)
$50,000 or more -0.61*** -1.32*** -0.44*** -0.44*** -0.24***
(0.05) (0.08) (0.01) (0.04) (0.03)
Unknown income -0.41*** -1.19*** -0.33*** -0.37*** -0.02
(0.05) (0.09) (0.01) (0.04) (0.03)
Divorced/Widowed/Separated 0.28*** 0.73*** 0.19*** 0.24*** 0.12***
(0.04) (0.09) (0.01) (0.04) (0.03)
Never married 0.06 0.22 0.06*** 0.11* 0.01
(0.06) (0.12) (0.01) (0.05) (0.04)
Unemployed 0.95*** 2.74*** 0.70*** 0.69*** 0.33***
(0.06) (0.15) (0.01) (0.07) (0.04)
Homemaker 0.20** 0.33** 0.11*** 0.09 0.09*
(0.06) (0.11) (0.01) (0.05) (0.04)
Retired 0.24*** 0.53*** 0.14*** 0.11** 0.13***
(0.04) (0.08) (0.01) (0.04) (0.03)
Unable to work 1.74*** 7.00*** 1.23*** 1.25*** 0.59***
(0.04) (0.11) (0.01) (0.05) (0.03)
3-4 in Household 0.08* 0.40*** 0.06*** 0.05 -0.01
(0.04) (0.08) (0.01) (0.04) (0.02)
5-6 in Household 0.01 0.24 0.06*** 0.10 -0.09**
(0.06) (0.12) (0.01) (0.06) (0.04)
7+ in Household 0.02 0.23 0.05*** 0.08 -0.07
(0.10) (0.20) (0.02) (0.09) (0.06)
Homeowner -0.13*** -0.53*** -0.11*** -0.13*** -0.04*
(0.03) (0.08) (0.01) (0.03) (0.02)
Log theta -0.28***
(0.02)
Zero model: Homeowner 0.14***
(0.03)
AIC 38479.44 720413.83 213462.21 206523.56
BIC 38699.95 720634.34 213691.91
Log Likelihood -19215.72 -360182.91 -106706.11 -103212.78
Deviance 38431.44 647901.06 41345.58
Num. obs. 72267 72267 72267 72267 72267
R2 0.11
Adj. R2 0.11
RMSE 6.93
p < 0.001, p < 0.01, p < 0.05

Table 3: Observed Zero Counts vs. Expected Zero Counts

Observed 80650.00
Logistic 17998.00
Poisson 8860.00
Negative Binomial 52175.00
Zero-inflated Negative Binomial 52877.00

Figures 2-11: Diagnostic Plots