In this assignment I am going to use the National Health Interview Survey (NHIS) to discover the causations of the frequency of drinking alcohol. The NHIS database that I am using is an integrated data collected by the National Center for Health Statistics (NCHS), and Centers for Disease Control and Prevention (CDC). The National Health Interview Survey has monitored the health of the nation since 1957. NHIS data on a broad range of health topics are collected through personal household interviews. Survey results have been instrumental in providing data to track health status, health care access, and progress toward achieving national health objectives.
The purpose of this analysis is to discover the influence of the amount of days at work, depression, age and gender on frequency of drinking. The aim is to find out which of these variables decreases and increases alcohol consumption. The variables I am going to use for the purpose of this study are:
library(DAAG)
library(Zelig)
library(ZeligChoice)
library(dplyr)
library(tidyr)
library(ggplot2)
library(sjmisc)
library(haven)
healthdata <- read_dta ("/Users/lasha/Desktop/Queens College/755/NHIS_v2.dta")
head(healthdata)
It contains variables needed for the analysis. The variables were recoded so the missing values are not included. The variable SEX was changed to a binary 0/1 coding. The variable DEPFREQ “How often feel depressed” has values from 1 to 5 where 1 indicates never depressed, 2 = a few times a year, 3 = monthly, 4 = weekly, 5 = daily.
health <- healthdata %>%
filter (alc5upyr<=365, depfreq >0, depfreq <=5, hourswrk<=95) %>%
mutate (depfreq1 = sjmisc::rec(depfreq, rec = "5=1; 4=2; 3=3; 2=4; 1=5")) %>%
mutate (sex1 = sjmisc::rec(sex, rec = "0=NA; 1=1; 2=0; 7/8/9=NA")) %>%
select (alc5upyr, hourswrk, depfreq1, age, sex1)
head(health)
## # A tibble: 6 x 5
## alc5upyr hourswrk depfreq1 age sex1
## <dbl+lbl> <dbl+lbl> <dbl> <dbl> <dbl>
## 1 0 0 2 81 0
## 2 0 0 1 58 1
## 3 0 95 1 57 1
## 4 12 40 1 27 1
## 5 0 44 4 37 1
## 6 10 4 2 27 1
Plot showing the number of total hours worked last week or weekly usually. The high number of 0 hours worked indicates that the question does not apply to a person (for example a person is not employed).
ggplot(data=health) +
geom_histogram(aes(x=hourswrk))
Plot showing the number of days when having 5 or more drinks in the past year (Time period)
ggplot(data=health) +
geom_histogram(aes(x=alc5upyr))
The analysis of influence of independent variables (hours work, depression, age and sex) on a dependent variable which is alc5upyr (number of days drinking 5+ drinks a year).
The results of the regression show that the the amount of hours worked in a week has a negative influence on days drinking. The same negative influence has the age of the person. With the increase of years the amounts of days drinking 5 or more drinks declines.
We can see that depression has a positive influence on the number of days drinking alcohol. It means that the more depressed people report to be the more days a year they drink alcohol. The regression also show that males are more likely to drink 5 or more drinks more often during the year.
options(scipen=9999)
d <-zelig(alc5upyr ~ hourswrk + depfreq1 + age + sex1, model = "poisson", data = health, cite = F)
summary(d)
## Model:
##
## Call:
## z5$zelig(formula = alc5upyr ~ hourswrk + depfreq1 + age + sex1,
## data = health)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -10.429 -4.876 -3.376 -2.532 51.998
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.07916088 0.00514647 404.00 <0.0000000000000002
## hourswrk -0.00363771 0.00005869 -61.98 <0.0000000000000002
## depfreq1 0.20854862 0.00101379 205.71 <0.0000000000000002
## age -0.01386634 0.00007598 -182.49 <0.0000000000000002
## sex1 1.13766929 0.00277830 409.48 <0.0000000000000002
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 2890084 on 59796 degrees of freedom
## Residual deviance: 2639747 on 59792 degrees of freedom
## AIC: 2727221
##
## Number of Fisher Scoring iterations: 8
##
## Next step: Use 'setx' method
The graph indicates that people who are never depressed (1) consume alcohol less often than people who have a high depression level (5). People who are more depressed tend to drink alcohol much more frequently than people who feel depressed rarely or not at all. This graph show that the depression increase the amount of days in which the alcohol was consumed.
a.range = min(health$depfreq1): max(health$depfreq1)
x <- setx(d, depfreq1 = a.range)
s <- sim (d, x =x )
ci.plot (s)
The graph indicates that people who work more hours a week tend to drink less days a year than people who work less often or not at all.
a.range = min(health$hourswrk): max(health$hourswrk)
x <- setx(d, hourswrk = a.range)
w <- sim (d, x =x )
ci.plot (w)
The graph indicates that people who are older tend to drink less days a year than people who are younger.
a.range = min(health$age): max(health$age)
x <- setx(d, age = a.range)
y <- sim (d, x =x )
ci.plot (y)
ev1 = when the person feels never depressed the mean number of days in which 5 or more alcohol drinks were consumed equals 8.179 ev2 = when the person feels depressed everyday the mean number of days of drinking alcohol increases to 18.832
fd = It is a difference between the second expected value and the first expected value: 18.83249 - 8.179053 = 10.65344
The first difference means that the person who is never depressed drinks 10.653 days a year less than a person who is depressed everyday.
x.low <- setx(d, depfreq1 = 1)
x.high <- setx(d, depfreq1 = 5)
sx <- sim(d, x = x.low, x1 = x.high)
summary(sx)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 8.178777 0.01423644 8.178813 8.151901 8.208176
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 8.204 2.819634 8 3 14
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 18.83655 0.06545293 18.83734 18.70591 18.96949
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 18.959 4.347899 19 11 28
## fd
## mean sd 50% 2.5% 97.5%
## [1,] 10.65778 0.06981764 10.65607 10.52136 10.80337
#Visualization of a difference between a depression level and a number of days of drinking alcohol.
plot(sx)
ev1 = men drinks more then 5 alcohol drinks around 16.815 days a year. ev2 = women drinks more then 5 alcohol drinks around 5.390 days a year.
fd = It is a difference between the number of drinking days between males and females: 5.390514 - 16.81578 = -11.42527
The first difference means that females tend to drink alcohol 11.42527 less days a year then males.
x.m <- setx(d, sex1 = 1)
x.f <- setx(d, sex1 = 0)
mf <- sim(d, x = x.m, x1 = x.f)
summary(mf)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 16.81414 0.02411747 16.81411 16.7669 16.86077
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 16.945 4.08248 17 9.975 25
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 5.389991 0.01327621 5.389421 5.36467 5.417001
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 5.402 2.304714 5 1 10
## fd
## mean sd 50% 2.5% 97.5%
## [1,] -11.42414 0.0275508 -11.42386 -11.4764 -11.36955
#Visualization of a difference between males and females and a number of days of drinking alcohol.
plot(mf)
#gender difference between people who drinks 5 or more drinks zero days a year
d1x <- setx (d, alc5upyr = 0, sex1 = 0)
d1x1 <- setx(d, alc5upyr = 0, sex1 = 1)
d1s <- sim(d, x = d1x, x1 = d1x1)
#gender difference between people who drinks 5 or more drinks 10 days a year
d2x <- setx (d, alc5upyr = 10, sex1 = 0)
d2x1 <- setx(d, alc5upyr = 10, sex1 = 1)
d2s <- sim(d, x = d2x, x1 = d2x1)
#gender difference between people who drinks 5 or more drinks 20 days a year
d3x <- setx (d, alc5upyr = 20, sex1 = 0)
d3x1 <- setx(d, alc5upyr = 20, sex1 = 1)
d3s <- sim(d, x = d3x, x1 = d3x1)
d1 <- d1s$get_qi(xvalue="x1", qi="fd")
d2 <- d2s$get_qi(xvalue="x1", qi="fd")
d3 <- d3s$get_qi(xvalue="x1", qi="fd")
dfd <- as.data.frame(cbind(d1, d2, d3))
tidd <- dfd %>%
gather(alc5upyr, simv, 1:3)
tidd %>%
group_by(alc5upyr) %>%
summarise(mean = mean(simv), sd = sd(simv))
## # A tibble: 3 x 3
## alc5upyr mean sd
## <chr> <dbl> <dbl>
## 1 V1 11.42288 0.02714639
## 2 V2 11.42524 0.02706163
## 3 V3 11.42454 0.02671804
ggplot(tidd, aes(simv)) + geom_histogram() + facet_grid(~alc5upyr)
The hourswork was grouped into ordinal leveled variable called “hoursrank” to make an ordinal logit regression model. The hours of work were grouped into categories “0-20” hours a week, “21-40” hours a week, “41-60” hours a week, “61-80” hours a week, “81+”hours a week.
health2 <- health %>%
mutate (hoursrank = rec(hourswrk, rec="0:20=0-20; 21:40=21-40; 41:60=41-60; 61:80=61-80; else= 81+")) %>%
select(alc5upyr, hourswrk, hoursrank, depfreq1, age, sex1)
head(health2)
## # A tibble: 6 x 6
## alc5upyr hourswrk hoursrank depfreq1 age sex1
## <dbl+lbl> <dbl+lbl> <chr> <dbl> <dbl> <dbl>
## 1 0 0 0-20 2 81 0
## 2 0 0 0-20 1 58 1
## 3 0 95 81+ 1 57 1
## 4 12 40 21-40 1 27 1
## 5 0 44 41-60 4 37 1
## 6 10 4 0-20 2 27 1
The regression is showing the influence of number of days drinking alcohol a year on the amount of hours at work weekly. The results in this regression show that with the increase of hours worked people tend to drink alcohol more days a year.
health2$hoursrank <- factor(health2$hoursrank, ordered = TRUE,
levels = c("0-20", "21-40", "41-60", "61-80", "81+"))
d.ord <- zelig(hoursrank ~ alc5upyr, model = "ologit",
data = health2, cite = F)
summary(d.ord)
## Model:
## Call:
## z5$zelig(formula = hoursrank ~ alc5upyr, data = health2)
##
## Coefficients:
## Value Std. Error t value
## alc5upyr 0.0004473 0.000188 2.379
##
## Intercepts:
## Value Std. Error t value
## 0-20|21-40 -0.3835 0.0086 -44.6369
## 21-40|41-60 1.3692 0.0104 131.8724
## 41-60|61-80 3.6390 0.0259 140.3899
## 61-80|81+ 5.6112 0.0677 82.8420
##
## Residual Deviance: 136936.96
## AIC: 136946.96
## Next step: Use 'setx' method
The analysis showed that the frequency of drinking changes under the influence of hours at work and depression level but it also vary for people in different age and for males and females. The key finding is that the people with high level of depression tend to drink more days a year. The research also show that males tend to drink much more days a year than females and that older people drinks less frequently.
NHIS data citation: Citation “Lynn A. Blewett, Julia A. Rivera Drew, Risa Griffin, Miram L. King, and Kari C.W. Williams. IPUMS Health Surveys: National Health Interview Survey, Version 6.2 [dataset]. Minneapolis, MN: University of Minnesota, 2016. http://doi.org/10.18128/D070.V6.2”