The Influence of Alcohol Use to Obesity

Introduction

Alcohol intake is a widespread activity and can be a contributing factor to weight gain and obesity in our nation. Obesity is often the result of an overall energy imbalance due to poor diet and limited physical activity. Obesity increases the risk for health conditions such as coronary heart disease, type 2 diabetes, cancer, hypertension, dyslipidemia, stroke, liver and gallbladder disease, sleep apnea and respiratory problems, osteoarthritis, and poor health status.

In this statistical analysis I will try to investigate the influence alcohol usage may have on Obesity. This data was gathered from Social Explorer Health Data 2016 using all 50 states.

Data Used and Variable Explanation:

A data frame consisting of:

  • state : All 50 States

  • Drinking Adults : Excessive Drinking is the percentage of adults that report either binge drinking, defined as consuming more than 4 (women) or 5 (men) alcoholic beverages on a single occasion in the past 30 days, or heavy drinking, defined as drinking more than one (women) or 2 (men) drinks per day on average.

  • Obese Persons : This will be the continuous dependent variable. Adult Obesity is the percentage of the adult population (age 20 and older) that reports a body mass index (BMI) greater than or equal to 30 kg/m2.

  • Persons with Limited Access to Healthy Foods : This will be an independent variable used to see if there is an interaction between limited access to healthy foods and alcohol usage and if it has any influence on obesity.

I am not using some of the variables such as Smokers or Percent of Fair to Poor Health in this particular analysis , however I just wanted to point out that when extracting the data some variables such as smokers and drinking adults were collected and put into the same table. For our purposes we are only using State, Drinking Adults, Limited Access to Healthy Foods and Obese Persons.

library(dplyr)
AlOb <- AlcoObesity
head(AlOb)

We will first look at how many states we are focusing on:

length(unique(AlOb$State))
[1] 51

Now we we look at how many people in each state:

AlOb %>%
  group_by(State) %>%
  summarise(Drinking_Adults = n())

Ecological Analysis

The ecological regression shown below illustrates the mean of all obese persons and the mean of all Drinking Adults in each state.

AlcObEco<- AlOb %>% 
  group_by(State) %>% 
  summarise(mean_p = mean(Obese_Persons, na.rm = TRUE), mean_s = mean(Drinking_Adults, na.rm = TRUE))
head(AlcObEco)
ecoalcobe<-lm(mean_p ~ mean_s, data = AlcObEco)
summary(ecoalcobe)

Call:
lm(formula = mean_p ~ mean_s, data = AlcObEco)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.4287 -2.1042  0.2516  2.4969  7.0014 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  40.3962     2.6574  15.201  < 2e-16 ***
mean_s       -0.6261     0.1511  -4.145 0.000134 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.376 on 49 degrees of freedom
Multiple R-squared:  0.2596,    Adjusted R-squared:  0.2445 
F-statistic: 17.18 on 1 and 49 DF,  p-value: 0.0001344

The regression evaluated is significant and it can be assumed that the mean for Drinking Adults in each state is decreased -.62 for every percent increase on the mean obesity level in each state. The problem with evaluating Drinking Adults on a ecological level is that we can not evaluate on an individual level. This brings in concern of ecological fallacy and it may or may not reflect a causal connection.

Complete Pooling

Now I will examine Drinking Adults without evaluating the ecological factors. We can see a statistical significance in this model because the Drinking Adult decreases by .52 for every 1 percent increase of obese persons. What else can go wrong? Well, we are only looking in the last month of a persons drinking history. What if the number of drinks recorded were more or less?

opooling<-lm(Obese_Persons ~ Drinking_Adults, data = AlOb)
summary(opooling)

Call:
lm(formula = Obese_Persons ~ Drinking_Adults, data = AlOb)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.9237  -2.2356   0.2737   2.5763  13.4080 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     39.67170    0.37017  107.17   <2e-16 ***
Drinking_Adults -0.52636    0.02189  -24.04   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.105 on 3138 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1556,    Adjusted R-squared:  0.1553 
F-statistic: 578.1 on 1 and 3138 DF,  p-value: < 2.2e-16

No pooling

Now I will use the no-pooling models to see the connection between states, obese persons and Drinking Adults.

The Intercept

We are conducting a regression model grouping by each state. The first is to look at the intercept. The histogram below examines the average percentage of obese persons in the 50 States. We can see from the histogram below, that a majority of the percent of obese persons falls between 33 to 45.

dcoef <- AlOb %>%
  group_by(State) %>%
  do(mod = lm(Obese_Persons~ Drinking_Adults, data = .))
coef<-dcoef %>% do(data.frame(intc= coef(.$mod)[1]))
ggplot(coef, aes(x=intc)) + geom_histogram()

The Slope

The histogram below suggests that the majority of Drinking Adults influence on obese adults fall mainly between -1.0 and -0.2.

dcoef<- AlOb %>%
  group_by(State)%>%
  do(mod =lm(Obese_Persons~Drinking_Adults, data= .))
coef<-dcoef %>% do(data.frame(Drinking_Adults=coef(.$mod)[2]))
ggplot(coef,aes(x=Drinking_Adults)) + geom_histogram()

Random Intercept

With random intercept we only assume that the intercept follows a normal distribution. A random intercept model is a model in which intercepts are allowed to vary, and therefore, the scores on the dependent variable for each individual observation are predicted by the intercept that varies across groups. This model assumes that slopes are fixed.

m1_lme <- lme(Obese_Persons ~ Drinking_Adults, data = AlOb, random = ~ 1|State, method = "ML")
summary(m1_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: AlOb 
       AIC      BIC    logLik
  16026.16 16050.36 -8009.078

Random effects:
 Formula: ~1 | State
        (Intercept) Residual
StdDev:    3.280227 3.003315

Fixed effects: Obese_Persons ~ Drinking_Adults 
                   Value Std.Error   DF   t-value p-value
(Intercept)     41.71426 0.7362793 3088  56.65549       0
Drinking_Adults -0.69873 0.0329907 3088 -21.17963       0
 Correlation: 
                (Intr)
Drinking_Adults -0.771

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-4.25192210 -0.63415298 -0.02447912  0.64220482  3.92768387 

Number of Observations: 3140
Number of Groups: 51 

Multilevel Model #2

The random effect used is ~ Limited Access to Healthy Foods|State We want to see the random intercept and the random affect of slope.We can see from the results below that on average the mean Obese Persons is 41.71 for those adults that consumed alcohol while those who do not consume alcohol but are obese have a mean of.69

m2_lme <- lme(Obese_Persons ~ Drinking_Adults, data = AlOb, random = ~ Limited_Access_to_Healthy_Foods|State, method = "ML")
summary(m2_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: AlOb 
       AIC      BIC    logLik
  16008.74 16045.05 -7998.368

Random effects:
 Formula: ~Limited_Access_to_Healthy_Foods | State
 Structure: General positive-definite, Log-Cholesky parametrization
                                StdDev     Corr  
(Intercept)                     3.36204482 (Intr)
Limited_Access_to_Healthy_Foods 0.07430976 -0.269
Residual                        2.97274226       

Fixed effects: Obese_Persons ~ Drinking_Adults 
                   Value Std.Error   DF   t-value p-value
(Intercept)     41.71752 0.7404692 3088  56.33930       0
Drinking_Adults -0.69531 0.0335364 3088 -20.73312       0
 Correlation: 
                (Intr)
Drinking_Adults -0.778

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-4.14920407 -0.63853669 -0.02787297  0.64569654  3.76435858 

Number of Observations: 3140
Number of Groups: 51 

Model Selection

Adding Access to Healthy Foods Covariate

From this we can see that the average obese persons is 41.42. For every increase in drinking adults, obesity decreases by .71. For every increase to Limited Access to Healthy Foods, Obesity increases by .011.

m3_lme <- lme(Obese_Persons ~ Drinking_Adults + Limited_Access_to_Healthy_Foods, data = AlOb, random = ~ Drinking_Adults|State, method = "ML")
summary(m3_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: AlOb 
       AIC      BIC    logLik
  15912.21 15954.57 -7949.104

Random effects:
 Formula: ~Drinking_Adults | State
 Structure: General positive-definite, Log-Cholesky parametrization
                StdDev    Corr  
(Intercept)     8.0888797 (Intr)
Drinking_Adults 0.4373833 -0.933
Residual        2.9183714       

Fixed effects: Obese_Persons ~ Drinking_Adults + Limited_Access_to_Healthy_Foods 
                                   Value Std.Error   DF   t-value p-value
(Intercept)                     41.42980 1.3616584 3087 30.425985  0.0000
Drinking_Adults                 -0.71224 0.0739485 3087 -9.631616  0.0000
Limited_Access_to_Healthy_Foods  0.01194 0.0074943 3087  1.592787  0.1113
 Correlation: 
                                (Intr) Drnk_A
Drinking_Adults                 -0.948       
Limited_Access_to_Healthy_Foods -0.134  0.092

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-4.09004865 -0.62410187 -0.01732312  0.64583502  3.93314629 

Number of Observations: 3140
Number of Groups: 51 

Adding Interaction

The interaction model is used to demonstrate the correlation of the interaction between Limited Access to Healthy Foods and Drinking Adults towards obesity in the states. The average percentage of Obese Persons begins at 41.89. For every increase in Drinking Adults, the percentage of obese persons decreases by .74. For evey unit increase of Limited Access to Healthy Foods, Obese persons increases by .05. For every increase of interaction between Adults Drinking and Limited Access to Healthy Foods, obesity increases by .004.

m4_lme <- lme(Obese_Persons ~ Drinking_Adults * Limited_Access_to_Healthy_Foods, data = AlOb, random = ~ Drinking_Adults|State, method = "ML")
summary(m4_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: AlOb 
       AIC      BIC    logLik
  15911.37 15959.78 -7947.683

Random effects:
 Formula: ~Drinking_Adults | State
 Structure: General positive-definite, Log-Cholesky parametrization
                StdDev    Corr  
(Intercept)     8.0775882 (Intr)
Drinking_Adults 0.4382402 -0.933
Residual        2.9171675       

Fixed effects: Obese_Persons ~ Drinking_Adults * Limited_Access_to_Healthy_Foods 
                                                   Value Std.Error   DF   t-value p-value
(Intercept)                                     41.89559 1.3872627 3086 30.200185  0.0000
Drinking_Adults                                 -0.74245 0.0761343 3086 -9.751815  0.0000
Limited_Access_to_Healthy_Foods                 -0.05287 0.0391625 3086 -1.350111  0.1771
Drinking_Adults:Limited_Access_to_Healthy_Foods  0.00406 0.0024063 3086  1.686177  0.0919
 Correlation: 
                                                (Intr) Drnk_A L_A__H
Drinking_Adults                                 -0.950              
Limited_Access_to_Healthy_Foods                 -0.218  0.246       
Drinking_Adults:Limited_Access_to_Healthy_Foods  0.197 -0.233 -0.982

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-4.0986701 -0.6226325 -0.0165673  0.6457412  3.9282446 

Number of Observations: 3140
Number of Groups: 51 

Best Model

library(texreg)
htmlreg(list(m2_lme, m3_lme, m4_lme))
Statistical models
Model 1 Model 2 Model 3
(Intercept) 41.72*** 41.43*** 41.90***
(0.74) (1.36) (1.39)
Drinking_Adults -0.70*** -0.71*** -0.74***
(0.03) (0.07) (0.08)
Limited_Access_to_Healthy_Foods 0.01 -0.05
(0.01) (0.04)
Drinking_Adults:Limited_Access_to_Healthy_Foods 0.00
(0.00)
AIC 16008.74 15912.21 15911.37
BIC 16045.05 15954.57 15959.78
Log Likelihood -7998.37 -7949.10 -7947.68
Num. obs. 3140 3140 3140
Num. groups 51 51 51
p < 0.001, p < 0.01, p < 0.05

We can see the best model to use for this particular analysis would be the third model (Model 3) according to the AIC and BIC.

Using MerTools

By using MerTools we can see the significant differences Adults that drink and Limited Access to Healthy foods has on obesity. The differences between States percent obese persons is 3.31

lme4::lmer(formula = Obese_Persons ~ Drinking_Adults + Limited_Access_to_Healthy_Foods + 
    (1 | State), data = AlOb)
                                coef.est coef.se
(Intercept)                     41.58     0.76  
Drinking_Adults                 -0.69     0.03  
Limited_Access_to_Healthy_Foods  0.01     0.01  

Error terms:
 Groups   Name        Std.Dev.
 State    (Intercept) 3.31    
 Residual             3.00    
---
number of obs: 3140, groups: State, 51
AIC = 16040.2

we are looking at the means and medians of the intercepts.

feEx <- FEsim(m1, 1000)
cbind(feEx[,1] , round(feEx[, 2:4], 3))

Now we are going to plot to visually see the differences betwen Drinking Adults and limited access to healthy foods. There are two ways to do this:

library(ggplot2)
ggplot(feEx[feEx$term!= "(Intercept)", ]) + 
  aes(x = term, ymin = median - 1.96 * sd, 
      ymax = median + 1.96 * sd, y = median) + 
  geom_pointrange() + 
  geom_hline(yintercept = 0, size = I(1.1), color = I("red")) + 
  coord_flip() + 
  theme_bw() + labs(title = "Coefficient Plot of Obese Persons Model", 
                    x = "Median Effect Estimate", y = "Evaluation Rating")

However this is the easier option (according to the reading on the Mertools):

plotFEsim(feEx) + 
  theme_bw() + labs(title = "Coefficient Plot of Obese Persons Model", 
                    x = "Median Effect Estimate", y = "Evaluation Rating")

Using Mertool’s Random Effect

Now we will observe the random effect which takes in consideration the differences between states. The table below shows the mean and median for each state.

reEx<-REsim(m1)
head(reEx)
table(reEx$term)

(Intercept) 
         51 
table(reEx$groupFctr)

State 
   51 

After plotting this we can see that most states are above the 0 mark, but there do exist some extreme outliers with both high and low means that need to be accounted for as well.

p1 <- plotREsim(reEx)
p1

Conclusion

There are more factors to consider when looking into obesity. The amount of alcohol can influence obesity but we should also look at whether the person(s) are physically active or inactive and age groups as well. I originally thought all my models would show statistical signficance but some did not. The best models for this analysis was the random effect multilevel model #2.

