Background

To understand the American Obesity epidemic, we must analyze all variables that may influence obesity. Research has shown that Americans are consuming 20% more calories in a year in comparison to 1983, with this percent growing we must analyze what type of factors are influencing this growth.

The data was collected through social explorer’s Health data. This data is to be used to build awareness of health factors that affect certain health issues. Since our data is based on all States and their counties, we will evaluate the correlation between factors using different methods to see which regression analysis best fits the data.

The Variables we will be analyzing are as follow:

library(dplyr)
Obesity <- rename (Obesity,
          "County" = Geo_QNAME,
          "State" = Geo_STATE,
          "Ad_Diabet" = SE_T009_001,
          "Ad_limitFds" = SE_T012_001,
          "Access_Exer_Opport" = SE_T012_002,
          "Obese_Adults" = SE_T012_003,
          "PhysInactive" = SE_T012_004,
          "FEI" = SE_T013_001)
Obesity1 <- Obesity
select(Obesity1, State, County, Obese_Adults, PhysInactive, FEI)
head(Obesity1)

The unique function allows us to evaluate the quanty of ecological factors in our data. The data consists of all 50 states including some outlier locations in Alaska. To properly assess the ecological regression I have evaluated how many food environmental index’s have been collected by State.

length(unique(Obesity1$State))
## [1] 51
Obesity1 %>% 
  group_by(State) %>% 
  summarise(FEI = n())

Ecological Analysis

The ecological regression, shown below illustrates the mean of all obese adults and the food scarcity in each state. The regression evaluated is significant and it can be assumed that the mean food environment index by states are decreased by -1.896 for every percent increase of the average obesity level in the state. The problem with evaluating food scarcity on a ecological level is that we can not evaluate each one on an individual level. This brings in concern of ecological fallacy.

Obese<- Obesity1 %>% 
  group_by(State) %>% 
  summarise(mean_p = mean(Obese_Adults, na.rm = TRUE), mean_s = mean(FEI, na.rm = TRUE))
head(Obese)
ecoobs <- lm(mean_p ~ mean_s, data = Obese)
summary(ecoobs)
## 
## Call:
## lm(formula = mean_p ~ mean_s, data = Obese)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8146 -2.5721  0.6634  2.8373  5.8935 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   43.012      4.824   8.916 7.86e-12 ***
## mean_s        -1.896      0.676  -2.805   0.0072 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.642 on 49 degrees of freedom
## Multiple R-squared:  0.1383, Adjusted R-squared:  0.1207 
## F-statistic: 7.866 on 1 and 49 DF,  p-value: 0.007202

Complete-pooling model

After, evaluating the ecological regression to understand the influence of food environmental index on an individual level. The model below examines all food environmntal index’s equally without evaluating the ecological factors. This regression model is statistically significant and states that the food environmental index decreases by 1.07 for every 1 percent increase of obese adults.

Obesepool <- lm(Obese_Adults ~ FEI, data = Obesity1)
summary(Obesepool)
## 
## Call:
## lm(formula = Obese_Adults ~ FEI, data = Obesity1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.9099  -2.2327   0.3445   2.7838  11.9230 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.43252    0.43427    88.5   <2e-16 ***
## FEI         -1.07593    0.06146   -17.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.263 on 3139 degrees of freedom
## Multiple R-squared:  0.08894,    Adjusted R-squared:  0.08865 
## F-statistic: 306.4 on 1 and 3139 DF,  p-value: < 2.2e-16

From our regressions on the ecological and complete pooling level we can see that both have significance in the percent of obese adults. The problem is not the significance of the factors but, the erasing of their prevalence by eliminating eachother in either regression.

No-pooling model

In order to examine the relationship between these factor’s to see the importance of between-states relationship with obesity and FEI, ultimately no- pooling models will organize our relationships to better grasp each one’s importance.

The intercept

The no-pooling model, is broken down into two different histograms. The histogram below examines the average percentage of obese adults in the 50 States. This graph shows that majority of percent of obese adults fall between 25% and 50%.

dcoef <- Obesity1 %>% 
    group_by(State) %>% 
    do(mod = lm(Obese_Adults ~ FEI, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()

### Slope,FEI Difference To evaluate the differences between food environmental index’s a histogram comparing the slope between 51 States has been analyzed. The histogram below suggests that the majority of food scarcity influence on obese adults fall mainly between -2.5 and 0.

dcoef <- Obesity1 %>% 
    group_by(State) %>% 
    do(mod = lm(Obese_Adults ~ FEI, data = .))
coef <- dcoef %>% do(data.frame(FEI = coef(.$mod)[2]))
ggplot(coef, aes(x = FEI)) + geom_histogram()

Partial Pooling

For my particular analysis, using the partial pooling models best fit the analysis as I would like to evaluate the social factors of location and food scarcity influence on Obesity. The models, below demonstrate the following:

Multilevel model #1

The multilevel model allows for an analysis of the influence of FEI on Obese Adults while evaluating between States. The model below suggests that the fixed affect amongst the 51 States, the average percentage of Obese Adults begins at 37.14. For every 1 unit increase in FEI, the percentage of obese adults decreases by 1.04.

This model also evaluates the random effect of an individual State which can be interpreted as the residual difference of the percent of obese adults by FEI between States is 3.94.

m2_lme <- lme(Obese_Adults ~ FEI, data = Obesity1, random = ~ FEI|State, method = "ML")
summary(m2_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: Obesity1 
##        AIC      BIC   logLik
##   16038.96 16075.27 -8013.48
## 
## Random effects:
##  Formula: ~FEI | State
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 7.9327942 (Intr)
## FEI         0.9770286 -0.908
## Residual    2.9634339       
## 
## Fixed effects: Obese_Adults ~ FEI 
##                Value Std.Error   DF   t-value p-value
## (Intercept) 37.13727 1.2979036 3089 28.613272       0
## FEI         -1.04918 0.1629647 3089 -6.438104       0
##  Correlation: 
##     (Intr)
## FEI -0.93 
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -4.535807311 -0.609509813 -0.003371669  0.657196625  3.302217056 
## 
## Number of Observations: 3141
## Number of Groups: 51

Multilevel Model 2

The second model regression analyzes the influence of FEI in obesity while adding a persons physical inactivity. The modeel below states the following: the fixed affect amongst the 51 States, the average percentage of Obese Adults begins at 20.96. For every 1 unit increase in FEI, the percentage of obese adults decreases by .50. For every 1 unit increase in physical inactivity obesity increases by .48.

m3_lme <- lme(Obese_Adults ~ FEI + PhysInactive, data = Obesity1, random = ~ FEI|State, method = "ML")
summary(m3_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: Obesity1 
##        AIC      BIC    logLik
##   15040.26 15082.63 -7513.132
## 
## Random effects:
##  Formula: ~FEI | State
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 5.6893846 (Intr)
## FEI         0.7358319 -0.943
## Residual    2.5465217       
## 
## Fixed effects: Obese_Adults ~ FEI + PhysInactive 
##                  Value Std.Error   DF  t-value p-value
## (Intercept)  20.962327 1.0688756 3088 19.61157   0e+00
## FEI          -0.502515 0.1263520 3088 -3.97710   1e-04
## PhysInactive  0.477427 0.0137569 3088 34.70453   0e+00
##  Correlation: 
##              (Intr) FEI   
## FEI          -0.908       
## PhysInactive -0.438  0.125
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.78419730 -0.67517639 -0.03039451  0.63762994  3.79474034 
## 
## Number of Observations: 3141
## Number of Groups: 51

Interaction model

The interaction model iss used to demonstrate the correlation of the interaction between Physical Inactivity and FEI towards obesity averaged by the 51 states. The fixed effect of this model states that amongst the 51 States, the average percentage of Obese Adults begins at 22.87. For every 1 unit increase in FEI, the percentage of obese adults decreases by .77. For every unit increase of physical inactivity obesity increases by .41. For every 1 unit increase of interaction between FEI and physical inactivity obesity only increases by .009.

In result, this model suggest that obesity decreases as FEI increases and the increase of physical inactivity is not significant.

m4_lme <- lme(Obese_Adults ~ FEI*PhysInactive, data = Obesity1, random = ~ FEI|State, method = "ML")
summary(m4_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: Obesity1 
##        AIC      BIC    logLik
##   15041.53 15089.95 -7512.764
## 
## Random effects:
##  Formula: ~FEI | State
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 5.8325664 (Intr)
## FEI         0.7521724 -0.946
## Residual    2.5457027       
## 
## Fixed effects: Obese_Adults ~ FEI * PhysInactive 
##                      Value Std.Error   DF   t-value p-value
## (Intercept)      22.872922 2.4273174 3087  9.423128  0.0000
## FEI              -0.768414 0.3289389 3087 -2.336038  0.0196
## PhysInactive      0.406696 0.0815667 3087  4.986052  0.0000
## FEI:PhysInactive  0.009904 0.0112797 3087  0.877998  0.3800
##  Correlation: 
##                  (Intr) FEI    PhysIn
## FEI              -0.983              
## PhysInactive     -0.914  0.915       
## FEI:PhysInactive  0.894 -0.920 -0.986
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.76940020 -0.67419075 -0.03025177  0.64037652  3.70728637 
## 
## Number of Observations: 3141
## Number of Groups: 51
library(texreg)
htmlreg(list(m2_lme,m3_lme, m4_lme))
Statistical models
Model 1 Model 2 Model 3
(Intercept) 37.14*** 20.96*** 22.87***
(1.30) (1.07) (2.43)
FEI -1.05*** -0.50*** -0.77*
(0.16) (0.13) (0.33)
PhysInactive 0.48*** 0.41***
(0.01) (0.08)
FEI:PhysInactive 0.01
(0.01)
AIC 16038.96 15040.26 15041.53
BIC 16075.27 15082.63 15089.95
Log Likelihood -8013.48 -7513.13 -7512.76
Num. obs. 3141 3141 3141
Num. groups 51 51 51
p < 0.001, p < 0.01, p < 0.05

Considering, multilevel regressions have been run evaluating them beside eachother would give a better assumption to which regression best fits our analysis. From the table above, observing the AIC and BIC tests we can state that the first model which evaluated the influence of FEI on Obese Adults between States best fits our analysis.

Mertools Fixed Effect

m1 <- lme4::lmer(Obese_Adults ~  FEI + PhysInactive + (1|State), data=Obesity2)
fastdisp(m1)
## lme4::lmer(formula = Obese_Adults ~ FEI + PhysInactive + (1 | 
##     State), data = Obesity2)
##              coef.est coef.se
## (Intercept)  20.50     0.62  
## FEI          -0.47     0.05  
## PhysInactive  0.49     0.01  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  State    (Intercept) 2.02    
##  Residual             2.64    
## ---
## number of obs: 3141, groups: State, 51
## AIC = 15200.6

Through the mertool’s model we can see the significant differences FEI and physical inactivity has on obesity. The differences between States percent obesity is approximately 2.02.

  feEx <- FEsim(m1, 1000)
cbind(feEx[,1] , round(feEx[, 2:4], 3))

Creating a plot, gives a visual difference between both the FEI and Physical inactivity coefficients. The plot below has interpreted the fixed effect of the coefficients on the percent of obesity. We can determine that these two variables correlate to the percent of obesity on opposite ends.

library(ggplot2)
ggplot(feEx[feEx$term!= "(Intercept)", ]) + 
  aes(x = term, ymin = median - 1.96 * sd, 
      ymax = median + 1.96 * sd, y = median) + 
  geom_pointrange() + 
  geom_hline(yintercept = 0, size = I(1.1), color = I("red")) + 
  coord_flip() + 
  theme_bw() + labs(title = "Coefficient Plot of Obesity Model", 
                    x = "Median Effect Estimate", y = "Evaluation Rating")

Mertool’s Random Effect

After evaluating the fixed effect FEI and Phyinact has on the percent of obese adults, we would like to observe the random effect which takes in consideration the differences betweeen states. First, we have created a table that exhibits the mean and median by each individual state.

reEx <- REsim(m1)
head(reEx)
table(reEx$term)
## 
## (Intercept) 
##          51
table(reEx$groupFctr)
## 
## State 
##    51

Since there are 51 states our plot has graphed numerous outputs plotREsim was used to control the analysis . As a whole, we can determine from this plot we can establish that most states effects are above the 0, but there do exist outliers with both high and low means that need to recorded.

p1 <- plotREsim(reEx)
p1

Conclusion

Random effect analysis’ are great to help our arguments to determine the correlation between multiple variables which have individual and ecological affects on the dependent variable. In reference to my analysis, I have found that a partial pooling is the best fit in analyzing how FEI influences obesity.