INTRODUCTION

Obesity has been known a known health concern for years, but the prevalence of it amongst Americans continues to increase. Between 2015 and 2016 obesity rates increased 5.9 percent, and this a trend that is continuing. The true danger of obesity lies in its close association with several leading causes of death amongst Americans, including diabetes, heart disease, stroke, and some types of cancer. It is a complicated issue, and one that has many contributing factors, such as genetics and behaviors such as inactivity, nutrition, and medication use.

The Center for Disease Control identifies two ways that adults can combat obesity and maintain a healthy weight: eating healthy and getting regular physical activity. I was interested in seeing if in states where adults practiced lower instances of these two factors, the obesity rate would be higher. In order to do this, three different variables across states were considered:

  1. Rate of Obesity in Adults
    • Obese adults are classified as those over 18 years of age with a body mass index greater than 30
  2. Rate of Physical Inactivity in Adults
    • Inactive adults are those over 18 who did not engage in physical activity or exercise in the 30 days prior to being surveyed (other than for their regular job)
  3. Percent of Adults Who Eat Fruit Less Than Once Daily
    • The percent of adults in each state who eat fruit less than one time a day is the lens through which this report looks at the idea of “eating healthy”. It is important to note that there are many other variables that play a part of healthy eating, such as carbohydrate intake, portion size, and the amount of fat consumed - but this report focuses on only one indicator of a healthy diet.

HYPOTHESIS:

States with high rates of adult obesity will also have higher rates of physical inactivity in adults and a higher percentage of adults who eat fruit less than once daily.

PROCEDURE:

This report was created using R Studio and published using R Markdown. The following packages were needed: sf, tigris, maps, tidyverse, here, ggplot2 and ggthemes.

To begin my analysis, I downloaded a shape file of the United States from GitHub and loaded this in RStudio. This file contained geographic information about the location of the states that is necessary for mapping the data used in this analysis. For aesthetic purposes, I limited the data to only the 48 continental U.S. states.

library(albersusa)
usa_sf()

usa <- st_read("usa/cb_2017_us_state_20m.shp")

usa_48 <- usa %>%
  filter(!(NAME %in% c("Alaska", "District of Columbia", "Hawaii", "Puerto Rico")))

Then, I transferred the data on adult obesity rates, physical inactivity rates, and fruit consumption from the webpages linked in the introduction into 3 separate excel sheets, inputting the state name in the first column, and the corresponding data point in the second column. I then saved these sheets as CSV files. Due to the propensity for error that comes with manually entering data, I extensively checked these datasheets to ensure they were accurate. Once I had done this, I loaded the CSV files in RStudio.

ObesityRate <- read.csv("~/R/Obesity/Obesity Rate.csv", stringsAsFactors=FALSE)

PhysicalInactivityRate <- read.csv("~/R/Obesity/Physical Inactivity Rate.csv", stringsAsFactors=FALSE)

FruitConsumption<- read.csv("~/R/Obesity/AdultFruitConsumption.csv", stringsAsFactors=FALSE)

I then merged the data that contained each state and it’s obesity rate with the data that contained geographical information about each state (from GitHub). I repeated this for the data that contained the physical inactivity rate per state, and the percent of adults that eat fruit less than one time a day.

USAobesity <- merge(ObesityRate, usa_48, by="NAME")

USAInactivity <- merge(PhysicalInactivityRate, usa_48, by="NAME")

USAFruitConsumption <- merge(FruitConsumption, usa_48, by="NAME")

In addition, I created a theme used on the maps in this analysis:

theme1<- theme(panel.grid.major = element_line(colour = 'transparent'), axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.y=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank(), panel.background=element_blank(), panel.border=element_blank(), panel.grid.minor=element_blank(), plot.background=element_blank(),plot.title = element_text(hjust = 0.5))

ANALYSIS OF THE RELATIONSHIP BETWEEN PHYSICAL INACTIVITY AND OBESITY

I created visualizations to show which states had the highest rates of adult obesity, and additionally the highest rates of inactivity. If my hypothesis is correct, states that are lightly shaded in the first map, will also be lightly shaded in the second.

USAobesity %>%
  ggplot() +
  geom_sf(aes(fill = Obesity.Rate)) +
  scale_fill_viridis_c("Obesity Rate", option = "magma") -> plot1

plot1 + theme1

USAInactivity %>%
  ggplot() +
  geom_sf(aes(fill = Physical.Inactivity.Rate)) +
  scale_fill_viridis_c("Physical Inactivity Rate", option = "magma") -> plot2

plot2 + theme1

Based on this initial visualization, it does appear that states with high rates of obesity also have high rates of inactivity. The heaviest concentration of states with high rates of both variables seems to fall in the South. To further investigate, I created bar charts to compare the ten states with the highest rates of obesity, and the ten states with the highest rates of physical inactivity:

USAobesity[order(USAobesity$Obesity.Rate),] -> USAobesity2

USAobesityTop10 <- USAobesity2 %>% tail(10)

ggplot(USAobesityTop10, aes(x=reorder(NAME,-Obesity.Rate), y=Obesity.Rate)) +
  geom_bar(stat="identity", fill="orange") + coord_flip()  -> ObeseStatesTop10
 
ObeseStatesTop10 + theme_minimal() +xlab("State") + ylab("Obesity Rate (%)") +
  ggtitle("The 10 States with the Highest Rates of Adult Obesity")

USAInactivity[order(USAInactivity$Physical.Inactivity.Rate),] -> USAInactivity2

USAInactivityTop10 <- USAInactivity2 %>% tail(10)

ggplot(USAInactivityTop10, aes(x=reorder(NAME,-Physical.Inactivity.Rate), y=Physical.Inactivity.Rate)) +
  geom_bar(stat="identity", fill="#FFFF66") + coord_flip()  -> InactiveStatesTop10

InactiveStatesTop10 + theme_minimal() +xlab("State") + ylab("% of Inactive Adults") +
  ggtitle("10 States with the Highest Physical Inactivity Rates")

There were 7 states that appeared on both lists, meaning they had the highest rates of both obesity and physical inactivity in the country. These were:

Due to the high crossover between the two, it is plausible to deduce that the rate of physical inactivity has an impact on the rate of obesity. In order to visualize the relationship between the two in a more concrete way, I created a scatter plot, and additionally ran a correlation test in order to quantify the relationship.

USAInfo <-  merge(USAInactivity, USAobesity, by="NAME")

ggplot(USAInfo, aes(Obesity.Rate, Physical.Inactivity.Rate)) +
  geom_point(color="#CC0099") -> scatterplot

scatterplot +
  xlab("Obesity Rate (%)") +
  ylab("Physical Inactivity Rate (%)") +
  ggtitle("Is Physical Inactivity Related to Obesity?") +
  theme_minimal()

cor.test(USAInactivity$Physical.Inactivity.Rate, USAobesity$Obesity.Rate)

The result of the correlation test was: 0.7104287. This confirms that there is a moderately strong correlation between the rate of physical inactivity and the rate of obesity in states. I was expecting the correlation to be stronger, however I still feel that this proved the first part of my hypothesis.

ANALYSIS OF THE RELATIONSHIP BETWEEN HEALTHY EATING AND OBESITY

Moving to the next component of my analysis, I created an additional visualization to show which states had the highest percent of adults eating fruit less than once daily. If the second part of my hypothesis is correct, the lightly shaded states on the first map, should also be lightly shaded on the map to the right.

USAFruitConsumption %>%
ggplot() +
  geom_sf(aes(fill = X..of.Adults)) +
  scale_fill_viridis_c("% of Adults", option = "magma") -> plot3

plot3 + theme1

Again, this initial visualization, does suggest that states with high rates of obesity also have a high percentage of adults who eat fruit less than one time a day. As performed previously, I created bar charts to compare the ten states with the highest rates of obesity, and the ten states with the highest percent of adults eating fruit less than once daily.

USAFruitConsumption[order(USAFruitConsumption$X..of.Adults),] -> USAFruitConsumption2

USAFruitTop10 <- USAFruitConsumption2 %>% tail(10)

ggplot(USAFruitTop10, aes(x=reorder(NAME,-X..of.Adults), y=X..of.Adults)) +
  geom_bar(stat="identity", fill="#FFFF66") + coord_flip()  -> FruitStatesTop10

FruitStatesTop10 + theme_minimal() +xlab("State") + ylab("% of Adults") +
  ggtitle("10 States with the Lowest Fruit Consumption")

There were 8 states that appeared on both lists, meaning they had the highest rates of both obesity and the highest percent of adults who consume fruit less than once daily in the entire country. These were:

There was more overlap in states that appeared in the top 10 for both of these variables than for obesity and physical inactivity, which was surprising to me. This caused me to predict that the correlation between rate of obesity and percent of adults who consume fruit less than one time a day would be stronger than the correlation between obesity and physical activity. This is not a prediction I would have made before this analysis. In order to visualize the relationship between the two in a more concrete way, I again created a scatter plot, and additionally ran a correlation test in order to quantify the relationship.

cor.test(USAFruitConsumption$X..of.Adults, USAobesity$Obesity.Rate)

USAInfo2 <- merge(USAobesity, USAFruitConsumption, by="NAME")

ggplot(USAInfo2, aes(Obesity.Rate, X..of.Adults)) +
  geom_point(color="#CC099") -> scatterplot2

scatterplot2 +
  xlab("Obesity Rate (%)") +
  ylab("% of Adults Consuming Fruit <1 Time Daily") +
  ggtitle("Is Fruit Consumption Related to Obesity?") +
  theme_minimal()

The result of the correlation test was: 0.7798399. This does suggest that there is a moderately strong correlation between the percent of adults eating fruit less than one time a day and the rate of obesity. The correlation was indeed stronger than that between the rate of inactivity and the rate of obesity, although not by much (around .07).

CONCLUSION

My hypothesis was correct, states with high rates of adult obesity also have higher rates of physical inactivity in adults and a higher percentage of adults who eat fruit less than once daily. This speaks to the importance of addressing ways to increase the physical activity of adults and the nutrition of their diets in order to combat the obesity epidemic. The presence of fruit in the diet had a greater correlation with rates of obesity than physical activity did, which leads me to wonder if diet has more of an impact on obesity than physical activity.

Some other interesting insights emerged from this analysis. One interesting trend that emerged, and was evident in the map visualizations, was that the states that appeared in the top 10 for all three variables are all classified as being in the South. These states were:

It would be interesting to investigate this further and look into factors that influence obesity that may be particularly prevalent in the South. Does the South have a higher concentration of fast food restaurants? Does the South have fewer walkable cities that could contribute to the lack of physical activity? These would make interesting topics for further analysis. It would also be interesting to see if the prevalence of diseases linked to obesity, such as diabetes and heart disease, are higher in these southern states as well.