Obesity Rates Compared to Fast Food Locations Across the USA

Introduction

Obesity is a growing problem in America. Over the past 3 decades, there has been what doctors call a ‘startling’ increase in obesity rates in the US. Obesity leads to numerous health conditions such as high blood pressure, strokes, type 2 diabetes, and heart disease. Over the past 33 years, worldwide obesity rates among adults have increased by 27.5%. The US had the highest increases in prevalence of adult obesity. Even worse, a third of the US population is considered obese today. Researchers, doctors and scientists are trying hard to find a solution to this problem. Variables such as age, ethnicity, and genetics affect BMI. In this project, I investigated one in particular which is fast food locations in each state. I am curious to see if those who live in states with the highest amount of fast food restaurants also have the highest statewide obesity rates.

Hypothesis

There is a positive correlation between the most obese states in the US and the high amount of fast food restaurant locations in the state. Also, there is a positive correlation between the least obese states and the low amount of fast food restaurants in the area.

Methodology

A list of required packages for this project includes: sf, mapview, rvest, tidyverse, ggplot2, dplyr, and readxl.

## Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3

library(mapview)
library(rvest)

## Loading required package: xml2

library(tidyverse)

## ── Attaching packages ────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0       ✔ purrr   0.3.2  
## ✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.3       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.4.0

## ── Conflicts ───────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()         masks stats::filter()
## ✖ readr::guess_encoding() masks rvest::guess_encoding()
## ✖ dplyr::lag()            masks stats::lag()
## ✖ purrr::pluck()          masks rvest::pluck()

library(ggplot2)
library(dplyr)
library(readxl)

To look at obesity rates, I downloaded a geojson file called, “National Obesity By State” from this website. This website is up to date as of 2018. The geojson file works incredibly well with mapview. I was able to use the mapview function to create a map. The map is interactive so users can hover over each state to see its obesity rate. The more yellow-green each state is, the higher the percentage of obesity for that state. On the other hand, the more blue-purple the state is, the lower the percentage of obesity for that state.

mapview(obesity, zcol="Obesity")

From this geojson file, I gathered the top 5 most obese and least obese states.

Top 5 Most Obese States in the US

State	Obesity Percentage
Louisiana	36.2%
Georgia	35.6%
Alabama	35.6%
West Virginia	35.6%
Kentucky	34.6%

Strangely enough, the middle three states were tied for second for the highest obesity level in the country.

Top 5 Least Obese States in the US

State	Percentage of Obesity
Colordao	20.2%
Montana	23.3%
Massachusetts	24.7%
Utah	24.8%
Vermont	25.1%

Here the data has more variability.

The next file uploaded was a csv from kaggle that had the locations of over 10,000 fast food restaurants across the US. The dataset was last updated in 2015. I titled the dataset fastFood. Because only the locations in the US were of interest, I sent the fastFood variable through a pipe operator which included only the fastFood locations in the US. Once I did this, I was able to send the same variable through a second pipe operator which allowed me to group by ‘province’ (also called state). Then, I was able to use the add_tally() function to count the number of locations per state. The result of this function in the table is a column titled ‘n’ which gives the number of locations per state. The fastFood2 variable from this series of data manipulation was the variable I ended up using for the rest of my analysis.

fastFood %>% 
  filter(country %in% "US") -> fastFood

fastFood %>% 
  group_by(province) %>% 
  add_tally() -> fastFood2

Next, I thought it would be interesting to plot the fast food restaurants on a map. I read in the usa dataset and used the usa_48 variable that is familiar to the class. This variable removed Alaska, District of Columbia, Hawaii and Puerto Rico to create a cleaner version of the map. Once this outline map of the USA was created, I used ggplot to create a plot that I could overlay on my map. The points were from the fastFood2 subset of the data and the coord_sf() indicated which latitude and longitude points to include on the final plot. I also included two themes. The first theme was to change the title of my graph and the second theme was a map background theme created so the background of my plots are clean and consistent.

## Reading layer `cb_2017_us_state_20m' from data source `/Users/Mschultz/COM 329*R/geospatial/USA/cb_2017_us_state_20m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 52 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs

usa_48 <- usa %>% 
  filter(!(NAME %in% c("Alaska", "District of Columbia", "Hawaii", "Puerto Rico")))

myplottheme <- theme(plot.title = element_text(family = "Helvetica", face = "bold", size = (15)))

background_theme <- theme(panel.background = element_blank(),
                     axis.text = element_blank(),
                     axis.title = element_blank(),
                     axis.ticks = element_blank(),
                     axis.line = element_blank())
ggplot() +
  geom_sf(data=usa_48) +
  geom_point(data=fastFood2, aes(longitude, latitude, alpha = n, color = "red"), show.legend = FALSE) +
  coord_sf(xlim = c(-130,-60), ylim = c(20,50)) +
  ggtitle("Fast Food Restaurant Locations Across the USA") + 
  myplottheme +
  theme(plot.title = element_text(hjust = 0.5)) +
  background_theme

As expected, there is a pattern. The higher the population in each state the more fast food restaurants in that state as well. However, this does not give us that much information regarding obesity rates. In order to find out the obesity level in each state compared to the number of restaurant locations, I needed to standardize the dataset by population. In order to standardize the dataset, a wikipedia table that included each state and its population was imported to r using the read_html function. A function called html nodes was used which allowed me to extract tables out of the wikipedia page. A subset of all of the tables included on the wikipedia page was created that only included the population table.

tbls <- html_nodes(wikipedia, "table")
 
pop_tbls <- wikipedia %>% 
  html_nodes("table") %>% 
  .[1] %>% 
  html_table(fill=TRUE)

popByState <- as.data.frame(pop_tbls)

#this is a copy of the popByState dataset
popByState2 <- popByState

This standardization process allowed me to make comparisons about state population and amount of fast food locations in each state. In order to make things easier, I created my own excel table called propTableGeospatial. I gathered data from my fastFood2 dataset as well as the population dataset from wikipedia. Most importantly, a proportion was calculated which was the state population divided by the number of fast food locations in that state. A preview of the dataset is shown below.

Preview of Excel Table

State	Population	Locations	Proportion
Alabama	4,887,871	236	20,711.32
Arizona	7,171,646	208	34,479.97
Arkansas	3,013,825	151	19,959.12

Once I was able to upload the propTableGeospatial, I was able to create my final map. I used an inner join to join my new table and the previously created usa_48 table by the state column in common. The variable created, merged2, allowed me to create a map which graphically showed the proportion calculated. Again, plot and title themes were added to ggplot. The same title and plot theme was used here as well.

## New names:
## * `` -> ...5
## * `` -> ...6

propTableGeospatial %>% 
  inner_join(usa_48) -> merged2

## Joining, by = "NAME"

## Warning: Column `NAME` joining character vector and factor, coercing into
## character vector

merged2 %>% 
  ggplot() +
  geom_sf(aes(fill=Proportion)) +
  coord_sf(xlim = c(-130,-60), ylim = c(20,50)) +
  ggtitle("Proportion of Population Compared to \nFast Food Restaurant Locations") +
  myplottheme +
  theme(plot.title = element_text(hjust = 0.5)) +
  background_theme

Results

So, what does the proportion actually represent? Proportions can be easily calculated, however interpretations are not always as straightforward. The proportion calculated was population by state divided by number of fast food restaurants in that state. If the proportion ended up being 1, this would mean that each individual would have their own fast food restaurant. If this was the case (which it wasn’t), this would be an incredibly high amount of fast food locations. So, the lower the proportion, the more restaurants there are per people in the population. In order to interpret the graph easier, below is a preview of the states with the lowest proportion (i.e. the states in dark blue on the map).

Lowest Proportions of Population by State

and Number of Fast Food Restaurant Locations

State	Proportion
Kentucky	13,459.04
North Dakota	15,201.54
Ohio	21,527.52
Alabama	20,711.32
Montana	42,492.20
Texas	45,271.05

These results are interesting, yet a bit confusing. Here Kentucky is ranked as number 1 for proportion and it’s ranked number 5 for obesity level overall. This result supports my hypothosis that there is a positive correlation between the most obese countries in the US and the amount of fast food restaurant locations in the state. In this case, it would make sense that Kentucky has the most resturants in the state in comparison to its population and that it has a high level of obesity.

If we go further down the list, North Dakota, Ohio, and Montana are listed as having low proportions, however, they are not on the top 5 or even top 10 list for most obese states in the country. I have decided there are a couple reasons for this. First, I think that fast food industries have been increasing country wide. Chains are being built in more locations to increase their profit. Another reason these states are listed as having low proportions and a lower obesity level is solely based off of the population size in each state. In comparison to other states, Montana had a fairly low population. By taking the proportions of each population by state and amount of fast food locations, I was trying to make sure this did not happen. This being said because Montana had such a lower population in comparison to the rest of the states, this could be why its being represented as having a low proportion.

It was shocking to see Ohio on this map as well. Ohio has a fairly large population and a lower obesity rate at 29.8%. This result completely disproves my hypothesis. As a state, Ohio has a combination of robust city life as well as rural areas. Research shows that the highest amount of fast food restaurants per county in Ohio reside in the more rural town. Fayette County has 1 fast food restaurant per 1,000 people, a median household income of $41,000 and a poverty rate of 16%. This town is also rural in comparison to other counties in the state. For this reason it is possible the proportion is skewed.

I also investigated some of the highest proportions.

Highest Proportions of Population by State

and Number of Fast Food Restaurant Locations

State	Proportion
New Jersey	58,996.83
California	58,516.34
Massachusetts	52,688.16
Pennsylvania	45,254.63
Florida	45,221.50
Rhode Island	44,054.79

New Jersey ranks number one for the least amount of fast food restaurants per people in the population. New Jersey has a low obesity rate of 25.6%. California, Massachusetts, Pennsylvania, Florida and Rhode Island all have obesity rates in the same range. This piece of information seems to support my hypothesis as well, that there is a positive correlation between the least obese states and the low amount of fast food restaurants in the area. However, there is room for more exploration.

Conclusion/Further Research

The Kentucky example is how I would explain a positive correlation. To reiterate Kentucky had a very high amount of fast food restauraunts per state as well as a high obesity percentage. This positive correlation shows how these variables could be related. However, correlation is not to get confused with causation. Kentucky’s obesity rate is not caused by the amount of restaurants in the state per population. There are too many other variables to consider that could also affect obesity levels within the state.

This leads me to believe there is room for further research. Different variables affect obesity level. Age, genetics, income level, poverty level, and ethnicity are all examples of other variables that might affect and individual’s BMI. So, it is hard to make a conclusion on obesity levels based only off of fast food restaurant locations in each state. A deeper analysis of different variables that could potentially affect BMI need to be taken into consideration.

Overall, the obesity map and proportion map do not line up completely, disproving my hypothesis. I thought that the high amount of fast food restaruants per area would result in a higher obesity rate per state, which was not the case.