The following data analysis explores different kinds of food contents found in menu items offered at several fast food restaurants. There are eight fast food restaurants within this dataset: McDonald’s, Chick-Fil-A, Sonic, Arby’s, Burger King, Dairy Queen, Subway and Taco Bell. Each restaurants sells different food items including burgers, sandwiches, burritos, chicken nuggets and salads. Within each food item are a wide array of nutritional content amounts: calories, fats, cholesterol, sodium, carbs, fiber, sugar, protein, vitamins and calcium. The source for this dataset is Data.World, however there is no ReadMe file with any information on methodology.
One topic of background research I feel is relevant is income within the fast food industry. McDonald’s alone reported approximately $8.47 billion dollars in 2023. To add more variance and perspective, Chick-Fil-A generated about $6.4 billion dollars in 2022, and Dairy Queen generated roughly $3.6 billion dollars in 2023. It is also imperative to understand what each of these nutritional values means. A calorie is a metric for energy. Fats are similar but also help regulate heart circulation and assist in healthy hair and skin growth. Carbohydrates are similar units for energy storage that also contain sugars and starch. Cholesterol contributes to cell growth and hormone production. Sodium produces muscle growth and nerve function. Fiber assists your digestive system and weight change. Sugar transforms carbs into glucose for energy use. Protein offers structure to the body’s internal tissues and organs. A vitamin is a carbon compound that helps many parts of the body function correctly. Lastly, calcium strengthens bones and teeth. For many of these food contents, too much or too low consumption likely will result in certain heart, blood and internal health problems. I also looked up the metric for each nutritional value (grams, milligrams, etc.) as I was unaware. One correlation I will analyze is the relationship between sodium and calories in all fast food restaurants.
It is essential to comprehend exactly what we are absorbing into our bodies, so we know what nutrients we may lack or need more of for healthy living. I enjoy learning about how much of every nutrient a restaurant sells. I eat at several of the restaurants in this dataset, especially McDonald’s. After all, McDonald’s is the highest-grossing fast food resraurant in the world with Chick-Fil-A not far behind. I would like to know which nutrients we are predominantly consuming to evaluate which levels in our bodies are particularly high.
Load the libraries
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.4.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Rows: 515 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): restaurant, item, salad
dbl (14): calories, cal_fat, total_fat, sat_fat, trans_fat, cholesterol, sod...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
chick_vitamin_total <- chick_fil_a |>mutate(chick_vitamin_total = vit_a + vit_c)chick_vitamin_total # This adds the same column to just our new Chick Fil-A dataset
Create a scatterplot with a linear regression and confidence interval
I am first going to measure sodium and calorie content in only Chick-Fil-A food items. The size of each dot is determined by the total fat in the item. I use the geom_smooth function to incorporate a dashed line to display a clear positive trend between the two variables.
ggplot(chick_vitamin_total,aes(x = sodium,y = calories,size = total_fat,)) +geom_point(alpha =0.6, color ="red") +xlim(200,3700) +ylim(50,1000) +labs(title ="Chick Fil-A Menu Items: Nutritional Content",caption ="Source: Data.World",x="Sodium (in grams)",y="Calories",size ="Total Fat (in grams)") +geom_smooth(method='lm',formula=y~x, se =FALSE, linetype="dashed", size =0.6) +theme_economist(base_size =12) # Change the theme
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning: Removed 9 rows containing missing values or values outside the scale range
(`geom_smooth()`).
Let’s add some interactivity with plotly, a feature that adds mouse-over tooltip capabilities but also causes us to lose our Total Fat legend
ggplot(chick_vitamin_total,aes(x = sodium,y = calories,size = total_fat,)) +geom_point(alpha =0.6, color ="red") +xlim(200,3700) +ylim(50,1000) +labs(title ="Chick Fil-A Menu Items: Nutritional Content",caption ="Source: Data.World",x="Sodium (in grams)",y="Calories",size ="Total Fat (in grams)") +geom_smooth(method='lm',formula=y~x, se =FALSE, linetype="dashed", size =0.6) +theme_economist(base_size =12)
Warning: Removed 9 rows containing missing values or values outside the scale range
(`geom_smooth()`).
ggplotly()
In this first visualization I measured the sodium and calorie content from every food item that Chick-Fil-A serves. This scatterplot shows a very clear positive correlation between the two variables; when sodium increases in a food item, so does the number of calories. Interestingly enough, because the size of each dot on the plot corresponds to how many grams of fat that item has, we are able to see a second positive correlation - that when sodium and calories increase, so does the amount of fat within that menu item.
fit1 <-lm(calories ~ sodium, data = chick_vitamin_total)summary(fit1)
Call:
lm(formula = calories ~ sodium, data = chick_vitamin_total)
Residuals:
Min 1Q Median 3Q Max
-136.18 -43.09 -10.61 40.51 154.72
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 53.14759 26.02441 2.042 0.0518 .
sodium 0.28771 0.01921 14.975 5.45e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 71.21 on 25 degrees of freedom
Multiple R-squared: 0.8997, Adjusted R-squared: 0.8957
F-statistic: 224.3 on 1 and 25 DF, p-value: 5.446e-14
Analysis
The model has the equation : calories = 0.29(sodium) + 53.15. This dervies from the formula y=mx + b. The p-value for the model is 0.00000000000005446. This is an extremely small p-value, suggesting that there is a significant correlation between sodium and calories in Chick-Fil-A food items. The p-value also has 3 asterisks which suggests it is a meaningful variable to explain the linear increase in calories. The slope may be interpreted as the following: For each additional gram of sodium in a Chick-Fil-A food item, there is a predicted increase of 0.29 calories. The adjust R-squared value is 0.8957. This means that nearly 90% of the variation in the observations may be explained by this model while about 10% of the variation in the data is likely not explained by the model.
Now let’s explore data from all eight fast food restaurants within the initial dataset
I am now measuring the total carbs and cholesterol in all food items for all restaurants.
ggplot(vitamin_total, aes(x = total_carb, y = cholesterol, color = restaurant)) +geom_point(alpha =0.1) +scale_colour_viridis_d() +geom_jitter() +labs(title ="Fast Food Restaurant Carb and Cholesterol Content by Menu Item",caption ="Source: Data.World",x ="Carbs (in grams)",y ="Cholesterol (in milligrams)",color ="Restaurant") +theme_bw()
I am cleaning up the data now. This includes altering the color scheme manually to colors that appear brighter and are more distinct from one another, as well as changing the theme of the scatterplot to theme_stata which seems more interesting and appropriate. I also increased the alpha slightly as an attempt to make each colored point stand out more and not obfuscate any neighboring points.
ggplot(vitamin_total3, aes(x = total_carb, y = cholesterol, color = restaurant)) +geom_point(alpha =0.3) +scale_color_manual(values =c("red", "blue", "brown", "green", "plum","#E60","cyan","yellow")) +geom_jitter() +labs(title ="Fast Food Restaurant Carb and Cholesterol Content by Menu Item",caption ="Source: Data.World",x ="Carbs (in grams)",y ="Cholesterol (in milligrams)",color ="Restaurant") +theme_stata() # Change the theme
This second visualization plots every menu item in the entire dataset based on their carb and cholesterol content on a scatterplot, color-coded by restaurant. The correlation between these two variables is, overall, much weaker than the first scatterplot. There is much variety in the amount of carbs in each food item, while the amount of cholesterol is generally concentrated under 200 milligrams. Subway and Taco Bell, in particular, have menu items with a wide range of carbs - some items under 25 grams and other items above 100 grams. Both of those restaurants sell a higher variety of food compared to other niche fast foot places. Other restaurants such as Burger King and Sonic have food items that are much closer to 50 grams of carbs, on average. I wonder if these two restaurants have similar carb content in their food because they are both primarily burger joints. The dots for McDonald’s food items are very intersting, because they are all over the place. Three of the top five highest cholesterol items come from McDonald’s, including the American Brewhouse King which ranks as the number one highest food item.
Measuring fat from calories and vitamin totals for each restaurant
Warning: Removed 213 rows containing missing values or values outside the scale range
(`geom_point()`).
Removed 213 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 144 rows containing missing values or values outside the scale range
(`geom_point()`).
Removed 144 rows containing missing values or values outside the scale range
(`geom_point()`).
Conclusion
The last visualization looks at calories specifically from fat and the total vitamins column I created from the vitamin a and c columns already given. Each restaurant has their own unique-looking scatterplot. Generally speaking, each restaurant has very few food items that contain more than 500 calories from fat. Once again, McDonald’s has the most variety, this time in both calories from fat and in total vitamins. I am a little surprised that McDonald’s has a handful of food items with such a high vitamin total given that they have the highest cholesterol items, and I do not typically associate those two contents together. I usually correlate cholesterol to unhealthy food and vitamins to healthy food, so I suppose these results speak to the surprisingly wide variety in McDonald’s menu. Subway also has several food items wtith low to intermediate levels of calories from fat and total vitamins. Despite Chick-Fil-A being known for chicken and Arby’s being known for roast beef, the two restaurants have nearly identical plots: few calories from fat overall and total vitamins under 100 grams for each item. The Taco Bell plot is the opposite of McDonald’s and, just like with their low cholesterol content, also plots few calories from fat and total vitamins in their items.
I wish I could have been able to add commas on my x and y axis to perhaps make the values in the thousands clearer at first glance. In the first scatterplot, I also wanted to move the x and y axis titles farther away from the numbers on the scale since they are almost running into each other.
Bibliography
1.Lingo D. What Are Calories and How Many Do You Need? EatingWell. Published March 8, 2023. https://www.eatingwell.com/article/8033186/what-are-calories/
1.MedlinePlus. Carbohydrates. Medlineplus.gov. Published 2022. https://medlineplus.gov/carbohydrates.html
1.Dietary fats explained: MedlinePlus Medical Encyclopedia. medlineplus.gov. Accessed July 8, 2024. https://medlineplus.gov/ency/patientinstructions/000104.htm#:
1.American Heart Association. What Is Cholesterol? www.heart.org. Published November 6, 2020. https://www.heart.org/en/health-topics/cholesterol/about-cholesterol
1.Gordon B. Is Sodium the Same Thing as Salt. www.eatright.org. Published August 8, 2019. https://www.eatright.org/health/essential-nutrients/minerals/is-sodium-the-same-thing-as-salt
1.Better Health Channel. Sugar. Better Health Channel. Published 2011. https://www.betterhealth.vic.gov.au/health/healthyliving/sugar
1.Medline Plus. What are proteins and what do they do? medlineplus.gov. Published 2021. https://medlineplus.gov/genetics/understanding/howgeneswork/protein/
1.Brazier Y. Vitamins: What are they and what do they do? Medical News Today. Published December 15, 2020. https://www.medicalnewstoday.com/articles/195878
1.National Institutes of Health. Office of Dietary Supplements - Calcium. Nih.gov. Published December 6, 2019. https://ods.od.nih.gov/factsheets/Calcium-Consumer/
1.International Dairy Queen Revenue - Zippia. www.zippia.com. Published December 14, 2021. https://www.zippia.com/international-dairy-queen-careers-27508/revenue/
1.Carter SM. Chick-fil-A lands behind McDonald’s as second-highest-grossing fast-food chain. FOXBusiness. Published May 15, 2020. https://www.foxbusiness.com/lifestyle/chick-fil-a-lands-behind-mcdonalds-as-second-highest-grossing-fast-food-chain