Project 2

The dataset topic I will be discussing is the nutritional value of fast foods from varieties of fast food restaurants. The categorical variables in the dataset are the multiple restaurant names and 505 different food names from these restaurants. The quantitative variables are the amount of calories, calories from fat, total fat, saturated fat, trans fat, cholesterol, sodium, total carbohydrates, fiber, sugar, protein, vitamin A, vitamin C and calcium. To give a little more background information, saturated fat, trans fat and cholesterol are the unhealthy nutritional values. I chose this topic because I consume fast food regularly. However, I am also trying to gain more weight in a healthy way while not consuming too much unhealthy fat or cholesterol. I believe that organizing and creating a visualization would make things easier for me to comprehend what kinds of fast food would be more beneficial for my health and what kinds of food I can eat in moderation while not compromising my health.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(dplyr)
library(ggplot2)
library(rio)

## Warning: package 'rio' was built under R version 4.2.3

library(plotly)

## Warning: package 'plotly' was built under R version 4.2.3

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:rio':
## 
##     export
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

#Load data from CSV file

fastfooddatafilecsv <- "fastfood.csv"
setwd("C:/Users/andre/OneDrive/Documents/Data")
fastfood <- import(fastfooddatafilecsv)

head(fastfood)

##   restaurant                                      item calories cal_fat
## 1  Mcdonalds          Artisan Grilled Chicken Sandwich      380      60
## 2  Mcdonalds            Single Bacon Smokehouse Burger      840     410
## 3  Mcdonalds            Double Bacon Smokehouse Burger     1130     600
## 4  Mcdonalds Grilled Bacon Smokehouse Chicken Sandwich      750     280
## 5  Mcdonalds  Crispy Bacon Smokehouse Chicken Sandwich      920     410
## 6  Mcdonalds                                   Big Mac      540     250
##   total_fat sat_fat trans_fat cholesterol sodium total_carb fiber sugar protein
## 1         7       2       0.0          95   1110         44     3    11      37
## 2        45      17       1.5         130   1580         62     2    18      46
## 3        67      27       3.0         220   1920         63     3    18      70
## 4        31      10       0.5         155   1940         62     2    18      55
## 5        45      12       0.5         120   1980         81     4    18      46
## 6        28      10       1.0          80    950         46     3     9      25
##   vit_a vit_c calcium salad
## 1     4    20      20 Other
## 2     6    20      20 Other
## 3    10    20      50 Other
## 4     6    25      20 Other
## 5     6    20      20 Other
## 6    10     2      15 Other

I removed Arbys, Sonic, and Dairy Queen because there really aren’t many in my area and I just simply don’t eat it as much as the other fast food restaurants. This way the data visualizations would be more cleaner and simpler.

# Remove the "Arbys", "Dairy Queen", and "Sonic" variables from the "restaurants" column
fastfood$restaurant <- gsub("(Arbys|Dairy Queen|Sonic),? ", "", fastfood$restaurant)

# Remove the "Arbys", "Dairy Queen", and "Sonic" variables
fastfood_filtered <- fastfood[, !(names(fastfood) %in% c("Arbys", "Dairy Queen", "Sonic"))]

# Exclude "Arbys", "Dairy Queen", and "Sonic" from the plot
fastfood_filtered_excluded <- fastfood_filtered[!(fastfood_filtered$restaurant %in% c("Arbys", "Dairy Queen", "Sonic")), ]

Now, I made a scatter plot to see any correlation with the items and restaurants regarding protein and calorie count.

# Create a scatter plot to visualize the correlation between protein and calories for each restaurant
ggplot(fastfood_filtered_excluded, aes(x = protein, y = calories, color = restaurant)) +
  geom_point() +
  labs(title = "Protein and Calories for Each Items from Restaurants ",
       x = "Protein", y = "Calories")

## Warning: Removed 1 rows containing missing values (`geom_point()`).

Because I plan on gaining weight by consuming foods that contains more calories and proteins, I wanted to see which restaurants has the most varieties of food that includes both type of nutrition. Taco Bell has the least variety of food that all seem to have less than 50g of protein and less than 1000 calories. This shows that Taco Bell doesn’t have much to offer when it comes to food that packs in a decent amount of calories and protein. I would stay away from Taco Bell to save money and not invest in food that doesn’t give much health benefits.

This visualization also shows that five of the seven highest calorie and protein count foods are from McDonald’s.

p <- ggplot(fastfood_filtered_excluded, aes(x = protein, y = calories, text = item, color = restaurant)) +
  geom_point() +
  labs(title = "Protein and Calories for Each Items from Restaurants ",
       x = "Protein", y = "Calories") +
  theme(legend.position = "bottom") +
  scale_color_discrete(name = "Restaurant")

ggplotly(p)

Now that I see some food with higher protein count and calories, I want to find the names of the items and so I added adjustments to find the names of the items when hovering over the plot. From the description of the items, I think I will pack up on some calories and protein with the 20 pc Buttermilk Crispy Chicken Tenders.

Another positive nutrient that I would like in my food intake is higher carbohydrates if I really want to gain weight. So I would like to find the highest carbohydrate count item as well.

p <- ggplot(fastfood_filtered_excluded, aes(x = protein, y = total_carb, text = item, color = restaurant)) +
  geom_point() +
  labs(title = "Protein and Total Carbs for Each Items from Restaurants ",
       x = "Protein", y = "Total Carbs") +
  theme(legend.position = "bottom") +
  scale_color_discrete(name = "Restaurant")

ggplotly(p)

In this scatter plot, it is evident that McDonald’s has food that has most variety when it comes to beneficial nutritional value for me.

However, just because a certain food is high in protein, carbohydrates or calories does not mean it is completely beneficial to me. It may be high in sodium, cholesterol or in unhealthy fats. So I need find correlation with all items regarding protein and unhealthy nutrition.

First, I group the top seven food with the highest protein.

top_protein_foods <- fastfood_filtered_excluded %>% 
  arrange(desc(protein)) %>% 
  head(7) %>% 
  select(item, protein)

top_protein_foods

##                                               item protein
## 1       20 piece Buttermilk Crispy Chicken Tenders     186
## 2                          American Brewhouse King     134
## 3       12 piece Buttermilk Crispy Chicken Tenders     115
## 4                         30 piece Chicken Nuggets     103
## 5                       40 piece Chicken McNuggets      98
## 6 10 piece Sweet N' Spicy Honey BBQ Glazed Tenders      97
## 7       10 piece Buttermilk Crispy Chicken Tenders      94

It is tedious and difficult to find all the correlation of the negative nutrition one by one to protein count by using a graphs for each negative nutrition to protein. So I use a correlation plot.

top_protein_foods <- fastfood_filtered_excluded %>% 
  filter(item %in% top_protein_foods$item)

correlations <- cor(top_protein_foods[, c("protein", "trans_fat", "sat_fat", "cholesterol", "sodium")])

library(ggcorrplot)

## Warning: package 'ggcorrplot' was built under R version 4.2.3

ggcorrplot(correlations, type = "upper", hc.order = TRUE, 
           colors = c("#6D9EC1", "#FFFFFF", "#E46726"), lab = TRUE, 
           lab_size = 4, title = "Correlation Between Protein and Negative Nutrition Variables for Top 7 Foods with Highest Protein")

This plot shows all the correlations between the negative nutritional values and protein. It is clear that there is an obvious correlation with cholesterol, saturated fat and trans fat. Fortunately, protein count correlates just a little but ultimately still correlates positively to the negative nutrients which was expected. I mean we’re talking fast food here.

This plot however does not show much so I decide to look for the average amount of all nutrition for all the foods per restaurant.

avg_nutrition_by_restaurant <- fastfood_filtered_excluded %>%
  group_by(restaurant) %>%
  summarise(
    avg_calories = mean(calories),
    avg_cal_fat = mean(cal_fat),
    avg_total_fat = mean(total_fat),
    avg_sat_fat = mean(sat_fat),
    avg_trans_fat = mean(trans_fat),
    avg_cholesterol = mean(cholesterol),
    avg_sodium = mean(sodium),
    avg_total_carb = mean(total_carb),
    avg_fiber = mean(fiber),
    avg_sugar = mean(sugar),
    avg_protein = mean(protein)
  )

After finding the averages of each nutritional values of each restaurants, I tried to make boxplots in comparison to all the nutritional values. However, I realized that getting averages only gives me the average values so they are just shown as lines instead of actual box plots. I realized I should’ve made a histogram.

Interestingly, I made it so that if you were to click on restaurants on the legend, you can compare the averages.

library(reshape2)

## Warning: package 'reshape2' was built under R version 4.2.3

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

# Calculate the average nutritional values for each restaurant
avg_nutrition_by_restaurant <- fastfood_filtered_excluded %>%
  group_by(restaurant) %>%
  summarize(across(calories:protein, mean, na.rm = TRUE))

## Warning: There was 1 warning in `summarize()`.
## ℹ In argument: `across(calories:protein, mean, na.rm = TRUE)`.
## ℹ In group 1: `restaurant = "Burger King"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

# Melt the data into long format for plotting
melted_avg_nutrition <- melt(avg_nutrition_by_restaurant, id.vars = "restaurant",
                             variable.name = "nutrient", value.name = "value")

# Create the boxplot
p <- ggplot(melted_avg_nutrition, aes(x = nutrient, y = value, fill = restaurant,
                                      text = paste(nutrient, ": ", round(value, 2)))) +
  geom_boxplot() +
  labs(x = "Nutrient", y = "Average Value", title = "Average Nutritional Values by Restaurant") +
  theme_minimal() +
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1))

# Convert to plotly object and display
ggplotly(p, tooltip = c("text"))

Now I attempt to make histograms with all nutrient averages included.

# Calculate the average nutritional values for each restaurant
avg_nutrition_by_restaurant <- fastfood_filtered_excluded %>%
  group_by(restaurant) %>%
  summarize(across(calories:protein, mean, na.rm = TRUE))

# Melt the data into long format for plotting
melted_avg_nutrition <- melt(avg_nutrition_by_restaurant, id.vars = "restaurant",
                             variable.name = "nutrient", value.name = "value")

# Create the histogram
histogram <- ggplot(melted_avg_nutrition, aes(x = value, fill = restaurant)) +
  geom_histogram(position = "dodge", bins = 10, color = "black") +
  facet_wrap(~ nutrient, scales = "free_x", ncol = 2) +
  labs(x = "Average Value", y = "Count", title = "Histogram of Average Nutritional Values by Restaurant") +
  theme_minimal() +
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1))

# Add hover information
hover <- melted_avg_nutrition %>%
  mutate(tooltip = paste("Restaurant: ", restaurant, "<br>",
                         "Nutrient: ", nutrient, "<br>",
                         "Value: ", value)) %>%
  select(-restaurant, -nutrient, -value) %>%
  unique()

histogram <- ggplotly(histogram, tooltip = c("x", "fill"), text = hover$tooltip)
histogram

You can click the restaurants to compare the averages of the nutrients and see what has more sodium or protein on average in comparison to other restaurants. As I clicked and compared restaurant nutrition averages, it seems Chik Fil A has the lowest averages of negative nutrition overall. The only issue is that Chik Fil A food has the third highest cholesterol averages of all the restaurants and its protein average is in third. So I would pretty much be consuming protein and cholesterol from Chik Fil A while giving up the rest off nutrients.

In addition, McDonald’s has the highest average in cholesterol.

Subway also seems like a decent choice for healthier foods in general because its average fiber, protein and carbohydrates are high although I would have to put into account that it’s sugar average is second highest.

Although it is interesting to compare these data visualizations, I believe that the averages of nutritional values of food for restaurants don’t mean much for someone who is focusing on select nutritional values like me. However, for other people looking to see and make fast food restaurants healthier in general, it is actually quite helpful to look at these average comparisons.

This project originally started as a quest to find what restaurants would offer the best food that offers the healthiest for me personally but then later turned into an interesting discovery of what restaurants give out healthier foods on average which might be beneficial to my own diet overall.

The biggest takeaway from this data is that, to my surprise, Taco Bell is actually not bad at all. Subway and Chik Fil A are probably the healthiest options overall depending on what you’re willing to trade for. Lastly, McDonald’s and Burger King may lead to death because of how unhealthy their food is on average.

For some background information, fast food is popular for most age groups because of the low cost, consistency, and convenience. Majority of research reveals that fast food consumption is linked to poor diet quality, weight gain, and even mortality. Research shows that improvements on nutrition labeling results in the availability of healthier choices for consumers.

Citations: Min J, Jahns L, Xue H, Kandiah J, Wang Y. Americans’ perceptions about fast food and how they associate with its consumption and obesity risk. Adv Nutr. 2018;9(5):590–601. doi: 10.1093/advances/nmy032.

Todd JE. Changes in consumption of food away from home and intakes of energy and other nutrients among US working-age adults, 2005–2014. Public Health Nutr. 2017;20(18):3238–3246. doi: 10.1017/S1368980017002403.

Project 2

Andrew Kwak

2023-04-17