Research Question: Does the average calorie content of menu items differ among major fast food restaurant chains?
The data for my project comes from the Nutrition in Fast Food Dataset provided by OpenIntro (Link: https://www.openintro.org/data/index.php?data=fastfood). This dataset contains 515 observations of menu items from several U.S fast food restaurants including McDonald’s, Burger King, Wendy’s, Chick-fil-A, Taco Bell etc. Each row in this dataset represents one menu item and includes information of the restaurant name, item name, calories, total fat, sodium, sugar, protein and other nutritional values.
The purpose of this project is to compare the calorie levels of menu items across different restaurant and determine whether the average calorie content differs among them.
Variables Selected:
restaurant: Name of fast food restaurant (categorical)
item: Name of menu item (categorical)
calories: Number of calories (quantitative)
Data Analysis In order to answer my research questions, I first loaded in my data set and started cleaning it while using the basic EDA functions, such as str() and head(). I then selected the variables I needed for this project to create a smaller data set to work with, which were restaurant, item and calories. After I created a new dataset, I explored the calories variable by using summary() to look at the mean, median, minimum and maximum values. I then used table() to see how many menu items each restaurant contributed to the data set. After that, I calculated summary statistics for calories overall and calories by restaurant in order to see the mean, median, standard deviations, minimum and maximum to see the distribution of calorie levels.
For my visuals, I created a histogram and a boxplot. I wanted to create a histogram to see the overall distribution of calories, and it showed that most menu items fall between 300 and 700 calories, with fewer items in the very high calorie range. The boxplot compares calorie levels across each restaurant from the data set. It displayed the spread, median, and variability if calories from each restaurant which allowed me to visually compare which restaurants tend to have higher- or lower calorie items.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
fast_food <- read_csv("fastfood.csv")
## Rows: 515 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): restaurant, item, salad
## dbl (14): calories, cal_fat, total_fat, sat_fat, trans_fat, cholesterol, sod...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(fast_food)
## spc_tbl_ [515 × 17] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ restaurant : chr [1:515] "Mcdonalds" "Mcdonalds" "Mcdonalds" "Mcdonalds" ...
## $ item : chr [1:515] "Artisan Grilled Chicken Sandwich" "Single Bacon Smokehouse Burger" "Double Bacon Smokehouse Burger" "Grilled Bacon Smokehouse Chicken Sandwich" ...
## $ calories : num [1:515] 380 840 1130 750 920 540 300 510 430 770 ...
## $ cal_fat : num [1:515] 60 410 600 280 410 250 100 210 190 400 ...
## $ total_fat : num [1:515] 7 45 67 31 45 28 12 24 21 45 ...
## $ sat_fat : num [1:515] 2 17 27 10 12 10 5 4 11 21 ...
## $ trans_fat : num [1:515] 0 1.5 3 0.5 0.5 1 0.5 0 1 2.5 ...
## $ cholesterol: num [1:515] 95 130 220 155 120 80 40 65 85 175 ...
## $ sodium : num [1:515] 1110 1580 1920 1940 1980 950 680 1040 1040 1290 ...
## $ total_carb : num [1:515] 44 62 63 62 81 46 33 49 35 42 ...
## $ fiber : num [1:515] 3 2 3 2 4 3 2 3 2 3 ...
## $ sugar : num [1:515] 11 18 18 18 18 9 7 6 7 10 ...
## $ protein : num [1:515] 37 46 70 55 46 25 15 25 25 51 ...
## $ vit_a : num [1:515] 4 6 10 6 6 10 10 0 20 20 ...
## $ vit_c : num [1:515] 20 20 20 25 20 2 2 4 4 6 ...
## $ calcium : num [1:515] 20 20 50 20 20 15 10 2 15 20 ...
## $ salad : chr [1:515] "Other" "Other" "Other" "Other" ...
## - attr(*, "spec")=
## .. cols(
## .. restaurant = col_character(),
## .. item = col_character(),
## .. calories = col_double(),
## .. cal_fat = col_double(),
## .. total_fat = col_double(),
## .. sat_fat = col_double(),
## .. trans_fat = col_double(),
## .. cholesterol = col_double(),
## .. sodium = col_double(),
## .. total_carb = col_double(),
## .. fiber = col_double(),
## .. sugar = col_double(),
## .. protein = col_double(),
## .. vit_a = col_double(),
## .. vit_c = col_double(),
## .. calcium = col_double(),
## .. salad = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
head(fast_food)
## # A tibble: 6 × 17
## restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mcdonalds Artisan G… 380 60 7 2 0 95
## 2 Mcdonalds Single Ba… 840 410 45 17 1.5 130
## 3 Mcdonalds Double Ba… 1130 600 67 27 3 220
## 4 Mcdonalds Grilled B… 750 280 31 10 0.5 155
## 5 Mcdonalds Crispy Ba… 920 410 45 12 0.5 120
## 6 Mcdonalds Big Mac 540 250 28 10 1 80
## # ℹ 9 more variables: sodium <dbl>, total_carb <dbl>, fiber <dbl>, sugar <dbl>,
## # protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>, salad <chr>
Selecting varibales I am using
# Selecting only the variables I am using
fast_food_clean <- fast_food |>
select(restaurant, item, calories)
head(fast_food_clean)
## # A tibble: 6 × 3
## restaurant item calories
## <chr> <chr> <dbl>
## 1 Mcdonalds Artisan Grilled Chicken Sandwich 380
## 2 Mcdonalds Single Bacon Smokehouse Burger 840
## 3 Mcdonalds Double Bacon Smokehouse Burger 1130
## 4 Mcdonalds Grilled Bacon Smokehouse Chicken Sandwich 750
## 5 Mcdonalds Crispy Bacon Smokehouse Chicken Sandwich 920
## 6 Mcdonalds Big Mac 540
Summary of calories over all and how many items each restaurant has
summary(fast_food_clean$calories)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.0 330.0 490.0 530.9 690.0 2430.0
table(fast_food_clean$restaurant)
##
## Arbys Burger King Chick Fil-A Dairy Queen Mcdonalds Sonic
## 55 70 27 42 57 53
## Subway Taco Bell
## 96 115
Find mean, median, sd, min and max of calories overall and by restaurant
calorie_count <- fast_food_clean |>
summarise(
mean_calories = mean(calories, na.rm = TRUE),
median_calories = median(calories, na.rm = TRUE),
sd_calories = sd(calories, na.rm = TRUE),
min_calories = min(calories, na.rm = TRUE),
max_calories = max(calories, na.rm = TRUE)
)
#Colories by restaurant
calorie_stats_by_restaurant <- fast_food_clean |>
group_by(restaurant) |>
summarise(
mean_calories = mean(calories, na.rm = TRUE),
sd_calories = sd(calories, na.rm = TRUE)
)
Visuals
#histogram
ggplot(fast_food_clean, aes(x = calories)) +
geom_histogram(binwidth = 100, fill = "#FF1493", color = "black") +
labs(title = "Histogram of Calories in Fast Food Items",
x = "Calories",
y = "Frequency") +
theme_minimal()
ggplot(fast_food_clean, aes(x = restaurant, y = calories)) +
geom_boxplot(fill = "#FF83FA", color = "black") +
labs(
title = "Boxplot of Calories by Restaurant",
x = "Restaurant",
y = "Calories"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
ANOVA test
anova_result <- aov(calories ~ restaurant, data = fast_food_clean)
anova_result
## Call:
## aov(formula = calories ~ restaurant, data = fast_food_clean)
##
## Terms:
## restaurant Residuals
## Sum of Squares 3177729 37824143
## Deg. of Freedom 7 507
##
## Residual standard error: 273.137
## Estimated effects may be unbalanced
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## restaurant 7 3177729 453961 6.085 7.75e-07 ***
## Residuals 507 37824143 74604
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Statistical Analysis To determine whether the mean calorie content differs among restaurants, I performed a one way ANOVA test.
Hypothesis \(H_0\): \(\mu_1 = \mu_2\) There is no significant difference in the mean calorie content of menu items amount the fast-food restaurants.
\(H_a\): at least one \(\mu_i\) is different. There is a significant difference in the mean calorie content of menu items among the fast-food restaurants.
Significance Level α = 0.05
From the ANOVA output, the p-value = 7.75e-07 and the test statistic/f-value = 6.085. This means, we reject the null hypothesis.
At the 5% significance level, there is enough evidence that there is a significant difference in the mean calorie content of menu items among the fast-food restaurants. This means that not all restaurants have the same average calorie levels, with some restaurants being higher or lower than others.
Conclusion My analysis on the Fast Foods dataset showed a clear difference in calorie levels across fast food restaurants. After comparing the data, I found that the average calories of menu items varied depending on the restaurant. The mean calorie values ranged from as low as 384 calories at Chick-fil-A to as high as 640 calories at McDonald’s. Other restaurants such as Sonic reached close to 632 and Burger King around 609 offered higher-calorie items on average. Restaurants like Taco Bell were around 444 and Subway near 503 tended to offer lower-calorie menu options. These differences were confirmed by the ANOVA test which gave us a very small p-value at 7.75e-07. These finding answered my research question showing that certain fast food restaurants consistently serve higher calorie menu items compared to others. I believe understanding the calorie differences across fast food restaurants can help consumers make more informed choices about where and what they choose to eat. In the future, I would like to compare more specific items separately such as their burgers, chicken sandwiches, chicken nuggets, french fries etc. to gain more insight into specific categories.
REFERENCES: https://www.openintro.org/data/index.php?data=fastfood
https://www.statology.org/rotate-axis-labels-ggplot2/ For the box plot, I searched up how to make my labels diagonal so they wouldn’t be overlapping.