Research Question: Does the average calorie content of menu items differ among major fast food restaurant chains?

The data for my project comes from the Nutrition in Fast Food Dataset provided by OpenIntro (Link: https://www.openintro.org/data/index.php?data=fastfood). This dataset contains 515 observations of menu items from several U.S fast food restaurants including McDonald’s, Burger King, Wendy’s, Chick-fil-A, Taco Bell etc. Each row in this dataset represents one menu item and includes information of the restaurant name, item name, calories, total fat, sodium, sugar, protein and other nutritional values.

The purpose of this project is to compare the calorie levels of menu items across different restaurant and determine whether the average calorie content differs among them.

Variables Selected:

restaurant: Name of fast food restaurant (categorical)

item: Name of menu item (categorical)

calories: Number of calories (quantitative)

Data Analysis In order to answer my research questions, I first loaded in my data set and started cleaning it while using the basic EDA functions, such as str() and head(). I then selected the variables I needed for this project to create a smaller data set to work with, which were restaurant, item and calories. After I created a new dataset, I explored the calories variable by using summary() to look at the mean, median, minimum and maximum values. I then used table() to see how many menu items each restaurant contributed to the data set. After that, I calculated summary statistics for calories overall and calories by restaurant in order to see the mean, median, standard deviations, minimum and maximum to see the distribution of calorie levels.

For my visuals, I created a histogram and a boxplot. I wanted to create a histogram to see the overall distribution of calories, and it showed that most menu items fall between 300 and 700 calories, with fewer items in the very high calorie range. The boxplot compares calorie levels across each restaurant from the data set. It displayed the spread, median, and variability if calories from each restaurant which allowed me to visually compare which restaurants tend to have higher- or lower calorie items.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)

fast_food <- read_csv("fastfood.csv")
## Rows: 515 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): restaurant, item, salad
## dbl (14): calories, cal_fat, total_fat, sat_fat, trans_fat, cholesterol, sod...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(fast_food)
## spc_tbl_ [515 × 17] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ restaurant : chr [1:515] "Mcdonalds" "Mcdonalds" "Mcdonalds" "Mcdonalds" ...
##  $ item       : chr [1:515] "Artisan Grilled Chicken Sandwich" "Single Bacon Smokehouse Burger" "Double Bacon Smokehouse Burger" "Grilled Bacon Smokehouse Chicken Sandwich" ...
##  $ calories   : num [1:515] 380 840 1130 750 920 540 300 510 430 770 ...
##  $ cal_fat    : num [1:515] 60 410 600 280 410 250 100 210 190 400 ...
##  $ total_fat  : num [1:515] 7 45 67 31 45 28 12 24 21 45 ...
##  $ sat_fat    : num [1:515] 2 17 27 10 12 10 5 4 11 21 ...
##  $ trans_fat  : num [1:515] 0 1.5 3 0.5 0.5 1 0.5 0 1 2.5 ...
##  $ cholesterol: num [1:515] 95 130 220 155 120 80 40 65 85 175 ...
##  $ sodium     : num [1:515] 1110 1580 1920 1940 1980 950 680 1040 1040 1290 ...
##  $ total_carb : num [1:515] 44 62 63 62 81 46 33 49 35 42 ...
##  $ fiber      : num [1:515] 3 2 3 2 4 3 2 3 2 3 ...
##  $ sugar      : num [1:515] 11 18 18 18 18 9 7 6 7 10 ...
##  $ protein    : num [1:515] 37 46 70 55 46 25 15 25 25 51 ...
##  $ vit_a      : num [1:515] 4 6 10 6 6 10 10 0 20 20 ...
##  $ vit_c      : num [1:515] 20 20 20 25 20 2 2 4 4 6 ...
##  $ calcium    : num [1:515] 20 20 50 20 20 15 10 2 15 20 ...
##  $ salad      : chr [1:515] "Other" "Other" "Other" "Other" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   restaurant = col_character(),
##   ..   item = col_character(),
##   ..   calories = col_double(),
##   ..   cal_fat = col_double(),
##   ..   total_fat = col_double(),
##   ..   sat_fat = col_double(),
##   ..   trans_fat = col_double(),
##   ..   cholesterol = col_double(),
##   ..   sodium = col_double(),
##   ..   total_carb = col_double(),
##   ..   fiber = col_double(),
##   ..   sugar = col_double(),
##   ..   protein = col_double(),
##   ..   vit_a = col_double(),
##   ..   vit_c = col_double(),
##   ..   calcium = col_double(),
##   ..   salad = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
head(fast_food)
## # A tibble: 6 × 17
##   restaurant item       calories cal_fat total_fat sat_fat trans_fat cholesterol
##   <chr>      <chr>         <dbl>   <dbl>     <dbl>   <dbl>     <dbl>       <dbl>
## 1 Mcdonalds  Artisan G…      380      60         7       2       0            95
## 2 Mcdonalds  Single Ba…      840     410        45      17       1.5         130
## 3 Mcdonalds  Double Ba…     1130     600        67      27       3           220
## 4 Mcdonalds  Grilled B…      750     280        31      10       0.5         155
## 5 Mcdonalds  Crispy Ba…      920     410        45      12       0.5         120
## 6 Mcdonalds  Big Mac         540     250        28      10       1            80
## # ℹ 9 more variables: sodium <dbl>, total_carb <dbl>, fiber <dbl>, sugar <dbl>,
## #   protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>, salad <chr>

Selecting varibales I am using

# Selecting only the variables I am using

fast_food_clean <- fast_food |>
select(restaurant, item, calories)

head(fast_food_clean)
## # A tibble: 6 × 3
##   restaurant item                                      calories
##   <chr>      <chr>                                        <dbl>
## 1 Mcdonalds  Artisan Grilled Chicken Sandwich               380
## 2 Mcdonalds  Single Bacon Smokehouse Burger                 840
## 3 Mcdonalds  Double Bacon Smokehouse Burger                1130
## 4 Mcdonalds  Grilled Bacon Smokehouse Chicken Sandwich      750
## 5 Mcdonalds  Crispy Bacon Smokehouse Chicken Sandwich       920
## 6 Mcdonalds  Big Mac                                        540

Summary of calories over all and how many items each restaurant has

summary(fast_food_clean$calories)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    20.0   330.0   490.0   530.9   690.0  2430.0
table(fast_food_clean$restaurant)
## 
##       Arbys Burger King Chick Fil-A Dairy Queen   Mcdonalds       Sonic 
##          55          70          27          42          57          53 
##      Subway   Taco Bell 
##          96         115

Find mean, median, sd, min and max of calories overall and by restaurant

calorie_count <- fast_food_clean |>
  summarise(
    mean_calories = mean(calories, na.rm = TRUE), 
    median_calories = median(calories, na.rm = TRUE),
    sd_calories     = sd(calories, na.rm = TRUE),
    min_calories    = min(calories, na.rm = TRUE),
    max_calories    = max(calories, na.rm = TRUE)
  )

#Colories by restaurant 
calorie_stats_by_restaurant <- fast_food_clean |>
  group_by(restaurant) |>
  summarise(
    mean_calories = mean(calories, na.rm = TRUE),
    sd_calories   = sd(calories, na.rm = TRUE)
  ) 

Visuals

#histogram 
ggplot(fast_food_clean, aes(x = calories)) +
  geom_histogram(binwidth = 100, fill = "#FF1493", color = "black") +
  labs(title = "Histogram of Calories in Fast Food Items",
       x = "Calories",
       y = "Frequency") +
  theme_minimal()

ggplot(fast_food_clean, aes(x = restaurant, y = calories)) +
  geom_boxplot(fill = "#FF83FA", color = "black") +
  labs(
    title = "Boxplot of Calories by Restaurant",
    x = "Restaurant",
    y = "Calories"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

ANOVA test

anova_result <- aov(calories ~ restaurant, data = fast_food_clean)
anova_result
## Call:
##    aov(formula = calories ~ restaurant, data = fast_food_clean)
## 
## Terms:
##                 restaurant Residuals
## Sum of Squares     3177729  37824143
## Deg. of Freedom          7       507
## 
## Residual standard error: 273.137
## Estimated effects may be unbalanced
summary(anova_result)
##              Df   Sum Sq Mean Sq F value   Pr(>F)    
## restaurant    7  3177729  453961   6.085 7.75e-07 ***
## Residuals   507 37824143   74604                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Statistical Analysis To determine whether the mean calorie content differs among restaurants, I performed a one way ANOVA test.

Hypothesis \(H_0\): \(\mu_1 = \mu_2\) There is no significant difference in the mean calorie content of menu items amount the fast-food restaurants.

\(H_a\): at least one \(\mu_i\) is different. There is a significant difference in the mean calorie content of menu items among the fast-food restaurants.

Significance Level α = 0.05

From the ANOVA output, the p-value = 7.75e-07 and the test statistic/f-value = 6.085. This means, we reject the null hypothesis.

At the 5% significance level, there is enough evidence that there is a significant difference in the mean calorie content of menu items among the fast-food restaurants. This means that not all restaurants have the same average calorie levels, with some restaurants being higher or lower than others.

Conclusion My analysis on the Fast Foods dataset showed a clear difference in calorie levels across fast food restaurants. After comparing the data, I found that the average calories of menu items varied depending on the restaurant. The mean calorie values ranged from as low as 384 calories at Chick-fil-A to as high as 640 calories at McDonald’s. Other restaurants such as Sonic reached close to 632 and Burger King around 609 offered higher-calorie items on average. Restaurants like Taco Bell were around 444 and Subway near 503 tended to offer lower-calorie menu options. These differences were confirmed by the ANOVA test which gave us a very small p-value at 7.75e-07. These finding answered my research question showing that certain fast food restaurants consistently serve higher calorie menu items compared to others. I believe understanding the calorie differences across fast food restaurants can help consumers make more informed choices about where and what they choose to eat. In the future, I would like to compare more specific items separately such as their burgers, chicken sandwiches, chicken nuggets, french fries etc. to gain more insight into specific categories.

REFERENCES: https://www.openintro.org/data/index.php?data=fastfood

https://www.statology.org/rotate-axis-labels-ggplot2/ For the box plot, I searched up how to make my labels diagonal so they wouldn’t be overlapping.