Source: Time Magazine

Introduction

This project explores the nutritional content of items on the McDonald’s USA menu. The data was collected and published by McDonald’s Corporation, which makes its nutritional information publicly available to consumers. The dataset contains 266 menu items across 9 categories including Breakfast, Chicken and Fish, Desserts, and Beverages. The variables used in this project include both quantitative and categorical types. The categorical variable is Category, which groups menu items by food type. The quantitative variables include Calories, Total Fat (g), Saturated Fat (g), Trans Fat (g), Cholesterol (mg), etc. The central questions I explore are: which nutritional components best predict calorie content, and how do calorie levels differ across menu categories? I chose this topic because I eat at McDonald’s frequently, and I wanted to better understand the nutritional value of the food I consume. Beyond my personal curiosity, this analysis has broader relevance — McDonald’s is one of the most visited fast food chains in the United States, and insights from this data could help other Americans make more informed decisions about their eating habits.

“According to the Centers for Disease Control and Prevention, adults in the United States consume about 37% of their daily calories from fast food, with consumption being highest among younger age groups” (CDC, 2018). This highlights the public health significance of understanding the nutritional content of fast food menus like McDonald’s.

Cleaning

# load the libraries
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
library(readr)
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.5.2
library(ggrepel)
## Warning: package 'ggrepel' was built under R version 4.5.2
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.5.2
library(RColorBrewer)
# set working directory
menu <- read_csv("menu2_.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 266 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): Category, Item, Serving Size, Calories
## dbl (20): Calories from Fat, Total Fat, Total Fat (% Daily Value), Saturated...
## lgl  (1): Observ
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(menu)
## # A tibble: 6 × 25
##   Category  Item         `Serving Size` Calories `Calories from Fat` `Total Fat`
##   <chr>     <chr>        <chr>          <chr>                  <dbl>       <dbl>
## 1 Breakfast Egg McMuffin 4.8 oz (136 g) 300cal.                  120          13
## 2 Breakfast Egg White D… 4.8 oz (135 g) 250                       70           8
## 3 Breakfast Sausage McM… 3.9 oz (111 g) 370                      200          23
## 4 Breakfast Sausage McM… 5.7 oz (161 g) 450                      250          28
## 5 Breakfast Sausage McM… 5.7 oz (161 g) 400                      210          23
## 6 Breakfast Steak & Egg… 6.5 oz (185 g) 430                      210          23
## # ℹ 19 more variables: `Total Fat (% Daily Value)` <dbl>,
## #   `Saturated Fat` <dbl>, `Saturated Fat (% Daily Value)` <dbl>,
## #   `Trans Fat` <dbl>, Cholesterol <dbl>, `Cholesterol (% Daily Value)` <dbl>,
## #   Sodium <dbl>, `Sodium (% Daily Value)` <dbl>, Carbohydrates <dbl>,
## #   `Carbohydrates (% Daily Value)` <dbl>, `Dietary Fiber` <dbl>,
## #   `Dietary Fiber (% Daily Value)` <dbl>, Sugars <dbl>, Protein <dbl>,
## #   `Vitamin A (% Daily Value)` <dbl>, `Vitamin C (% Daily Value)` <dbl>, …
# cleaning
names(menu) <- tolower(names(menu))
names(menu) <- gsub(" ","_",names(menu))
names(menu) <- gsub("[(). //-]", "_", names(menu))
mcdonalds <- menu|>
  select(-observ)
head(mcdonalds)
## # A tibble: 6 × 24
##   category  item               serving_size calories calories_from_fat total_fat
##   <chr>     <chr>              <chr>        <chr>                <dbl>     <dbl>
## 1 Breakfast Egg McMuffin       4.8 oz (136… 300cal.                120        13
## 2 Breakfast Egg White Delight  4.8 oz (135… 250                     70         8
## 3 Breakfast Sausage McMuffin   3.9 oz (111… 370                    200        23
## 4 Breakfast Sausage McMuffin … 5.7 oz (161… 450                    250        28
## 5 Breakfast Sausage McMuffin … 5.7 oz (161… 400                    210        23
## 6 Breakfast Steak & Egg McMuf… 6.5 oz (185… 430                    210        23
## # ℹ 18 more variables: `total_fat__%_daily_value_` <dbl>, saturated_fat <dbl>,
## #   `saturated_fat__%_daily_value_` <dbl>, trans_fat <dbl>, cholesterol <dbl>,
## #   `cholesterol__%_daily_value_` <dbl>, sodium <dbl>,
## #   `sodium__%_daily_value_` <dbl>, carbohydrates <dbl>,
## #   `carbohydrates__%_daily_value_` <dbl>, dietary_fiber <dbl>,
## #   `dietary_fiber__%_daily_value_` <dbl>, sugars <dbl>, protein <dbl>,
## #   `vitamin_a__%_daily_value_` <dbl>, `vitamin_c__%_daily_value_` <dbl>, …
mcdonalds$calories <- gsub("cal.", "", mcdonalds$calories)
mcdonalds$calories <- gsub("cal", "", mcdonalds$calories)
mcdonalds$calories <- gsub("CAL", "", mcdonalds$calories)
head(mcdonalds)
## # A tibble: 6 × 24
##   category  item               serving_size calories calories_from_fat total_fat
##   <chr>     <chr>              <chr>        <chr>                <dbl>     <dbl>
## 1 Breakfast Egg McMuffin       4.8 oz (136… 300                    120        13
## 2 Breakfast Egg White Delight  4.8 oz (135… 250                     70         8
## 3 Breakfast Sausage McMuffin   3.9 oz (111… 370                    200        23
## 4 Breakfast Sausage McMuffin … 5.7 oz (161… 450                    250        28
## 5 Breakfast Sausage McMuffin … 5.7 oz (161… 400                    210        23
## 6 Breakfast Steak & Egg McMuf… 6.5 oz (185… 430                    210        23
## # ℹ 18 more variables: `total_fat__%_daily_value_` <dbl>, saturated_fat <dbl>,
## #   `saturated_fat__%_daily_value_` <dbl>, trans_fat <dbl>, cholesterol <dbl>,
## #   `cholesterol__%_daily_value_` <dbl>, sodium <dbl>,
## #   `sodium__%_daily_value_` <dbl>, carbohydrates <dbl>,
## #   `carbohydrates__%_daily_value_` <dbl>, dietary_fiber <dbl>,
## #   `dietary_fiber__%_daily_value_` <dbl>, sugars <dbl>, protein <dbl>,
## #   `vitamin_a__%_daily_value_` <dbl>, `vitamin_c__%_daily_value_` <dbl>, …
mcdonalds$calories<- as.numeric(mcdonalds$calories)
head(mcdonalds)
## # A tibble: 6 × 24
##   category  item               serving_size calories calories_from_fat total_fat
##   <chr>     <chr>              <chr>           <dbl>             <dbl>     <dbl>
## 1 Breakfast Egg McMuffin       4.8 oz (136…      300               120        13
## 2 Breakfast Egg White Delight  4.8 oz (135…      250                70         8
## 3 Breakfast Sausage McMuffin   3.9 oz (111…      370               200        23
## 4 Breakfast Sausage McMuffin … 5.7 oz (161…      450               250        28
## 5 Breakfast Sausage McMuffin … 5.7 oz (161…      400               210        23
## 6 Breakfast Steak & Egg McMuf… 6.5 oz (185…      430               210        23
## # ℹ 18 more variables: `total_fat__%_daily_value_` <dbl>, saturated_fat <dbl>,
## #   `saturated_fat__%_daily_value_` <dbl>, trans_fat <dbl>, cholesterol <dbl>,
## #   `cholesterol__%_daily_value_` <dbl>, sodium <dbl>,
## #   `sodium__%_daily_value_` <dbl>, carbohydrates <dbl>,
## #   `carbohydrates__%_daily_value_` <dbl>, dietary_fiber <dbl>,
## #   `dietary_fiber__%_daily_value_` <dbl>, sugars <dbl>, protein <dbl>,
## #   `vitamin_a__%_daily_value_` <dbl>, `vitamin_c__%_daily_value_` <dbl>, …
colSums(is.na(mcdonalds))
##                      category                          item 
##                             0                             0 
##                  serving_size                      calories 
##                             0                             3 
##             calories_from_fat                     total_fat 
##                             1                             2 
##     total_fat__%_daily_value_                 saturated_fat 
##                             2                             2 
## saturated_fat__%_daily_value_                     trans_fat 
##                             1                             2 
##                   cholesterol   cholesterol__%_daily_value_ 
##                             2                             1 
##                        sodium        sodium__%_daily_value_ 
##                             2                             1 
##                 carbohydrates carbohydrates__%_daily_value_ 
##                             3                             1 
##                 dietary_fiber dietary_fiber__%_daily_value_ 
##                             2                             0 
##                        sugars                       protein 
##                             0                             1 
##     vitamin_a__%_daily_value_     vitamin_c__%_daily_value_ 
##                             1                             2 
##       calcium__%_daily_value_          iron__%_daily_value_ 
##                             1                             1

Visualizations

#Tableau Visualization:

(https://public.tableau.com/shared/NZ79Z3YDG

This Tableau chart breaks down the nutrients in each McDonald’s menu item by category. What stood out to me most is how much sodium dominates almost every item it towers over the other nutrients simply because it’s measured in milligrams. Breakfast items like the Big Breakfast are the tallest overall, showing just how packed they are nutritionally. You can use the category filter on the right to zoom in on specific sections of the menu.

avg_calories <- mcdonalds |>
  filter(!is.na(calories)) |>
  group_by(category) |>
  summarize(avg_cal = round(mean(calories, na.rm = TRUE), 1)) |>
  arrange(desc(avg_cal))

hchart(avg_calories, "bar", hcaes(x = category, y = avg_cal)) |>
  hc_title(text = "Average Calories by McDonald's Menu Category") |>
  hc_xAxis(title = list(text = "Menu Category")) |>
  hc_yAxis(title = list(text = "Average Calories")) |>
  hc_tooltip(pointFormat = "Avg Calories: <b>{point.y}</b>") |>
  hc_colors("#c8102e") |>
  hc_caption(text = "Source: McDonald's USA Nutritional Facts") |>
  hc_add_theme(hc_theme_flat())

This interactive bar chart displays the average calorie count for each menu category at McDonald’s. The data reveals that Chicken & Fish and Beef & Pork categories carry the highest average calorie counts, which makes sense given that these items tend to be larger, protein-heavy entrees. On the lower end, Beverages and Salads have the fewest average calories, reflecting their lighter composition. This visualization is useful for consumers who want to quickly identify which sections of the McDonald’s menu to approach with caution when managing calorie intake.

mcdonalds |>
  filter(!is.na(calories), !is.na(total_fat)) |>
  ggplot(aes(x = total_fat, y = calories, color = category)) +
  geom_point(size = 2.5, alpha = 0.75) +
  scale_color_brewer(palette = "Set1") +
  theme_foundation() +
  labs(
    title = "Calories vs. Total Fat in McDonald's Menu Items",
    subtitle = "Each point represents one menu item, colored by menu category",
    x = "Total Fat (g)",
    y = "Calories",
    color = "Menu Category",
    caption = "Source: McDonald's USA Nutritional Facts"
  )

This scatter plot visualizes the relationship between total fat content and calorie count for every item on the McDonald’s menu, with each color representing a different menu category. A clear positive relationship is visible — as total fat increases, calories increase as well. Beef & Pork and Chicken & Fish items cluster toward the higher end of both axes, confirming they are among the most calorie-dense. Beverages and Coffee & Tea items cluster near the lower left, indicating lower fat and calorie content overall. The color coding by category makes it easy to spot patterns across menu sections at a glance.

Linear Model

# filter out all na's to set up a correlation plot
mcdonalds1 <- mcdonalds |>
  filter(!is.na(calories))|>
 filter(!is.na(total_fat))|>
  filter(!is.na(saturated_fat))|>
  filter(!is.na(trans_fat))|>
  filter(!is.na(sodium))|>
  filter(!is.na(carbohydrates))|>
  filter(!is.na(protein))|>
  filter(!is.na(cholesterol))|>
  filter(!is.na(dietary_fiber))|>
  select(calories, total_fat, saturated_fat,trans_fat, sugars, sodium,cholesterol, carbohydrates, dietary_fiber, protein)
head(mcdonalds1)
## # A tibble: 6 × 10
##   calories total_fat saturated_fat trans_fat sugars sodium cholesterol
##      <dbl>     <dbl>         <dbl>     <dbl>  <dbl>  <dbl>       <dbl>
## 1      300        13             5         0      3    750         260
## 2      370        23             8         0      2    780          45
## 3      450        28            10         0      2    860         285
## 4      400        23             8         0      2    880          50
## 5      430        23             9         1      3    960         300
## 6      460        26            13         0      3   1300         250
## # ℹ 3 more variables: carbohydrates <dbl>, dietary_fiber <dbl>, protein <dbl>
#make correlation plot to look at which variables can determine calories.
library(DataExplorer)
plot_correlation(mcdonalds1)

#multiple linear regression
multiple_model <- lm(calories ~ total_fat + carbohydrates + protein + dietary_fiber + sodium + sugars + cholesterol, 
                                data = mcdonalds1)

summary(multiple_model)
## 
## Call:
## lm(formula = calories ~ total_fat + carbohydrates + protein + 
##     dietary_fiber + sodium + sugars + cholesterol, data = mcdonalds1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.230  -4.097   0.218   3.150 192.292 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -1.6033480  1.8168448  -0.882    0.378    
## total_fat      8.5798042  0.1483025  57.853   <2e-16 ***
## carbohydrates  4.1830134  0.1228323  34.055   <2e-16 ***
## protein        4.2604630  0.1825505  23.339   <2e-16 ***
## dietary_fiber -0.4348191  0.9015951  -0.482    0.630    
## sodium        -0.0008306  0.0057083  -0.146    0.884    
## sugars        -0.1719959  0.1273483  -1.351    0.178    
## cholesterol    0.0087130  0.0135902   0.641    0.522    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.72 on 245 degrees of freedom
## Multiple R-squared:  0.9969, Adjusted R-squared:  0.9968 
## F-statistic: 1.125e+04 on 7 and 245 DF,  p-value: < 2.2e-16

Model Equation : Calories = -1.60 + 8.58(Total Fat) + 4.18(Carbohydrates) + 4.26(Protein) - 0.43(Dietary Fiber) - 0.0008(Sodium) - 0.17(Sugars) + 0.009(Cholesterol)

This multiple linear regression model predicts calorie content across McDonald’s menu items using seven nutritional predictors. The adjusted R² of 0.9968 indicates that 99.68% of the variation in calories is explained by the model — an exceptionally strong fit. Of the seven predictors, only three are statistically significant which is total fat, carbohydrates, and protein.

#check basic assumptions and plots
plot(multiple_model)

The diagnostic plots support the model’s validity. The Residuals vs Fitted plot shows mostly random scatter, suggesting the linearity assumption holds, though there is an outlier visible. The Q-Q plot indicates the residuals are approximately normally distributed, with slight deviation at the upper tail due to that same outlier.

Conclusion

This project analyzed the nutritional composition of McDonald’s menu items using multiple linear regression. The regression model revealed that total fat, carbohydrates, and protein are the strongest and most statistically significant predictors of calorie content, together explaining approximately 99.68% of the variation in calories. The bar chart and scatter plot together reinforced this finding visually, showing that Beef & Pork and Chicken & Fish categories consistently rank highest in both calories and fat content, while Beverages and Salads sit at the lower end.

One surprising pattern was how dominant sodium appeared in the Tableau visualization relative to other nutrients. If given more time, I would have liked to explore changes in McDonald’s nutritional content over time, or compare McDonald’s data to other major fast food chains to provide broader context for the findings.

References:

Centers for Disease Control and Prevention. (2018). FastStats: Obesity and overweight. U.S. Department of Health & Human Services. https://www.cdc.gov/nchs/fastats/obesity-overweight.htm

Image: https://time.com/4084668/mcdonalds-rebranding-sales-growth/