Overall the dataset has various information relating to fast food restaurants and the nutritional value of their menu items. Therefore, variables mostly consist of sodium, cholesterol, protein, calories, and more. For my analysis I will only need the variables sodium and restaurant. This dataset consists of 515 observations and 17 variables and was accessed through Open Intro. Link: https://www.openintro.org/data/index.php?data=fastfood The information surrounds 8 fast food restaurants. These are Arby’s, Burger King, Chik-Fil-A, Dairy Queen, McDonald’s, Sonic, Subway, and Taco Bell.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
setwd("C:/Users/tonge/Downloads")
fastfood <- read_csv("fastfood.csv")
## Rows: 515 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): restaurant, item, salad
## dbl (14): calories, cal_fat, total_fat, sat_fat, trans_fat, cholesterol, sod...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
I will conduct a data analysis looking at the different sodium levels of each restaurant, and seeing which one is the highest. Next, to address my research question I will create a chart that it grouped my restaurant(x value) which will be color coded. Then, the y value will be sodium so one can see the differences across restaurants.
head(fastfood)
## # A tibble: 6 × 17
## restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mcdonalds Artisan G… 380 60 7 2 0 95
## 2 Mcdonalds Single Ba… 840 410 45 17 1.5 130
## 3 Mcdonalds Double Ba… 1130 600 67 27 3 220
## 4 Mcdonalds Grilled B… 750 280 31 10 0.5 155
## 5 Mcdonalds Crispy Ba… 920 410 45 12 0.5 120
## 6 Mcdonalds Big Mac 540 250 28 10 1 80
## # ℹ 9 more variables: sodium <dbl>, total_carb <dbl>, fiber <dbl>, sugar <dbl>,
## # protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>, salad <chr>
str(fastfood)
## spc_tbl_ [515 × 17] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ restaurant : chr [1:515] "Mcdonalds" "Mcdonalds" "Mcdonalds" "Mcdonalds" ...
## $ item : chr [1:515] "Artisan Grilled Chicken Sandwich" "Single Bacon Smokehouse Burger" "Double Bacon Smokehouse Burger" "Grilled Bacon Smokehouse Chicken Sandwich" ...
## $ calories : num [1:515] 380 840 1130 750 920 540 300 510 430 770 ...
## $ cal_fat : num [1:515] 60 410 600 280 410 250 100 210 190 400 ...
## $ total_fat : num [1:515] 7 45 67 31 45 28 12 24 21 45 ...
## $ sat_fat : num [1:515] 2 17 27 10 12 10 5 4 11 21 ...
## $ trans_fat : num [1:515] 0 1.5 3 0.5 0.5 1 0.5 0 1 2.5 ...
## $ cholesterol: num [1:515] 95 130 220 155 120 80 40 65 85 175 ...
## $ sodium : num [1:515] 1110 1580 1920 1940 1980 950 680 1040 1040 1290 ...
## $ total_carb : num [1:515] 44 62 63 62 81 46 33 49 35 42 ...
## $ fiber : num [1:515] 3 2 3 2 4 3 2 3 2 3 ...
## $ sugar : num [1:515] 11 18 18 18 18 9 7 6 7 10 ...
## $ protein : num [1:515] 37 46 70 55 46 25 15 25 25 51 ...
## $ vit_a : num [1:515] 4 6 10 6 6 10 10 0 20 20 ...
## $ vit_c : num [1:515] 20 20 20 25 20 2 2 4 4 6 ...
## $ calcium : num [1:515] 20 20 50 20 20 15 10 2 15 20 ...
## $ salad : chr [1:515] "Other" "Other" "Other" "Other" ...
## - attr(*, "spec")=
## .. cols(
## .. restaurant = col_character(),
## .. item = col_character(),
## .. calories = col_double(),
## .. cal_fat = col_double(),
## .. total_fat = col_double(),
## .. sat_fat = col_double(),
## .. trans_fat = col_double(),
## .. cholesterol = col_double(),
## .. sodium = col_double(),
## .. total_carb = col_double(),
## .. fiber = col_double(),
## .. sugar = col_double(),
## .. protein = col_double(),
## .. vit_a = col_double(),
## .. vit_c = col_double(),
## .. calcium = col_double(),
## .. salad = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Checking for NAs - there are none in the variables I will need
colSums(is.na(fastfood))
## restaurant item calories cal_fat total_fat sat_fat
## 0 0 0 0 0 0
## trans_fat cholesterol sodium total_carb fiber sugar
## 0 0 0 0 12 0
## protein vit_a vit_c calcium salad
## 1 214 210 210 0
This interactive visualization shows that each restaurant experiences fluctuations in sodium. However, Mcdonald’s experiences the highest sodium spikes. The code used is from my Data 110 notes(Maliha, 2026).
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
library(RColorBrewer)
library(dplyr)
library(tidyverse)
cols <- brewer.pal(4, "Set1")
highchart() |>
hc_add_series(data = fastfood,
type = "line",
hcaes(y = sodium,
group = restaurant)) |>
hc_colors(cols) |>
hc_xAxis(title = list(text = "Year")) |>
hc_yAxis(title = list(text = "Sodium"))
Summary of Sodium(mg) based on fast food restaurant - Here we can see the means based on the restaurant. The highest average is seen in Arby’s(1515.273), followed by McDonald’s(1437.895).
fastfood_means <- fastfood |>
group_by(restaurant) |>
summarise(
mean_sodium = mean(sodium, na.rm = TRUE),
median_sodium = median(sodium, na.rm = TRUE),
sd_sodium = sd(sodium, na.rm = TRUE),
min_sodium = min(sodium, na.rm = TRUE),
max_sodium = max(sodium, na.rm = TRUE))
fastfood_means
## # A tibble: 8 × 6
## restaurant mean_sodium median_sodium sd_sodium min_sodium max_sodium
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Arbys 1515. 1480 664. 100 3350
## 2 Burger King 1224. 1150 500. 310 2310
## 3 Chick Fil-A 1151. 1000 727. 220 3660
## 4 Dairy Queen 1182. 1030 610. 15 3500
## 5 Mcdonalds 1438. 1120 1036. 20 6080
## 6 Sonic 1351. 1250 665. 470 4520
## 7 Subway 1273. 1130 744. 65 3540
## 8 Taco Bell 1014. 960 474. 290 2260
Hypothesis:
\(H_0\): \(\mu_A\) = \(\mu_B\) = \(\mu_C\) =\(\mu_D\) = \(\mu_E\) = \(\mu_F\) = \(\mu_G\) =\(\mu_H\)
\(H_a\): not all \(\mu_i\) are equal
Testing the mean sodium levels across 8 different fast food restaurants
anova_result <- aov(sodium ~ restaurant, data = fastfood)
anova_result
## Call:
## aov(formula = sodium ~ restaurant, data = fastfood)
##
## Terms:
## restaurant Residuals
## Sum of Squares 13382025 231300945
## Deg. of Freedom 7 507
##
## Residual standard error: 675.4368
## Estimated effects may be unbalanced
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## restaurant 7 13382025 1911718 4.19 0.000167 ***
## Residuals 507 231300945 456215
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation: The p-value is very small (0.000167) when alpha = 0.05. This shows there is strong evidence against the null hypothesis. Overall, this test suggests that there are significant differences in sodium levels among the different fast food restaurants.
The ANOVA test resulted in a statistically significant p value when alpha = 0.05. Therefore, we know there is a difference in sodium levels among different fast food restaurants. Furthermore, through visualizations and summaries it is known that the highest sodium levels are seen in McDonald’s menu items. For consumers trying to stay healthy, this can help them decide which restaurants to avoid. Especially McDonald’s since they may have more unhealthy items on the menu. The results from this analysis can also be used by health experts to convince people to avoid fast food. Since avoiding these high sodium foods can prevent the many health conditions associated with high sodium intake. Such as cardiovascular diseases and hypertension. In the future, I could conduct ANOVA to see the difference in sodium levels across menu items. I could group food items into categories like burgers, sandwiches, tacos, chicken tenders, etc. Then I could test which menu items experience the highest sodium levels. Additionally, I could conduct a linear regression model testing if sodium increases as calories increase.
Maliha, M. (2026). Working with Continuous Variables with DS Labs and HighCharter [Class notes]. Montgomery College. DATA 110.
OpenIntro. (n.d.). fastfood [Data set]. https://www.openintro.org/data/index.php?data=fastfood