library(treemap)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
library(readr)
library(dplyr)
library(RColorBrewer)
How do calorie counts in chicken salads differ across various restaurants? To address this question, I will analyze a dataset that compiles nutritional information from multiple fastfood. The key columns I’ll focus on are “Restaurant Name,” “Dish Name,” and “Calories.” By specifically examining chicken salads, I aim to uncover patterns in their calorie content, which could help diners make more informed choices about their meals.
This dataset is sourced from an extensive online compilation of fastfood menus Awesome Public Datasets. By investigating the mean calorie content of chicken salads across different fastfood, I hope to provide insights into the variations that exist, highlighting which options might be healthier or more indulgent. ## this is the link https://github.com/awesomedata/awesome-public-datasets
To address the question of how calorie counts in chicken salads differ across various fastfoods, I will conduct a descriptive statistical analysis of the dataset. This analysis will include calculating key metrics such as the mean, median, of calorie counts. These statistics will provide insight into the variability of the calorie content, helping to identify which fastfoods offer more or fewer calories in their chicken salad options.
In terms of visualization, I plan to create bar charts. A bar chart which will be particularly useful. For a bar chart will display the mean calorie count for chicken salads at each restaurant, making it easier to compare the average calorie content visually. Together, these analyses and visualizations will provide a comprehensive overview of the calorie content in chicken salads, helping diners make more informed choices about their meals.
setwd("C:/Users/eyong/OneDrive - montgomerycollege.edu/Desktop/Data 101/00_all_csv_files/csv/fastfood")
getwd()
## [1] "C:/Users/eyong/OneDrive - montgomerycollege.edu/Desktop/Data 101/00_all_csv_files/csv/fastfood"
fast <-read_csv("fastfood.csv")
## Rows: 515 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): restaurant, item, salad
## dbl (14): calories, cal_fat, total_fat, sat_fat, trans_fat, cholesterol, sod...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_filtered <- fast[, c("restaurant", "item", "calories")]
salad_items <- df_filtered[apply(df_filtered, 1, function(row) any(grepl("salad", row, ignore.case = TRUE))), ]
salad_items
## # A tibble: 64 × 3
## restaurant item calories
## <chr> <chr> <dbl>
## 1 Mcdonalds Premium Asian Salad w/o Chicken 140
## 2 Mcdonalds Premium Asian Salad w/ Grilled Chicken 270
## 3 Mcdonalds Premium Asian Salad w/ Crispy Chicken 490
## 4 Mcdonalds Premium Bacon Ranch Salad w/o Chicken 190
## 5 Mcdonalds Premium Bacon Ranch Salad w/ Grilled Chicken 320
## 6 Mcdonalds Premium Bacon Ranch Salad w/ Crispy Chicken 490
## 7 Mcdonalds Premium Southwest Salad w/o Chicken 220
## 8 Mcdonalds Premium Southwest Salad w/ Grilled Chicken 350
## 9 Mcdonalds Premium Southwest Salad w/ Crispy Chicken 520
## 10 Chick Fil-A Chicken Salad Sandwich 490
## # ℹ 54 more rows
chicken_salads <- df_filtered[apply(df_filtered, 1, function(row) {
any(grepl("salad", row, ignore.case = TRUE)) && any(grepl("chicken", row, ignore.case = TRUE))
}), ]
mean_calories_by_restaurant <- chicken_salads %>%
group_by(restaurant) %>%
summarise(mean_calories = mean(calories, na.rm = TRUE), min_cal = min(calories))
mean_calories_by_restaurant
## # A tibble: 7 × 3
## restaurant mean_calories min_cal
## <chr> <dbl> <dbl>
## 1 Arbys 660 430
## 2 Burger King 578 320
## 3 Chick Fil-A 490 490
## 4 Dairy Queen 332. 150
## 5 Mcdonalds 332. 140
## 6 Subway 286 140
## 7 Taco Bell 720 720
mean_calories_by_restaurant <- chicken_salads %>%
group_by(restaurant) %>%
summarize(mean_calories = mean(calories, na.rm = TRUE)) %>%
arrange(desc(mean_calories))
mean_calories_by_restaurant <- chicken_salads %>%
group_by(restaurant) %>%
summarize(
mean_calories = mean(calories, na.rm = TRUE),
min_cal = min(calories, na.rm = TRUE),
max_cal = max(calories, na.rm = TRUE)
) %>%
arrange(desc(mean_calories))
mean_calories_by_restaurant
## # A tibble: 7 × 4
## restaurant mean_calories min_cal max_cal
## <chr> <dbl> <dbl> <dbl>
## 1 Taco Bell 720 720 720
## 2 Arbys 660 430 840
## 3 Burger King 578 320 720
## 4 Chick Fil-A 490 490 490
## 5 Dairy Queen 332. 150 520
## 6 Mcdonalds 332. 140 520
## 7 Subway 286 140 510
ggplot(mean_calories_by_restaurant, aes(x = reorder(restaurant, -mean_calories), y = mean_calories, fill = mean_calories)) +
geom_bar(stat = "identity") +
geom_point(aes(y = min_cal), color = "black", size = 3, shape = 21, fill = "gray") +
geom_point(aes(y = max_cal), color = "black", size = 3, shape = 20, fill = "lightgray") +
geom_text(aes(y = min_cal, label = min_cal), vjust = -1, color = "black", size = 3) +
geom_text(aes(y = max_cal, label = max_cal), vjust = -1, color = "black", size = 3) +
scale_fill_gradient(low = "blue", high = "red") +
labs(
title = "Mean Calories in Chicken Salads by Restaurant",
x = "Restaurant",
y = "Mean Calories",
fill = "Mean Calories"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, family = "serif"))
The analysis highlights that Taco Bell’s average calorie count significantly skews the overall data due to its limited salad options, particularly one salad with chicken. This limitation not only inflates Taco Bell’s mean calorie count but also obscures the broader calorie distribution among various restaurants in the dataset. The bar graph effectively illustrates the substantial disparities in calorie counts, emphasizing the importance of context when interpreting these averages.
These findings have important implications for both consumers and researchers. Relying solely on average calorie counts can lead to misconceptions about the nutritional value of menu items across different restaurants. For consumers aiming for healthier choices, understanding that a few outliers can distort average values is crucial. This awareness can guide better decision-making and foster a more nuanced view of restaurant offerings.
Looking ahead, there are several avenues for future research that could enhance our understanding of restaurant nutrition. Expanding the dataset to include a wider variety of salad options across various chains would provide a clearer picture of average calorie counts. Additionally, analyzing other nutritional aspects such as fat, sugar, and fiber content could create a more comprehensive framework for evaluating meal healthiness. Finally, studies examining consumer perceptions of calorie information and its influence on food choices would be valuable for informing public health initiatives aimed at promoting healthier eating habits.
Awesome Public Datasets https://github.com/awesomedata/awesome-public-datasets