library(treemap)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
library(readr)
library(dplyr)
library(RColorBrewer)

Introduction

How do calorie counts in chicken salads differ across various restaurants? To address this question, I will analyze a dataset that compiles nutritional information from multiple fastfood. The key columns I’ll focus on are “Restaurant Name,” “Dish Name,” and “Calories.” By specifically examining chicken salads, I aim to uncover patterns in their calorie content, which could help diners make more informed choices about their meals.

This dataset is sourced from an extensive online compilation of fastfood menus Awesome Public Datasets. By investigating the mean calorie content of chicken salads across different fastfood, I hope to provide insights into the variations that exist, highlighting which options might be healthier or more indulgent. ## this is the link https://github.com/awesomedata/awesome-public-datasets

Data Analysis and Visualization Plan

To address the question of how calorie counts in chicken salads differ across various fastfoods, I will conduct a descriptive statistical analysis of the dataset. This analysis will include calculating key metrics such as the mean, median, of calorie counts. These statistics will provide insight into the variability of the calorie content, helping to identify which fastfoods offer more or fewer calories in their chicken salad options.

In terms of visualization, I plan to create bar charts. A bar chart which will be particularly useful. For a bar chart will display the mean calorie count for chicken salads at each restaurant, making it easier to compare the average calorie content visually. Together, these analyses and visualizations will provide a comprehensive overview of the calorie content in chicken salads, helping diners make more informed choices about their meals.

setwd("C:/Users/eyong/OneDrive - montgomerycollege.edu/Desktop/Data 101/00_all_csv_files/csv/fastfood")
getwd()
## [1] "C:/Users/eyong/OneDrive - montgomerycollege.edu/Desktop/Data 101/00_all_csv_files/csv/fastfood"
fast <-read_csv("fastfood.csv")
## Rows: 515 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): restaurant, item, salad
## dbl (14): calories, cal_fat, total_fat, sat_fat, trans_fat, cholesterol, sod...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

getting only restaurant ,items and calories

df_filtered <- fast[, c("restaurant", "item", "calories")]

Finding only restaurant who serve salad

salad_items <- df_filtered[apply(df_filtered, 1, function(row) any(grepl("salad", row, ignore.case = TRUE))), ]
salad_items
## # A tibble: 64 × 3
##    restaurant  item                                         calories
##    <chr>       <chr>                                           <dbl>
##  1 Mcdonalds   Premium Asian Salad w/o Chicken                   140
##  2 Mcdonalds   Premium Asian Salad w/ Grilled Chicken            270
##  3 Mcdonalds   Premium Asian Salad w/ Crispy Chicken             490
##  4 Mcdonalds   Premium Bacon Ranch Salad w/o Chicken             190
##  5 Mcdonalds   Premium Bacon Ranch Salad w/ Grilled Chicken      320
##  6 Mcdonalds   Premium Bacon Ranch Salad w/ Crispy Chicken       490
##  7 Mcdonalds   Premium Southwest Salad w/o Chicken               220
##  8 Mcdonalds   Premium Southwest Salad w/ Grilled Chicken        350
##  9 Mcdonalds   Premium Southwest Salad w/ Crispy Chicken         520
## 10 Chick Fil-A Chicken Salad Sandwich                            490
## # ℹ 54 more rows

salad whit chicken in it

chicken_salads <- df_filtered[apply(df_filtered, 1, function(row) {
  any(grepl("salad", row, ignore.case = TRUE)) && any(grepl("chicken", row, ignore.case = TRUE))
}), ]

Finding the mean calories for each chicken salad per restaurant

mean_calories_by_restaurant <- chicken_salads %>%
  group_by(restaurant) %>%
  summarise(mean_calories = mean(calories, na.rm = TRUE), min_cal = min(calories))
mean_calories_by_restaurant
## # A tibble: 7 × 3
##   restaurant  mean_calories min_cal
##   <chr>               <dbl>   <dbl>
## 1 Arbys                660      430
## 2 Burger King          578      320
## 3 Chick Fil-A          490      490
## 4 Dairy Queen          332.     150
## 5 Mcdonalds            332.     140
## 6 Subway               286      140
## 7 Taco Bell            720      720

mean calories for chicken salad

mean_calories_by_restaurant <- chicken_salads %>%
  group_by(restaurant) %>%
  summarize(mean_calories = mean(calories, na.rm = TRUE)) %>%
  arrange(desc(mean_calories))

points representing the minimum and max calories in chicken salad

mean_calories_by_restaurant <- chicken_salads %>%
  group_by(restaurant) %>%
  summarize(
    mean_calories = mean(calories, na.rm = TRUE),
    min_cal = min(calories, na.rm = TRUE),
    max_cal = max(calories, na.rm = TRUE)
  ) %>%
  arrange(desc(mean_calories))
mean_calories_by_restaurant
## # A tibble: 7 × 4
##   restaurant  mean_calories min_cal max_cal
##   <chr>               <dbl>   <dbl>   <dbl>
## 1 Taco Bell            720      720     720
## 2 Arbys                660      430     840
## 3 Burger King          578      320     720
## 4 Chick Fil-A          490      490     490
## 5 Dairy Queen          332.     150     520
## 6 Mcdonalds            332.     140     520
## 7 Subway               286      140     510

A bar graph whit whit point respresenting the max and min values for calories for chicken salad

ggplot(mean_calories_by_restaurant, aes(x = reorder(restaurant, -mean_calories), y = mean_calories, fill = mean_calories)) +
  geom_bar(stat = "identity") +
  geom_point(aes(y = min_cal), color = "black", size = 3, shape = 21, fill = "gray") +  
  geom_point(aes(y = max_cal), color = "black", size = 3, shape = 20, fill = "lightgray") +  
  geom_text(aes(y = min_cal, label = min_cal), vjust = -1, color = "black", size = 3) +  
  geom_text(aes(y = max_cal, label = max_cal), vjust = -1, color = "black", size = 3) +  
  scale_fill_gradient(low = "blue", high = "red") + 
  labs(
    title = "Mean Calories in Chicken Salads by Restaurant",
    x = "Restaurant",
    y = "Mean Calories",
    fill = "Mean Calories"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, family = "serif"))

conclusion

The analysis highlights that Taco Bell’s average calorie count significantly skews the overall data due to its limited salad options, particularly one salad with chicken. This limitation not only inflates Taco Bell’s mean calorie count but also obscures the broader calorie distribution among various restaurants in the dataset. The bar graph effectively illustrates the substantial disparities in calorie counts, emphasizing the importance of context when interpreting these averages.

These findings have important implications for both consumers and researchers. Relying solely on average calorie counts can lead to misconceptions about the nutritional value of menu items across different restaurants. For consumers aiming for healthier choices, understanding that a few outliers can distort average values is crucial. This awareness can guide better decision-making and foster a more nuanced view of restaurant offerings.

Looking ahead, there are several avenues for future research that could enhance our understanding of restaurant nutrition. Expanding the dataset to include a wider variety of salad options across various chains would provide a clearer picture of average calorie counts. Additionally, analyzing other nutritional aspects such as fat, sugar, and fiber content could create a more comprehensive framework for evaluating meal healthiness. Finally, studies examining consumer perceptions of calorie information and its influence on food choices would be valuable for informing public health initiatives aimed at promoting healthier eating habits.

References

Awesome Public Datasets https://github.com/awesomedata/awesome-public-datasets