library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'readr' was built under R version 4.3.3
## Warning: package 'dplyr' was built under R version 4.3.3
## Warning: package 'forcats' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load dataset
fastfood <- readr::read_csv("C:/Users/priya/Downloads/fastfood_calories.csv")
## New names:
## Rows: 515 Columns: 18
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (3): restaurant, item, salad dbl (15): ...1, calories, cal_fat, total_fat,
## sat_fat, trans_fat, cholestero...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
# Display first few rows
head(fastfood)
## # A tibble: 6 Ă— 18
## ...1 restaurant item calories cal_fat total_fat sat_fat trans_fat
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 Mcdonalds Artisan Grilled… 380 60 7 2 0
## 2 2 Mcdonalds Single Bacon Sm… 840 410 45 17 1.5
## 3 3 Mcdonalds Double Bacon Sm… 1130 600 67 27 3
## 4 4 Mcdonalds Grilled Bacon S… 750 280 31 10 0.5
## 5 5 Mcdonalds Crispy Bacon Sm… 920 410 45 12 0.5
## 6 6 Mcdonalds Big Mac 540 250 28 10 1
## # ℹ 10 more variables: cholesterol <dbl>, sodium <dbl>, total_carb <dbl>,
## # fiber <dbl>, sugar <dbl>, protein <dbl>, vit_a <dbl>, vit_c <dbl>,
## # calcium <dbl>, salad <chr>
numeric_summary <- fastfood %>% summarise( min_calories = min(calories, na.rm = TRUE), max_calories = max(calories, na.rm = TRUE), mean_calories = mean(calories, na.rm = TRUE), median_calories = median(calories, na.rm = TRUE), sd_calories = sd(calories, na.rm = TRUE), min_fat = min(total_fat, na.rm = TRUE), max_fat = max(total_fat, na.rm = TRUE), mean_fat = mean(total_fat, na.rm = TRUE), median_fat = median(total_fat, na.rm = TRUE), sd_fat = sd(total_fat, na.rm = TRUE) )
numeric_summary
## # A tibble: 1 Ă— 10
## min_calories max_calories mean_calories median_calories sd_calories min_fat
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 20 2430 531. 490 282. 0
## # ℹ 4 more variables: max_fat <dbl>, mean_fat <dbl>, median_fat <dbl>,
## # sd_fat <dbl>
Calories:
Range: 20 to 2430 calories.
Mean: 530.9, Median: 490.
High Variability: Large standard deviation (282.4).
Total Fat:
Range: 0 to 282.4 grams.
Mean: 141, Median: 26.6.
High Variability: Standard deviation (18.4).
Nutritional Diversity: Wide range of values indicates diverse menu options, important for dietary choices.
Health Concerns: Large variability and skewness suggest some items are very high in calories and fat, which could impact health.
What items contribute to extreme calorie and fat values?
How do these values compare with dietary guidelines?
quantiles <- fastfood %>% summarise( q1_calories = quantile(calories, 0.25, na.rm = TRUE), q3_calories = quantile(calories, 0.75, na.rm = TRUE), q1_fat = quantile(total_fat, 0.25, na.rm = TRUE), q3_fat = quantile(total_fat, 0.75, na.rm = TRUE) )
quantiles
## # A tibble: 1 Ă— 4
## q1_calories q3_calories q1_fat q3_fat
## <dbl> <dbl> <dbl> <dbl>
## 1 330 690 14 35
Calories: The middle 50% of menu items have between 330 and 690 calories.
Total Fat: The middle 50% of items have between 14 and 35 grams of fat.
Calories:
Total Fat:
How do the calorie and fat percentiles compare to recommended dietary guidelines?
Are there specific types of menu items (e.g., burgers, fries) that frequently fall into the higher or lower percentiles for calories and fat?
library(tidyverse)
ggplot(fastfood, aes(x = calories)) +
geom_histogram(binwidth = 50, fill = "blue", color = "white") +
labs(title = "Distribution of Calories", x = "Calories", y = "Count")
What percentage of items fall below or above specific calorie thresholds (e.g., 500 calories)?
Are there any noticeable trends in calorie content based on different types of menu items or restaurants?
ggplot(fastfood, aes(x = total_fat)) +
geom_histogram(binwidth = 5, fill = "red", color = "white") +
labs(title = "Distribution of Total Fat", x = "Total Fat (g)", y = "Count")
What proportion of items exceed recommended daily fat intake levels?
Are there specific types of menu items or restaurants with notably higher or lower fat content?
categorical_summary <- fastfood %>% count(restaurant)
categorical_summary
## # A tibble: 8 Ă— 2
## restaurant n
## <chr> <int>
## 1 Arbys 55
## 2 Burger King 70
## 3 Chick Fil-A 27
## 4 Dairy Queen 42
## 5 Mcdonalds 57
## 6 Sonic 53
## 7 Subway 96
## 8 Taco Bell 115
How does the nutritional content (calories, fat) compare across restaurants with different menu sizes?
Does the number of menu items correlate with the average calorie or fat content per restaurant?
combined_summary <- list( numeric_summary = numeric_summary, categorical_summary = categorical_summary )
combined_summary
## $numeric_summary
## # A tibble: 1 Ă— 10
## min_calories max_calories mean_calories median_calories sd_calories min_fat
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 20 2430 531. 490 282. 0
## # ℹ 4 more variables: max_fat <dbl>, mean_fat <dbl>, median_fat <dbl>,
## # sd_fat <dbl>
##
## $categorical_summary
## # A tibble: 8 Ă— 2
## restaurant n
## <chr> <int>
## 1 Arbys 55
## 2 Burger King 70
## 3 Chick Fil-A 27
## 4 Dairy Queen 42
## 5 Mcdonalds 57
## 6 Sonic 53
## 7 Subway 96
## 8 Taco Bell 115
Questions to Investigate:
This question aims to understand whether some restaurants generally offer higher or lower calorie options compared to others. It can help identify if certain restaurants have more calorie-dense menu items.
This question seeks to explore the correlation between fat content and calorie content. Understanding this relationship can provide insights into how different nutritional aspects are related in fast food items.
This question investigates which restaurant has a higher percentage of menu items that exceed the median calorie value. It can reveal if certain restaurants tend to offer more calorie-rich options.
average_calories_by_restaurant <- fastfood %>%
group_by(restaurant) %>%
summarise(
avg_calories = mean(calories, na.rm = TRUE)
) %>%
arrange(desc(avg_calories))
average_calories_by_restaurant
## # A tibble: 8 Ă— 2
## restaurant avg_calories
## <chr> <dbl>
## 1 Mcdonalds 640.
## 2 Sonic 632.
## 3 Burger King 609.
## 4 Arbys 533.
## 5 Dairy Queen 520.
## 6 Subway 503.
## 7 Taco Bell 444.
## 8 Chick Fil-A 384.
How do these average calorie values compare with nutritional guidelines or recommended daily intake?
Is there a correlation between average calorie content and menu size (number of items) across restaurants?
ggplot(fastfood, aes(x = calories)) +
geom_histogram(binwidth = 50, fill = "blue", color = "white") +
labs(title = "Distribution of Calories", x = "Calories", y = "Count")
ggplot(fastfood, aes(x = protein)) +
geom_histogram(binwidth = 5, fill = "red", color = "white") +
labs(title = "Distribution of protein", x = "protein (g)", y = "Count")
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
Distribution of Calories:
Distribution of Protein:
Calories:
Protein:
How does the protein content correlate with calorie content? Are higher-calorie items also higher in protein?
Are there specific types of menu items (e.g., burgers, sandwiches) that typically have higher or lower protein content?
ggplot(fastfood, aes(x = protein, y = total_fat, color = restaurant)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(title = "total_fat vs. protein", x = "protein (g)", y = "total_fat") +
theme(legend.position = "bottom")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
The scatter plot shows a general trend where higher protein content tends to be associated with higher total fat. The linear regression line indicates a positive correlation between protein and fat content across different restaurants.
Are there specific types of menu items or restaurants where this trend is more pronounced?
How does the protein-to-fat ratio vary between different restaurants?
ggplot(fastfood, aes(x = reorder(restaurant, calories, FUN = median), y = calories, fill = restaurant)) +
geom_boxplot() +
labs(title = "Calorie Content by Restaurant", x = "Restaurant", y = "Calories") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The boxplot reveals that there is considerable variation in calorie content across different restaurants. Some restaurants, like McDonald’s and Burger King, have higher medians and wider interquartile ranges, indicating higher calorie content and more variability in their menu items. Others, like Chick-fil-A, have lower medians and narrower interquartile ranges.
Which specific menu items contribute to the high-calorie ranges at restaurants with wider boxes?
How does calorie variability impact overall menu healthiness across different restaurants?
ggplot(fastfood, aes(x = total_fat, y = calories, color = restaurant)) +
geom_point(alpha = 0.5) +
facet_wrap(~ restaurant, scales = "free") +
labs(title = "Scatter Plot of Calories vs. Total Fat by Restaurant", x = "Total Fat (g)", y = "Calories") +
theme(legend.position = "bottom")
The scatter plot matrix shows how the relationship between total fat and calories varies across different restaurants. Each restaurant’s plot reveals how items with higher fat content tend to have higher calorie counts, though the strength of this relationship varies. Some restaurants show a clear positive trend, while others have more dispersed data.
Are there significant differences in the fat-to-calorie relationship between restaurants with high and low average calorie content?
Which types of menu items (e.g., burgers, sides) drive the observed patterns in fat and calorie relationships?