Load necessary libraries

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'readr' was built under R version 4.3.3
## Warning: package 'dplyr' was built under R version 4.3.3
## Warning: package 'forcats' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load dataset
fastfood <- readr::read_csv("C:/Users/priya/Downloads/fastfood_calories.csv")
## New names:
## Rows: 515 Columns: 18
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (3): restaurant, item, salad dbl (15): ...1, calories, cal_fat, total_fat,
## sat_fat, trans_fat, cholestero...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
# Display first few rows 
head(fastfood)
## # A tibble: 6 Ă— 18
##    ...1 restaurant item             calories cal_fat total_fat sat_fat trans_fat
##   <dbl> <chr>      <chr>               <dbl>   <dbl>     <dbl>   <dbl>     <dbl>
## 1     1 Mcdonalds  Artisan Grilled…      380      60         7       2       0  
## 2     2 Mcdonalds  Single Bacon Sm…      840     410        45      17       1.5
## 3     3 Mcdonalds  Double Bacon Sm…     1130     600        67      27       3  
## 4     4 Mcdonalds  Grilled Bacon S…      750     280        31      10       0.5
## 5     5 Mcdonalds  Crispy Bacon Sm…      920     410        45      12       0.5
## 6     6 Mcdonalds  Big Mac               540     250        28      10       1  
## # ℹ 10 more variables: cholesterol <dbl>, sodium <dbl>, total_carb <dbl>,
## #   fiber <dbl>, sugar <dbl>, protein <dbl>, vit_a <dbl>, vit_c <dbl>,
## #   calcium <dbl>, salad <chr>

Summary statistics for ‘calories’ and ‘total_fat’

numeric_summary <- fastfood %>% summarise( min_calories = min(calories, na.rm = TRUE), max_calories = max(calories, na.rm = TRUE), mean_calories = mean(calories, na.rm = TRUE), median_calories = median(calories, na.rm = TRUE), sd_calories = sd(calories, na.rm = TRUE), min_fat = min(total_fat, na.rm = TRUE), max_fat = max(total_fat, na.rm = TRUE), mean_fat = mean(total_fat, na.rm = TRUE), median_fat = median(total_fat, na.rm = TRUE), sd_fat = sd(total_fat, na.rm = TRUE) )
numeric_summary
## # A tibble: 1 Ă— 10
##   min_calories max_calories mean_calories median_calories sd_calories min_fat
##          <dbl>        <dbl>         <dbl>           <dbl>       <dbl>   <dbl>
## 1           20         2430          531.             490        282.       0
## # ℹ 4 more variables: max_fat <dbl>, mean_fat <dbl>, median_fat <dbl>,
## #   sd_fat <dbl>

Insights

  1. Calories:

    • Range: 20 to 2430 calories.

    • Mean: 530.9, Median: 490.

    • High Variability: Large standard deviation (282.4).

  2. Total Fat:

    • Range: 0 to 282.4 grams.

    • Mean: 141, Median: 26.6.

    • High Variability: Standard deviation (18.4).

Significance

  • Nutritional Diversity: Wide range of values indicates diverse menu options, important for dietary choices.

  • Health Concerns: Large variability and skewness suggest some items are very high in calories and fat, which could impact health.

Further Questions

  1. What items contribute to extreme calorie and fat values?

  2. How do these values compare with dietary guidelines?

Quantiles for ‘calories’ and ‘total_fat’

quantiles <- fastfood %>% summarise( q1_calories = quantile(calories, 0.25, na.rm = TRUE), q3_calories = quantile(calories, 0.75, na.rm = TRUE), q1_fat = quantile(total_fat, 0.25, na.rm = TRUE), q3_fat = quantile(total_fat, 0.75, na.rm = TRUE) )
quantiles
## # A tibble: 1 Ă— 4
##   q1_calories q3_calories q1_fat q3_fat
##         <dbl>       <dbl>  <dbl>  <dbl>
## 1         330         690     14     35

Insights

  • Calories: The middle 50% of menu items have between 330 and 690 calories.

  • Total Fat: The middle 50% of items have between 14 and 35 grams of fat.

Significance

  • Calories:

    • The range between the 25th and 75th percentiles (330 to 690 calories) shows that the central 50% of menu items fall within this range. This helps identify typical calorie content and highlights that half of the items have calorie counts either below 330 or above 690.
  • Total Fat:

    • The fat content for the central 50% of items ranges from 14 to 35 grams. This range indicates the typical fat content of fast food items, with a clear distinction between lower and higher fat content within the dataset.

Further Questions

  1. How do the calorie and fat percentiles compare to recommended dietary guidelines?

  2. Are there specific types of menu items (e.g., burgers, fries) that frequently fall into the higher or lower percentiles for calories and fat?

histograms

library(tidyverse) 
ggplot(fastfood, aes(x = calories)) +
  geom_histogram(binwidth = 50, fill = "blue", color = "white") +
  labs(title = "Distribution of Calories", x = "Calories", y = "Count")

Insight

  • Distribution of Calories: The histogram shows that most menu items have calorie counts clustered between 200 and 800 calories. There are fewer items with extremely low or high calorie counts.

Significance

  • Typical Caloric Range: This indicates the most common calorie range for fast food items. Understanding this distribution helps gauge the average energy content customers are likely to encounter.

Further Questions

  1. What percentage of items fall below or above specific calorie thresholds (e.g., 500 calories)?

  2. Are there any noticeable trends in calorie content based on different types of menu items or restaurants?

ggplot(fastfood, aes(x = total_fat)) +
  geom_histogram(binwidth = 5, fill = "red", color = "white") +
  labs(title = "Distribution of Total Fat", x = "Total Fat (g)", y = "Count")

Insight

  • Distribution of Total Fat: The histogram reveals that most fast food items have total fat content ranging from 0 to 40 grams, with a concentration between 10 and 30 grams. Fewer items have either very low or very high fat content.

Significance

  • Typical Fat Content: This indicates the common range of fat content in fast food items. It helps in understanding how much fat consumers are typically consuming from these items.

Further Questions

  1. What proportion of items exceed recommended daily fat intake levels?

  2. Are there specific types of menu items or restaurants with notably higher or lower fat content?

Unique values and counts for ‘restaurant’

categorical_summary <- fastfood %>% count(restaurant)

categorical_summary
## # A tibble: 8 Ă— 2
##   restaurant      n
##   <chr>       <int>
## 1 Arbys          55
## 2 Burger King    70
## 3 Chick Fil-A    27
## 4 Dairy Queen    42
## 5 Mcdonalds      57
## 6 Sonic          53
## 7 Subway         96
## 8 Taco Bell     115

Insight

  • Number of Menu Items by Restaurant: Subway has the highest number of menu items (96), followed by Taco Bell (115). Chick-fil-A has the fewest (27).

Significance

  • Menu Size Comparison: The number of menu items varies widely among restaurants. A higher count of items might indicate a more extensive menu or greater variety, which could influence the range of nutritional content available.

Further Questions

  1. How does the nutritional content (calories, fat) compare across restaurants with different menu sizes?

  2. Does the number of menu items correlate with the average calorie or fat content per restaurant?

Combine summaries

combined_summary <- list( numeric_summary = numeric_summary, categorical_summary = categorical_summary )

Display combined summary

combined_summary
## $numeric_summary
## # A tibble: 1 Ă— 10
##   min_calories max_calories mean_calories median_calories sd_calories min_fat
##          <dbl>        <dbl>         <dbl>           <dbl>       <dbl>   <dbl>
## 1           20         2430          531.             490        282.       0
## # ℹ 4 more variables: max_fat <dbl>, mean_fat <dbl>, median_fat <dbl>,
## #   sd_fat <dbl>
## 
## $categorical_summary
## # A tibble: 8 Ă— 2
##   restaurant      n
##   <chr>       <int>
## 1 Arbys          55
## 2 Burger King    70
## 3 Chick Fil-A    27
## 4 Dairy Queen    42
## 5 Mcdonalds      57
## 6 Sonic          53
## 7 Subway         96
## 8 Taco Bell     115

Questions to Investigate:

  1. How does the average calorie content of menu items vary by restaurant?

This question aims to understand whether some restaurants generally offer higher or lower calorie options compared to others. It can help identify if certain restaurants have more calorie-dense menu items.

  1. What is the relationship between total fat and calorie content in fast food menu items?

This question seeks to explore the correlation between fat content and calorie content. Understanding this relationship can provide insights into how different nutritional aspects are related in fast food items.

  1. Which restaurant offers the highest proportion of menu items with calories above the median value?

This question investigates which restaurant has a higher percentage of menu items that exceed the median calorie value. It can reveal if certain restaurants tend to offer more calorie-rich options.

Addressing One of the Questions

Aggregate average calorie content by restaurant

average_calories_by_restaurant <- fastfood %>%
  group_by(restaurant) %>%
  summarise(
    avg_calories = mean(calories, na.rm = TRUE)
  ) %>%
  arrange(desc(avg_calories))

1. Distribution of Calories and protein

histograms for calories and protein

ggplot(fastfood, aes(x = calories)) +
  geom_histogram(binwidth = 50, fill = "blue", color = "white") +
  labs(title = "Distribution of Calories", x = "Calories", y = "Count")

ggplot(fastfood, aes(x = protein)) +
  geom_histogram(binwidth = 5, fill = "red", color = "white") +
  labs(title = "Distribution of protein", x = "protein (g)", y = "Count")
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

Insights

  1. Distribution of Calories:

    • Observation: Most menu items have calorie counts concentrated between 200 and 800 calories, with fewer items at the extremes.
  2. Distribution of Protein:

    • Observation: Most menu items have protein content clustered between 5 and 20 grams, with fewer items having very low or high protein content.

Significance

  1. Calories:

    • Typical Caloric Intake: The histogram helps identify the range of calorie content that is common among fast food items, aiding consumers in understanding typical energy intake from these foods.
  2. Protein:

    • Protein Content Insight: The distribution shows the common range of protein in fast food, which is useful for understanding how these items contribute to protein intake.

Further Questions

  1. How does the protein content correlate with calorie content? Are higher-calorie items also higher in protein?

  2. Are there specific types of menu items (e.g., burgers, sandwiches) that typically have higher or lower protein content?

2. Relationship Between Total Fat and protein

Scatter plot of Total Fat vs. protein

ggplot(fastfood, aes(x = protein, y = total_fat, color = restaurant)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(title = "total_fat vs. protein", x = "protein (g)", y = "total_fat") +
  theme(legend.position = "bottom")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Insight

The scatter plot shows a general trend where higher protein content tends to be associated with higher total fat. The linear regression line indicates a positive correlation between protein and fat content across different restaurants.

Significance

  • Nutritional Correlation: This insight suggests that menu items with more protein often have higher fat content, which could imply that protein-rich items are also higher in fat, potentially affecting the overall nutritional profile.

Further Questions

  1. Are there specific types of menu items or restaurants where this trend is more pronounced?

  2. How does the protein-to-fat ratio vary between different restaurants?

3. Calories by Restaurant with Facet Grid

Boxplot of Calories by Restaurant

ggplot(fastfood, aes(x = reorder(restaurant, calories, FUN = median), y = calories, fill = restaurant)) +
  geom_boxplot() +
  labs(title = "Calorie Content by Restaurant", x = "Restaurant", y = "Calories") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Insight

The boxplot reveals that there is considerable variation in calorie content across different restaurants. Some restaurants, like McDonald’s and Burger King, have higher medians and wider interquartile ranges, indicating higher calorie content and more variability in their menu items. Others, like Chick-fil-A, have lower medians and narrower interquartile ranges.

Significance

  • Restaurant Comparison: This visualization helps identify which restaurants tend to offer higher or lower calorie options on average. It highlights variability in menu item calorie content and can be useful for consumers seeking to make healthier choices based on calorie content.

Further Questions

  1. Which specific menu items contribute to the high-calorie ranges at restaurants with wider boxes?

  2. How does calorie variability impact overall menu healthiness across different restaurants?

4.Calories and Total Fat Faceted by Restaurant

Scatter plot matrix

ggplot(fastfood, aes(x = total_fat, y = calories, color = restaurant)) +
  geom_point(alpha = 0.5) +
  facet_wrap(~ restaurant, scales = "free") +
  labs(title = "Scatter Plot of Calories vs. Total Fat by Restaurant", x = "Total Fat (g)", y = "Calories") +
  theme(legend.position = "bottom")

Insight

The scatter plot matrix shows how the relationship between total fat and calories varies across different restaurants. Each restaurant’s plot reveals how items with higher fat content tend to have higher calorie counts, though the strength of this relationship varies. Some restaurants show a clear positive trend, while others have more dispersed data.

Significance

  • Variation in Nutritional Relationships: This matrix allows us to see differences in how total fat and calories correlate across various restaurants. It highlights which restaurants have more consistent or variable relationships between fat and calorie content, providing insights into their menu composition.

Further Questions

  1. Are there significant differences in the fat-to-calorie relationship between restaurants with high and low average calorie content?

  2. Which types of menu items (e.g., burgers, sides) drive the observed patterns in fat and calorie relationships?