Load the Nutrition in Fast Food Data and Required Packages:

The nutritional content (calories, calories from fat, grams of protein, etc.) of 515 fast food items from various chain restaurants was measured and recorded. Below, we load this fast food data set and the packages required for analysis.

#Data
my_url <- "https://raw.githubusercontent.com/geedoubledee/R/main/fastfood.csv"
fastfood_df <- read.csv(file=my_url, header=TRUE, stringsAsFactors=FALSE)
head(fastfood_df)
##   X restaurant                                      item calories cal_fat
## 1 1  Mcdonalds          Artisan Grilled Chicken Sandwich      380      60
## 2 2  Mcdonalds            Single Bacon Smokehouse Burger      840     410
## 3 3  Mcdonalds            Double Bacon Smokehouse Burger     1130     600
## 4 4  Mcdonalds Grilled Bacon Smokehouse Chicken Sandwich      750     280
## 5 5  Mcdonalds  Crispy Bacon Smokehouse Chicken Sandwich      920     410
## 6 6  Mcdonalds                                   Big Mac      540     250
##   total_fat sat_fat trans_fat cholesterol sodium total_carb fiber sugar protein
## 1         7       2       0.0          95   1110         44     3    11      37
## 2        45      17       1.5         130   1580         62     2    18      46
## 3        67      27       3.0         220   1920         63     3    18      70
## 4        31      10       0.5         155   1940         62     2    18      55
## 5        45      12       0.5         120   1980         81     4    18      46
## 6        28      10       1.0          80    950         46     3     9      25
##   vit_a vit_c calcium salad
## 1     4    20      20 Other
## 2     6    20      20 Other
## 3    10    20      50 Other
## 4     6    25      20 Other
## 5     6    20      20 Other
## 6    10     2      15 Other
#Packages
library(knitr)
library(rmarkdown)
library(magrittr)
library(plyr)
library(dplyr)
library(ggplot2)
library(DescTools)

Research Questions:

Grilled food is generally assumed to be more healthful for people to consume than food prepared in ways that require more oil, such as pan-frying or deep-frying. Many fast food items indicate that they are grilled right in their name.

Do these grilled fast food items actually contain fewer total calories or fewer calories from fat than other fast food items? If we look at calories from fat as a percentage of total calories, is the percentage lower for grilled fast food items?

Please note that these questions apply only to fast food items in the data set that use a variation of “grilled” in their names, not all the fast food items in the data set that may actually be prepared that way. The data set does not include data on the actual preparation of these menu items.

Hypotheses:

Null Hypothesis: Grilled fast food items do not have fewer total calories, fewer calories from fat, or a lower percentage of calories from fat than other fast food items.

Alternative Hypothesis: Grilled fast food items have fewer total calories, fewer calories from fat, or a lower percentage of calories from fat than other fast food items.

Visual Analysis for All Fast Food Items:

There appears to be one error in the recorded data that must be addressed before going any further. The Ultimate Chicken Club from Sonic is listed as having more calories from fat than it has total calories, so I have decided to exclude it from all analysis, reducing the total observations to 514.

fastfood_df_new <- fastfood_df
fastfood_df_new %<>%
    filter(!X==128) #remove the Ultimate Chicken Club from Sonic

Before filtering the data into grilled and not grilled groups, a simple scatterplot of the total calories and calories from fat data for all fast food items reveals a strong, positive, linear relationship between these variables. As an item’s calories increase, its calories from fat tend to increase as well.

plot(fastfood_df_new$calories, fastfood_df_new$cal_fat, main="Scatterplot of Total Calories and Calories from Fat for All Fast Food Items", xlab="Calories", ylab="Calories from Fat")

There are some noticeable outliers. The most extreme of these outliers contains significantly more total calories and calories from fat than any of the other items. That item is the 20 piece Buttermilk Crispy Chicken Tenders from McDonalds, which contains 2,430 total calories, 1,270 of which are calories from fat.

Visual Analysis for Grilled Fast Food Items:

fastfood_df_grilled <- fastfood_df_new
fastfood_df_grilled %<>%
    filter(item %like any% c("%grilled%", "Grilled%", "%Grilled%")) %>%
    mutate(grilled=1) %>%
    mutate(cal_fat_to_cal_ratio=cal_fat/calories)
grilled_mean_cal = round(mean(fastfood_df_grilled$calories), 2)
grilled_mean_cal_fat = round(mean(fastfood_df_grilled$cal_fat), 2)
grilled_mean_cal_fat_to_cal_ratio = round(mean(fastfood_df_grilled$cal_fat_to_cal_ratio), 2)

After filtering the data, 39 out of 514 fast food items are indicated to be grilled. Looking at the histograms, both the total calories and calories from fat for these grilled items are for practical purposes unimodal and nearly normally distributed. The mean calories for grilled items is 426.67, and the mean calories from fat for grilled items is 163.85. The mean proportion of calories from fat for grilled items is 0.36.

hist(fastfood_df_grilled$calories, main="Grilled: Total Calories Histogram", xlab="Calories")

hist(fastfood_df_grilled$cal_fat, main="Grilled: Calories from Fat Histogram", xlab="Calories from Fat")

hist(fastfood_df_grilled$cal_fat_to_cal_ratio, main="Grilled: Fat Calories to Total Calories Ratio Histogram", xlab="Proportion of Calories from Fat")

Visual Analysis for Not Grilled Fast Food Items:

fastfood_df_not <- fastfood_df_new
fastfood_df_not %<>%
    filter(!item %like any% c("%grilled%", "Grilled%", "%Grilled%")) %>%
    mutate(grilled=0) %>%
    mutate(cal_fat_to_cal_ratio=cal_fat/calories)
not_mean_cal = round(mean(fastfood_df_not$calories), 2)
not_mean_cal_fat = round(mean(fastfood_df_not$cal_fat), 2)
not_mean_cal_fat_to_cal_ratio = round(mean(fastfood_df_not$cal_fat_to_cal_ratio), 2)

The remaining 475 fast food items are not indicated to be grilled. Looking at the histograms for these not grilled items, both the total calories and calories from fat distributions are unimodal and right-skewed. The mean calories for not grilled items is 540.38, and the mean calories from fat for not grilled items is 244.25. The mean proportion of calories from fat for not grilled items is 0.43.

hist(fastfood_df_not$calories, main="Not Grilled: Calories Histogram", xlab="Calories")

hist(fastfood_df_not$cal_fat, main="Not Grilled: Calories from Fat Histogram", xlab="Calories from Fat")

hist(fastfood_df_not$cal_fat_to_cal_ratio, main="Not Grilled: Fat Calories to Total Calories Ratio Histogram", xlab="Proportion of Calories from Fat")

Comparing Grilled and Not Grilled Fast Food Items:

For the grilled group, the mean calories, calories from fat, and proportion of calories from fat are all lower than the same measurements for the not grilled group. Before determining whether these differences are statistically significant, let’s examine the variance within both groups.

Looking at side-by-side boxplots comparing the distribution of calories, calories from fat, and proportions of calories from fat data between the two groups reveals that there is much more variance within the not grilled group than within the grilled group. Many outliers lie beyond the upper whiskers of each of the boxplots for the not grilled group. For the proportion of calories from fat boxplots, there are also several outliers below the lower whisker for the not grilled group.

fastfood_df_new <- rbind(fastfood_df_grilled, fastfood_df_not)
fastfood_df_new %<>%
    arrange(X)

boxplot(fastfood_df_new$calories ~ fastfood_df_new$grilled, main="Calories by Group", xlab = "Group", ylab="Calories")

boxplot(fastfood_df_new$cal_fat ~ fastfood_df_new$grilled, main="Fat Calories by Group", xlab = "Group", ylab="Calories from Fat")

boxplot(fastfood_df_new$cal_fat_to_cal_ratio ~ fastfood_df_new$grilled, main="Fat Calories to Total Calories Ratio by Group", xlab = "Group", ylab="Proportion of Calories from Fat")

Statistical Significance Testing:

Because the data for the not grilled group are not normally distributed, and the variance for the grilled group is not equal to the variance for the not grilled group, t-tests might not be the most appropriate tests of statistical significance here. Nonetheless, these are the tests I have performed.

t.test(fastfood_df_new$calories ~ fastfood_df_new$grilled, alternative="greater")
## 
##  Welch Two Sample t-test
## 
## data:  fastfood_df_new$calories by fastfood_df_new$grilled
## t = 3.3007, df = 51.906, p-value = 0.0008743
## alternative hypothesis: true difference in means between group 0 and group 1 is greater than 0
## 95 percent confidence interval:
##  56.01532      Inf
## sample estimates:
## mean in group 0 mean in group 1 
##        540.3789        426.6667
t.test(fastfood_df_new$cal_fat ~ fastfood_df_new$grilled, alternative="greater")
## 
##  Welch Two Sample t-test
## 
## data:  fastfood_df_new$cal_fat by fastfood_df_new$grilled
## t = 4.541, df = 57.855, p-value = 1.445e-05
## alternative hypothesis: true difference in means between group 0 and group 1 is greater than 0
## 95 percent confidence interval:
##  50.80645      Inf
## sample estimates:
## mean in group 0 mean in group 1 
##        244.2505        163.8462
t.test(fastfood_df_new$cal_fat_to_cal_ratio ~ fastfood_df_new$grilled, alternative="greater")
## 
##  Welch Two Sample t-test
## 
## data:  fastfood_df_new$cal_fat_to_cal_ratio by fastfood_df_new$grilled
## t = 3.5812, df = 46.172, p-value = 0.0004094
## alternative hypothesis: true difference in means between group 0 and group 1 is greater than 0
## 95 percent confidence interval:
##  0.0374042       Inf
## sample estimates:
## mean in group 0 mean in group 1 
##       0.4330726       0.3626711

Conclusions:

Performing two-sample, one-tailed t-tests does indicate that the observed differences in the total calories, calories from fat, and proportion of calories from fat means between grilled and not grilled fast food items are significant. There is a correlation between a fast food item being indicated as “grilled” on the menu and that item actually containing fewer total calories, fewer calories from fat, and a lower percentage of calories from fat. So I reject the null hypothesis, with a caveat that better, alternative significance testing could probably be performed.