The nutritional content (calories, calories from fat, grams of protein, etc.) of 515 fast food items from various chain restaurants was measured and recorded. Below, we load this fast food data set and the packages required for analysis.
#Data
my_url <- "https://raw.githubusercontent.com/geedoubledee/R/main/fastfood.csv"
fastfood_df <- read.csv(file=my_url, header=TRUE, stringsAsFactors=FALSE)
head(fastfood_df)
## X restaurant item calories cal_fat
## 1 1 Mcdonalds Artisan Grilled Chicken Sandwich 380 60
## 2 2 Mcdonalds Single Bacon Smokehouse Burger 840 410
## 3 3 Mcdonalds Double Bacon Smokehouse Burger 1130 600
## 4 4 Mcdonalds Grilled Bacon Smokehouse Chicken Sandwich 750 280
## 5 5 Mcdonalds Crispy Bacon Smokehouse Chicken Sandwich 920 410
## 6 6 Mcdonalds Big Mac 540 250
## total_fat sat_fat trans_fat cholesterol sodium total_carb fiber sugar protein
## 1 7 2 0.0 95 1110 44 3 11 37
## 2 45 17 1.5 130 1580 62 2 18 46
## 3 67 27 3.0 220 1920 63 3 18 70
## 4 31 10 0.5 155 1940 62 2 18 55
## 5 45 12 0.5 120 1980 81 4 18 46
## 6 28 10 1.0 80 950 46 3 9 25
## vit_a vit_c calcium salad
## 1 4 20 20 Other
## 2 6 20 20 Other
## 3 10 20 50 Other
## 4 6 25 20 Other
## 5 6 20 20 Other
## 6 10 2 15 Other
#Packages
library(knitr)
library(rmarkdown)
library(magrittr)
library(plyr)
library(dplyr)
library(ggplot2)
library(DescTools)
Grilled food is generally assumed to be more healthful for people to consume than food prepared in ways that require more oil, such as pan-frying or deep-frying. Many fast food items indicate that they are grilled right in their name.
Do these grilled fast food items actually contain fewer total calories or fewer calories from fat than other fast food items? If we look at calories from fat as a percentage of total calories, is the percentage lower for grilled fast food items?
Please note that these questions apply only to fast food items in the data set that use a variation of “grilled” in their names, not all the fast food items in the data set that may actually be prepared that way. The data set does not include data on the actual preparation of these menu items.
Null Hypothesis: Grilled fast food items do not have fewer total calories, fewer calories from fat, or a lower percentage of calories from fat than other fast food items.
Alternative Hypothesis: Grilled fast food items have fewer total calories, fewer calories from fat, or a lower percentage of calories from fat than other fast food items.
There appears to be one error in the recorded data that must be addressed before going any further. The Ultimate Chicken Club from Sonic is listed as having more calories from fat than it has total calories, so I have decided to exclude it from all analysis, reducing the total observations to 514.
fastfood_df_new <- fastfood_df
fastfood_df_new %<>%
filter(!X==128) #remove the Ultimate Chicken Club from Sonic
Before filtering the data into grilled and not grilled groups, a simple scatterplot of the total calories and calories from fat data for all fast food items reveals a strong, positive, linear relationship between these variables. As an item’s calories increase, its calories from fat tend to increase as well.
plot(fastfood_df_new$calories, fastfood_df_new$cal_fat, main="Scatterplot of Total Calories and Calories from Fat for All Fast Food Items", xlab="Calories", ylab="Calories from Fat")
There are some noticeable outliers. The most extreme of these outliers contains significantly more total calories and calories from fat than any of the other items. That item is the 20 piece Buttermilk Crispy Chicken Tenders from McDonalds, which contains 2,430 total calories, 1,270 of which are calories from fat.
fastfood_df_grilled <- fastfood_df_new
fastfood_df_grilled %<>%
filter(item %like any% c("%grilled%", "Grilled%", "%Grilled%")) %>%
mutate(grilled=1) %>%
mutate(cal_fat_to_cal_ratio=cal_fat/calories)
grilled_mean_cal = round(mean(fastfood_df_grilled$calories), 2)
grilled_mean_cal_fat = round(mean(fastfood_df_grilled$cal_fat), 2)
grilled_mean_cal_fat_to_cal_ratio = round(mean(fastfood_df_grilled$cal_fat_to_cal_ratio), 2)
After filtering the data, 39 out of 514 fast food items are indicated to be grilled. Looking at the histograms, both the total calories and calories from fat for these grilled items are for practical purposes unimodal and nearly normally distributed. The mean calories for grilled items is 426.67, and the mean calories from fat for grilled items is 163.85. The mean proportion of calories from fat for grilled items is 0.36.
hist(fastfood_df_grilled$calories, main="Grilled: Total Calories Histogram", xlab="Calories")
hist(fastfood_df_grilled$cal_fat, main="Grilled: Calories from Fat Histogram", xlab="Calories from Fat")
hist(fastfood_df_grilled$cal_fat_to_cal_ratio, main="Grilled: Fat Calories to Total Calories Ratio Histogram", xlab="Proportion of Calories from Fat")
fastfood_df_not <- fastfood_df_new
fastfood_df_not %<>%
filter(!item %like any% c("%grilled%", "Grilled%", "%Grilled%")) %>%
mutate(grilled=0) %>%
mutate(cal_fat_to_cal_ratio=cal_fat/calories)
not_mean_cal = round(mean(fastfood_df_not$calories), 2)
not_mean_cal_fat = round(mean(fastfood_df_not$cal_fat), 2)
not_mean_cal_fat_to_cal_ratio = round(mean(fastfood_df_not$cal_fat_to_cal_ratio), 2)
The remaining 475 fast food items are not indicated to be grilled. Looking at the histograms for these not grilled items, both the total calories and calories from fat distributions are unimodal and right-skewed. The mean calories for not grilled items is 540.38, and the mean calories from fat for not grilled items is 244.25. The mean proportion of calories from fat for not grilled items is 0.43.
hist(fastfood_df_not$calories, main="Not Grilled: Calories Histogram", xlab="Calories")
hist(fastfood_df_not$cal_fat, main="Not Grilled: Calories from Fat Histogram", xlab="Calories from Fat")
hist(fastfood_df_not$cal_fat_to_cal_ratio, main="Not Grilled: Fat Calories to Total Calories Ratio Histogram", xlab="Proportion of Calories from Fat")
For the grilled group, the mean calories, calories from fat, and proportion of calories from fat are all lower than the same measurements for the not grilled group. Before determining whether these differences are statistically significant, let’s examine the variance within both groups.
Looking at side-by-side boxplots comparing the distribution of calories, calories from fat, and proportions of calories from fat data between the two groups reveals that there is much more variance within the not grilled group than within the grilled group. Many outliers lie beyond the upper whiskers of each of the boxplots for the not grilled group. For the proportion of calories from fat boxplots, there are also several outliers below the lower whisker for the not grilled group.
fastfood_df_new <- rbind(fastfood_df_grilled, fastfood_df_not)
fastfood_df_new %<>%
arrange(X)
boxplot(fastfood_df_new$calories ~ fastfood_df_new$grilled, main="Calories by Group", xlab = "Group", ylab="Calories")
boxplot(fastfood_df_new$cal_fat ~ fastfood_df_new$grilled, main="Fat Calories by Group", xlab = "Group", ylab="Calories from Fat")
boxplot(fastfood_df_new$cal_fat_to_cal_ratio ~ fastfood_df_new$grilled, main="Fat Calories to Total Calories Ratio by Group", xlab = "Group", ylab="Proportion of Calories from Fat")
Because the data for the not grilled group are not normally distributed, and the variance for the grilled group is not equal to the variance for the not grilled group, t-tests might not be the most appropriate tests of statistical significance here. Nonetheless, these are the tests I have performed.
t.test(fastfood_df_new$calories ~ fastfood_df_new$grilled, alternative="greater")
##
## Welch Two Sample t-test
##
## data: fastfood_df_new$calories by fastfood_df_new$grilled
## t = 3.3007, df = 51.906, p-value = 0.0008743
## alternative hypothesis: true difference in means between group 0 and group 1 is greater than 0
## 95 percent confidence interval:
## 56.01532 Inf
## sample estimates:
## mean in group 0 mean in group 1
## 540.3789 426.6667
t.test(fastfood_df_new$cal_fat ~ fastfood_df_new$grilled, alternative="greater")
##
## Welch Two Sample t-test
##
## data: fastfood_df_new$cal_fat by fastfood_df_new$grilled
## t = 4.541, df = 57.855, p-value = 1.445e-05
## alternative hypothesis: true difference in means between group 0 and group 1 is greater than 0
## 95 percent confidence interval:
## 50.80645 Inf
## sample estimates:
## mean in group 0 mean in group 1
## 244.2505 163.8462
t.test(fastfood_df_new$cal_fat_to_cal_ratio ~ fastfood_df_new$grilled, alternative="greater")
##
## Welch Two Sample t-test
##
## data: fastfood_df_new$cal_fat_to_cal_ratio by fastfood_df_new$grilled
## t = 3.5812, df = 46.172, p-value = 0.0004094
## alternative hypothesis: true difference in means between group 0 and group 1 is greater than 0
## 95 percent confidence interval:
## 0.0374042 Inf
## sample estimates:
## mean in group 0 mean in group 1
## 0.4330726 0.3626711
Performing two-sample, one-tailed t-tests does indicate that the observed differences in the total calories, calories from fat, and proportion of calories from fat means between grilled and not grilled fast food items are significant. There is a correlation between a fast food item being indicated as “grilled” on the menu and that item actually containing fewer total calories, fewer calories from fat, and a lower percentage of calories from fat. So I reject the null hypothesis, with a caveat that better, alternative significance testing could probably be performed.