library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.0 ✓ dplyr 1.0.5
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
load("/Users/towanimwansa/Downloads/fastfood.rda")
tidy.fastfood<-fastfood
###Question 1: Using tidyverse commands discussed in class, subset just the rows corresponding to food items sold at Arby’s, Subway, and Taco Bell. Share your code here.
target<-c("Arbys", "Subway", "Taco Bell")
fastfood.slim<-filter(tidy.fastfood, restaurant %in% target)
###Question 2: Using the mutate() function on the full dataset (not the one created in Question 1), create a column that subtracts the calories from fat from the total calorie content of each food item.
mutate(tidy.fastfood, Calories.nofat = calories -cal_fat)
## # A tibble: 515 x 18
## restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mcdonalds Artisan … 380 60 7 2 0 95
## 2 Mcdonalds Single B… 840 410 45 17 1.5 130
## 3 Mcdonalds Double B… 1130 600 67 27 3 220
## 4 Mcdonalds Grilled … 750 280 31 10 0.5 155
## 5 Mcdonalds Crispy B… 920 410 45 12 0.5 120
## 6 Mcdonalds Big Mac 540 250 28 10 1 80
## 7 Mcdonalds Cheesebu… 300 100 12 5 0.5 40
## 8 Mcdonalds Classic … 510 210 24 4 0 65
## 9 Mcdonalds Double C… 430 190 21 11 1 85
## 10 Mcdonalds Double Q… 770 400 45 21 2.5 175
## # … with 505 more rows, and 10 more variables: sodium <dbl>, total_carb <dbl>,
## # fiber <dbl>, sugar <dbl>, protein <dbl>, vit_a <dbl>, vit_c <dbl>,
## # calcium <dbl>, salad <chr>, Calories.nofat <dbl>
###Question 3: Share what commands you would use to select just the restaurant, item, and calories columns AND only include the food items with calorie counts > 1000. Do this all using piping.
tidy.fastfood%>%
select(restaurant, item, calories)%>%
filter(calories>1000)
## # A tibble: 28 x 3
## restaurant item calories
## <chr> <chr> <dbl>
## 1 Mcdonalds Double Bacon Smokehouse Burger 1130
## 2 Mcdonalds 10 piece Buttermilk Crispy Chicken Tenders 1210
## 3 Mcdonalds 12 piece Buttermilk Crispy Chicken Tenders 1510
## 4 Mcdonalds 20 piece Buttermilk Crispy Chicken Tenders 2430
## 5 Mcdonalds 40 piece Chicken McNuggets 1770
## 6 Mcdonalds 10 piece Sweet N' Spicy Honey BBQ Glazed Tenders 1600
## 7 Sonic Super Sonic Bacon Double Cheeseburger (w/mayo) 1280
## 8 Sonic Super Sonic Double Cheeseburger W/ Mustard 1120
## 9 Sonic Super Sonic Double Cheeseburger W/ Ketchup 1130
## 10 Sonic Super Sonic Double Cheeseburger W/ Mayo 1220
## # … with 18 more rows
###Question 4: Using piping, the group_by function, and the summarise function, compute the average calorie content, the standard deviation of the calorie content, and the sample size for Arby’s, Subway, and Taco Bell separately. Use the dataset you created in Question 1. Share your code here and fill in the following table with your results:
fastfood.slim%>%
group_by(restaurant) %>%
summarise(Ave.Calories = mean(calories), Sd.Calories = sd(calories), Sample.size = length(restaurant))
## # A tibble: 3 x 4
## restaurant Ave.Calories Sd.Calories Sample.size
## <chr> <dbl> <dbl> <int>
## 1 Arbys 533. 210. 55
## 2 Subway 503. 282. 96
## 3 Taco Bell 444. 184. 115
###Question 5: What are the hypotheses for this ANOVA test? H0: There is no difference in the average calorie content of food from Arby’s, Subway and Taco Bell. Ha: There is a difference in the average calorie content in at least one of the fast food restaurants.
###Question 6: Create histograms of the calorie contents of each restaurant. Evaluate how normal the distribution of calories looks for each restaurant.
arbys.calories<-filter(fastfood.slim, restaurant == "Arbys")
hist(arbys.calories$calories, main = "Histogram of Arbys Calorie Content",
xlab = "Calories",
breaks = 8)
subway.calories<-filter(fastfood.slim, restaurant == "Subway")
hist(subway.calories$calories, main = "Histogram of Subway Calorie Content",
xlab = "Calories",
breaks = 10)
tb.calories<-filter(fastfood.slim, restaurant == "Taco Bell")
hist(tb.calories$calories, main = "Histogram of Taco Bell Calorie Content",
xlab = "Calories",
breaks = 10)
Arby’s - The histogram is could be considered to have a normal distribution. Subway - The histogram is right skewed. Taco Bell- The histogram is slightly right skewed.
When it comes to assessing the normality of our data, given that two of the groups were non-normal we also have to look at the sample size. For this set of data, all the groups had a sample size of more than 30. We can still consider normality in the sample distribution.
###Question 7: Create side-by-side boxplots of the calorie contents of each restaurant. Evaluate the constant variance assumption based on these boxplots. Feel free to reference the standard deviations you computed in Question 4.
ggplot(fastfood.slim, aes(x = restaurant, y = calories)) + geom_boxplot()
From the side-by-side boxplots, there does not apprear to be a great deal of variance in caloric content between the restaurants.Furthermore, looking at the standard deviations calculated in Q4, there is significant overlap between the 3 restaurants when we take the sd into account.Looking at the boxplots, there appears to be a difference in constant varience between the groups so that condition is not met.
###Question 8: Using the above code (but filling in the appropriate variable names), carry out an ANOVA test using a significance level of 0.05. Fill in the following table with your results AND state your conclusions.
results <- aov(calories ~ restaurant, data = fastfood.slim)
summary(results)
## Df Sum Sq Mean Sq F value Pr(>F)
## restaurant 2 352468 176234 3.351 0.0365 *
## Residuals 263 13829781 52585
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the results of the ANOVA test we see that the p-value is 0.0365 which is less than our significance level of 0.05. From this, we can conclude that that we can reject the null. This would mean that there is a difference in the average caloric content in at least one of the restaurants. However, given that the constant variance assumptions aren’t met, we can take this with a grain of salt.
###Question 9: Carry out a pairwise t-test using a Bonferroni correction. Share your results here and if you rejected any of the pairwise t-tests.
pairwise.t.test(fastfood.slim$calories, fastfood.slim$restaurant, p.adj = "bonferroni")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: fastfood.slim$calories and fastfood.slim$restaurant
##
## Arbys Subway
## Subway 1.000 -
## Taco Bell 0.056 0.187
##
## P value adjustment method: bonferroni
From the pairwise test, the resulting p-values are all above our significance level of 0.05. Therefore we do not have enough evidence to reject the null hypothesis in each of the 3 tests of comparison. In context, this would mean that we do not have enough evidence to deny that there is no difference in caloric content between menu items at Abrys and Subway, between menu items at Taco Bell and Subway and between menu items at Arbys and Taco Bell.
Given that there is a difference in the results of the ANOVA compared to the pairwise test, I would be more comfortable trusting the pairwise test because the conditions for the ANOVA test were not completely met and therefore it might not be an accurate measure of comparison.