Assignment 12

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ dplyr   1.0.5
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

load("/Users/towanimwansa/Downloads/fastfood.rda")
tidy.fastfood<-fastfood

###Question 1: Using tidyverse commands discussed in class, subset just the rows corresponding to food items sold at Arby’s, Subway, and Taco Bell. Share your code here.

target<-c("Arbys", "Subway", "Taco Bell")
fastfood.slim<-filter(tidy.fastfood, restaurant %in% target)

###Question 2: Using the mutate() function on the full dataset (not the one created in Question 1), create a column that subtracts the calories from fat from the total calorie content of each food item.

mutate(tidy.fastfood, Calories.nofat = calories -cal_fat)

## # A tibble: 515 x 18
##    restaurant item      calories cal_fat total_fat sat_fat trans_fat cholesterol
##    <chr>      <chr>        <dbl>   <dbl>     <dbl>   <dbl>     <dbl>       <dbl>
##  1 Mcdonalds  Artisan …      380      60         7       2       0            95
##  2 Mcdonalds  Single B…      840     410        45      17       1.5         130
##  3 Mcdonalds  Double B…     1130     600        67      27       3           220
##  4 Mcdonalds  Grilled …      750     280        31      10       0.5         155
##  5 Mcdonalds  Crispy B…      920     410        45      12       0.5         120
##  6 Mcdonalds  Big Mac        540     250        28      10       1            80
##  7 Mcdonalds  Cheesebu…      300     100        12       5       0.5          40
##  8 Mcdonalds  Classic …      510     210        24       4       0            65
##  9 Mcdonalds  Double C…      430     190        21      11       1            85
## 10 Mcdonalds  Double Q…      770     400        45      21       2.5         175
## # … with 505 more rows, and 10 more variables: sodium <dbl>, total_carb <dbl>,
## #   fiber <dbl>, sugar <dbl>, protein <dbl>, vit_a <dbl>, vit_c <dbl>,
## #   calcium <dbl>, salad <chr>, Calories.nofat <dbl>

###Question 3: Share what commands you would use to select just the restaurant, item, and calories columns AND only include the food items with calorie counts > 1000. Do this all using piping.

tidy.fastfood%>%
  select(restaurant, item, calories)%>%
  filter(calories>1000)

## # A tibble: 28 x 3
##    restaurant item                                             calories
##    <chr>      <chr>                                               <dbl>
##  1 Mcdonalds  Double Bacon Smokehouse Burger                       1130
##  2 Mcdonalds  10 piece Buttermilk Crispy Chicken Tenders           1210
##  3 Mcdonalds  12 piece Buttermilk Crispy Chicken Tenders           1510
##  4 Mcdonalds  20 piece Buttermilk Crispy Chicken Tenders           2430
##  5 Mcdonalds  40 piece Chicken McNuggets                           1770
##  6 Mcdonalds  10 piece Sweet N' Spicy Honey BBQ Glazed Tenders     1600
##  7 Sonic      Super Sonic Bacon Double Cheeseburger (w/mayo)       1280
##  8 Sonic      Super Sonic Double Cheeseburger W/ Mustard           1120
##  9 Sonic      Super Sonic Double Cheeseburger W/ Ketchup           1130
## 10 Sonic      Super Sonic Double Cheeseburger W/ Mayo              1220
## # … with 18 more rows

###Question 4: Using piping, the group_by function, and the summarise function, compute the average calorie content, the standard deviation of the calorie content, and the sample size for Arby’s, Subway, and Taco Bell separately. Use the dataset you created in Question 1. Share your code here and fill in the following table with your results:

fastfood.slim%>%
  group_by(restaurant) %>%
  summarise(Ave.Calories = mean(calories), Sd.Calories = sd(calories), Sample.size = length(restaurant))

## # A tibble: 3 x 4
##   restaurant Ave.Calories Sd.Calories Sample.size
##   <chr>             <dbl>       <dbl>       <int>
## 1 Arbys              533.        210.          55
## 2 Subway             503.        282.          96
## 3 Taco Bell          444.        184.         115

###Question 5: What are the hypotheses for this ANOVA test? H0: There is no difference in the average calorie content of food from Arby’s, Subway and Taco Bell. Ha: There is a difference in the average calorie content in at least one of the fast food restaurants.

###Question 6: Create histograms of the calorie contents of each restaurant. Evaluate how normal the distribution of calories looks for each restaurant.

arbys.calories<-filter(fastfood.slim, restaurant == "Arbys")
hist(arbys.calories$calories, main = "Histogram of Arbys Calorie Content",
     xlab = "Calories",
     breaks = 8)

subway.calories<-filter(fastfood.slim, restaurant == "Subway")
hist(subway.calories$calories, main = "Histogram of Subway Calorie Content",
     xlab = "Calories",
     breaks = 10)

tb.calories<-filter(fastfood.slim, restaurant == "Taco Bell")
hist(tb.calories$calories, main = "Histogram of Taco Bell Calorie Content",
     xlab = "Calories",
     breaks = 10)

Arby’s - The histogram is could be considered to have a normal distribution. Subway - The histogram is right skewed. Taco Bell- The histogram is slightly right skewed.

When it comes to assessing the normality of our data, given that two of the groups were non-normal we also have to look at the sample size. For this set of data, all the groups had a sample size of more than 30. We can still consider normality in the sample distribution.

###Question 7: Create side-by-side boxplots of the calorie contents of each restaurant. Evaluate the constant variance assumption based on these boxplots. Feel free to reference the standard deviations you computed in Question 4.

ggplot(fastfood.slim, aes(x = restaurant, y = calories)) + geom_boxplot()

From the side-by-side boxplots, there does not apprear to be a great deal of variance in caloric content between the restaurants.Furthermore, looking at the standard deviations calculated in Q4, there is significant overlap between the 3 restaurants when we take the sd into account.Looking at the boxplots, there appears to be a difference in constant varience between the groups so that condition is not met.

###Question 8: Using the above code (but filling in the appropriate variable names), carry out an ANOVA test using a significance level of 0.05. Fill in the following table with your results AND state your conclusions.

results <- aov(calories ~ restaurant, data = fastfood.slim)
summary(results)

##              Df   Sum Sq Mean Sq F value Pr(>F)  
## restaurant    2   352468  176234   3.351 0.0365 *
## Residuals   263 13829781   52585                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the results of the ANOVA test we see that the p-value is 0.0365 which is less than our significance level of 0.05. From this, we can conclude that that we can reject the null. This would mean that there is a difference in the average caloric content in at least one of the restaurants. However, given that the constant variance assumptions aren’t met, we can take this with a grain of salt.

###Question 9: Carry out a pairwise t-test using a Bonferroni correction. Share your results here and if you rejected any of the pairwise t-tests.

pairwise.t.test(fastfood.slim$calories, fastfood.slim$restaurant, p.adj = "bonferroni")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  fastfood.slim$calories and fastfood.slim$restaurant 
## 
##           Arbys Subway
## Subway    1.000 -     
## Taco Bell 0.056 0.187 
## 
## P value adjustment method: bonferroni

From the pairwise test, the resulting p-values are all above our significance level of 0.05. Therefore we do not have enough evidence to reject the null hypothesis in each of the 3 tests of comparison. In context, this would mean that we do not have enough evidence to deny that there is no difference in caloric content between menu items at Abrys and Subway, between menu items at Taco Bell and Subway and between menu items at Arbys and Taco Bell.

Given that there is a difference in the results of the ANOVA compared to the pairwise test, I would be more comfortable trusting the pairwise test because the conditions for the ANOVA test were not completely met and therefore it might not be an accurate measure of comparison.

Assignment 12 - Anova

Towani Mwansa

5/1/2021