Week 12 ANOVA HW

Question 1: Using tidyverse commands discussed in class, subset just the rows corresponding to food items sold at Arby’s, Subway, and Taco Bell. Share your code here.

groups_3 <- fastfood %>%
  filter(restaurant %in%
           c("Subway", "Arbys", "Taco Bell"))

Question 2: Using the mutate() function on the full dataset (not the one created in Question 1), create a column that subtracts the calories from fat from the total calorie content of each food item.

fastfood <-
  mutate(fastfood, cal_noFat
         = calories - total_fat)

Question 3: Share what commands you would use to select just the restaurant, item, and calories columns AND only include the food items with calorie counts > 1000. Do this all using piping.

fastfood %>%
  select(restaurant, item, calories) %>%
  filter(restaurant %in%
           c("Subway", "Arbys", "Taco Bell")) %>%
  filter(calories > 1000)

# A tibble: 5 x 3
  restaurant item                                     calories
  <chr>      <chr>                                       <dbl>
1 Arbys      Triple Decker Sandwich                       1030
2 Subway     Footlong Big Hot Pastrami                    1160
3 Subway     Footlong Carved Turkey & Bacon w/ Cheese     1140
4 Subway     Footlong Chicken & Bacon Ranch Melt          1140
5 Subway     Footlong Italian Hero                        1100

Question 4: Using piping, the group_by function, and the summarise function, compute the average calorie content, the standard deviation of the calorie content, and the sample size for Arby’s, Subway, and Taco Bell separately. Use the dataset you created in Question 1. Share your code here and fill in the following table with your results:

group <- fastfood %>%
  filter(restaurant %in%
           c("Subway", "Arbys", "Taco Bell")) %>%
  group_by(restaurant) %>%
  summarise( mean_cal = mean(calories),
            sd_cal = sd(calories),
            n = length(calories))
group

# A tibble: 3 x 4
  restaurant mean_cal sd_cal     n
  <chr>         <dbl>  <dbl> <int>
1 Arbys          533.   210.    55
2 Subway         503.   282.    96
3 Taco Bell      444.   184.   115

Question 5: What are the hypotheses for this ANOVA test?

H\(_{o}\): The average calorie content is identical in across all restaurants. Any observed difference is due to chance. \(\mu_{A}\) = \(\mu_{B}\) = \(\mu_{C}\)
H\(_{A}\): The average calorie content varies across some (or all) restaurants.

Question 6: Create histograms of the calorie contents of each restaurant. Evaluate how normal the distribution of calories looks for each restaurant.

The calorie content of Subway’s distribution looks normal with no extreme outliers.

The calorie content of Arby’s distribution looks normal with no extreme outliers.

The calorie content of Taco Bell’s distribution looks normal with no extreme outliers.

Question 7: Create side-by-side boxplots of the calorie contents of each restaurant. Evaluate the constant variance assumption based on these boxplots. Feel free to reference the standard deviations you computed in Question 4. The side by side box plots for each restaurant’s calorie content illustrates that the variability appears to be approximately constant across the each restaurant.

Question 8: Using the above code (but filling in the appropriate variable names), carry out an ANOVA test using a significance level of 0.05. Fill in the following table with your results AND state your conclusions.

             Df   Sum Sq Mean Sq F value Pr(>F)  
restaurant    2   352468  176234   3.351 0.0365 *
Residuals   263 13829781   52585                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value of .0365 is smaller than the significance level of .05, indicating the evidence is strong enough to reject the null hypothesis. That is, the data provides strong evidence that the average calorie content varies across some (or all) restaurants.

Question 9: Carry out a pairwise t-test using a Bonferroni correction. Share your results here and if you rejected any of the pairwise t-tests.


    Pairwise comparisons using t tests with pooled SD 

data:  groups_3$calories and groups_3$restaurant 

          Arbys Subway
Subway    1.000 -     
Taco Bell 0.056 0.187 

P value adjustment method: bonferroni

With the modified significance level or .05/3 = 0.0167. All the p-values calculated from the pairwaise t-test using a Bonfettoni correction are larger than our modified significance level of 0.0167. This means that with this method the p-values show that there is not strong evidence of a difference in the average calorie contents of these restaurants.

Subway \(\stackrel{?}{=}\) Arby’s, Subway \(\stackrel{?}{=}\) Taco Bell, Arby’s \(\stackrel{?}{=}\) Taco Bell

In, conclusion we did not have sufficient evidence to reject the null hypothesis.

Week 12 ANOVA HW

Tully O’Leary

5/1/2021