ANOVA Assignment

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggpubr)
library(rstatix)

## 
## Attaching package: 'rstatix'
## 
## The following object is masked from 'package:stats':
## 
##     filter

carbs_data <- read.csv("/Users/yahavmanor/Desktop/ENP164/carbs_data.csv")
carbs_data

##       Food Serving_Size Carbs
## 1    Pizza          150    40
## 2    Pizza          150    38
## 3    Pasta          200    70
## 4    Pasta          200    74
## 5 Potatoes          180    38
## 6 Potatoes          180    31
## 7   Quinoa          150    39
## 8   Quinoa          150    40

#NUMBER 1: Independence of variables
# The variables are independent from each other, as the amount of carbs in each food category (pizza, pasta, potatoes, and quinoa), do not depend on each other. This essentially means that the carbs in one pizza does not influence the carbs in another pizza, or any other item in other food categories. Therefore, the data is independent.

food <- carbs_data$Food
carbs <- carbs_data$Carbs

#NUMBER 2: Testing for normality of data
shapiro.test(carbs)

## 
##  Shapiro-Wilk normality test
## 
## data:  carbs
## W = 0.72845, p-value = 0.004705

# p-value is less than 0.05, indicating that we reject the null hypothesis (which states that the data is normally distributed), in favor of the alternative (the data is not normally distributed). This makes sense considering that the data is looking at two different kinds of pastas, pizzas, potatoes, and quinoas, which all have different carb amounts that do not depend on each other. Therefore, the data is not normally distributed.

#NUMBER 3: Summary statistics
results <- tapply(carbs, food, mean)
results

##    Pasta    Pizza Potatoes   Quinoa 
##     72.0     39.0     34.5     39.5

#Based on the data from the table and the means of each food category, it seems that there is the greatest in-group difference between the two difference kinds of potatoes, as the mean (34.5) is the furthest away from each of the two data point (38 and 31). As for between groups, Pasta is most definitely the outlier in terms of number of carbs, and this difference will likely be shown through the ANOVA and post-hoc analysis (turkey test).

#NUMBER 4: ANOVA
anova_result <- aov(carbs ~ food)
get_anova_table(anova_result)

## Call:
##    aov(formula = carbs ~ food)
## 
## Terms:
##                   food Residuals
## Sum of Squares  1798.5      35.0
## Deg. of Freedom      3         4
## 
## Residual standard error: 2.95804
## Estimated effects may be unbalanced

summary(anova_result)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## food         3   1798   599.5   68.51 0.000679 ***
## Residuals    4     35     8.8                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# The results of this test show a p-value of 0.000679, which is less than 0.05. This indicates that we can reject the null hypothesis in favor of the alternative, which concludes that there does exist a statistically significant difference between the means of all of the four groups of foods with carbs. Though the ANOVA is able to conclude a difference, further analysis must be conducted to determine which groups the differences lie between.

#NUMBER 5: Post-hoc analysis
TukeyHSD(anova_result)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = carbs ~ food)
## 
## $food
##                  diff        lwr        upr     p adj
## Pizza-Pasta     -33.0 -45.041752 -20.958248 0.0012846
## Potatoes-Pasta  -37.5 -49.541752 -25.458248 0.0007784
## Quinoa-Pasta    -32.5 -44.541752 -20.458248 0.0013632
## Potatoes-Pizza   -4.5 -16.541752   7.541752 0.5035352
## Quinoa-Pizza      0.5 -11.541752  12.541752 0.9979995
## Quinoa-Potatoes   5.0  -7.041752  17.041752 0.4317840

# The relationships with p-values below 0.05 indicate statistically significant differences between the means of the two groups. In this case, pizza and pasta, potatoes and pasta, and quina and pasta, all have statistically significant differences, which influenced the results of the ANOVA test in the previous question. These results make sense when just visually comparing the means (in number 3), as the mean of carbs in pasta seems to be significantly greater than the means of carbs in the 3 other food groups.

#NUMBER 6: Plotting Outcomes

plot(results, xlab = "Food Types", ylab = "Carbs" )

# Note that on the chart, Pasta = 1.0, Pizza = 2.0, Potatoes = 3.0, Quinoa = 4.0

ANOVA Assignment

Yahav Manor

2025-04-06