library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggpubr)
library(rstatix)
##
## Attaching package: 'rstatix'
##
## The following object is masked from 'package:stats':
##
## filter
carbs_data <- read.csv("/Users/yahavmanor/Desktop/ENP164/carbs_data.csv")
carbs_data
## Food Serving_Size Carbs
## 1 Pizza 150 40
## 2 Pizza 150 38
## 3 Pasta 200 70
## 4 Pasta 200 74
## 5 Potatoes 180 38
## 6 Potatoes 180 31
## 7 Quinoa 150 39
## 8 Quinoa 150 40
#NUMBER 1: Independence of variables
# The variables are independent from each other, as the amount of carbs in each food category (pizza, pasta, potatoes, and quinoa), do not depend on each other. This essentially means that the carbs in one pizza does not influence the carbs in another pizza, or any other item in other food categories. Therefore, the data is independent.
food <- carbs_data$Food
carbs <- carbs_data$Carbs
#NUMBER 2: Testing for normality of data
shapiro.test(carbs)
##
## Shapiro-Wilk normality test
##
## data: carbs
## W = 0.72845, p-value = 0.004705
# p-value is less than 0.05, indicating that we reject the null hypothesis (which states that the data is normally distributed), in favor of the alternative (the data is not normally distributed). This makes sense considering that the data is looking at two different kinds of pastas, pizzas, potatoes, and quinoas, which all have different carb amounts that do not depend on each other. Therefore, the data is not normally distributed.
#NUMBER 3: Summary statistics
results <- tapply(carbs, food, mean)
results
## Pasta Pizza Potatoes Quinoa
## 72.0 39.0 34.5 39.5
#Based on the data from the table and the means of each food category, it seems that there is the greatest in-group difference between the two difference kinds of potatoes, as the mean (34.5) is the furthest away from each of the two data point (38 and 31). As for between groups, Pasta is most definitely the outlier in terms of number of carbs, and this difference will likely be shown through the ANOVA and post-hoc analysis (turkey test).
#NUMBER 4: ANOVA
anova_result <- aov(carbs ~ food)
get_anova_table(anova_result)
## Call:
## aov(formula = carbs ~ food)
##
## Terms:
## food Residuals
## Sum of Squares 1798.5 35.0
## Deg. of Freedom 3 4
##
## Residual standard error: 2.95804
## Estimated effects may be unbalanced
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## food 3 1798 599.5 68.51 0.000679 ***
## Residuals 4 35 8.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# The results of this test show a p-value of 0.000679, which is less than 0.05. This indicates that we can reject the null hypothesis in favor of the alternative, which concludes that there does exist a statistically significant difference between the means of all of the four groups of foods with carbs. Though the ANOVA is able to conclude a difference, further analysis must be conducted to determine which groups the differences lie between.
#NUMBER 5: Post-hoc analysis
TukeyHSD(anova_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = carbs ~ food)
##
## $food
## diff lwr upr p adj
## Pizza-Pasta -33.0 -45.041752 -20.958248 0.0012846
## Potatoes-Pasta -37.5 -49.541752 -25.458248 0.0007784
## Quinoa-Pasta -32.5 -44.541752 -20.458248 0.0013632
## Potatoes-Pizza -4.5 -16.541752 7.541752 0.5035352
## Quinoa-Pizza 0.5 -11.541752 12.541752 0.9979995
## Quinoa-Potatoes 5.0 -7.041752 17.041752 0.4317840
# The relationships with p-values below 0.05 indicate statistically significant differences between the means of the two groups. In this case, pizza and pasta, potatoes and pasta, and quina and pasta, all have statistically significant differences, which influenced the results of the ANOVA test in the previous question. These results make sense when just visually comparing the means (in number 3), as the mean of carbs in pasta seems to be significantly greater than the means of carbs in the 3 other food groups.
#NUMBER 6: Plotting Outcomes
plot(results, xlab = "Food Types", ylab = "Carbs" )

# Note that on the chart, Pasta = 1.0, Pizza = 2.0, Potatoes = 3.0, Quinoa = 4.0