Today, we will be using an analysis of variance (ANOVA) to analyze the results of an experiment involving pigs. A study was conducted to examine the effects of various feeding diets on the growth of pigs. Twenty male pigs of uniform age and weight were randomly assigned to four different experimental groups, with each group receiving a distinct diet. After two months of being raised on these respective diets, the body weights of the pigs were recorded.

Part 1

Load the data into the R. (1 Point)

pigs<-read.csv("C:/Users/Joey/Downloads/pig_weight.csv")

#run the following lines to reformat the data to correctly run the anova
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
pigs<- pigs %>%
  pivot_longer(c(1:4),names_to = "Diet", values_to = "Weight")
  1. What is the treatment in this study? (1 Point) The diet each group of pigs received

  2. What is the response variable (1 Point)

The weight of the pigs

  1. What is the experimental unit (EU) in this study (1 Point)

One pig

d)Formulate a null and alternative hypothesis about pig weights (2 points). Ho: there is no difference in the mean weight of pigs between treatments Ha: at least one mean weight is different than the others

Part 2

Calculate the individual parts of the ANOVA Table table, then fill in the markdown table below the chunk. (5 Points)

#Calculate the overall mean of weight, then add a column to the dataframe containing the overall mean, repeating. 
overall_mean <- mean(pigs$Weight)
overall_mean
## [1] 68.075
#Next calculate the means of the individual treatments, then add a table of those valules, repeating.
slop1<- pigs %>% filter(Diet== "Diet.1")  
slop2<- pigs %>% filter(Diet== "Diet.2") 
slop3<- pigs %>% filter(Diet== "Diet.3") 
slop4<- pigs %>% filter(Diet== "Diet.4") 

slop1_mean <- mean(pigs[pigs$Diet == "Diet.1", ]$Weight)
slop1_count <- nrow(pigs[pigs$Diet == "Diet.1", ])
slop2_mean <- mean(pigs[pigs$Diet == "Diet.2", ]$Weight)
slop2_count <- nrow(pigs[pigs$Diet == "Diet.2", ])
slop3_mean <- mean(pigs[pigs$Diet == "Diet.3", ]$Weight)
slop3_count <- nrow(pigs[pigs$Diet == "Diet.3", ])
slop4_mean <- mean(pigs[pigs$Diet == "Diet.4", ]$Weight)
slop4_count <- nrow(pigs[pigs$Diet == "Diet.4", ])

overall_mean <- mean(pigs$Weight)
overall_mean
## [1] 68.075
pigs$overall_mean <- overall_mean


mean_table <- data.frame(
  Diet = c("slop1", "slop2", "slop3", "slop4"),
  Diet_means = c(slop1_mean, slop2_mean, slop3_mean, slop4_mean)
)
print(mean_table)
##    Diet Diet_means
## 1 slop1      64.62
## 2 slop2      71.30
## 3 slop3      73.14
## 4 slop4      63.24
#Calculate the Sum of Squares between Groups (SSB)
SSB <- nrow(slop1) * (slop1_mean - overall_mean)^2 +
       nrow(slop2) * (slop2_mean - overall_mean)^2 +
       nrow(slop3) * (slop3_mean - overall_mean)^2 +
       nrow(slop4) * (slop4_mean - overall_mean)^2    
SSB
## [1] 356.8455
#Calculate the Sum of Squares of the total
SST <- sum((pigs$Weight - overall_mean)^2)
SST
## [1] 498.4775
#calculate the Sum of Squares of the error
SSE <- SST-SSB
SSE
## [1] 141.632
#Calculate the Degrees of Freedom for the treamtment, the error and the total
k<- 4
N <- nrow(pigs)
df1 <- k-1
df2 <- N-k
df3 <- N-1
#calculate the Mean squares between the treatments
MSB <- SSB/df1
MSB
## [1] 118.9485
#Calculate the Mean squares of the error
MSE <- SSE/df2
MSE
## [1] 8.852
#Calculate the F statisitic
F_value <- MSB/MSE
F_value
## [1] 13.43747
Source of Variation Sum Of Squares Degrees of Freedom Mean Squares
Between Groups 356.8455 3 118.9485 13.438
Error 141.632 16 8.852
Total 498.4775 19

Let’s now confirm our result by building a linear regression model and using the anova() function, then report the p-value (2 Points)

#Building a linear model between weight and diet. 
model <- lm(Weight ~ Diet, pigs)

anova(model)
## Analysis of Variance Table
## 
## Response: Weight
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Diet       3 356.85 118.949  13.438 0.0001226 ***
## Residuals 16 141.63   8.852                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#use anova to build an anova table in R. What is the p-value? Does it match your above table?
#p value: 0.0001226
#yes

Part 3

Based on the results of the anova, what conclusions can you draw from the experiment? What is the effect of diet on weight? Use the summary statistics from your calculations to jusify your response. We can conclude that since the P value is much smaller than our F Value, we can reject our null hypothesis that our means are not statistically different.