Today, we will be using an analysis of variance (ANOVA) to analyze the results of an experiment involving pigs. A study was conducted to examine the effects of various feeding diets on the growth of pigs. Twenty male pigs of uniform age and weight were randomly assigned to four different experimental groups, with each group receiving a distinct diet. After two months of being raised on these respective diets, the body weights of the pigs were recorded.
Load the data into the R. (1 Point)
pigs<-read.csv("C:/Users/Joey/Downloads/pig_weight.csv")
#run the following lines to reformat the data to correctly run the anova
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
pigs<- pigs %>%
pivot_longer(c(1:4),names_to = "Diet", values_to = "Weight")
What is the treatment in this study? (1 Point) The diet each group of pigs received
What is the response variable (1 Point)
The weight of the pigs
One pig
d)Formulate a null and alternative hypothesis about pig weights (2 points). Ho: there is no difference in the mean weight of pigs between treatments Ha: at least one mean weight is different than the others
Calculate the individual parts of the ANOVA Table table, then fill in the markdown table below the chunk. (5 Points)
#Calculate the overall mean of weight, then add a column to the dataframe containing the overall mean, repeating.
overall_mean <- mean(pigs$Weight)
overall_mean
## [1] 68.075
#Next calculate the means of the individual treatments, then add a table of those valules, repeating.
slop1<- pigs %>% filter(Diet== "Diet.1")
slop2<- pigs %>% filter(Diet== "Diet.2")
slop3<- pigs %>% filter(Diet== "Diet.3")
slop4<- pigs %>% filter(Diet== "Diet.4")
slop1_mean <- mean(pigs[pigs$Diet == "Diet.1", ]$Weight)
slop1_count <- nrow(pigs[pigs$Diet == "Diet.1", ])
slop2_mean <- mean(pigs[pigs$Diet == "Diet.2", ]$Weight)
slop2_count <- nrow(pigs[pigs$Diet == "Diet.2", ])
slop3_mean <- mean(pigs[pigs$Diet == "Diet.3", ]$Weight)
slop3_count <- nrow(pigs[pigs$Diet == "Diet.3", ])
slop4_mean <- mean(pigs[pigs$Diet == "Diet.4", ]$Weight)
slop4_count <- nrow(pigs[pigs$Diet == "Diet.4", ])
overall_mean <- mean(pigs$Weight)
overall_mean
## [1] 68.075
pigs$overall_mean <- overall_mean
mean_table <- data.frame(
Diet = c("slop1", "slop2", "slop3", "slop4"),
Diet_means = c(slop1_mean, slop2_mean, slop3_mean, slop4_mean)
)
print(mean_table)
## Diet Diet_means
## 1 slop1 64.62
## 2 slop2 71.30
## 3 slop3 73.14
## 4 slop4 63.24
#Calculate the Sum of Squares between Groups (SSB)
SSB <- nrow(slop1) * (slop1_mean - overall_mean)^2 +
nrow(slop2) * (slop2_mean - overall_mean)^2 +
nrow(slop3) * (slop3_mean - overall_mean)^2 +
nrow(slop4) * (slop4_mean - overall_mean)^2
SSB
## [1] 356.8455
#Calculate the Sum of Squares of the total
SST <- sum((pigs$Weight - overall_mean)^2)
SST
## [1] 498.4775
#calculate the Sum of Squares of the error
SSE <- SST-SSB
SSE
## [1] 141.632
#Calculate the Degrees of Freedom for the treamtment, the error and the total
k<- 4
N <- nrow(pigs)
df1 <- k-1
df2 <- N-k
df3 <- N-1
#calculate the Mean squares between the treatments
MSB <- SSB/df1
MSB
## [1] 118.9485
#Calculate the Mean squares of the error
MSE <- SSE/df2
MSE
## [1] 8.852
#Calculate the F statisitic
F_value <- MSB/MSE
F_value
## [1] 13.43747
| Source of Variation | Sum Of Squares | Degrees of Freedom | Mean Squares | |
|---|---|---|---|---|
| Between Groups | 356.8455 | 3 | 118.9485 | 13.438 |
| Error | 141.632 | 16 | 8.852 | |
| Total | 498.4775 | 19 |
Let’s now confirm our result by building a linear regression model and using the anova() function, then report the p-value (2 Points)
#Building a linear model between weight and diet.
model <- lm(Weight ~ Diet, pigs)
anova(model)
## Analysis of Variance Table
##
## Response: Weight
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 3 356.85 118.949 13.438 0.0001226 ***
## Residuals 16 141.63 8.852
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#use anova to build an anova table in R. What is the p-value? Does it match your above table?
#p value: 0.0001226
#yes
Based on the results of the anova, what conclusions can you draw from the experiment? What is the effect of diet on weight? Use the summary statistics from your calculations to jusify your response. We can conclude that since the P value is much smaller than our F Value, we can reject our null hypothesis that our means are not statistically different.