Prepare the neccasary package of this project and the dataset contianing the columns we are interested in.
# install.packages('dplyr')
# install.packages("moments")
# install.packages("reshape2")
# setwd("C:/Users/MacBook/Desktop/xie")
# install.packages('broom')
infusions<-read.csv("infusions_data_2002.csv")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(moments)
library(reshape2)
library(ggplot2)
mydata <- select(infusions,Tender,EQSRef,Trial.Notes,CUT,D.AGED,COOK,Carcase.No.,KEY.1,GROUP)
The purpose of this project is to study the factors that affect tenderness. Among the variables we are interested in, we will use ANOVA to test whether the variables are related to tension. In the process of inspection, I will elaborate null hupothesis, alternative hupothesis and the conclusion in turn.
At the beginning, to study whether kiweifruit enhancement treatment will affect tension. We measure the distribution of the variables.
p <- ggplot(mydata, aes(Trial.Notes,Tender ,fill=(Trial.Notes)))
p + geom_boxplot() + coord_flip() +theme(axis.line = element_line(colour = "black") ,panel.background = element_rect(fill = NA))
n_count = mydata %>% group_by(Trial.Notes) %>% count()
n_count
## # A tibble: 2 x 2
## # Groups: Trial.Notes [2]
## Trial.Notes n
## <fct> <int>
## 1 Control 581
## 2 Enhanced 468
n_i = n_count %>% pull(n)
ci = n_i * rep(c(1/581, -1/468), c(1, 1))
The distribution of the dependent variable tension is checked to determine whether the dependent variable satisfies the assumption of normal distribution.
hist(mydata$Tender, probability = TRUE)
The density function does not seem to satisfy the normal distribution
plot(density(mydata$Tender))
Hence,we use shapiro test to check the p- value is lower than 0.05. The dependent variable is not satisfied the requirement of normal distribution.
shapiro.test(mydata$Tender)
##
## Shapiro-Wilk normality test
##
## data: mydata$Tender
## W = 0.97147, p-value = 1.688e-13
So, we should use kruskal test compared with ANOVA. The null hypothesis: Under different conditions, the mean value of the two groups of data is the same, that is, kiwifruit enhancement treatment will not affect the change of tendrness. The alternative lypothesis: kiwifruit enhancement treatment will affect the change of tendrness.
mydata_anova <- aov(Tender ~ factor(Trial.Notes), data = mydata)
summary(mydata_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(Trial.Notes) 1 5782 5782 16.2 6.12e-05 ***
## Residuals 1047 373696 357
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
resid_ms = sum(mydata_anova$residuals^2)/mydata_anova$df.residual
se = sqrt(resid_ms*sum((ci^2)/n_i))
ybar_i = mydata %>% group_by(Trial.Notes) %>% summarise(mean(Tender)) %>% pull()
t_stat = sum(ci * ybar_i)/se
t_stat
## [1] -4.024715
The p-value of the ANOVA is nearly 0 and test statistics is -4.02. So we reject the null hypothesis.
qqnorm(mydata_anova$residuals)
qqline(mydata_anova$residuals)
The residual distribution of ANOVA model does not satisfy the normal distribution. This also proves that ANOVA is not applicable. So we use the alternative: Kruskal Test.
kruskal.test(Tender ~ factor(Trial.Notes), data = mydata)
##
## Kruskal-Wallis rank sum test
##
## data: Tender by factor(Trial.Notes)
## Kruskal-Wallis chi-squared = 14.231, df = 1, p-value = 0.0001617
The test statistics chi-squared is 14.23 and the p-value is 0.0002 is lower than 0.05. So we can draw a conclusion that at the confidence level of 0.05, we have enough evidence to reject the original hypothesis, that is, kiwifruit enhancement will affect the value of tenderness.
When we have got kiwifruit treatment, it will affect the tension. We also hope to find out some possible interaction effects with other variables.
mydata_anova <- aov(Tender ~ factor(Trial.Notes)*factor(COOK), data = mydata)
summary(mydata_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(Trial.Notes) 1 5782 5782 16.496 5.24e-05 ***
## factor(COOK) 3 4932 1644 4.691 0.00293 **
## factor(Trial.Notes):factor(COOK) 3 3913 1304 3.722 0.01114 *
## Residuals 1041 364850 350
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These three different p-values correspond to three different null hypothese: kiwifruit, cook methd and the interaction effect of the two variables will not affect the tenderness. The alternative hypothesis is that these variables affect the tenderness. According to the result of the model, uder the confident level of 0.1, there is sufficient evidence to show the interaction affect the result variable.
interaction.plot(mydata$COOK,mydata$Trial.Notes,mydata$Tender)
This is the interaction plot with variables. We can see that the interaction effect does exist but is not very obvious.
Meanwhile, let’s continue to check whether the interaction between the two variables control method and kiwifruit will affect the tension.
The null hypothesis: the interaction effect will not affect the tenderness. The alternative hypothesis: the interaction effect will affect the tenderness.
mydata_anova <- aov(Tender ~ factor(Trial.Notes)*factor(GROUP), data = mydata)
summary(mydata_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(Trial.Notes) 1 5782 5782 16.233 6.01e-05 ***
## factor(GROUP) 3 2361 787 2.209 0.0854 .
## factor(Trial.Notes):factor(GROUP) 3 564 188 0.528 0.6632
## Residuals 1041 370772 356
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(mydata$GROUP,mydata$Trial.Notes,mydata$Tender)
There is no intersection in the interaction diagram and p-value = 0.66 indicates that the two variables control method and kiwifruit have no mutual influence.
The null hypothesis: 1. The prediction cook method will not affect the tenderness 2. The prediction aged period will not affect the tenderness 3. The prediction muscles will not affect the tenderness
THe alternative hypothesis: 1. The prediction cook method will affect the tenderness 2. The prediction aged period will affect the tenderness 3. The prediction muscles will affect the tenderness
The visulization of predictor cool method:
p <- ggplot(mydata, aes(COOK,Tender ,fill=(COOK)))
p + geom_boxplot() + coord_flip() +theme(axis.line = element_line(colour = "black") ,panel.background = element_rect(fill = NA))
n_count = mydata %>% group_by(GROUP) %>% count()
n_count
## # A tibble: 4 x 2
## # Groups: GROUP [4]
## GROUP n
## <dbl> <int>
## 1 462. 262
## 2 462. 260
## 3 462. 262
## 4 462. 265
n_i = n_count %>% pull(n)
mydata_anova <- aov(Tender ~ factor(COOK), data = mydata)
summary(mydata_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(COOK) 3 6263 2087.8 5.846 0.000585 ***
## Residuals 1045 373214 357.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
resid_ms = sum(mydata_anova$residuals^2)/mydata_anova$df.residual
se = sqrt(resid_ms*sum((c(1,1,-1,-1)^2)/n_i))
ybar_i = mydata %>% group_by(COOK) %>% summarise(mean(Tender)) %>% pull()
t_stat = sum((ci * ybar_i)/se)
t_stat
## [1] -4.873342
The test statisics is -4.87 and the p-value is 0.00059.
qqnorm(mydata_anova$residuals)
qqline(mydata_anova$residuals)
Because of the model residuals are not satisfied normal distribution. We also use the krustal test. And the chi-squared is 16.22, the p-value is 0.001. Hence, we reject the null hypothesis and we have sufficient evidences that cook method woll affect the tenderness.
kruskal.test(Tender ~ factor(COOK), data = mydata)
##
## Kruskal-Wallis rank sum test
##
## data: Tender by factor(COOK)
## Kruskal-Wallis chi-squared = 16.216, df = 3, p-value = 0.001024
The analysis of aged period. Ithe the same reason, we also use krustal test to modify the ANOVA model.
mydata_anova <- aov(Tender ~ D.AGED, data = mydata)
summary(mydata_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## D.AGED 1 4556 4556 12.72 0.000377 ***
## Residuals 1047 374921 358
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
kruskal.test(Tender ~ factor(D.AGED), data = mydata)
##
## Kruskal-Wallis rank sum test
##
## data: Tender by factor(D.AGED)
## Kruskal-Wallis chi-squared = 11.365, df = 1, p-value = 0.0007484
The results from two different models are same. The p-value is 0.0004 from ANOVA model and 0.0007 from krustal model. Hence, we reject the null hypothesis and the aged period will affect the tenderness.
The distribution of predictor muscle group.
p <- ggplot(mydata, aes(factor(GROUP),Tender ,fill=(GROUP)))
p + geom_boxplot() + coord_flip() +theme(axis.line = element_line(colour = "black") ,panel.background = element_rect(fill = NA))
n_count = mydata %>% group_by(GROUP) %>% count()
n_count
## # A tibble: 4 x 2
## # Groups: GROUP [4]
## GROUP n
## <dbl> <int>
## 1 462. 262
## 2 462. 260
## 3 462. 262
## 4 462. 265
n_i = n_count %>% pull(n)
mydata_anova <- aov(Tender ~ factor(GROUP), data = mydata)
summary(mydata_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(GROUP) 3 2331 777.1 2.153 0.0919 .
## Residuals 1045 377146 360.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
resid_ms = sum(mydata_anova$residuals^2)/mydata_anova$df.residual
se = sqrt(resid_ms*sum((c(1,1,-1,-1)^2)/n_i))
ybar_i = mydata %>% group_by(GROUP) %>% summarise(mean(Tender)) %>% pull()
t_stat = sum((ci * ybar_i)/se)
t_stat
## [1] 0.9202566
The result from ANOVA: the test staistics is 0.92 and the p-value is 0.09 so we do not reject the null hypothesis.
qqnorm(mydata_anova$residuals)
qqline(mydata_anova$residuals)
With the same reason: the residuals are not from normal distribution. We use krustal test to improve.
kruskal.test(Tender ~ factor(GROUP), data = mydata)
##
## Kruskal-Wallis rank sum test
##
## data: Tender by factor(GROUP)
## Kruskal-Wallis chi-squared = 6.3133, df = 3, p-value = 0.09732
The test statistics chi-squared is 6.31 and the p-value is 0.10 so we do not reject the null hypothesis.
With the analysis above, the differences between kiwifruit enhancement treatment, thecook method,the interaction of kiwifruit and cook method and the aged period should be mentioned and the muscle group, the interaction of kiwifruit and muscle group should not be mentioned. The slight differences from later predictors are not significant.