R Markdown

Prepare the neccasary package of this project and the dataset contianing the columns we are interested in.

# install.packages('dplyr')
# install.packages("moments")
# install.packages("reshape2")
# setwd("C:/Users/MacBook/Desktop/xie")
# install.packages('broom')
infusions<-read.csv("infusions_data_2002.csv")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(moments)
library(reshape2)
library(ggplot2)
mydata <- select(infusions,Tender,EQSRef,Trial.Notes,CUT,D.AGED,COOK,Carcase.No.,KEY.1,GROUP)

Introduction

The purpose of this project is to study the factors that affect tenderness. Among the variables we are interested in, we will use ANOVA to test whether the variables are related to tension. In the process of inspection, I will elaborate null hupothesis, alternative hupothesis and the conclusion in turn.

Analysis

Part 1: The primary question

At the beginning, to study whether kiweifruit enhancement treatment will affect tension. We measure the distribution of the variables.

p <- ggplot(mydata, aes(Trial.Notes,Tender ,fill=(Trial.Notes))) 
p + geom_boxplot() + coord_flip() +theme(axis.line = element_line(colour  = "black") ,panel.background = element_rect(fill = NA))

n_count = mydata %>% group_by(Trial.Notes) %>% count()
n_count
## # A tibble: 2 x 2
## # Groups:   Trial.Notes [2]
##   Trial.Notes     n
##   <fct>       <int>
## 1 Control       581
## 2 Enhanced      468
n_i = n_count %>% pull(n)
ci = n_i * rep(c(1/581, -1/468), c(1, 1))

The distribution of the dependent variable tension is checked to determine whether the dependent variable satisfies the assumption of normal distribution.

hist(mydata$Tender, probability = TRUE)

The density function does not seem to satisfy the normal distribution

plot(density(mydata$Tender))

Hence,we use shapiro test to check the p- value is lower than 0.05. The dependent variable is not satisfied the requirement of normal distribution.

shapiro.test(mydata$Tender)
## 
##  Shapiro-Wilk normality test
## 
## data:  mydata$Tender
## W = 0.97147, p-value = 1.688e-13

So, we should use kruskal test compared with ANOVA. The null hypothesis: Under different conditions, the mean value of the two groups of data is the same, that is, kiwifruit enhancement treatment will not affect the change of tendrness. The alternative lypothesis: kiwifruit enhancement treatment will affect the change of tendrness.

mydata_anova <- aov(Tender ~ factor(Trial.Notes), data = mydata)
summary(mydata_anova)
##                       Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(Trial.Notes)    1   5782    5782    16.2 6.12e-05 ***
## Residuals           1047 373696     357                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
resid_ms = sum(mydata_anova$residuals^2)/mydata_anova$df.residual
se = sqrt(resid_ms*sum((ci^2)/n_i))
ybar_i = mydata %>% group_by(Trial.Notes) %>% summarise(mean(Tender)) %>% pull()
t_stat = sum(ci * ybar_i)/se
t_stat
## [1] -4.024715

The p-value of the ANOVA is nearly 0 and test statistics is -4.02. So we reject the null hypothesis.

qqnorm(mydata_anova$residuals)
qqline(mydata_anova$residuals)

The residual distribution of ANOVA model does not satisfy the normal distribution. This also proves that ANOVA is not applicable. So we use the alternative: Kruskal Test.

kruskal.test(Tender ~ factor(Trial.Notes), data = mydata)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Tender by factor(Trial.Notes)
## Kruskal-Wallis chi-squared = 14.231, df = 1, p-value = 0.0001617

The test statistics chi-squared is 14.23 and the p-value is 0.0002 is lower than 0.05. So we can draw a conclusion that at the confidence level of 0.05, we have enough evidence to reject the original hypothesis, that is, kiwifruit enhancement will affect the value of tenderness.

Part 2: The primary question – interaction effctness

When we have got kiwifruit treatment, it will affect the tension. We also hope to find out some possible interaction effects with other variables.

mydata_anova <- aov(Tender ~ factor(Trial.Notes)*factor(COOK), data = mydata)
summary(mydata_anova)
##                                    Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(Trial.Notes)                 1   5782    5782  16.496 5.24e-05 ***
## factor(COOK)                        3   4932    1644   4.691  0.00293 ** 
## factor(Trial.Notes):factor(COOK)    3   3913    1304   3.722  0.01114 *  
## Residuals                        1041 364850     350                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These three different p-values correspond to three different null hypothese: kiwifruit, cook methd and the interaction effect of the two variables will not affect the tenderness. The alternative hypothesis is that these variables affect the tenderness. According to the result of the model, uder the confident level of 0.1, there is sufficient evidence to show the interaction affect the result variable.

interaction.plot(mydata$COOK,mydata$Trial.Notes,mydata$Tender)

This is the interaction plot with variables. We can see that the interaction effect does exist but is not very obvious.

Meanwhile, let’s continue to check whether the interaction between the two variables control method and kiwifruit will affect the tension.
The null hypothesis: the interaction effect will not affect the tenderness. The alternative hypothesis: the interaction effect will affect the tenderness.

mydata_anova <- aov(Tender ~ factor(Trial.Notes)*factor(GROUP), data = mydata)
summary(mydata_anova)
##                                     Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(Trial.Notes)                  1   5782    5782  16.233 6.01e-05 ***
## factor(GROUP)                        3   2361     787   2.209   0.0854 .  
## factor(Trial.Notes):factor(GROUP)    3    564     188   0.528   0.6632    
## Residuals                         1041 370772     356                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(mydata$GROUP,mydata$Trial.Notes,mydata$Tender)

There is no intersection in the interaction diagram and p-value = 0.66 indicates that the two variables control method and kiwifruit have no mutual influence.

Part 3: The general question

The null hypothesis: 1. The prediction cook method will not affect the tenderness 2. The prediction aged period will not affect the tenderness 3. The prediction muscles will not affect the tenderness

THe alternative hypothesis: 1. The prediction cook method will affect the tenderness 2. The prediction aged period will affect the tenderness 3. The prediction muscles will affect the tenderness

The visulization of predictor cool method:

p <- ggplot(mydata, aes(COOK,Tender ,fill=(COOK))) 
p + geom_boxplot() + coord_flip() +theme(axis.line = element_line(colour  = "black") ,panel.background = element_rect(fill = NA))

n_count = mydata %>% group_by(GROUP) %>% count()
n_count
## # A tibble: 4 x 2
## # Groups:   GROUP [4]
##   GROUP     n
##   <dbl> <int>
## 1  462.   262
## 2  462.   260
## 3  462.   262
## 4  462.   265
n_i = n_count %>% pull(n)
mydata_anova <- aov(Tender ~ factor(COOK), data = mydata)
summary(mydata_anova)
##                Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(COOK)    3   6263  2087.8   5.846 0.000585 ***
## Residuals    1045 373214   357.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
resid_ms = sum(mydata_anova$residuals^2)/mydata_anova$df.residual
se = sqrt(resid_ms*sum((c(1,1,-1,-1)^2)/n_i))
ybar_i = mydata %>% group_by(COOK) %>% summarise(mean(Tender)) %>% pull()
t_stat = sum((ci * ybar_i)/se)
t_stat
## [1] -4.873342

The test statisics is -4.87 and the p-value is 0.00059.

qqnorm(mydata_anova$residuals)
qqline(mydata_anova$residuals)

Because of the model residuals are not satisfied normal distribution. We also use the krustal test. And the chi-squared is 16.22, the p-value is 0.001. Hence, we reject the null hypothesis and we have sufficient evidences that cook method woll affect the tenderness.

kruskal.test(Tender ~ factor(COOK), data = mydata)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Tender by factor(COOK)
## Kruskal-Wallis chi-squared = 16.216, df = 3, p-value = 0.001024

The analysis of aged period. Ithe the same reason, we also use krustal test to modify the ANOVA model.

mydata_anova <- aov(Tender ~ D.AGED, data = mydata)
summary(mydata_anova)
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## D.AGED         1   4556    4556   12.72 0.000377 ***
## Residuals   1047 374921     358                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
kruskal.test(Tender ~ factor(D.AGED), data = mydata)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Tender by factor(D.AGED)
## Kruskal-Wallis chi-squared = 11.365, df = 1, p-value = 0.0007484

The results from two different models are same. The p-value is 0.0004 from ANOVA model and 0.0007 from krustal model. Hence, we reject the null hypothesis and the aged period will affect the tenderness.

The distribution of predictor muscle group.

p <- ggplot(mydata, aes(factor(GROUP),Tender ,fill=(GROUP))) 
p + geom_boxplot() + coord_flip() +theme(axis.line = element_line(colour  = "black") ,panel.background = element_rect(fill = NA))

n_count = mydata %>% group_by(GROUP) %>% count()
n_count
## # A tibble: 4 x 2
## # Groups:   GROUP [4]
##   GROUP     n
##   <dbl> <int>
## 1  462.   262
## 2  462.   260
## 3  462.   262
## 4  462.   265
n_i = n_count %>% pull(n)
mydata_anova <- aov(Tender ~ factor(GROUP), data = mydata)
summary(mydata_anova)
##                 Df Sum Sq Mean Sq F value Pr(>F)  
## factor(GROUP)    3   2331   777.1   2.153 0.0919 .
## Residuals     1045 377146   360.9                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
resid_ms = sum(mydata_anova$residuals^2)/mydata_anova$df.residual
se = sqrt(resid_ms*sum((c(1,1,-1,-1)^2)/n_i))
ybar_i = mydata %>% group_by(GROUP) %>% summarise(mean(Tender)) %>% pull()
t_stat = sum((ci * ybar_i)/se)
t_stat
## [1] 0.9202566

The result from ANOVA: the test staistics is 0.92 and the p-value is 0.09 so we do not reject the null hypothesis.

qqnorm(mydata_anova$residuals)
qqline(mydata_anova$residuals)

With the same reason: the residuals are not from normal distribution. We use krustal test to improve.

kruskal.test(Tender ~ factor(GROUP), data = mydata)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Tender by factor(GROUP)
## Kruskal-Wallis chi-squared = 6.3133, df = 3, p-value = 0.09732

The test statistics chi-squared is 6.31 and the p-value is 0.10 so we do not reject the null hypothesis.

Conclusion

With the analysis above, the differences between kiwifruit enhancement treatment, thecook method,the interaction of kiwifruit and cook method and the aged period should be mentioned and the muscle group, the interaction of kiwifruit and muscle group should not be mentioned. The slight differences from later predictors are not significant.