1. Intro of ANOVA

Use one-way ANOVA to analysis the “EspressoData”. Three brew methods with a measure of the crème on the top. Find the method producing the most crème.

The statement of the research/ study purpose H0: BrewMethod1 = BrewMethod2 = BrewMethod3 H1: at least one pair of levels differ from one another
The type of analysis conducted, i.e. D’Agostino test, Scatterplot of residuals, Bartlett test. etc.
Descriptive statistics: basic information of the data, i.e. age and gender of the participants.
The ANOVA test
Post-hoc analysis
Effect size
Conclusions

setwd("E:\\mikhilesh\\HU Sem VI ANLY 510 and 506\\ANLY 510 Kao Principals and Applications\\assignment and project")
data1 <- read.csv("Lecture 2 EspressoData.csv")
names(data1)

## [1] "cereme"     "brewmethod"

str(data1)

## 'data.frame':    27 obs. of  2 variables:
##  $ cereme    : num  36.6 39.6 37.7 36 38.5 ...
##  $ brewmethod: int  1 1 1 1 1 1 1 1 1 2 ...

summary(data1)

##      cereme        brewmethod
##  Min.   :21.02   Min.   :1   
##  1st Qu.:35.66   1st Qu.:1   
##  Median :38.52   Median :2   
##  Mean   :44.47   Mean   :2   
##  3rd Qu.:55.23   3rd Qu.:3   
##  Max.   :73.19   Max.   :3

Assumptions

library(moments)
plot(density(data1$cereme))

qqnorm(data1$cereme)

agostino.test(data1$cereme) #skewness test

## 
##  D'Agostino skewness test
## 
## data:  data1$cereme
## skew = 0.54679, z = 1.32787, p-value = 0.1842
## alternative hypothesis: data have a skewness

shapiro.test(data1$cereme) #normality test

## 
##  Shapiro-Wilk normality test
## 
## data:  data1$cereme
## W = 0.92201, p-value = 0.04414

#Residual plot (lm - linear model)
cereme.lm <- lm(cereme ~ brewmethod, data = data1)
cereme.lm

## 
## Call:
## lm(formula = cereme ~ brewmethod, data = data1)
## 
## Coefficients:
## (Intercept)   brewmethod  
##       37.17         3.65

cereme.res <- resid(cereme.lm)
cereme.res

##           1           2           3           4           5           6 
##  -4.1766667  -1.1666667  -3.0766667  -4.8566667  -2.2966667 -19.7966667 
##           7           8           9          10          11          12 
## -16.0066667  -6.6366667 -17.7366667  26.3733333   2.2133333  28.7233333 
##          13          14          15          16          17          18 
##  13.3133333   4.1433333  28.3033333  20.5733333  18.0633333   9.7933333 
##          19          20          21          22          23          24 
##   8.0733333 -11.4466667 -12.7666667  -8.0066667 -14.5966667 -10.9966667 
##          25          26          27 
## -10.7866667 -15.4366667   0.2133333

plot(data1$cereme, cereme.res, ylab = "Residual", xlab = "Brew Method", main = "Qauntity of the crème")
abline(0, 0)

bartlett.test(data1$cereme, data1$brewmethod)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  data1$cereme and data1$brewmethod
## Bartlett's K-squared = 0.96331, df = 2, p-value = 0.6178

# Variance are equal. p-value = 0.6178 failed to reject the null hypothesis

#Checking variance among groups
tapply(data1$cereme, data1$brewmethod, var)

##         1         2         3 
##  53.29088 102.02220  59.30182

#ratio of largest variance to smallest variance 102/53 = 1.9 - is way less than 3 (signifies that there is no issue with failing to reject null hypothesis)

#After checking all the assumptions fulfill our criteria, now we will perform ANOVA

#With only one continuous “outcome” and more than two groups of categorical "input" variables  
#IV - 3 brew methods (predictor)
#DV - quantity of crème on the top (outcome)

#Perform the linear model: 
#OPTION 1 - directly summary the model
summary(aov(cereme ~ factor(brewmethod), data = data1)) #(Don't forget to factor()ize your predictor - categorical data)

##                    Df Sum Sq Mean Sq F value  Pr(>F)    
## factor(brewmethod)  2   4065  2032.6   28.41 4.7e-07 ***
## Residuals          24   1717    71.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#OR OPTION 2 - We can also create a modeland then summary that model seperately
model <- aov(aov(cereme ~ factor(brewmethod), data = data1))
model

## Call:
##    aov(formula = aov(cereme ~ factor(brewmethod), data = data1))
## 
## Terms:
##                 factor(brewmethod) Residuals
## Sum of Squares            4065.180  1716.919
## Deg. of Freedom                  2        24
## 
## Residual standard error: 8.458032
## Estimated effects may be unbalanced

summary(model)

##                    Df Sum Sq Mean Sq F value  Pr(>F)    
## factor(brewmethod)  2   4065  2032.6   28.41 4.7e-07 ***
## Residuals          24   1717    71.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#To find out what group differs from the others
library(pgirmess)

## Warning: package 'pgirmess' was built under R version 3.6.3

#Type 1 - Pairwise comparisons using t tests - Bonferroni Method
pairwise.t.test(data1$cereme, data1$brewmethod, paired = FALSE, p.adjust.method = "bonferroni") #method can be "none", "bonferroni", "holm", "hochberg", "hommel", "BH", or "BY“

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  data1$cereme and data1$brewmethod 
## 
##   1       2      
## 2 5.2e-07 -      
## 3 0.24    4.4e-05
## 
## P value adjustment method: bonferroni

#Bonferroni Method show -> group 1 significantly differs from group 2, and group 2 and group 3 significantly differs from each others, but group 1 and 3 are not significantly different.

#Type 2 - Kurskal-Wallis:
kruskalmc(cereme ~ factor(brewmethod), data = data1)

## Multiple comparison test after Kruskal-Wallis 
## p.value: 0.05 
## Comparisons
##       obs.dif critical.dif difference
## 1-2 14.666667     8.957452       TRUE
## 1-3  3.666667     8.957452      FALSE
## 2-3 11.000000     8.957452       TRUE

#Kurskal-Wallis Method show -> group 1 significantly differs from group 2, and group 2 and group 3 significantly differs from each others, but group 1 and 3 are not significantly different.


#Type 3 - Tukey’s Test:
TukeyHSD(model)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = aov(cereme ~ factor(brewmethod), data = data1))
## 
## $`factor(brewmethod)`
##      diff        lwr       upr     p adj
## 2-1  28.9  18.942931  38.85707 0.0000005
## 3-1   7.3  -2.657069  17.25707 0.1811000
## 3-2 -21.6 -31.557069 -11.64293 0.0000419

# Tuey's test show -> group 2-1 and 3-2 are significant, 3-1 are nt significantaly different

library(pastecs)

## Warning: package 'pastecs' was built under R version 3.6.3

library(compute.es)

## Warning: package 'compute.es' was built under R version 3.6.3

#First get the relevant stats for each factor level (only relevant output pasted below):
by(data1$cereme, data1$brewmethod, stat.desc)

## data1$brewmethod: 1
##      nbr.val     nbr.null       nbr.na          min          max        range 
##    9.0000000    0.0000000    0.0000000   21.0200000   39.6500000   18.6300000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##  291.6000000   35.9600000   32.4000000    2.4333533    5.6113228   53.2908750 
##      std.dev     coef.var 
##    7.3000599    0.2253105 
## ------------------------------------------------------------ 
## data1$brewmethod: 2
##      nbr.val     nbr.null       nbr.na          min          max        range 
##    9.0000000    0.0000000    0.0000000   46.6800000   73.1900000   26.5100000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##  551.7000000   62.5300000   61.3000000    3.3668680    7.7640115  102.0222000 
##      std.dev     coef.var 
##   10.1006039    0.1647733 
## ------------------------------------------------------------ 
## data1$brewmethod: 3
##      nbr.val     nbr.null       nbr.na          min          max        range 
##     9.000000     0.000000     0.000000    32.680000    56.190000    23.510000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##   357.300000    37.120000    39.700000     2.566923     5.919334    59.301825 
##      std.dev     coef.var 
##     7.700768     0.193974

#n - sample size, M - mean, SD - standard deviation 
#brewmethod 1 (n = 9, M = 32.4 , SD = 7.3) 
#brewmethod 2 (n = 9, M = 61.3, SD = 10.1)
#brewmethod 3 (n = 9, M = 39.7, SD = 7.7)

#Use these values to compute a few standard Measures of Effect Size (MES or mes) for any pair of interest 

#OPtion 1 will compare brewmethod 1 and 2 ~ cereme vs brewmwthod
mes(61.3, 32.4, 10.1, 7.3, 9, 9)

## Mean Differences ES: 
##  
##  d [ 95 %CI] = 3.28 [ 1.86 , 4.69 ] 
##   var(d) = 0.52 
##   p-value(d) = 0 
##   U3(d) = 99.95 % 
##   CLES(d) = 98.98 % 
##   Cliff's Delta = 0.98 
##  
##  g [ 95 %CI] = 3.12 [ 1.78 , 4.47 ] 
##   var(g) = 0.47 
##   p-value(g) = 0 
##   U3(g) = 99.91 % 
##   CLES(g) = 98.64 % 
##  
##  Correlation ES: 
##  
##  r [ 95 %CI] = 0.87 [ 0.67 , 0.95 ] 
##   var(r) = 0 
##   p-value(r) = 0 
##  
##  z [ 95 %CI] = 1.32 [ 0.81 , 1.83 ] 
##   var(z) = 0.07 
##   p-value(z) = 0 
##  
##  Odds Ratio ES: 
##  
##  OR [ 95 %CI] = 383.22 [ 29.45 , 4987.18 ] 
##   p-value(OR) = 0 
##  
##  Log OR [ 95 %CI] = 5.95 [ 3.38 , 8.51 ] 
##   var(lOR) = 1.71 
##   p-value(Log OR) = 0 
##  
##  Other: 
##  
##  NNT = 1.26 
##  Total N = 18

#OPtion 2 will compare brewmethod 2 and 3 ~ cereme vs brewmwthod
mes(39.7, 61.3, 7.7, 10.1, 9, 9)

## Mean Differences ES: 
##  
##  d [ 95 %CI] = -2.41 [ -3.62 , -1.19 ] 
##   var(d) = 0.38 
##   p-value(d) = 0 
##   U3(d) = 0.81 % 
##   CLES(d) = 4.45 % 
##   Cliff's Delta = -0.91 
##  
##  g [ 95 %CI] = -2.29 [ -3.45 , -1.14 ] 
##   var(g) = 0.35 
##   p-value(g) = 0 
##   U3(g) = 1.1 % 
##   CLES(g) = 5.26 % 
##  
##  Correlation ES: 
##  
##  r [ 95 %CI] = -0.79 [ -0.92 , -0.51 ] 
##   var(r) = 0.01 
##   p-value(r) = 0 
##  
##  z [ 95 %CI] = -1.06 [ -1.57 , -0.56 ] 
##   var(z) = 0.07 
##   p-value(z) = 0 
##  
##  Odds Ratio ES: 
##  
##  OR [ 95 %CI] = 0.01 [ 0 , 0.12 ] 
##   p-value(OR) = 0 
##  
##  Log OR [ 95 %CI] = -4.36 [ -6.56 , -2.16 ] 
##   var(lOR) = 1.26 
##   p-value(Log OR) = 0 
##  
##  Other: 
##  
##  NNT = -5.01 
##  Total N = 18

#OPtion 3 will compare brewmethod 1 and 3 ~ cereme vs brewmwthod
mes(39.7, 32.4, 7.7, 7.3, 9, 9)

## Mean Differences ES: 
##  
##  d [ 95 %CI] = 0.97 [ 0 , 1.95 ] 
##   var(d) = 0.25 
##   p-value(d) = 0.07 
##   U3(d) = 83.47 % 
##   CLES(d) = 75.43 % 
##   Cliff's Delta = 0.51 
##  
##  g [ 95 %CI] = 0.93 [ 0 , 1.86 ] 
##   var(g) = 0.23 
##   p-value(g) = 0.07 
##   U3(g) = 82.29 % 
##   CLES(g) = 74.38 % 
##  
##  Correlation ES: 
##  
##  r [ 95 %CI] = 0.46 [ -0.01 , 0.76 ] 
##   var(r) = 0.03 
##   p-value(r) = 0.07 
##  
##  z [ 95 %CI] = 0.5 [ -0.01 , 1 ] 
##   var(z) = 0.07 
##   p-value(z) = 0.07 
##  
##  Odds Ratio ES: 
##  
##  OR [ 95 %CI] = 5.84 [ 0.99 , 34.36 ] 
##   p-value(OR) = 0.07 
##  
##  Log OR [ 95 %CI] = 1.76 [ -0.01 , 3.54 ] 
##   var(lOR) = 0.82 
##   p-value(Log OR) = 0.07 
##  
##  Other: 
##  
##  NNT = 2.84 
##  Total N = 18

#Effect size  (more than or equal to) < = 0.1 is small, 0.25 is medium, 0.4 is large (Cohen, 1988)

#Summary of ANOVA:
Observations from the study were analyzed by conducting a one-way analysis of variance using R version 3.6.1. First, all assumptions are met and there is no adjustment made. Results suggest that the brew method 2 has an influence on measure of the crème on the top of the expresso (F(2, 24) = 28.41, p < .001) Continuing the discussion with specifically which brew method produced the signiificantaly differed measures of the crème on the top of the expresso, a Tukey’s hoc test was established. The result suggested that there is a significant difference between brew method 1 and 2 (p-value < 0.001), [and brew method 2 and 3 (p-value < 0.001)] in terms of the measures of the crème on the top of the expresso. The effect was large, Cohen’ D = 3.28 for 1 ad 2, and Cohen’s D = 2.41 for method 2 and 3].

1. Intro of ANOVA

Mikhilesh Dehane

5/6/2020