Alternative Hypothesis

Brewing techniques have effect on the amount of crème produced.

3. Code

library(readr)
espresso <- read_csv("EspressoData.csv")

## Rows: 27 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): cereme, brewmethod
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(espresso)

## spc_tbl_ [27 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ cereme    : num [1:27] 36.6 39.6 37.7 36 38.5 ...
##  $ brewmethod: num [1:27] 1 1 1 1 1 1 1 1 1 2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   cereme = col_double(),
##   ..   brewmethod = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

summary(espresso)

##      cereme        brewmethod
##  Min.   :21.02   Min.   :1   
##  1st Qu.:35.66   1st Qu.:1   
##  Median :38.52   Median :2   
##  Mean   :44.47   Mean   :2   
##  3rd Qu.:55.23   3rd Qu.:3   
##  Max.   :73.19   Max.   :3

Checking if the assumptions for ANOVA test hold for the data.

Is data is normally distributed?
Is there is any skewness in the data?

## 
##  D'Agostino skewness test
## 
## data:  espresso$cereme
## skew = 0.54679, z = 1.32787, p-value = 0.1842
## alternative hypothesis: data have a skewness

From the Agostino test we can see that the p-value is >0.05 and reject the alternate hypothesis that the data has skewness.

tapply(espresso$cereme, espresso$brewmethod, var)

##         1         2         3 
##  53.29088 102.02220  59.30182

Now that we have checked all the assumptions of ANOVA, let us run the anova test. There is one continuous variable and a categorical variable that has more than two values.

summary(aov(cereme ~ factor(brewmethod), data = espresso))

##                    Df Sum Sq Mean Sq F value  Pr(>F)    
## factor(brewmethod)  2   4065  2032.6   28.41 4.7e-07 ***
## Residuals          24   1717    71.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see that there is high corelation between the brewing method and the amount of cereme.

Now that we have understood that there is corelation between the two, let us see how each brewing techniques relate to each other. This can be done using pairwise t-

library(pgirmess)

## Warning: package 'pgirmess' was built under R version 4.3.2

pairwise.t.test(espresso$cereme, espresso$brewmethod, p.adjust.method = "bonferroni", paired = FALSE)

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  espresso$cereme and espresso$brewmethod 
## 
##   1       2      
## 2 5.2e-07 -      
## 3 0.24    4.4e-05
## 
## P value adjustment method: bonferroni

aggregate(espresso[, 1], list(espresso$brewmethod), mean)

##   Group.1 cereme
## 1       1   32.4
## 2       2   61.3
## 3       3   39.7

We can see that the brewing techniques 1 and 3 are similar but are significantly different from 2. But we still don’t know which technique yields higher creme.

If we take the mean after grouping them by the brewing techniques, we can see that the second brewing technique has a mean of 61.3 creme level while 1 and 2 have only 32.4 and 39.7 respectively. Also each one of them have 9 data points and the variance are similar.

Summary

For ANOVA, the Espresso data holds all the assumptions. Based on the analysis, it appears that the second brewing technique produces more crème. Additionally, brewing techniques 1&2 and 2&3 differ tremendously, but 1&3 don’t differ much from one another.

Intro of ANOVA

Jagruti Garg

2024-07-28

1. Research Question

2. Hypothesis

Null Hypothesis

Alternative Hypothesis

3. Code

Checking if the assumptions for ANOVA test hold for the data.

Summary