Exercise 8.4

# understand the data
eggs = data.frame(size=c("A","B","A","B"),piercer=c("N","N","Y","Y"), total=c(54,200,60,70), broken=c(4,15,4,1), cracked=c(8,28,9,7))
eggs
##   size piercer total broken cracked
## 1    A       N    54      4       8
## 2    B       N   200     15      28
## 3    A       Y    60      4       9
## 4    B       Y    70      1       7
# make a dataframe in a grid.
eggs_grid = expand.grid(size=c("A","B"),piercer=c("N","Y"), result=c("broken","cracked","ok"))
# expland.grid: create a dataframe from all combinations of the supplied vectors or factors
eggs_grid$n = c(4,15,4,1,  8,28,9,7,  42, 157, 47, 62)   # add numbers to the grid.
eggs_grid
##    size piercer  result   n
## 1     A       N  broken   4
## 2     B       N  broken  15
## 3     A       Y  broken   4
## 4     B       Y  broken   1
## 5     A       N cracked   8
## 6     B       N cracked  28
## 7     A       Y cracked   9
## 8     B       Y cracked   7
## 9     A       N      ok  42
## 10    B       N      ok 157
## 11    A       Y      ok  47
## 12    B       Y      ok  62
## does the piercer affect "broken-or-cracked" regardless of egg size
## get subset of dataframe by multiple conditions
notPiercer_broken = eggs_grid[which(eggs_grid$piercer=="N" & eggs_grid$result=="broken"),]
notPiercer_cracked = eggs_grid[which(eggs_grid$piercer=="N" & eggs_grid$result=="cracked"),]
notPiercer_ok = eggs_grid[which(eggs_grid$piercer=="N" & eggs_grid$result=="ok"),]

yesPiercer_broken = eggs_grid[which(eggs_grid$piercer=="Y" & eggs_grid$result=="broken"),]
yesPiercer_cracked = eggs_grid[which(eggs_grid$piercer=="Y" & eggs_grid$result=="cracked"),]
yesPiercer_ok = eggs_grid[which(eggs_grid$piercer=="Y" & eggs_grid$result=="ok"),]

#make a matrix
piercerOkBrokenCracked = matrix(c(sum(notPiercer_ok$n), sum(yesPiercer_ok$n), 
                                    sum(notPiercer_broken$n),  sum(yesPiercer_broken$n),
                                sum(notPiercer_cracked$n), sum(yesPiercer_cracked$n)), nrow=2)

dimnames(piercerOkBrokenCracked) <- list(c("notPiercer", "yesPiercer"), c("ok", "broken", "cracked"))
piercerOkBrokenCracked
##             ok broken cracked
## notPiercer 199     19      36
## yesPiercer 109      5      16
# marginal analysis for pierced vs. ok or broken or cracked 
fisher.test(piercerOkBrokenCracked)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  piercerOkBrokenCracked
## p-value = 0.3211
## alternative hypothesis: two.sided
chisq.test(piercerOkBrokenCracked)
## 
##  Pearson's Chi-squared test
## 
## data:  piercerOkBrokenCracked
## X-squared = 2.3623, df = 2, p-value = 0.3069

Discussion

In both fisher.test and chisq.test, the p-values are 0.3211 and 0.3069, respectively. These p-values are lareger than 0.05. Therefore, we accept the null hypothesis; it seems that the piercer doesn’t affect how many eggs are broken or cracked during boiling.

One cell in the table has the frequency 5. This frequency is so small, so Fisher Exact test will be more appropriate than chisq test. Therefore, p-values between fisher.test and chisq.test are a little different, 0.3211 and 0.3069

# marginal analysis for pierced vs. ok or not_ok
notPiercer_brokencracked=sum(notPiercer_broken$n)+sum(notPiercer_cracked$n)
yesPiercer_brokencracked=sum(yesPiercer_broken$n)+sum(yesPiercer_cracked$n)

piercerOk_notOk = matrix(c(notPiercer_brokencracked, sum(notPiercer_ok$n), 
                          yesPiercer_brokencracked, sum(yesPiercer_ok$n)), nrow=2)
dimnames(piercerOk_notOk) <- list(c("Broken&Cracked", "Ok"), c("notPiercer", "yesPiercer"))
piercerOk_notOk
##                notPiercer yesPiercer
## Broken&Cracked         55         21
## Ok                    199        109
fisher.test(piercerOk_notOk)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  piercerOk_notOk
## p-value = 0.2247
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.8029372 2.6337180
## sample estimates:
## odds ratio 
##   1.433252
chisq.test(piercerOk_notOk)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  piercerOk_notOk
## X-squared = 1.3103, df = 1, p-value = 0.2523

Discussion

This time, I considered broken and cracked as one category of damaged eggs. And, I examined whether the piercer has an effect on egg’s damage during boiling. The odds ratio is 1.433; it seems that the piercer has some effect to prevent egg’s damage during boiling. Let’s see the p-value of fisher.test and chisq.test. The p-values in both tests decreased compared to the previous test that I considered ‘broken’ and ‘cracked’ separately; 0.2247 in fisher.test; 0.2523 in chisq.test. But, still, the p-values are larger than 0.05. So, I still accept the Null hypothesis; piercer doesn’t seem to affect to prevent egg’s damage during boiling.

Slice analysis when the egg size==A or B.

This time, I consider egg size A and B are different factors when examining the piercer’s effect on egg’s breaking&cracking.

# Slice analysis when size==A

A_notPiercer_broken = eggs_grid[which(eggs_grid$size=="A" & eggs_grid$piercer=="N" & eggs_grid$result=="broken"),]
A_notPiercer_cracked = eggs_grid[which(eggs_grid$size=="A" & eggs_grid$piercer=="N" & eggs_grid$result=="cracked"),]
A_notPiercer_ok = eggs_grid[which(eggs_grid$size=="A" & eggs_grid$piercer=="N" & eggs_grid$result=="ok"),]

A_yesPiercer_broken = eggs_grid[which(eggs_grid$size=="A" & eggs_grid$piercer=="Y" & eggs_grid$result=="broken"),]
A_yesPiercer_cracked = eggs_grid[which(eggs_grid$size=="A" & eggs_grid$piercer=="Y" & eggs_grid$result=="cracked"),]
A_yesPiercer_ok = eggs_grid[which(eggs_grid$size=="A" & eggs_grid$piercer=="Y" & eggs_grid$result=="ok"),]


# marginal analysis in slice A for pierced vs. ok or not_ok 
A_notPiercer_brokencracked=sum(A_notPiercer_broken$n)+sum(A_notPiercer_cracked$n)
A_yesPiercer_brokencracked=sum(A_yesPiercer_broken$n)+sum(A_yesPiercer_cracked$n)

A_piercerOk_notOk = matrix(c(A_notPiercer_brokencracked, sum(A_notPiercer_ok$n), 
                          A_yesPiercer_brokencracked, sum(A_yesPiercer_ok$n)), nrow=2)
dimnames(A_piercerOk_notOk) <- list(c("A_Broken&Cracked", "A_Ok"), c("A_notPiercer", "A_yesPiercer"))
A_piercerOk_notOk
##                  A_notPiercer A_yesPiercer
## A_Broken&Cracked           12           13
## A_Ok                       42           47
fisher.test(A_piercerOk_notOk)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  A_piercerOk_notOk
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.3837906 2.7608613
## sample estimates:
## odds ratio 
##   1.032686
chisq.test(A_piercerOk_notOk)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  A_piercerOk_notOk
## X-squared = 5.0627e-31, df = 1, p-value = 1

Discussion

In the egg size==A, the odds ratio is close to 1, and the p-values in fisher.test and chisq.test is 1. These results show that the piercer doesn’t look effective at all, and we need to strongly accept the null hypothesis.

# Slice analysis when size==B
B_notPiercer_broken = eggs_grid[which(eggs_grid$size=="B" & eggs_grid$piercer=="N" & eggs_grid$result=="broken"),]
B_notPiercer_cracked = eggs_grid[which(eggs_grid$size=="B" & eggs_grid$piercer=="N" & eggs_grid$result=="cracked"),]
B_notPiercer_ok = eggs_grid[which(eggs_grid$size=="B" & eggs_grid$piercer=="N" & eggs_grid$result=="ok"),]

B_yesPiercer_broken = eggs_grid[which(eggs_grid$size=="B" & eggs_grid$piercer=="Y" & eggs_grid$result=="broken"),]
B_yesPiercer_cracked = eggs_grid[which(eggs_grid$size=="B" & eggs_grid$piercer=="Y" & eggs_grid$result=="cracked"),]
B_yesPiercer_ok = eggs_grid[which(eggs_grid$size=="B" & eggs_grid$piercer=="Y" & eggs_grid$result=="ok"),]

# marginal analysis in slice B for pierced vs. ok or not_ok 
B_notPiercer_brokencracked=sum(B_notPiercer_broken$n)+sum(B_notPiercer_cracked$n)
B_yesPiercer_brokencracked=sum(B_yesPiercer_broken$n)+sum(B_yesPiercer_cracked$n)

B_piercerOk_notOk = matrix(c(B_notPiercer_brokencracked, sum(B_notPiercer_ok$n), 
                          B_yesPiercer_brokencracked, sum(B_yesPiercer_ok$n)), nrow=2)
dimnames(B_piercerOk_notOk) <- list(c("B_Broken&Cracked", "B_Ok"), c("B_notPiercer", "B_yesPiercer"))
B_piercerOk_notOk
##                  B_notPiercer B_yesPiercer
## B_Broken&Cracked           43            8
## B_Ok                      157           62
fisher.test(B_piercerOk_notOk)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  B_piercerOk_notOk
## p-value = 0.07609
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.9154887 5.5159171
## sample estimates:
## odds ratio 
##    2.11738
chisq.test(B_piercerOk_notOk)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  B_piercerOk_notOk
## X-squared = 2.807, df = 1, p-value = 0.09385

Discussion

The fisher.test and chisq.test results in egg size==B are quite different from the result in egg size==A. The odds ratio is 2.117 and this is the biggest odds ratio that I got while performing several tests with the egg data. However, the p-value is still 0.076, which is larger than 0.05 and non-significant. Still we accept the Null hypothesis, and it seems that the piercer doesn’t affect preventing egg’s damage during boiling. However, if I experiment with more eggs of size B to increase sample size, I may be able to get significant p-value (< 0.05) and reject the Null hypothesis to conclude that the piercer seems to be effective to prevent egg’s damage during boiling.