This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
9.S.10: As part of the study of the inheritance pattern of cowpea plants, geneticists classified the plants in one experiment according to whether the plants had one leaf or three. The data follow:
Leaves = matrix(c(1, 74, 3, 61), nrow=2)
colnames(Leaves)=c("", "")
rownames(Leaves)=c("Number of leaves","Number of plants")
Leaves
##
## Number of leaves 1 3
## Number of plants 74 61
Test the null hypothesis that the two types of plants occur with equal probabilities. Use a nondirectional alternative and let α=0.05.
#H0: the proportion, p, of cowpea plants with 3 leaves is equal to plants with 1 leaf; p = 0.5
#For the nondirectional alternative, we can say that deviation either upwards or downwards from equla ptobabilities can be our HA.
#HA: the proportion of plants with 3 leaves is less than or greater than that of plants with 1 leaf; p < 0.5 or p > 0.5
#P-value = Pr{Y /n ≤ pˆ|p = p0 } = Pr{Y ≤ 61|p = 0.5}
#Since Y ~ bin(135, 0.5) --> p-value = 0.3017 --> 0.1508 > 0.05 = We're unable to reject H0.
binom.test(61, 135, 0.5)
##
## Exact binomial test
##
## data: 61 and 135
## number of successes = 61, number of trials = 135, p-value = 0.3017
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.3660991 0.5397632
## sample estimates:
## probability of success
## 0.4518519
10.8.1: As part of a study of risk factors for stoke, 155 women who had experienced a hemorrhagic stroke (cases) were interviewed. For each case, a control was chosen who had not experienced a stroke; the control was matched to the case by neighborhood of residence, age, and race. Each woman was asked whether she used oral contraceptives. The data for the 155 pairs are displayed in the table. “Yes” and “No” refer to use of oral contraceptives.
Stroke_Cases_Females = matrix(c(107, 13, 30, 5), ncol=2, dimnames = list("Case" = c("No", "Yes"), "Control" = c("No", "Yes")))
Stroke_Cases_Females
## Control
## Case No Yes
## No 107 30
## Yes 13 5
To test for association between oral contraceptive use and stroke, consider only the 43 discordant pairs (pairs who answered differently) and test the hypothesis that a discordant pair is equally likely to be “yes/no” or “no/yes”. Use McNemar’s test to test the hypothesis that having a stroke is independent of use of oral contraceptives against a nondirectional alternative at α=0.05.
#The McNemar's test for paired data is extremely useful as it can get more powerful data by matching certain pieces of a 2*2 contingency table to create deeper analyses. Thus, we transform our table from one based on individual subject data points to one that uses matched pairs as the units instead.
Stroke_Cases_Females_Complete = matrix(c(107, 13, 120, 30, 5, 35, 137, 18, 155), ncol=3, dimnames = list("Case" = c("No", "Yes", "Total"), "Control" = c("No", "Yes", "Total")))
Stroke_Cases_Females_Complete
## Control
## Case No Yes Total
## No 107 30 137
## Yes 13 5 18
## Total 120 35 155
mcnemar.test(Stroke_Cases_Females)
##
## McNemar's Chi-squared test with continuity correction
##
## data: Stroke_Cases_Females
## McNemar's chi-squared = 5.9535, df = 1, p-value = 0.01469
10.S.6: Dissert Lizards regulate their body temperature by basking in the sun or moving into the shade, as required. Normally the lizards will maintain a daytime temperature of about 38ºC. When they are sick, however, they maintain a temperature about 2 to 4 degree higher – that is, a “fever”. In an experiment to see whether this fever might be beneficial, lizards were given a bacterial infection; then 36 of the animals were prevented from developing a fever by keeping them in a 38ºC enclosure, while 12 animals were kept at a temperature of 40ºC. The following table describes the mortality after 24 hours. How strongly do these results support the hypothesis that fever has survival value? Use Fisher’s exact test against a directional alternative. Let α=0.05.
#Nice table from HW #5 pset
Lizard_Temperature_Survival = matrix(c(18, 18, 36, 2, 10, 12), nrow = 3, ncol = 2)
colnames(Lizard_Temperature_Survival) = c("38ºC", "40ºC")
rownames(Lizard_Temperature_Survival) = c("Died", "Survived", "Total")
Lizard_Temperature_Survival
## 38ºC 40ºC
## Died 18 2
## Survived 18 10
## Total 36 12
#Table used for Fisher's exact test
#We want to test H0: p1 = p2 vs. HA : p1 > p2 where p1 is probability of dying amongst those who weren't allowed to get a fever (at 38ºC) and p2 is probability of dying amongst those who were allowed to get a fever (at 40ºC).
###The alternative will be "greater" in the fisher.test b/c we're looking at how things change if p1 grows and p2 becomes smaller after each round of p-value computation.
###p-value = 0.04239 --> we reject the null hypothesis b/c p-value < 0.05
Lizard_Temperature_Simple_Matrix = matrix(c(18, 18, 2, 10), nrow = 2, ncol = 2)
colnames(Lizard_Temperature_Simple_Matrix) = c("38ºC", "40ºC")
rownames(Lizard_Temperature_Simple_Matrix) = c("Died", "Survived")
Lizard_Temperature_Simple_Matrix
## 38ºC 40ºC
## Died 18 2
## Survived 18 10
fisher.test(Lizard_Temperature_Simple_Matrix, alternative="greater")
##
## Fisher's Exact Test for Count Data
##
## data: Lizard_Temperature_Simple_Matrix
## p-value = 0.04239
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 1.054505 Inf
## sample estimates:
## odds ratio
## 4.847104
##By hand we would add up the p-values obtained from 3 separate instances: (I) 18 dying at (at 38ºC) and 2 dying at (at 40ºC), (I) 19 dying at (at 38ºC) and 1 dying at (at 40ºC), (I) 20 dying at (at 38ºC) and 0 dying at (at 40ºC).
10.S.7 Consider the data from Exercise10.S.6. Analyze these data with a chi-square test. Let α=0.05.
#The Chi Square Goodness of Fit Test gives us an idea of how far away or near the observed values are from the expected values of our data. It is a useful statistic even in cases when our results are not binary. We are able to look at multiple categories from our data and look at their separate probabilities.
#We know that if a Lizard has an infection and is not allowed to increase its body temperature to 40ºC, then they have a 50% chance of survival. This is supported by the fact that out of 36 Lizards within the 36ºC category with the fever, 18 died and 18 survived.
#We also know that for those Lizards who bore a fever and were placed in the 40ºC environemnt, then they had an 83% chance of survival. This is supported by the fact that 10 out of 12 lizards in this category survived.
#After conducting the Chi-Square test after turning off the Yates correction, the p-value obtained is 0.04252 ultimately indicating that we reject the null hypothesis (H0).
Lizard_Temperature_Simple_Matrix = matrix(c(18, 18, 2, 10), nrow = 2, ncol = 2)
colnames(Lizard_Temperature_Simple_Matrix) = c("38ºC", "40ºC")
rownames(Lizard_Temperature_Simple_Matrix) = c("Died", "Survived")
Lizard_Temperature_Simple_Matrix
## 38ºC 40ºC
## Died 18 2
## Survived 18 10
chisq.test(Lizard_Temperature_Simple_Matrix, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: Lizard_Temperature_Simple_Matrix
## X-squared = 4.1143, df = 1, p-value = 0.04252