Task 1

Question 1

I would calculate mean number of seeds that germinated for the wild type, and assume normality. I would then calculate the mean number of seeds that the GMO plant, and calculate the p value of the mean number of seeds of the GMO plant, to see if it is actually statistically different or likely to just be different by chance.

Question 2

d1 <- read.csv("http://faraway.neu.edu/biostats/lab3_dataset1.csv")

Question 3

mean_table = apply(d1, 2, mean)
se_table = apply(d1, 2, sd)/sqrt(apply(d1,2,length))
confidence_interval = se_table * 1.96

Question 4

plot.heights <- as.matrix(cbind(as.numeric(mean_table[2:3]),
                                as.numeric(mean_table[4:5])))

plot.heights
##      [,1] [,2]
## [1,] 80.9 19.1
## [2,] 25.7 74.3
bp <- barplot(plot.heights, 
              beside = T, 
              names = c("GMO germ","Wild germ", "GMO fail", "Wild fail"), 
              ylim = c(0,max(as.numeric(plot.heights) + 2)), 
              ylab = "Number of seeds")

arrows(y0 = plot.heights-confidence_interval[2:5], y1 = plot.heights+confidence_interval[2:5], x0 = bp, x1 = bp, angle = 90, code = 3, length = 0.1, col = "dark red")

Question 5

My extremely hard work in the lab was worth it. GMO plans both germinated at a higher percentage and at a larger volume.

Question 6

H0: The mean number of seeds that germinated is not statistically different in GMO plants than wild plants. HA: The mean number of seeds that germinated is statistically different in GMO plants than wild plants.

Question 7

sum_table = colSums(d1)[2:5]
fisher_table <- as.matrix(cbind(as.numeric(sum_table[1:2]),
                                as.numeric(sum_table[3:4])))
fisher.test(fisher_table, alternative="greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  fisher_table
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  10.18571      Inf
## sample estimates:
## odds ratio 
##   12.22468

Question 8

There is overwhelming evidence that the GMO plants helped seed germination. With a p-value of 2.2e^-16, it is extremely likely that this was not due to chance.

Task 2

Question 1

d2 <- read.csv("http://faraway.neu.edu/biostats/lab3_dataset2.csv")
head(d2)
##      countries gmo.disease gmo.nodisease nogmo.disease nogmo.nodisease
## 1        India          45            40            15              31
## 2      Vietnam          59            42            27              23
## 3       Brazil          58            44            30              31
## 4 South Africa          52            44            21              29
## 5     Cambodia          39            51            22              25
## 6  Ivory Coast          53            50            23              24

H0: There is no association between GMO and disease influence HA: There is association between GMO and disease influence

Question 2

d2_sum_table = colSums(d2[, -1])
d2_fisher_table <- as.matrix(cbind(as.numeric(d2_sum_table[1:2]),
                                as.numeric(d2_sum_table[3:4])))
fisher.test(d2_fisher_table, alternative="greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  d2_fisher_table
## p-value = 0.02954
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  1.027284      Inf
## sample estimates:
## odds ratio 
##   1.240706

Question 3

There is evidence that GMO use is associated with disease incidence. With a p value of 0.02, using an alpha of 0.05, we can reject H0 and say that there is a greater disease incidence with GMO use.

Question 4

pvals <- numeric(NROW(d2))
for (i in 1:NROW(d2)) {
  fisher_matrix = cbind(t(d2[i, 2:3]), 
                       (t(d2[i, 4:5])))
  pvals[i] <- fisher.test(fisher_matrix)$p.value
}
pvals
##  [1] 0.0288584 0.7271239 0.4170731 0.2219837 0.7204465 0.8606783 0.3042649
##  [8] 0.7393854 0.7224751 1.0000000

Question 5

In all countries except 1, there is an association between GMO use and disease incidence.