Overview

Instructions

For each problem, write down the hypothesis and also write your conclusion. You should also attach the codes. The level of significance is \(0.05\).


Preliminaries

For this assignment, we will use the Fisher’s Exact Test and the Chi-square test. We denote the odds ratio as \(\theta\). There are six problems. See the document in the repository for more details.


Problems

Problem 1

Problem Restatement

Is it statistically significant that the proportion of chipmunks trilling is higher when they are closer to their burrow?


Hypothesis

Our hypotheses are \(H_0: \theta = 1\) and \(H_1: \theta > 1\). We input the data and use fisher.test with the argument alternative = 'greater'.


Testing

trills=array(c(16, 8, 3, 18),
      dim = c(2, 2),
      dimnames = list(
          trilled = c("did", "did not"),
          distance = c("10 M", "100 M")))

fisher.test(trills, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  trills
## p-value = 0.0004321
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  2.829883      Inf
## sample estimates:
## odds ratio 
##   11.23249


Analysis

Based on the \(p-value\) we find the results to be statistically significant and we reject \(H_0\).


Problem 2

Problem Restatement

Is there no evidence that the two species of birds use the substrates in different proportions?


Hypothesis

Our hypotheses are \(H_0: \theta = 1\) and \(H_1: \theta \neq 1\). We input the data and use fisher.test with the argument alternative = 'two.sided'.


Testing

substrate=array(c(15, 20, 14, 6, 8, 5, 7, 1),
                dim = c(4, 2),
                dimnames = list(
                    type = c("Vegetation", "Shoreline", "Water", "Structures"),
                    bird = c("Heron", "Egret")))

fisher.test(substrate, alternative = "two.sided")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  substrate
## p-value = 0.5491
## alternative hypothesis: two.sided


Analysis

Based on the \(p-value\) we do not find the results to be statistically significant and we fail to reject \(H_0\).


Problem 3

Problem Restatement

Is there a significant difference in synonymous/replacement ratio between polymorphisms and fixed differences?


Hypothesis

Our hypotheses are \(H_0: \theta = 1\) and \(H_1: \theta \neq 1\). We input the data and use fisher.test with the argument alternative = 'two.sided'.


Testing

gene=array(c(43, 17, 2, 7),
                dim = c(2, 2),
                dimnames = list(
                    fixity = c("polymorphic", "fixed"),
                    synonymicity = c("synonymous", "replacement")))

fisher.test(gene, alternative = "two.sided")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  gene
## p-value = 0.006653
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   1.437432 92.388001
## sample estimates:
## odds ratio 
##   8.540913


Analysis

Based on the \(p-value\) we find the results to be statistically significant and we reject \(H_0\).


Problem 4

Problem Restatement

We want to check whether being a man or a woman (columns) is independent of having voted in the last election (rows). In other words is “sex and voting independent”?


Hypothesis

Our hypotheses are \(H_0:\) the variables are independent and \(H_1:\) they are not independent. We input the data and use chisq.test.


Testing

vote=array(c(2792, 1486, 3591, 2131),
                dim = c(2, 2),
                dimnames = list(
                    position = c("voted", "didn't Vote"),
                    sex = c("Men", "Women")))

chisq.test(vote)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  vote
## X-squared = 6.5523, df = 1, p-value = 0.01047


Analysis

Based on the \(p-value\) we find the results to be statistically significant and we reject \(H_0\).


Problem 5

Problem Restatement

What would we conclude (about this die)?


Hypothesis

Our hypotheses are \(H_0:\) the variables are independent and \(H_1:\) they are not independent. We input the data and use chisq.test.


Testing

die=array(c(42, 55, 38, 57, 64, 44),
                dim = c(6, 1),
                dimnames = list(
                    face = c("1", "2", "3", "4", "5", "6"),
                    count = c("observed")))

chisq.test(die)
## 
##  Chi-squared test for given probabilities
## 
## data:  die
## X-squared = 10.28, df = 5, p-value = 0.06768


Analysis

Based on the \(p-value\) we do not find the results to be statistically significant and we fail to reject \(H_0\).


Problem 6

Problem Restatement

Based on the data, what is your conclusion (pertaining to Calendar Effect for hockey players and date of birth)?


Hypothesis

Our hypotheses are \(H_0:\) the variables are independent and \(H_1:\) they are not independent. We input the data and use chisq.test.


Testing

hockey=array(c(84, 77, 35, 34),
                dim = c(4, 1),
                dimnames = list(
                    quarter = c("Jan. to March", "April to June", 
                                "July to Sept.", "Oct. to Dec."),
                    players = c("count")))

chisq.test(hockey)
## 
##  Chi-squared test for given probabilities
## 
## data:  hockey
## X-squared = 37.2348, df = 3, p-value = 4.104e-08


Analysis

Based on the \(p-value\) we find the results to be statistically significant and we reject \(H_0\).