M. Drew LaMar
February 24, 2017
Use binom.test
to do a binomial test! It's more accurate. If our observed test statistic is \( X = 16 \) successes out of \( n = 17 \) trials, and our null hypothesized proportion is \( p_{0} = 0.5 \), then we have:
binom.test(16,
n = 17,
p = 0.5)
Exact binomial test
data: 16 and 17
number of successes = 16, number of trials = 17, p-value =
0.0002747
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.7131106 0.9985118
sample estimates:
probability of success
0.9411765
From proportions and binomial distributions…
…to working with direct frequency distributions.
Right Left
Observed 14 4
Expected 9 9
Note: The binomial test is an example of a
goodness-of-fit test .
Definition: A
goodness-of-fit test is a method for comparing an observed frequency distribution with the frequency distribution that would be expected under a simple probability model governing the occurrence of different outcomes.
Definition: A
model in this case is a simplified, mathematical representation that mimics how we think a natural process works.
Assignment Problem #21
A more recent study of Feline High-Rise Syndrom (FHRS) included data on the month in which each of 119 cats fell (Vnuk et al. 2004). The data are in the accompanying table. Can we infer that the rate of cat falling varies between months of the year?
Month | Number fallen | Month | Number fallen |
---|---|---|---|
January | 4 | July | 19 |
February | 6 | August | 13 |
March | 8 | September | 12 |
April | 10 | October | 12 |
May | 9 | November | 7 |
June | 14 | December | 5 |
A more recent study of Feline High-Rise Syndrom (FHRS) included data on the month in which each of 119 cats fell (Vnuk et al. 2004). The data are in the accompanying table. Can we infer that the rate of cat falling varies between months of the year?
Question: What are the null and alternative hypotheses?
Answer:
\( H_{0} \): The frequency of cats falling is the same in each month.
\( H_{A} \): The frequency of cats falling isnot the same in each month.
Observed and Expected Frequencies
We want to use dplyr
for practice, so…
if (!require(dplyr)) {
install.packages("dplyr")
library(dplyr)
}
Now load data and store as a tibble.
mydata <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter08/chap08q21FallingCatsByMonth.csv") %>%
tbl_df()
Observed and Expected Frequencies
Let's peek at the data using glimpse
.
glimpse(mydata)
Observations: 119
Variables: 1
$ month <fctr> January, January, January, January, February, February,...
We need frequencies for months of the year.
The data in this case is in tidy form, i.e. each row is an observation (a falling cat), and each column is a measurement (month).
Question: How do we get frequencies?
Observed and Expected Frequencies
You can use the table
command…
table(mydata)
mydata
April August December February January July June
10 13 5 6 4 19 14
March May November October September
8 9 7 12 12
…but the output is not a data frame!
Question: How can we get frequencies in data frame format?
Observed and Expected Frequencies
(mytable <- mydata %>%
group_by(month) %>%
summarize(obs = n()))
# A tibble: 12 × 2
month obs
<fctr> <int>
1 April 10
2 August 13
3 December 5
4 February 6
5 January 4
6 July 19
7 June 14
8 March 8
9 May 9
10 November 7
11 October 12
12 September 12
Observed and Expected Frequencies
How do we get the months in the correct order?!?!
mytable$month <- factor(mytable$month,
levels = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"))
mytable %>% arrange(month)
arrange
on month
.Observed and Expected Frequencies
How do we get the months in the correct order?!?!
# A tibble: 12 × 2
month obs
<fctr> <int>
1 January 4
2 February 6
3 March 8
4 April 10
5 May 9
6 June 14
7 July 19
8 August 13
9 September 12
10 October 12
11 November 7
12 December 5