Hypothesis testing is one of the core components of statistical analysis. The p-value represents the probability of getting results similar to yours (or more extreme) given that the null hypothesis is true. In this recitation, we will: (1) generate a random binomial distribution, (2) test the null hypothesis and (3) calculate critical values.
In this recitation, we will use a distribution of Zener cards trials. These cards were invented by Karl Zener in the early 1930s to test for telepathic abilities. In a typical trial, an experimenter would select one of five potential cards (representing a circle, a cross, waves, a square, or a star) and ask an individual to guess which one he/she selected.
rbinom()
Given that there are different 5 cards, the probability of correctly guessing a card in a given Zener card test is 1/5 (or 0.2). Here, we generate a random binomial distribution of correct guesses by one single individual for 1000 trials of 25 Zener cards.
set.seed(200) # Setting the seed for replication purposes
myData <- rbinom(n=1000,size=25,prob=0.2) # Creating a random binomial distribution
mean(), Mode() and sd()
mean(myData) # What is the mean?
## [1] 4.978
Mode <- function(x){ # Creating a "Mode()" function
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
Mode(myData) # What is the mode?
## [1] 5
sd(myData) # What is the standard deviation?
## [1] 1.926455
Creating a table of the Probability Mass Function.
x <- table(myData)/length(myData) # Creating a table with the probability distribution (or PMF) of myData
x # Printing out that table
## myData
## 0 1 2 3 4 5 6 7 8 9 10 11
## 0.005 0.016 0.069 0.140 0.178 0.216 0.176 0.108 0.054 0.023 0.008 0.004
## 12 13
## 0.002 0.001
x[1]+x[2]+x[3] # Example: Calculating the probability of correctly guessing 2 cards or less
## 0
## 0.09
pbinom(2,size=25,prob=0.2,lower.tail=TRUE) # For reference, you can use rbinom to calculate the THEORETICAL probability of correctly guessing 2 cards or less out of 25 trials using pbinom(). But remember, in this recitation, we rather want to conduct tests on the data that we actually sampled (myData)! The randomness of the sampling process explains why the results for myData are slightly different from the theoretically expected ones.
## [1] 0.09822522
Creating a histogram of the Probability Mass Function.
hist(myData,probability=TRUE, # Creating a histogram for the PMF
col="#4286f4",
border="black",
ylim=c(0,0.25),
xlim=c(0,25),
main="Probability Mass Function of myData",
xlab="Number of Correct Guesses",
ylab="Probability")
binom.test()
| Null Hypothesis | Alternative Hypothesis |
|---|---|
| No telepathic ability | Telepathic abilities |
| Probability of success = 0.2 | Probability of success > 0.2 |
Since we are testing the hypothesis that our respondent is able to guess Zener cards at a higher rate than the hypothesized probability of success, this is a one-sided test.
sum(myData) # How many times has our respondent correctly guessed a Zener card?
## [1] 4978
binom.test(4978,25000,p=0.2, # 4978 successes, 25000 trials, with 0.2 hypothesized probability of success
alternative ="greater",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 4978 and 25000
## number of successes = 4978, number of trials = 25000, p-value =
## 0.6385
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.1949717 1.0000000
## sample estimates:
## probability of success
## 0.19912
Our p-value is greater than 0.05, thus we conclude that our respondent doesn’t have telepathic abilities. Using qbinom(), let’s see how many times he/she would have needed to guess the cards correctly to reject the null hypothesis.
q.binom()
qbinom(p=.95, size=25000, prob=0.2) # Finding the critical value at the 95% confidence level
## [1] 5104
# Sanity check: 5014 successfull trials should not be significant
binom.test(5104,25000,p=0.2,
alternative ="greater",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 5104 and 25000
## number of successes = 5104, number of trials = 25000, p-value =
## 0.05114
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.1999723 1.0000000
## sample estimates:
## probability of success
## 0.20416
# Sanity check: 5015 successfull trials should be significant
binom.test(5105,25000,p=0.2,
alternative ="greater",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 5105 and 25000
## number of successes = 5105, number of trials = 25000, p-value =
## 0.04951
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.2000119 1.0000000
## sample estimates:
## probability of success
## 0.2042
In the density plot below, the area in red represents the rejection region. It is called this way because these values lead to the rejection of the null hypothesis. Another way to see this is the following: the probability of getting these results if the null hypothesis is true (i.e. if the respondent in not a psychic) is only 5%. Thus, even if a given respondent is not a psychic, he/she could sometimes manage to correctly guess 5105 cards or more by pure chance—but it’s very unlikely.
One of your friends claims that they have telepathic powers because they correctly guessed 8 Zener cards out of 25 trials—a success rate of 0.32, which is much higher than the hypothesized success rate of 0.2. Test the null hypothesis that they do not have telepathic abilities, and find the critical value (95% confidence interval).
Relevant functions: binom.test(), qbinom().
In our myData sample, our respondent sometimes guessed more cards than the critical value found in Exercice 1 (see table below). Does that necessarily mean that he/she has telepathic abilities? Why or why not?
## myData
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 5 16 69 140 178 216 176 108 54 23 8 4 2 1
Test the null hypothesis that a coin is fair if you flip a “head” 550 times out of 1000 trials. Hint: you need to use the alternative =“two.sided” argument in binom.test().
Relevant function: binom.test().
Imagine a random binomial distribution of 1000 trials of 25 coin flips (using a fair coin). What are the rejection regions of that distribution? In other words, how many times would you have to flip a “head” before concluding that the coin is weighted (i.e. not fair)? Hint: when using qbinom(), think about how the 5% chances of having a satistically significant result are distributed in a two-tailed test versus a one-sided test.
Relevant functions: qbinom().