Introduction to p-values
Hypothesis testing is one of the core components of statistical analysis. The p-value represents the probability of getting results similar to yours (or more extreme) if the null hypothesis is true. In this lab, we will: (1) generate a random binomial distribution, (2) test the null hypothesis, (3) calculate critical values and (4) perform a t-test.
Relevant functions: rbinom(),
binom.test(), qbinom(),
t.test().
1. Generating and Understanding our Data
In this lab, we will use a distribution of Zener cards trials. These cards were invented by Karl Zener in the early 1930s to test for telepathic abilities. In a typical trial, an experimenter would select one of five potential cards (representing a circle, a cross, waves, a square, or a star) 25 times and ask an individual to correctly guess which card they selected, without showing the card to the participant. That is to say, participants are requested to use telepathic abilities to correctly guess the card.
1.1 Using rbinom()
Given that there are different 5 cards, the probability of correctly guessing a card in a given Zener card test is 1/5 (or 0.2). Here, we sample 1000 random observations from an underlying binomial distribution representing the number of correct guesses out of 25 Zener cards. In other words, we are simulating the outcome (in number of correct guesses) of a classic 25-card trial for 1000 different people.
set.seed(200) # Setting the seed for replication purposes
myData <- rbinom(n=1000,size=25,prob=0.2) # Sampling 100 random observations from an underlying binomial distribution
1.2 MCT and MS
Next, we want to check some measures of central tendency and of spread of myData.
mean(myData) # What is the mean?
## [1] 4.978
Mode <- function(x){ # Creating a "Mode()" function
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
Mode(myData) # What is the mode?
## [1] 5
sd(myData) # What is the standard deviation?
## [1] 1.926455
1.3 Graphing our Data
We are graphing the result of 1000 Zener cards trials of 25 cards each. In other words, we want to visualize how many correct guesses were obtained on each of these 1000 trials (i.e. how many times people correctly guessed 0 cards, 1 card, etc. out of 25).
1.4 Calculating the PMF
Let’s first look at the distribution of our data, and then at the Probability Mass Function of our binomial distribution, which as we already know, follows the following formula:
\[\ \left({{n}\atop k}\right) p^k (1-p)^{n-k} \] In other words: \[\ \frac{n!}{k!*(n-k)!} * p^k (1-p)^{n-k} \]
In which n is the number of trials, k is the number of successes and p is the probability of a success on any given trial.
x <- table(myData)/length(myData) # Creating a table with the distribution of myData in terms of %
x # Printing out that table
## myData
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 0.005 0.016 0.069 0.140 0.178 0.216 0.176 0.108 0.054 0.023 0.008 0.004 0.002
## 13
## 0.001
x[1]+x[2]+x[3] # Example: Calculating the % of trials where we correctly guessed 2 cards or less
## 0
## 0.09
# Calculating the PMF
pbinom(2,size=25,prob=0.2,lower.tail=TRUE)
## [1] 0.09822522
For reference, you can calculate the THEORETICAL
probability of correctly guessing 2 cards or less out of 25 trials using
pbinom(). The randomness of the sampling process explains
why the results for myData are slightly different from
the theoretically expected ones.
Why did we do all this? Because before mobilizing p-values, we needed to understand that every outcome (even the most extreme) have a given probability to occur due to chance alone (even if that probability is extremely small).
2. Hypothesis Testing
A p-value represents the probability of observing results similar to yours or more extreme due to chance alone. That is to say, if I tell you that you had only 1.7% chances of correctly guessing 9 cards out of 25, and that you do guess 9 cards out of 25… does that allow us to conclude that you have a superpower?
You have to decide which threshold would convince you—but in science, we usually consider that \[p<0.05\] is enough to be convinced and to reject the null hypothesis.
2.1 Testing the null hypothesis
| Null Hypothesis | Alternative Hypothesis |
|---|---|
| No telepathic ability | Telepathic abilities |
| Probability of success = 0.2 | Probability of success > 0.2 |
Since we are testing the hypothesis that our respondent is able to guess Zener cards at a higher rate than the hypothesized probability of success, let’s do a one-sided test.
sum(myData) # How many times has our respondent correctly guessed a Zener card?
## [1] 4978
binom.test(4978,25000,p=0.2, # 4978 successes, 25000 trials, with 0.2 hypothesized probability of success
alternative ="greater",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 4978 and 25000
## number of successes = 4978, number of trials = 25000, p-value = 0.6385
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.1949717 1.0000000
## sample estimates:
## probability of success
## 0.19912
Our p-value is greater than 0.05, thus we conclude that our respondent doesn’t have telepathic abilities.
Exercise 1
Test the null hypothesis that a coin is fair if you flip a “head” 550
times out of 1000 trials. Hint: you need to use the
alternative =“two.sided” argument in
binom.test().
binom.test(550,1000,p=0.5,
alternative ="two.sided",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 550 and 1000
## number of successes = 550, number of trials = 1000, p-value = 0.001731
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.5185565 0.5811483
## sample estimates:
## probability of success
## 0.55
Next step: using qbinom(), let’s now see how many times
they would have needed to guess the cards correctly to reject the null
hypothesis.
2.2 Finding critical values
What are the critical values associated with the 95th percentile of what I should observe? Remember that as this is a one-sided test, we put the entire 5% of most extreme possibilites on the right side of the distribution, rather than separating it in 2. In other words, we want to know: what score is associated with having one of the 5% most extremes (i.e. most unlikely) results? Since this score is so unlikely to happen by chance alone, I consider that my result is in fact, probably NOT due to chance, but to a unique underlying DGP—and therefore we reject the null hypothesis at this threshold.
qbinom(p=.95, size=25000, prob=0.2) # Finding the critical value at the 95th percentile
## [1] 5104
# Sanity check: 5014 successfull trials should not be significant
binom.test(5104,25000,p=0.2,
alternative ="greater", # alternative what? hypothesis!
conf.level = 0.95) # 1 - confidence level = p-value
##
## Exact binomial test
##
## data: 5104 and 25000
## number of successes = 5104, number of trials = 25000, p-value = 0.05114
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.1999723 1.0000000
## sample estimates:
## probability of success
## 0.20416
# Sanity check: 5015 successfull trials should be significant
binom.test(5105,25000,p=0.2,
alternative ="greater",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 5105 and 25000
## number of successes = 5105, number of trials = 25000, p-value = 0.04951
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.2000119 1.0000000
## sample estimates:
## probability of success
## 0.2042
In the density plot below, the area in red represents the rejection region. It is called this way because these values lead to the rejection of the null hypothesis. Another way to see this is the following: the probability of getting these results if the null hypothesis is true (i.e. if the respondent is not a psychic) is only 5%. Thus, even if a given respondent is not a psychic, they could sometimes manage to correctly guess 5105 cards or more by pure chance—but it’s very unlikely.
2.3 Performing a t-test
Imagine that we have two individuals in this class who accept to do the Zener Cards trial 10 times (each trial consists of trying to correctly guess 25 cars). Below are what the results look like for Student A and for Student B in terms of number of correct guesses per trial.
StudentA
## [1] 3 5 6 4 3 5 6 5 3 5
StudentB
## [1] 6 7 5 8 6 5 4 7 6 6
How do we know if the difference between the results of both students is statistically significant? We can run a t-test, which attributes a p-value to the difference between the means of both sets of results. That is to say it tells us the probability of observing such a difference (or even more extreme) between the scores of both students, just by chance alone.
| Null Hypothesis | Alternative Hypothesis |
|---|---|
| No difference between students | Difference between students |
| Similar means | Different means |
# Performing a two-sided t-test
t.test(StudentA,StudentB,alternative ="two.sided")
##
## Welch Two Sample t-test
##
## data: StudentA and StudentB
## t = -2.8749, df = 17.993, p-value = 0.01008
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.5961871 -0.4038129
## sample estimates:
## mean of x mean of y
## 4.5 6.0
How do you interpret these results? In other words, given the p-value, can you confidently reject the null hypothesis?
Exercise 2
One of your friends claims that they have telepathic powers because they correctly guessed 8 Zener cards out of 25 trials—a success rate of 0.32, which is much higher than the hypothesized success rate of 0.2. Test the null hypothesis that they do not have telepathic abilities, and find the critical value (95% confidence interval).
binom.test(8,25,p=0.2,
alternative ="greater",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 8 and 25
## number of successes = 8, number of trials = 25, p-value = 0.1091
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.1703037 1.0000000
## sample estimates:
## probability of success
## 0.32
qbinom(p=.95, size=25, prob=0.2)
## [1] 8