Introduction to p-values
Hypothesis testing is one of the core components of statistical analysis. The p-value represents the probability of getting results similar to yours (or more extreme) if the null hypothesis is true. In this lab, we will: (1) generate a random binomial distribution, (2) test the null hypothesis, (3) calculate critical values and (4) perform a t-test.
Relevant functions: rbinom(),
binom.test(), qbinom(),
t.test().
1. Generating and Understanding our Data
In this lab, we will use a distribution of Zener cards trials. These cards were invented by Karl Zener in the early 1930s to test for telepathic abilities. In a typical trial, an experimenter would select one of five potential cards (representing a circle, a cross, waves, a square, or a star) 25 times and ask an individual to correctly guess which card they selected, without showing the card to the participant. That is to say, participants are requested to use telepathic abilities to correctly guess the card.
This is what Zener Cards look like
1.1 Using rbinom()
Given that there are different 5 cards, the probability of correctly guessing a card in a given Zener card test is 1/5 (or 0.2). Here, we generate a random binomial distribution of correct guesses by one single individual for 1000 trials of 25 Zener cards.
set.seed(200) # Setting the seed for replication purposes
myData <- rbinom(n=1000,size=25,prob=0.2) # Creating a random binomial distribution
1.2 MCT and MS
Next, we want to check some measures of central tendency and of spread of myData.
mean(myData) # What is the mean?
## [1] 4.978
Mode <- function(x){ # Creating a "Mode()" function
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
Mode(myData) # What is the mode?
## [1] 5
sd(myData) # What is the standard deviation?
## [1] 1.926455
1.3 Graphing our Data
We are graphing the result of 1000 Zener cards trials of 25 cards each. In other words, we want to visualize how many correct guesses we obtained on each of these 1000 trials (i.e. how many times we correctly guessed 0 cards, 1 card, etc.).
1.4 Calculating the PMF
Let’s first look at the distribution of our data, and then at the Probability Mass Function of our binomial distribution.
x <- table(myData)/length(myData) # Creating a table with the distribution of myData in terms of %
x # Printing out that table
## myData
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 0.005 0.016 0.069 0.140 0.178 0.216 0.176 0.108 0.054 0.023 0.008 0.004 0.002
## 13
## 0.001
x[1]+x[2]+x[3] # Example: Calculating the % of trials where we correctly guessed 2 cards or less
## 0
## 0.09
# Calculating the theoretical PMF (not on our data)
pbinom(2,size=25,prob=0.2,lower.tail=TRUE)
## [1] 0.09822522
For reference, you can calculate the THEORETICAL
probability of correctly guessing 2 cards or less out of 25 trials using
pbinom(). The randomness of the sampling process explains
why the results for myData are slightly different from
the theoretically expected ones.
Why did we do this? Because before mobilizing p-values, we needed to understand that every outcome (even the most extreme) have a given probability to occur due to chance alone (even if that probability is extremely small).
2. Hypothesis Testing
A p-value represents the probability of observing results similar to yours or more extreme due to chance alone. That is to say, if I tell you that you had only 1.7% chances of correctly guessing 9 cards out of 25, and that you do guess 9 cards out of 25… does that allow us to conclude that you have a superpower?
You have to decide which threshold would convince you—but in science, we usually consider that \[p<0.05\] is enough to be convinced and to reject the null hypothesis.
2.1 Testing the null hypothesis
| Null Hypothesis | Alternative Hypothesis |
|---|---|
| No telepathic ability | Telepathic abilities |
| Probability of success = 0.2 | Probability of success > 0.2 |
Since we are testing the hypothesis that our respondent is able to guess Zener cards at a higher rate than the hypothesized probability of success, let’s do a one-sided test.
sum(myData) # How many times has our respondent correctly guessed a Zener card?
## [1] 4978
binom.test(4978,25000,p=0.2, # 4978 successes, 25000 trials, with 0.2 hypothesized probability of success
alternative ="greater",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 4978 and 25000
## number of successes = 4978, number of trials = 25000, p-value = 0.6385
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.1949717 1.0000000
## sample estimates:
## probability of success
## 0.19912
Our p-value is greater than 0.05, thus we conclude that our respondent doesn’t have telepathic abilities.
Next step: using qbinom(), let’s now see how many times
they would have needed to guess the cards correctly to reject the null
hypothesis.
2.2 Finding critical values
What are the critical values associated with the 95th percentile of what I should observe? Remember that as this is a one-sided test, we put the entire 5% of most extreme possibilites on the right side of the distribution, rather than separating it in 2. In other words, we want to know: what score is associated with having one of the 5% most extremes (i.e. most unlikely) results? Since this score is so unlikely to happen by chance alone, I consider that my result is in fact, probably NOT due to chance, but to a unique underlying DGP—and therefore we reject the null hypothesis at this threshold.
qbinom(p=.95, size=25000, prob=0.2) # Finding the critical value at the 95th percentile
## [1] 5104
# Sanity check: 5014 successfull trials should not be significant
binom.test(5104,25000,p=0.2,
alternative ="greater", # alternative what? hypothesis!
conf.level = 0.95) # 1 - confidence level = p-value
##
## Exact binomial test
##
## data: 5104 and 25000
## number of successes = 5104, number of trials = 25000, p-value = 0.05114
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.1999723 1.0000000
## sample estimates:
## probability of success
## 0.20416
# Sanity check: 5015 successfull trials should be significant
binom.test(5105,25000,p=0.2,
alternative ="greater",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 5105 and 25000
## number of successes = 5105, number of trials = 25000, p-value = 0.04951
## alternative hypothesis: true probability of success is greater than 0.2
## 95 percent confidence interval:
## 0.2000119 1.0000000
## sample estimates:
## probability of success
## 0.2042
In the density plot below, the area in red represents the rejection region. It is called this way because these values lead to the rejection of the null hypothesis. Another way to see this is the following: the probability of getting these results if the null hypothesis is true (i.e. if the respondent is not a psychic) is only 5%. Thus, even if a given respondent is not a psychic, they could sometimes manage to correctly guess 5105 cards or more by pure chance—but it’s very unlikely.
2.3 Extra Material: Performing a t-test
Imagine that we have two individuals in this class who accept to do the Zener Cards trial 10 times (each trial consists of trying to correctly guess 25 cars). Below are what the results look like for Student A and for Student B in terms of number of correct guesses per trial.
StudentA
## [1] 3 5 6 4 3 5 6 5 3 5
StudentB
## [1] 6 7 5 8 6 5 4 7 6 6
How do we know if the difference between the results of both students is statistically significant? We can run a t-test, which attributes a p-value to the difference between the means of both sets of results. That is to say it tells us the probability of observing such a difference (or even more extreme) between the scores of both students, just by chance alone.
| Null Hypothesis | Alternative Hypothesis |
|---|---|
| No difference between students | Difference between students |
| Similar means | Different means |
# Performing a two-sided t-test
t.test(StudentA,StudentB,alternative ="two.sided")
##
## Welch Two Sample t-test
##
## data: StudentA and StudentB
## t = -2.8749, df = 17.993, p-value = 0.01008
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.5961871 -0.4038129
## sample estimates:
## mean of x mean of y
## 4.5 6.0
How do you interpret these results? In other words, given the p-value, can you confidently reject the null hypothesis?
Exercise
Test the null hypothesis that a coin is fair if you flip a “head” 550
times out of 1000 trials. Hint: you need to use the
alternative =“two.sided” argument in
binom.test().