2025-09-19

What is a p-value?

The odds of getting results that are as extreme if not more than our data is the p-value. This is based on the idea of the hypothesis is false ergo the null hypothesis is true. The smaller the p-value the lower the odds of the results randomly occurring, so the greater chance that the hypothesis is actually true.

We will be doing a right tailed test using \(T\) as our test stat and observed value \(t_{\text{obs}}\), the p-value is \[ \small \text{p-value} = P_{Null Hypothesis}\!\big(T \ge t_{\text{obs}}\big). \] For a two-sided test, \[ \small \text{p-value} = 2\,\min\!\Big\{P_{Null Hypothesis}\!\small(T \le t_{\text{obs}}\small),\; P_{Null Hypothesis}\!\small(T \ge t_{\text{obs}}\small)\Big\}. \]

How do we use p-value?

We pick the p-value that dictates that we can reject the null hypothesis. The standard and the one we will be using in our examples is .05 as our p-value target. P-value of .1 will mean we accept the null hypothesis and a p-value of .01 will mean we reject the null hypothesis meaning our hypothesis is backed up.

Example: Rock Paper Scissors

The example we will be looking at is John’s ability to win at rock paper scissors. John thinks he is good at rock paper scissors. This makes the hypothesis that he can regularly win over 33% of the time. The null hypothesis is that he has no intrinsic skill and thus will win 33% of the time.

We test \(Null Hypothesis: p = \tfrac{1}{3}\) versus \(Alternate Hypothesis: p > \tfrac{1}{3}\) with \(X \sim \mathrm{Bin}(n,p)\). For an observed \(w\) wins, the right-tailed p-value is \[ \small \text{p-value} = P_{Null}(X \ge w) \;= \sum_{x=w}^{n} \binom{n}{x} p_Null^{\,x}(1-p_Null)^{\,n-x} \\ p_Null = \tfrac{1}{3} \]

Explore the Odds

Here is the general probability of winning any assortment of games under a sample size of 20 games.

Code for the graph

n  <- 20  # Number of Games
p0 <- 1/3 # Win probability

xValues  = 0:n
pmf = dbinom(xValues, size = n, prob = p0)

plot_ly() |>
  add_bars(x =xValues, y = pmf,
           name = "Probability",
           hovertemplate = "Wins = %{x}<br>Probability = %{y:.4f}<extra></extra>") |>
  layout(title= paste0("Normal Distribution of Wins"),
         xaxis = list(title = "Wins"),
         yaxis = list(title = "Probability"))

Example 1

In this version he played 10 games and won 8 times.

Example 1 Continued

As you could see in the example given that the 8 win bar and everything greater then it was so low this is a rare amount of games to win. Our p-value was .0034. As you remember our benchmark was .05, as it is less the odds of this being random chance are low. We can reject the null hypothesis that stated John wasn’t better than the average player at rock paper scissors. By proxy we accept John’s hypothesis that he is better than the average rock paper scissors player and his results weren’t likely due to random chance.

Example 2

In this version he played 12 games and won 5 times.

Example 2 Continued

As you could see in the example given that the 5 win bar and everything passed it was so high in comparison to our previous example this is a much less rare result. Our p-value was .3685. As you remember our benchmark was .05, our p-value is above this. As such we accept the null hypothesis that stated that John likely doesn’t have a special talent for rock paper scissors. By proxy we reject John’s hypothesis that he does have a greater than standard chance of winning rock paper scissors.