P-values

You calculate a p value by calculating the probability of obtaining data as or more extreme than you actually obtained in favor of the alternative, where the probability calculation is done under the null. When communicating a P-value, the reader can perform the test at whatever Type I error rate that they would like. Just compare the P-value to the desired Type I error rate and if the P-value is smaller, reject the null hypothesis.

https://xkcd.com/1132/

Formally, the P-value is the probability of getting data as or more extreme than the observed data in favor of the alternative. The probability calculation is done assuming that the null is true. In other words if we get a very large T statistic the P-value answers the question “How likely would it be to get a statistic this large or larger if the null was actually true?”. If the answer to that question is “very unlikely”, in other words the P-value is very small, then it sheds doubt on the null being true, since you actually observed a statistic that extreme.

Most common measure of “statistical significance”
Their ubiquity, along with concern over their interpretation and use makes them controversial among statisticians
- http://warnercnr.colostate.edu/~anderson/thompson1.html
- Also see Statistical Evidence: A Likelihood Paradigm by Richard Royall
- Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy by Steve Goodman
- The hilariously titled: The Earth is Round (p < .05) by Cohen.
Some positive comments

What is a P-value?

Idea: Suppose nothing is going on - how unusual is it to see the estimate we got?

Approach:

Define the hypothetical distribution of a data summary (statistic) when “nothing is going on” (null hypothesis)
Calculate the summary/statistic with the data we have (test statistic)
Compare what we calculated to our hypothetical distribution and see if the value is “extreme” (p-value)

The attained significance level

Our test statistic was \(2\) for \(H_0 : \mu_0 = 30\) versus \(H_a:\mu > 30\).
Notice that we rejected the one sided test when \(\alpha = 0.05\), would we reject if \(\alpha = 0.01\), how about \(0.001\)?
The smallest value for alpha that you still reject the null hypothesis is called the attained significance level
This is equivalent, but philosophically a little different from, the P-value

Revisiting an earlier example

Suppose a friend has \(8\) children, \(7\) of which are girls and none are twins
If each gender has an independent \(50\)% probability for each birth, what’s the probability of getting \(7\) or more girls out of \(8\) births?

choose(8, 7) * .5 ^ 8 + choose(8, 8) * .5 ^ 8

[1] 0.03515625

pbinom(6, size = 8, prob = .5, lower.tail = FALSE)

[1] 0.03515625

Here , if we were testing that hypothesis, we would reject at a 5% level. We would reject at a 4% level, but we would not reject at an type 1 error rate of 3%.

Poisson example

Suppose that a hospital has an infection rate of 10 infections per 100 person/days at risk (rate of 0.1) during the last monitoring period.
Assume that an infection rate of 0.05 is an important benchmark.
Given the model, could the observed rate being larger than 0.05 be attributed to chance?
Under \(H_0: \lambda = 0.05\) so that \(\lambda_0 100 = 5\)
Consider \(H_a: \lambda > 0.05\).

ppois(9, 5, lower.tail = FALSE)

[1] 0.03182806

The rate is 0.05 having been monitored for 100 person days at risk, is the probability of obtaining 10 or more infections. So in R we plug in (n-1) that is 9 for the upper tail, days at risk is 5 and lower.tail=FALSE because want 10 or more, not less than 10!

So the results show that there is only a 3% chance of that occurring. As the real infection was 5 for a 100 person days at risk, 5%, this hospital should perhaps should execute planned quality control procedures.

P-values

Statistical inference

Linda Angulo Lopez

P-values

What is a P-value?

The attained significance level

Revisiting an earlier example

Poisson example