How to Generate & Interpret P Values.

Necessary Background: Statistical methods for dealing with large & small datasets.

The question motivating p-values is: Given that we have some null hypothesis concerning our data, how unusual or extreme is the sample value we get from our data, from for example, its mean?

That is, is our test statistic consistent with our hypothesis? There are, implicitly, three steps we have to take to answer these types of questions.

Create a null hypothesis
Calculate a test statistic from the given data
Compare the test statistic to the hypothetical distribution

P Values by https://xkcd.com/1132/

If the p-value (probability of seeing your test statistic) is small, then one of two things happens.

EITHER H_0 is true and you have observed a rare event, as there is an unusual test statistic,
OR H_0 is false.

For example if get a t statistic of 2.5 with 15 degrees of freedom, testing H_0, that is that that mu = mu_0, versus an alternative H_a, that mu > mu_0. We would need to find the probability of getting a t statistic as large as 2.5.

We can use the R function pt, the distribution function of the t distribution. This function returns one of two probabilities, EITHER the probability of X > q (if lower.tail is FALSE) OR X <= q (if lower.tail is TRUE), where q is a quantile argument. Here we’ll set q=2.5, df=15, lower.tail=FALSE since H_a says that mu>mu_0. We have to gauge the extremity in the direction of H_a, as follows:

pt(2.5, 15, lower.tail = FALSE )

## [1] 0.0122529

#[1] 0.0122529

This result tells us that, if H_0 were true, we would see this large a test statistic with probability of 1%, which is quite unlikely. So we can - Reject H_0.

Revise, if necessary: how to create a null hypothesis: We have to begin with a null hypothesis which is a reasoned guess at some distribution of a data summary (a statistic). The null hypothesis H_0 is a baseline against which we’ll measure an alternative hypothesis using the actual observed data.

Revise, if necessary: how to calculate a test statistic from the given data The Central Limit Theorem (CLT); Z statistic ; Student’s or Gosset’s t distribution ; t confidence intervals are a few statistical methods for dealing with large & small datasets.

Compare the test statistic to the hypothetical distribution

The comparison tells you how “extreme” the test value is toward the alternativehypothesis. The p-value is the probability under the null hypothesis of obtaining evidence as or more extreme than your test statistic, obtained from your observed data, in the direction of the alternative hypothesis.

an attained significance level:

Another way to think about a p-value is as an attained significance level, which is when the p-value is the smallest value of alpha at which you will reject the null hypothesis.

So if we we computed a test statistic Z = 2 and our H_0 said that mu_0 = 30 and the alternative H_a that mu > 30. And we rejected the one sided test when alpha was set to 0.05 as the quantile assiciated with this test was 1.644854, less than 2.

## [1] 1.644854

So we say WE REJECTED H_0 BECAUSE OUR DATA (the test statistic actually) FAVOURED H_a.

The test statistic 2 (shown by the vertical blue line) falls in the shaded portion of this figure because it exceeds the quantile. The shaded portion represents 5% of the area under the curve.

at the 99th percentile:

However as the test statistic 2 is not in the region of rejection: - we say WE FAILED TO REJECTED H_0

## [1] 2.326348

attained significance level, with pnorm():

Our data (the test statistic) tells us what the attained significance level is. We use the R function pnorm() to give us this number. We should get a result between .95, where we rejected and .99, where we failed to reject.

pnorm(2)

## [1] 0.9772499

#[1] 0.9772499
#With the default values, specifically lower.tail=TRUE, this gives us the probability 
#that a random draw from the distribution is less than or equal to the argument.

The attained significance level was .98, as expected between the quantiles where we rejected and where we failed to reject H_0.

To find the p value associated with this example, we use pnorm. But this time we’ll set the lower.tail argument to FALSE. This gives us the probability of X exceeding the test statistic, that is, the area under the curve to the right of test statistic.

pnorm(2, lower.tail = FALSE)

## [1] 0.02275013

#[1] 0.02275013

This tells us that the attained level of significance is about 2%.

By reporting a p-value, instead of an alpha level and whether or not you reject H_0, reviewers of your work can hypothesis test at any alpha level they choose. The general rule is that if the p-value is less than the specified alpha you reject the null hypothesis and if it’s greater you fail to reject.

`Most software assumes a two-sided test and automatically`

`and double the p value:`

So for a two sided hypothesis test, you have to double the smaller of the two one-sided p values.

So using the family with 7 girls and a boy child example, if we treat the case as a fair coin flip, You want to test H_0, that p=.5, where p is the probability of a girl and H_a is that p is not equal to .5. It’s either greater or less than .5.

Let’s set alpha, the level of our test, to .05 and find the probabilities associated with different rejection regions, where a rejection region i has at least i-1 girls out of a possible 8 and we put these probabilities in a vector mybin:

## [1] 0.03515625

## [1] 0.1445313

## [1] 1.00000000 0.99609375 0.96484375 0.85546875 0.63671875 0.36328125 0.14453125
## [8] 0.03515625 0.00390625

So mybin[1]=1.0, meaning that with probability 1 there are at least 0 girls, and mybin[2]=.996 is the probability that there’s at least 1 girl out of the 8, and so forth. The probabilities decrease as i increases.

The second last value shows us that the probability of having at least 7 girls, out of 8 children is .035. You can verify this with the R function pbinom as follows, with the arguments 6,size=8, prob=.5, and lower.tail=FALSE.

pbinom(6,size=8,prob=.5,lower.tail=FALSE)

## [1] 0.03515625

#[1] 0.03515625

As .03 is less than .05 so our sample falls in this region of rejection, so - we reject H_0 at alpha = .04 or alpha = .05 - we fail to reject H_0 at alpha = .03 as the p-value is about .035 which is greater than alpha=.03

For the other side of the test we want the probability that i <= 7, again out of a sample of size 8 with probability .5. Again, we use pbinom, this time with an argument of 7 and lower.tail=TRUE.

pbinom(7,size=8,prob=.5,lower.tail=TRUE)

## [1] 0.9960938

#[1] 0.9960938

So we see that it’s highly likely, at a probability of 99.6%, that out of 8 children you’ll have at most 7 girls.

The p value of this two sided test is 2 times the smaller of the two one-sided values. In this case the lower value is .035, so 2*.035 is the p-value for this two-sided test.

Poisson distribution:

A Poisson distribution, is discrete and involves counts or rates of counts, usually over time.

Using the infections in a hospital case, where the hospital has an infection rate of 10 infections per 100 person/days at risk that is a rate of 0.1. If we assume that an infection rate of 0.05 is the benchmark, our alpha level.

With this model, could the observed rate (.1) be larger than the benchmark 0.05 by chance or does it indicate a problem at the hospital?

So translating to stastics we, H_0 says that lambda = 0.05 so lambda_0 * 100 = 5, and H_a says that lambda > 0.05.The question is “Is H_0 true and our observed rate (.1) is just a rare event OR should we reject H_0 ?”

ppois() returns probabilities for Poisson:

The R function ppois() returns probabilities for Poisson distributions. If we want the probability of seeing at least 9 infections using a lambda value of 5 and lower.tail=FALSE. As when we used pbinom we have to use 9 as the argument since we’re looking for a probability of a value greater than the argument.

ppois(9,5,lower.tail=FALSE)

## [1] 0.03182806

#[1] 0.03182806

A probability of about .03 is returned, so we reject the infection rate hypothesized by H_0 since the data favors H_a, indicating that the infection rate is much higher than the expected value of H_0.

P-values are a convenient way to communicate the results of a hypothesis test.

You calculate a p value by calculating the probability of obtaining data as or more extreme than you actually obtained in favor of the alternative, where the probability calculation is done under the null. When communicating a P-value, the reader can perform the test at whatever Type I error rate that they would like. Just compare the P-value to the desired Type I error rate and if the `-value is smaller, reject the null hypothesis.