p-values

The concept of p-value is foundamental in statistics because it is a formal way to test hypotheses. According to Wikipedia:

In statistical hypothesis testing, the p-value or probability value or asymptotic significance is the probability for a given statistical model that, when the null hypothesis is true, the statistical summary (such as the sample mean difference between two compared groups) would be more extreme than the actual observed results

Some definitions:

Null hypothesis

Τhe null hypothesis is a general statement, that there is no relationship between two measured objects. For example:

  • Gender & Height. The null hypothesis states that boys and girls have the same height.
  • Income & Health. The null hypothesis states that the health condition does not depend on the income.

The null hypothesis is a statement for the populations from which the sample is obtained and not for the sample itself. Thus, when the null hypothesis states that: boys and girls have the same height, this statement is referred to the total population of boys and girls from which the sample is obtained and not on the sampled boys and girls.

The null hypothesis is usually the opposite from what we are interested in. For example, a researcher may want to show that boys are taller than girls. Then, she forms the null hypothesis that boys and girls (again their populations) have the same height.

Wikipedia explains nicely the concept of the Null hypothesis as follows: In the significance testing approach of Ronald Fisher, a null hypothesis is rejected if the observed data are significantly unlikely to have occurred if the null hypothesis were true. In this case the null hypothesis is rejected and an alternative hypothesis is accepted in its place. If the data are consistent with the null hypothesis, then the null hypothesis is not rejected. In neither case is the null hypothesis or its alternative proven; the null hypothesis is tested with data and a decision is made based on how likely or unlikely the data are. This is analogous to the legal principle of presumption of innocence, in which a suspect or defendant is assumed to be innocent (null is not rejected) until proven guilty (null is rejected) beyond a reasonable doubt (to a statistically significant degree).

Statistical summary summary statistics

Summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. For example:

  • mean value (average) of heights
  • variance
  • the t-statistic, i.e. \(t = \frac{mean1 - mean2}{VARIANCE}\)
  • any other simple measurement on the complicated data you have

The important this is:

  • Summary statistics should capture as much information as possible for the problem we study
  • We should be able to find or get its distribution when the null hypothesis is correct.

More extreme than the actual observed results

This means that when the null hypothesis is correct we will obtain the distribution of the summary statistic and we should be able to calculate the probability that the summary statistic obtains the same or more extreme value than the value we have already observed.

For example:

Assume that we are interested in studying the height of boys and girls and we use as a statistic the k = mean_boys - mean_girls.

Also assume that we know the distribution of k when the null hypothesis is correct, i.e. when the populations of boys and girls have the same mean (and variance of course). Let’s assume that the following plot is the distribution of k when the null hypothesis is correct:

x <- seq(-4, 4, length=100)
hx <- dnorm(x)
colors <- c("blue")
labels <- c("k")

plot(x, hx, type="l", lty=2, xlab="k",
  ylab="Density", main="Distribution of k given the null")

Let’s assume that k = 1.4 and

  1. the alternative hypothesis is that boys are taller than girls. Then, the p-value is the area under the distribution for \(k \geq 1.4\).
  2. the alternative hypothesis is that boys do not have the same height as girls. Then, the p-value is the area under the distribution for \(k \le -1.4\) or \(k \geq 1.4\). Why is that? Because different means that either boys are taller or shorter. Thus, we should include both cases in the testing. In other words: different height between boys and girls means that \(|k| \ge 1.4\).
  3. the alternative hypothesis is that boys are shorter than girls. Then, the p-value is the are under the distribution for \(k \le 1.4\).

Thus, the p-value depends also on the alternative hypothesis. The distribution of k depends only on the null hypothesis.

How we obtain the distribution of the statistic under the null hypothesis?

Mathematical methodology

This is one of the most crucial steps in hypothesis testing. For the case of the t-statistic, the distribution of the statistic under the null hypothesis is known. You can have a look on the Wikipedia articles

Computational methodology

Shuffling

If we cannot find the statistic distribution under the null hypothesis using mathematics, then we can try to construct it from the data. Let’s again, assume the example of boys and girls, where we want to test whether the boys are taller and as a statistic we use the k = mean_boys - mean_girls. Let’s assume that we have sampled 10 boys and 8 girls.

The null hypothesis states that the samples of boys and girls should come from populations that do not differ. Can we construct, repeatitively such samplings?

Yes, it is possible. One idea is to put all 18 samples in a common pool. Then, randomly pick 8 and call them “boys” and randomly pick “7” and call them girls. This, shuffling approach makes samples coming from a mixed pool. In other words the two samples are coming from a single population, i.e., with the same means (since there is only one population). Thus, in this way the distribution of the statistic k describes the null hypothesis.

Simulations

We will see this methodology later on, when we will study population genetics hypotheses.