Understanding the steps involved in statistical hypothesis testing: We will use the toad example seen in class - From Whitlock and Schluter (2014): Humans are predominantly right handed. Do other animals exhibit handedness as well? Bisazza et al. (1996) tested this possibility on the common toad. They sampled (randomly) 18 toads from the wild. They wrapped a balloon around each individual’s head and recorded which forelimb each toad used to remove the baloon. Do other animals exhibit handedness as well? Well, this question boils down to “Do right-handed and left-handed toads occur with equal frequency in the toad population, or is one type more frequent than the other?”
Result of the study: 14 toads were right handed and four were left handed. Are these results evidence of handedness in toads? Let’s recapitulate the steps involved in hypothesis testing:
Transform the scientific question into a statistical question.
State the null (theoretical population) and alternative hypotheses based on population values.
Compute the appropriate test statistic.
Determine the P-value by contrasting the sample value with a sampling distribution that assumes the null hypothesis to be true (sampling from a theoretical population where the null hypothesis is true), i.e., probability of finding the observed, or a more extreme value in the sampling distribution of the theoretical population.
Draw a conclusion by comparing the observed P-value against the significance level (𝛼). If P-value greater than 𝛼, then do not reject H0; if P-value smaller than 𝛼, then reject H0.
1) Scientific question into a statistical question: Do right-handed and left-handed toads occur with equal frequency in the toad population, or is one type more frequent than the other?
2) State the statistical hypotheses:
H0: the number of right and left handed toads are equal. HA: the number of right and left handed toads are different.
3) Decide on an appropriate statistic: Number of right-handed frogs:
14
Note that here we could had used the proportion of rigth-handed over the total (14/18) or left-handed over the total (4/18)
4) Determine the P-value by contrasting the sample value with a sampling distribution that assumes the null hypothesis to be true (is based on a theoretical population where H0 is true):
4.1 - Building the sampling distribution assuming the null hypothesis as true. This will repeat the “paper sampling from a bag approach” seen in class:
Let’s start by learning how to take a random sample based on a categorical variable having two possible outcomes from a theoretical population of known probabilities (here, 50%/50% chance of each hand) with the desired sample size:
temporarySample <- sample(c("L", "R"), size = 18, prob = c(0.5, 0.5), replace = TRUE)
temporarySample
We can then sum the values for a particular category (here we decided right-handed):
number.R <- sum(temporarySample == "R")
number.R
Repeat the two sets of commands above a couple of times and notice the changes in the number of right-handed (i.e., number.R)
Now, let’s generate a huge number of samples, i.e., the sampling distribution assuming the null hypothesis as true:
sum.vector <- vector()
number.samples <- 1000000
for(count.samples in 1:number.samples) {
temporarySample <- sample(c("L", "R"), size = 18, prob = c(0.5, 0.5), replace = TRUE)
sum.vector[count.samples] <- sum(temporarySample == "R")
}
Take a look into what the vector sum.vector recorded - remember the function head lists the first 6 values in a vector or matrix (here the first 6 values our of 1000000 as generated above). Remember that the values recorded are the number of right-handed frogs from a theoretical population where H0 is true.
head(sum.vector)
Let’s build a frequency distribution table for the samples (i.e., sampling distribution of rigt-handed frogs assuming the null hypothesis as true). But first, we need to save sum.vector as a factor with 19 levels, representing each possible integer outcome between 0 and 18 (19 possible outcomes). This step makes sure that all possible 19 outcomes are included, i.e., even categories with no occurrences (e.g., 18 frogs all right-handed or 0 frogs left-handed; this could happen even by chance alone).
factor18 <- factor(sum.vector, levels = 0:18)
factor18
Now the table:
Table.TheoreticalPop <- table(factor18, dnn = "numberRightToads")/number.samples
data.frame(Table.TheoreticalPop)
The table above lists the proportion of each possible random sample outcome, from 0 up to 18 right handed individuals. As one could have predicted, the most common category is 9 in which half of the toads are right- and the other half left-handed. But notices that many samples differed from this value, showing sampling variation around the theoretical value (50%/50% right- and left-handed).
Because we are dealing with a discrete variable (number of right-handed toads), the histogram will lump discrete values (number of right-handed toads) into classes. In this case, we rather use a barplot using the table we generated for the theoretical population so that each discrete value appears separately:
barplot(height = Table.TheoreticalPop, space = 0, las = 1, cex.names = 0.8, col = "white", xlab = "Number of right-handed toads", ylab = "Relative frequency")
Let’s see where our observed sample (14 right handed) situates on the graph by ploting a red vertical line using the command abline (adds a straight line to a plot):
abline(v=14,col="red")
abline(v=4,col="red")
Note that there are not many values that are as extreme (equal) or more extreme than the observed (i.e., equal or greater than 14 right-handed frogs AND equal or smaller than 14 left-handed toads)
The distribution is pretty symmetric but note that the relative frequency 8 left- handed toads may not be exactly the same as 10-right handed toads; and the same is the case for the relative frequencies of 2 left- and 16-right handed toads as we did not generate the distribution for infinite number of samples; note, however, that they are pretty similar!
4.2 - Calculate the probability of rejecting the null hypothesis, i.e., the proportion of samples from the sampling distribution assuming the null hypothesis that are equal or greater than 14 right-handed frogs AND equal or smaller than 14 left-handed toads, divided by the total number of samples:
frac14orMore <- sum(sum.vector >= 14)/number.samples
frac14orMore
Given that the sampling distribution is symmetric, the value above is pretty similar to:
frac4orLess <- sum(sum.vector <= 4)/number.samples
frac4orLess
Given that the interest here is not to generate evidence that the dominant limb of toads is their right limb but rather generate evidence as to whether they have a dominant limb (either right or left), as before, we adopt a probability that reflects both sides of the sampling distribution from the theoretical population:
P.value <- 2*frac14orMore
P.value
This probability can be equally estimated by:
P.value <- frac4orLess+frac14orMore
P.value
Note that the difference between the two values above are due to the fact that our distribution was computer generated and as such, it is not perfectly symmetric. The one generated by infinite sampling would had given exactly the same value because it would had been perfectly symmetric.
Regarless, the probability of finding a sample of 18 toads (from a theoretical population in which the number of right- and left-handed individuals are equal) in which 14 are right-handed and 4 are left-handed (hence calculating the probabilities from both sides of the curve) is around 0.03 (it may change a bit as it is computer generated). This probability is smaller than an alpha of 0.05 and provides evidence to say that toads exhibit handedness. In other words, the probability of finding a sample value with 14 toads as right-handed (4 left-handed) is quite implausible (inconsistent) with what is expected from a theoretical population generated assuming the null hypothesis as true, i.e., with half of the individuals being right-handed and the other half being left-handed!
Now, this computational experiment allowed us to understand that a statistical hypothesis test is performed by contrasting the observed sample value against a sampling distribution assuming the null hypothesis as true. However, in real applications we don’t use a computational approach but rather an analytical solution that considers all infinite possible samples to build the sampling distribution assuming that the null hypothesis is true (in stats jargon we would had said “sampling distribution under the null hypothesis”). This can be done in at least two ways:
The Z-score test (seen in our lecture 3):
prop.test(14,18,p=0.5,correct=FALSE)
The important output is the p-value, i.e., 0.01842. This assumes a normal approximation that may not work well for very small sample sizes (as discussed in class).
The binomial test, which is based on all infinite sample values from a binomial variable (i.e., two possible outcomes, left and right):
binom.test(14, 18, p = 0.5)
The important output is the p-value, i.e., 0.03088. Note that this probability is very similar to the one obtained using the computational approch, i.e., calculated on the basis of frac4orLess+frac14orMore. This is because, for small sample sizes (n=18), the normal approximation will tend to generate smaller P-values than it should.
Understanding the properties of estimators:
Remember that an unbiased estimator (i.e., based on a single sample) is one that the average of all infinite sample values equals the true parameter (population value).
The case of the sample mean: Let’s assume a uniformally distributed population with mean equal to 50 and generate a large number of samples from the population (say 1000000). This can be done by the function runif, which is used to generate random samples from a given uniform distribution. The uniform distribution is described simply by its minimum and maximum. Let’s see a quick example in which we generate 10 values (n=10) between 15 (minimum) and 50 (maximum). I picked these values for no reason; the principles shown here will work for any set of values.
runif(n=10,min=15,max=50)
Repeat the command above a couple of times to see what happens.
Let’s plot the population distribution. We can do that by plotting a huge number of values from the population:
hist(runif(n=1000000,min=15,max=50),breaks=35)
Now, let’s take multiple samples from the population and for each of them calculate its mean:
sample.means <- vector()
number.samples <- 1000000
for (count.samples in 1:number.samples){
sample.means[count.samples] <- mean(runif(n=10,min=15,max=50))
}
Now, let’s plot the sampling distribution of means:
hist(sample.means)
What is the difference in shape between the population and the sampling distribution?
What is the mean of the population? For a uniform distribution, then it is simply the (max + min)/2, which in our case is 32.5. This could had been approximated calculating the mean of a large number of values from the population:
mean(runif(n=1000000,min=15,max=50))
Let’s calculate the mean of the sampling distribution:
mean(sample.means)
Here there will be a tiny difference between the true population mean value (32.5) and the mean of the sampling distribution (e.g., 32.50077). This is because we took 1000000 samples and not an infinite number from the population! BUT, because the sample mean value equals the population value, the sample mean is an unbiased estimator of the population mean.
We won’t repeat the exercise for a normally distributed population, but the mean of the sampling distribution would equal the true population mean value as well.
So, we can say that the sample mean is a robust estimator for the true population value. It’s robust in this case, because it is not affected by the original distribution of the population (i.e., robust against assumptions).
After the lectures and this tutorial you should be able to: