Here we introduce a general framework to test hypotheses by using statistics computed from random samples. Since these statistics have a sampling distribution, our decision is made in the face of random variation.

The process that we develop here works as a direct analogy to a criminal jury trial. In a jury trial we assume that the accused is innocent, and the jury will decide that a person is guilty only if there is very strong evidence against the presumption of innocence.

We start by defining the parameter of interest in the context of the problem. For example, we may be interested in the mean sales last quarter \((\mu)\) or the proportion of new customers \((p).\)

Our next step is to determine the two competing hypotheses we want to test about our chosen parameter. One hypothesis is called the null hypothesis and is denoted by \(H_0.\) The other hypothesis is called the alternative hypothesis and is denoted by \(H_a.\)

The null hypothesis is the “status quo” hypothesis of no change. It generally takes the form that the parameter equals a particular value, i.e. \(H_0: \mu = 50\) or \(H_0: p = 0.3.\) We assume the null hypothesis is true and this assumption will be maintained unless we have strong evidence otherwise.

The alternative hypothesis is the research hypothesis. It generally takes one of three forms: \(<\) or \(>\) or \(\ne.\) For example, \(H_0: \mu \ne 50\) or \(H_0: p > 0.3.\)

To connect this back to our jury trial analogy, the null hypothesis is that the defendant is innocent and the alternative hypothesis is that the defendant is guilty. We assume the defendant is innocent unless we have strong evidence (“reasonable doubt”) otherwise. We can make one of two decisions: either we decide the defendant is not guilty or guilty depending on the strength of the evidence.

It is important to note that we only find the defendant guilty or not guilty. If we find the defendant guilty it is because that we had sufficient evidence to do so. On the other hand, if we find the defendant not guilty it is because we had insufficient evidence to do so. We do NOT find the defendant innocent. This same sort of logic applies to hypothesis testing. We either reject the null hypothesis or fail to reject the null hypothesis. We can reject when we have sufficient evidence and we fail to reject when we have insufficient evidence. We never accept a hypothesis. We never say that we have proven one of the hypotheses true or false.

How do we decide whether we have enough evidence to reject the null hypothesis? We base our decision on sample data and how likely it would be to see such data under the assumption that the null hypothesis is true. The likelihood is computed using our knowledge of probability and sampling distributions. The details are contained in the following sections.

Unfortunately, this decision making process is not perfect and certain types of errors can be made.

\(H_0\) is true (innocent) \(H_0\) is false (guilty)
Reject \(H_0\) (find guilty) Type I error Correct decision
Fail to reject \(H_0\) (find not guilty) Correct decision Type II error

The probability of Type I error is called the significance level of the test and is denoted by \(\alpha.\) \[\alpha = P(\text{reject }H_0| H_0 \text{ is true})\] The signficance level of the test is set by the researcher. The most common value for \(\alpha\) is \(0.05\). It is set lower when the consequences of making such an error is severe. For example, consider the consequences of making a type I error in a jury trial for shoplifting vs murder. The consequences of finding a defendant guilty of murder when they are really innocent is much worse than finding a defendant guilty of shoplifting when they are really innocent.

The probability of making a type II error is denoted by \(\beta.\)

\[\beta = P(\text{fail to reject }H_0|H_0 \text{ is false})\]

We would like to avoid making either type of error. However, if you decrease \(\alpha\) then \(\beta\) increases and vice versa. The only way to make both types of error smaller at the same time is to increase the sample size.