Chapter 18: Hypothesis Testing

Components of a Hypothesis Test
Testing Means
- One mean
- Two means
Testing Proportions
- One proportion
- Two proportions
Examples

In hypothesis testing, a probability from a relevant sampling distribution is used as evidence against some claim about the true value. When a probability is used in this way, it is referred to as a p-value.

Components of a Hypothesis Test

Hypotheses

The hypotheses are formal statements about the parameters.

The null hypothesis denoted by $H_0$ is interpreted as the baseline or no-change hypothesis and is the claim that is assumed to be true. This is usually stated as an equality (we use an $=$ sign).
The alternative hypothesis denoted by $H_A$ is the conjecture that you’re testing for, against the null hypothesis. This can be a less than, or greater than statement (we use $\lt$ or $\gt$ signs) and it’s called a one-sided hypothesis. Most of the time, however, we will work with two-sided hypothesis (we use the $\ne$ sign).

Test Statistics

The test statistic is the statistic that is compared to the appropriate standardized sampling distribution to yield the p-value.

Rejection Region

A rejection region (RR) specifies the range of values the test statistic might assume that would lead to rejection of $H_0$.

p-value

The p-value is the probability value that is used to quantify the amount of evidence, if any, against the null hypothesis.

Put simply, the more extreme the test statistic, the smaller the p-value. The smaller the p-value, the greater the amount of statistical evidence against the assumed truth of $H_0$.

Significance Level

The significance level, denoted by $\alpha$ is used to qualify the result of the test. The significance level defines a cutoff point, at which you decide whether there is sufficient evidence to view $H_0$ as incorrect.

If the p-value is less than $\alpha$, then the result of the test is statistically significant. This implies there is sufficient evidence against the null hypothesis, and therefore you reject $H_0$.

People typically use small values for $\alpha$ like 5% or 1%.

The level of significance is our willingness to make an error in our conclusion about an experiment due to sampling variability.

Relationship with Confidence Intervals

Remember that the level of significance $\alpha$ corresponds to a $(1-\alpha)$% confidence level.

That means that if a confidence interval does not include the hypothesized value, the results of a test are statistically significant.

Statistical Errors

When making a decision to either “Reject” or “Fail to Reject” a null hypothesis, there is always the possibility that we are making a mistake, since we are basing the decision on a sample and not the population.

Type I Error: Reject $H_0$ when $H_0$ is really true.
Type II Error: Failing to reject $H_0$ when $H_0$ is not true.

In reality, we never know if we make a Type I or Type II error because we never know everything about the population that we are studying. All we can do is control the probability of making one of these errors.

Testing Means

One mean

The test statistic Z in a hypothesis test for a single mean and known $\sigma$ with respect to a null value of $\mu_0$ is given as:

\[Z = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}\]

The test statistic T in a hypothesis test for a single mean and unknown standard deviation with respect to a null value of $\mu_0$ is given as:

\[T = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}\]

Two means

When we want to directly compare the means of two distinct groups of measurements we want to test:

\[ H_0: \mu_1 = \mu_2\] Notice that this is equivalent to the following statement:

\[ H_0: \mu_1 - \mu_2 = 0\] so when we test for differences in means, we really are testing on whether the mean of two groups are equal to each other.

There are several versions for this test, but we will concentrate on two cases, independent samples, and paired samples.

Independent Samples

The most general case is where the two sets of measurements are based on two independent, separate groups (also referred to as unpaired samples).

The test statistic is given by:

\[T = \frac{(\:\overline{x_1}-\overline{x_2}\:) -\mu_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\] where the degrees of freedom will be determined as the smaller of $n_1-1$ or $n_2-1$, OR $n_1+n_2-2$ if the standard deviations and sample sizes are similar.

Paired Samples

Paired data occur if the measurements forming the two sets of observations are recorded on the same individual or if they are related in some other important or obvious way. A classic example of this is “before” and “after” observations, such as two measurements made on each person before and after some kind of intervention treatment.

In this case we are concerned on the mean difference $\bar{d}$ and the test statistic is given by:

\[T = \frac{\bar{d} -\mu_0}{\sqrt{\frac{s_d^2}{n} }}.\]

Testing Proportions

One proportion

In testing for the true value of some proportion of success, $p$, let $\hat{p}$ be the sample proportion over n trials, and let the null value be denoted with $p_0$. You find the test statistic with the following:

\[Z =\frac{\hat{p}-p_0}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}}\]

Two proportions

As with the difference between two means, you’re often testing whether the two proportions are the same and thus have a difference of zero.