2025-09-16

What is Hypothesis Testing?

Hypothesis testing is a statistical method of analyzing a portion of a population to determine whether a hypothesis about the whole population is likely true or false. This allows us to make inferences about a large population without having to spend the potential high costs of collecting data on the entire population.

Lets look at a hypothetical example of a clinical drug trial in the following slides.

Hypothetical Example

In this example, let us assume that we are conducting a clinical drug trial for a new medication with four groups:

  • Group 1: taking the new medication being tested
  • Group 2: taking the current standard treatment for the condition being treated
  • Group 3: taking a placebo
  • Group 4: not being treated at all

Step 1: The Null Hypotheis

The first step in Hypothesis Testing is to form a Null Hypothesis. This is a hypothesis that assumes there will be no statistically significant change between the current standard treatment and the new medication. For this example we will formulate our Null Hypothesis as:

\(H_0: \mu_{new} = \mu_{placebo}\)

And out alternative hypothesis as:

\(H_1: \mu_{new} > \mu_{placebo}\)

Step 2: Data Collection

After forming our Null Hypothesis, we can begin to collect data from a portion of our population. For our example we can generate some sample data as follows:

# Seed for example data
set.seed(123456789)

# Setup statistics
n_size <- 200
std_dev <- 10
mu <- c(new_med = 65,
        standard = 50,
        placebo = 35,
        control = 30)

# Generate Example Data
trial_data <- data.frame(
  group = rep(c("New Medication","Standard", "Placebo", "Control"), each = n_size),
  result = c(rnorm(n_size, mu["new_med"], std_dev),
             rnorm(n_size, mu["standard"], std_dev),
             rnorm(n_size, mu["placebo"], std_dev),
             rnorm(n_size, mu["control"], std_dev)
  )
)

Step 3: Data Analysis Part 1

Step 3: Data Analysis Part 2

Step 3: Data Analysis Part 3

It is also important to take into consideration any covariates that may impact our results. Here we have included example age and viral load data for each group.

Step 4: Perform a Test on the Data

Next we compare the means of each group using a t-test to determine if the new medication and standard treatment groups were statistically different.

\(t=\frac{ X_{new}-X_{standard}} {\sqrt{ \frac{s^2_{new}}{n_{new}} + \frac{s^2_{standard}}{n_{standard}}} }\)

Step 5: Interpret the Results

Our t-test results let us find a p-value that we can use to compare with our chosen significance level (usually 0.05).

From the previous slide, our p-value is:

## [1] 1.605579e-46

If our p-value is less than our chosen significance level, then we can reject the null hypothesis. Our data shows that the new medication results in a faster mean recovery time than the standard treatment.

If our p-value is greater than or equal to our chosen significance level, we fail to reject the null hypothesis. This does NOT mean that our null hypothesis is true. It is possible that with a larger sample size we could obtain a more accurate result that disproves the null hypothesis. But for now, our data does not suggest our new medication offers an improved mean recovery time when compared to standard treatment.