2023-03-13

Hypothesis Testing

  • Hypothesis testing is a procedure that uses samples from a population to make an assumption about that population.
  • Even if the sample you choose seems to support your hypothesis, you can’t conclude that your hypothesis is correct
    • You have no way of knowing your sample is representative of the overall population; you might have sampled extreme observations
  • Hypothesis tests use the sample size and various statistics from the sample (i.e. mean, variance) to determine whether we have enough evidence to support our conclusion (reject the null hypothesis) or not (fail to reject the null hypothesis).

Let’s explore the use of Hypothesis Testing with an example!

Example

Is there enough evidence to suggest that the passage of the law requiring seatbelts decreased the mean number monthly vehicular deaths?

  • Data set used: Seatbelts
##      DriversKilled drivers front rear  kms PetrolPrice VanKilled law
## [1,]           107    1687   867  269 9059   0.1029718        12   0
## [2,]            97    1508   825  265 7685   0.1023630         6   0
## [3,]           102    1507   806  319 9963   0.1020625        12   0
  • Two-Sample Unpaired Non-pooled T-test
    • Samples from 2 independent populations:
      • Population 1: Before Seatbelt Law
      • Population 2: After Seatbelt Law
    • We assume the population variances are unequal (non-pooled)

Raw Data

  • By just eyeballing the raw data, it appears like the number of deaths decreases after the law was enacted, but how can we be sure?
    • Answer: Hypothesis testing!

Check Assumptions

  • Both samples appear to be normally distributed
  • The number of samples after the law was enacted is relatively small (23)
    • T-Test is more resiliant than Z-Test, so we can continue

Hypotheses

First, we formulate our null and alternative hypotheses: \[\begin{align*} H_0:& \: \mu_1 - \mu_2 \leq 0 \\ H_a:& \: \mu_1 - \mu_2 > 0 \end{align*}\] (\(\mu_i\) is the unknown mean of population \(i\))

These are equivalent to: \[\begin{align*} H_0:& \: \mu_1 \leq \mu_2 \\ H_a:& \: \mu_1 > \mu_2 \end{align*}\]

Test statistic

The test statistic for the Two Sample Unpaired Non-pooled T-Test is defined as: \[t = \frac{\overline{x}_1 - \overline{x}_2}{\sqrt{s_{\overline{x}_1}^2 + s_{\overline{x}_2}^2}}\] where

  • \(s_{\overline{x}_i} = \frac{s_i}{\sqrt{N_i}}\)
  • \(N_i\) is the sample size of sample \(i\)
  • \(s_i\) is the variance of sample \(i\),
  • \(\overline{x}_i\) is the mean of sample \(i\).

We find that \(t = 5.9139984\).

Degrees of Freedom

“Degrees of Freedom”, in short, is a parameter of the T-Distribution. The degrees of freedom for the Two Sample Unpaired Non-pooled T-Test is defined as: \[v = \frac{\left(\frac{s_1^2}{N_1} + \frac{s_2^2}{N_2}\right)^2}{\frac{s_1^4}{N_1^2 v_1} + \frac{s_2^4}{N_2^2 v_2}}\] where

  • \(N_i\) is the sample size of sample \(i\)
  • \(s_i\) is the variance of sample \(i\)
  • \(v_i\), the original degrees of freedom, is \(N_i - 1\).

We find that \(v = 30.4073931\).

Results

We choose a confidence level of 95%, which implies our level of significance is \(\alpha = 0.05\).

We can use the inverse T-Distribution to find our critical value:

confidenceLevel <- 0.95
significanceLevel <- 1 - confidenceLevel
criticalValue <- qt(significanceLevel, degreesOfFreedom, 
                    lower.tail = F)

Thus, our critical value is 1.6965369.

Since we are performing a right-tailed T-test, we reject the null hypothesis if the test statistic is greater than the critical value. We see that our test statistic (5.9139984) is greater than the critical value (1.6965369), so we reject the null hypothesis.

P-Value

  • Another way to look at it is comparing the area under the curve to the right of the critical value, to the area under the curve to the right of the test statistic (the p-value).
    • P-value: The probability of observing a sample that is more extreme than the test statistic

We can calculate our P-Value with the following code:

p.value <- pt(testStatistic, degreesOfFreedom, lower.tail = F)
  • Since our p-value (\(8.4278357\times 10^{-7}\)) is less than our significance level (\(0.05\)), we can reject the null hypothesis.

    • There is enough evidence to suggest that the enactment of the seat belt law decreased the mean number of monthly vehicular deaths.

Confirming Results with R

We can confirm our results using the R function t.test:

res <- t.test(withoutLaw$Killed, withLaw$Killed, 
              var.equal = F, alternative = "greater",
              conf.level = confidenceLevel)
## 
##  Welch Two Sample t-test
## 
## data:  withoutLaw$Killed and withLaw$Killed
## t = 5.914, df = 30.407, p-value = 8.428e-07
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  21.40882      Inf
## sample estimates:
## mean of x mean of y 
##  135.4556  105.4348

Hypothesis Test Graph

This graph shows our test statistic is well within the rejection region, and the area to the right of the critical value is far greater than the area to the right of the test statistic.

Code for Hypothesis Test Graph

The following code generates the graph of the previous slide:

x <- seq(-2, 7, length=100)
plotDf <- data.frame(x=x, dt=dt(x, res$parameter))
criticalValue <- qt(confidenceLevel, res$parameter)
graph <- ggplot(plotDf, aes(x, dt)) +
geom_area(data=subset(df2, x > criticalValue), 
          aes(fill="Rejection Region")) +
geom_line(aes(color="T-Distribution")) +
geom_vline(aes(xintercept = res$statistic, color="Test Statistic"),
           linetype="dashed", linewidth=2) +
geom_vline(aes(xintercept = criticalValue, color="Critical Value"),
           linetype="dashed", linewidth=2) +
scale_fill_manual(name= "Regions", values=c("red")) +
scale_color_manual(name= "Lines", values=c("orange", "black", "blue")) +
ggtitle("T-Distribution for Hypothesis Test") +
xlab("T Score") +
ylab("Density") +
theme(plot.title = element_text(hjust = 0.5))