2024-10-08

Introduction to Hypothesis Testing

What is Hypothesis Testing?

Hypothesis testing is a statistical method that is used to make decisions about the population parameters based on sample data.

  • Null hypothesis (H₀): Assumes no effect or no difference.
  • Alternative hypothesis (H₁): Suggests there is an effect or a difference.

Steps in Hypothesis Testing

  1. Formulate H₀ and H₁.
  2. Choose a significance level (α).
  3. Collect data and calculate the test statistic.
  4. Compute the p-value.
  5. Compare the p-value with α and make a decision.

What is a P-value?

The p-value is the probability of obtaining a result equal to or more extreme than the one observed, under the assumption that H₀ is true.

\(p = P(\text{observing data} \mid H_0 \text{ true})\)

In the case of a two-tailed test, the p-value is calculated as:

\[ p = P(T \geq |t|) \quad \text{for a two-tailed test, where} \quad t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

A small p-value suggests that the observed data is unlikely under the null hypothesis, leading to its rejection.

Hypothesis for Transmission Types

Using the mtcars dataset, we will test if there is a significant difference in mpg between automatic (am = 0) and manual (am = 1) transmission cars.

head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

We want to compare the means of mpg for automatic and manual cars:

  • Null hypothesis (H₀): There is no difference in mean mpg between automatic and manual cars. \(H₀: \mu_{auto} = \mu_{manual}\)
  • Alternative hypothesis (H₁): There is a difference in mean mpg between the two transmission types. \(H₁: \mu_{auto} \neq \mu_{manual}\)

Data Visualization with Plotly

mpg vs wt by Transmission

# Plotly scatter plot for mpg vs wt colored by transmission
plot_ly(mtcars, x = ~wt, y = ~mpg, color = ~as.factor(am),
        type = 'scatter', mode = 'markers',
        marker = list(size = 10)) %>%
  layout(title = "mpg vs Weight by Transmission Type",
         xaxis = list(title = "Weight"),
         yaxis = list(title = "Miles per Gallon (mpg)"),
         legend = list(title = list(text = "Transmission Type (0=Auto, 1=Manual)")))
Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Boxplot of mpg by Transmission

# ggplot boxplot of mpg by transmission type
ggplot(mtcars, aes(x = factor(am, labels = c("Automatic", "Manual")), y = mpg)) +
  geom_boxplot(aes(fill = factor(am))) +
  labs(x = "Transmission Type", y = "Miles per Gallon (mpg)", title = "Boxplot of mpg by Transmission") +
  theme_minimal() +
  scale_fill_manual(values = c("#FF9999", "#99CCFF"))

Hypothesis Test - Two-Sample t-test

Performing the t-test

t_test <- t.test(mpg ~ am, data = mtcars)
t_test
    Welch Two Sample t-test

data:  mpg by am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -11.280194  -3.209684
sample estimates:
mean in group 0 mean in group 1 
       17.14737        24.39231 

We conducted a two-sample t-test comparing the mpg between automatic and manual cars. The p-value from the t-test helps us decide whether to reject or fail to reject the null hypothesis.

Results of Hypothesis Test

If the p-value is less than our significance level (α = 0.05), we reject the null hypothesis and conclude that there is a significant difference in mpg between automatic and manual cars.

  • Test Statistic: -3.77
  • Degrees of Freedom: 18.33
  • P-value: 0.0014

We compared mpg between automatic and manual cars using a two-sample t-test. Based on our p-value (0.0014) which is less than 0.05, we reject the null hypothesis and conclude there is a significant difference between mpg in automatic versus manual cars.

Plot of the Distribution

# ggplot density plot of mpg by transmission
ggplot(mtcars, aes(x = mpg, fill = factor(am))) +
  geom_density(alpha = 0.5) +
  labs(x = "Miles per Gallon (mpg)", title = "Density Plot of mpg by Transmission") +
  scale_fill_manual(values = c("#FF9999", "#99CCFF"), name = "Transmission") +
  theme_minimal()

Conclusion

Hypothesis testing is a powerful tool for determining statistical significance. By calculating the p-value and comparing it with our significance level, we can make informed decisions about whether to reject the null hypothesis.