March 17, 2024

Introduction to p-values and hypothesis testing

\[ H_0: \mu = 0 \quad \text{(Null Hypothesis)} \]
\[ H_1: \mu < 0 \quad \text{(Alternative Hypothesis)} \]

Null Hypothesis: Represents the default assumption, states that there is no effect or no difference.

  • Example: There is no difference in sepal width between different species of flower

Alternative Hypothesis: States what the study hopes to demonstrate, usually that a treatment has an effect or that two groups are different.

  • Example: Flowers of different species have different sepal widths

Statistical Significance

The significance level \(\alpha\) is the probability to reject the null hypothesis.


\(\alpha = 0.05\) means there is a 5% risk of concluding a a difference in sepal width exists across different flower species when there is no actual difference.

A p-value is below the significance level allows for the the null hypothesis to be rejected in favor of the alternative hypothesis.

when \(p< \alpha\) we conclude that the observed difference is the result of a real effect.

Iris data set

data("iris")
species_table <- table(iris$Species)
print(species_table)
    setosa versicolor  virginica 
        50         50         50 

A Null hypothesis and Alternative have been declared and a significance level has been set to 0.05

Next a statistical test will be used to obtain a p-value.

A t-test is typically used to compare to groups however, the data set has more than two species. Instead of preforming multiple t-tests an ANOVA will be used.

\(H_0 =\) there is no difference in sepal width between species of Iris
  fitted:

\(H_1 =\) there is a difference in sepal width between species of Iris

\(\alpha = 0.05\)

Visualizing sepal width across species

ggplot(iris, aes(x = Species, y = Sepal.Width, fill = Species)) +
    geom_boxplot() + labs(title = "Sepal Width Across Species",
    x = "Species", y = "Sepal Width") + scale_fill_manual(values = c("cornflowerblue",
    "goldenrod2", "blueviolet"))

Scatterplot

3D plot of Iris characteristics

plot_ly(data = iris, x = ~Sepal.Width, y = ~Sepal.Length, z = ~Petal.Length,
    color = ~Species, type = "scatter3d", mode = "markers", marker = list(size = 5)) %>%
    layout(scene = list(xaxis = list(title = "Sepal Width"),
        yaxis = list(title = "Sepal Length"), zaxis = list(title = "Petal Length"),
        aspectmode = "cube")) %>%
    colorbar(title = "Species")

ANOVA

anova_result <- aov(Sepal.Width ~ Species, data = iris)
summary(aov(Sepal.Width ~ Species, data = iris))
             Df Sum Sq Mean Sq F value Pr(>F)    
Species       2  11.35   5.672   49.16 <2e-16 ***
Residuals   147  16.96   0.115                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is equal to \(2*10^-16\) which is less than \(\alpha = 0.05\)

This tells us that there is some significant difference somewhere among the groups.

In order to know which specific pairs of groups are different from each other a Tukey’s honest significant difference test will be used.

Tukey’s Honest Significant difference

tukey_result <- TukeyHSD(anova_result)

print(tukey_result)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Sepal.Width ~ Species, data = iris)

$Species
                       diff         lwr        upr     p adj
versicolor-setosa    -0.658 -0.81885528 -0.4971447 0.0000000
virginica-setosa     -0.454 -0.61485528 -0.2931447 0.0000000
virginica-versicolor  0.204  0.04314472  0.3648553 0.0087802

From this we can see that each of p-values for comparing pairs of iris’s is smaller than \(\alpha =0.05\)

Therefore we can reject the null hypothesis that there is no difference in sepal width across different iris species.