2025-02-06

Hypothesis Testing Basics

Hypothesis Testing is the process by which a statistician tests the significance of a particular hypothesis made about the data from a population.

The starting point for hypothesis testing is coming up with what assumption you would like to test. This begins by creating both a Null Hypothesis and an Alternative Hypothesis.

Your Null Hypothesis will be your baseline that states that there is no relationship between the variables you are testing. Whereas the Alternative Hypothesis will be the opposite stating that there is a significant relationship between the same variables.

Sepal Length and Width by Petal Length

R Code to Create Plotly Plot

To Create that 3d Scatter Plot in R we would use the following code.

plot <- plot_ly(iris, 
                x = ~Sepal.Length, 
                y = ~Sepal.Width, 
                z = ~Petal.Length, 
                color = ~Species, 
                type = "scatter3d", 
                mode = "markers")

plot <- plot %>% layout(title = "Sepal Length, Width vs Petal Length",
                        scene = list(
                          xaxis = list(title = "Sepal Length"),
                          yaxis = list(title = "Sepal Width"),
                          zaxis = list(title = "Petal Length")
                        ))

plot

Hypothesis Testing on the Iris Dataset

With the information on this plot we can use it to test some hypothesis we may come up with using the iris dataset.

For example, our null hypothesis could be that the mean sepal length of the Setosa and Versicolor species are the same. Our alternative hypothesis could be that the mean sepal length of the Setosa species is higher than that of the Versicolor species.

We can use the following formula for a T-Test to compare the means of both groups to test our hypothesis.

\[ t = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

P-Value – Significance

Now that we have calculated our p-value using the T-Test, if our p-value \((p)\) is less than the level of significance we’ve chosen, e.g., \(\alpha = 0.05\): \[ p < \alpha \] We can reject our null hypothesis, showing that there is statistical significance.

However, if our p-value \((p)\) is greater than the level of significance we’ve chosen, e.g., \(\alpha = 0.05\): \[ p \geq \alpha \] then we can accept our null hypothesis, showing there is NOT statistical significance.

Sepal Length vs Sepal Width by Species

This plot gives us another idea of the data that we were looking at before when examining the process of hypothesis testing.

Petal Length vs Petal Width by Species

Petal Length vs Width Hypothesis

By having our plot split up into separate areas for each species it allows us to come up with different ideas for hypothesis testing. For example we can see that for every species the slope of the regression line appears to be positive.

With this in mind our null hypothesis can be that the slope of the regression line is 0 for each species and our alternative hypothesis can be that the slope of the regression line is not 0 for each species. We will stick with the significance value of 0.05.

Regression Analysis

We can use R to perform the linear regression for each species.

The resulting P values are as follows. \[ P \text{ for Setosa is } 0.0186 \\ P \text{ for Versicolor is } 1.27 \times 10^{-11} \\ P \text{ for Virginica is } 0.02254 \] Since all of these P values are below our significance value of 0.05. We can reject our Null Hypothesis.

Linear Regression Code

This is the R Code used to perform the linear regression in the previous slide.

setosa_lm <- lm(Petal.Width ~ Petal.Length,
                data = subset(iris, Species == "setosa"))
versicolor_lm <- lm(Petal.Width ~ Petal.Length,
                    data = subset(iris, Species == "versicolor"))
virginica_lm <- lm(Petal.Width ~ Petal.Length,
                   data = subset(iris, Species == "virginica"))

summary(setosa_lm)
summary(versicolor_lm)
summary(virginica_lm)

Thank you for your time.

Thank you for taking the time to read through my slides. I hope you learned something cool!