2024-06-09

Hypothesis Testing

  • Type of statistical analysis (analyzing data to find patterns and trends)
  • Test your assumptions about certain parameters
  • Used to show or estimate a relationship between two statistical parameters
  • Determine if there is enough evidence to conclude a certain condition is true

Example

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

A 1970s car enthusiast believes the number of cylinders is a factor for improving the speed to a vehicle in terms of mile time. In order to support his hypothesis, he plots the data in order to see a trend.

Cylinder vs Mile Time data

The data does indeed show a trend. As the number of cylinders increases, the average time it takes to travel 1/4 mile decreases, supporting his argument. No absolute conclusions can be made as of yet, as more testing is required.

Example 2

The car enthusiast wants to make another assumption, that the more the car weighs, the less miles per gallon it’ll have.

## `geom_smooth()` using formula = 'y ~ x'

However, only focusing on two variables can hide away what the bigger picture is giving. ## Hypothesis Testing Calculation

model: \(\text{z} = (\bar{x} - \mu 0) / (\sigma / \sqrt{n})\)

This calculation finds the z-score that can be used to prove or reject a Null Hypothesis, the assumption that the event won’t occur.

Understanding Hypothesis Calculation

  • HC takes a sample of data in order to prove a trend for an entire population.

  • HC is the entire process of sculpting a hypothesis and preparing the appropriate conditions for the test to undergo.

  • From hypothesis formulation, to collecting data, choosing the significant level and weighing it against the determined p-value.

Significant Level and P-Value

Before tests can be done, the significance level \((\alpha)\) is the probability of null hypothesis being rejected even when it is true and usually range between 1-10%.

The p-value determines the overall strength of the data and evidence gathered against the null hypothesis. If the \(\text{p-value} \le \alpha\): than the null hypothesis is rejected, showing support for an alternative hypothesis. If the \(\text{p-value} > \alpha\): than the null hypothesis is not rejected, showing that there is not enough gathered evidence in favor of the alternative hypothesis.

Multiple variables

The car enthusiast wants to see the bigger picture and encapsulates multiple variables together: Comparing weight, miles per gallon, gross horsepower, and whether the engine is V-shaped or straight.

R Code For Last Example

g <- ggplot(data=mtcars, aes(wt, mpg)) + geom_point(aes(color = vs, size = hp)) g + xlab(“Weight”) + ylab(“Miles per Gallon”) + scale_size_continuous(“HorsePower”)

ggplot is a handy tool for creating static plots that can even be used by plotly to make interactive.