DAT301: Hypothesis Testing

2024-09-21

Hypothesis Testing

Hypothesis testing is a statistical method used to make decisions about a population parameter based on sample data.

Null Hypothesis (\(H_0\)): The default assumption.
Alternative Hypothesis (\(H_A\)): The challenge to the null.
It helps assess the strength of evidence in favor of or against a certain claim.

The Steps of Hypothesis Testing

Step 1: Formulate \(H_0\) and \(H_A\).
Step 2: Choose significance level (\(\alpha\)).
Step 3: Perform the test and calculate p-value.
Step 4: Compare p-value with \(\alpha\).
Step 5: Reject or fail to reject \(H_0\).

Example: T-Test

We will perform a t-test to see if heavier cars have significantly lower MPG.

Null Hypothesis:

\(H_0\): Heavier cars have the same average MPG as the population (μ = 20).

Alternative Hypothesis:

\(H_A\): Heavier cars have significantly lower MPG than the population.

## Loading public data set
heavy_cars <- subset(mtcars, wt > median(mtcars$wt))

## Performing t-test
t.test(heavy_cars$mpg, mu = 20)

## 
##  One Sample t-test
## 
## data:  heavy_cars$mpg
## t = -6.3224, df = 15, p-value = 1.369e-05
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
##  14.20857 17.12893
## sample estimates:
## mean of x 
##  15.66875

Car Weight vs MPG

Having performed the t-test, we will next visualize the relationship between car weight and fuel efficiency using a scatter plot with a linear trend line.

An additional hypothesis we could have would be:

\(H_0\): Car weight has no significant effect on MPG (fuel efficiency).

\(H_A\): Heavier cars have significantly lower MPG than lighter cars.

## `geom_smooth()` using formula = 'y ~ x'

3D Scatter Plot

In the following 3D scatter plot, we will visualize the relationship between car weight, MPG, and quarter-mile time (or qsec).

Hypothesis:

\(H_0\): There is no relationship between car weight, MPG, and quarter-mile time.

\(H_A\): Heavier cars have lower MPG and longer quarter-mile times.

Box Plot

Lastly, in the following plot, we compare the fuel efficiency (mpg) of cars with different cylinder counts.

Hypothesis:

\(H_0\): There is no significant difference for the MPG of cars with 4, 6, or 8 cylinders.

\(H_A\): Cars that have 6 or 8 cylinders have significantly lower MPG than those with 4 cylinders.

Recap: Steps for Hypothesis Testing

Define the hypotheses (\(H_0\) and \(H_A\)).
Choose \(\alpha\) (e.g., 0.05).
Collect and analyze sample data.
Calculate p-value and compare to \(\alpha\).
Make the decision based on p-value.

Conclusion

Hypothesis testing is an essential tool in statistics that allows us to make informed decisions based on sample data.

Provides a structured approach for decision-making.
Different tests are used for different data types (e.g., t-test, chi-square test).

Through this method, we’re able to precisely test assumptions about populations.