Understanding Hypothesis Testing

What is Hypothesis Testing?

Hypothesis testing is a statistical method used to determine whether there is enough evidence in a sample to support a claim about a population.

It is commonly used in science, engineering, business, and medicine to make data-driven decisions.

Why is Hypothesis Testing Important?

Hypothesis testing helps answer questions such as:

Is a new teaching method more effective?
Is the average product lifetime different from what is claimed?
Did a treatment actually improve results?

It allows researchers to make conclusions using statistical evidence.

Null and Alternative Hypotheses

In hypothesis testing, we begin with two competing statements.

Null hypothesis:

\[ H_0 : \mu = 75 \]

Alternative hypothesis:

\[ H_a : \mu \neq 75 \]

Where:

\(H_0\) = null hypothesis
\(H_a\) = alternative hypothesis
\(\mu\) = population mean

Test Statistic Formula

A commonly used test statistic is the z-score.

\[ z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} \]

Where:

\(\bar{x}\) = sample mean
\(\mu\) = population mean
\(\sigma\) = population standard deviation
\(n\) = sample size

This statistic measures how far the sample mean is from the hypothesized population mean.

Example Data

Suppose we want to test whether the average exam score is different from 75.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   54.91   73.06   78.62   78.90   84.92   99.87

This dataset simulates 100 exam scores.

Histogram of Scores (ggplot)

This histogram shows the distribution of simulated exam scores.

Boxplot of Scores (ggplot)

The boxplot shows the median, quartiles, and possible outliers in the data.

Interactive Density Plot (Plotly)

Example R Code

mean(scores)

## [1] 78.90406

sd(scores)

## [1] 9.128159

t.test(scores, mu = 75)

## 
##  One Sample t-test
## 
## data:  scores
## t = 4.2769, df = 99, p-value = 4.374e-05
## alternative hypothesis: true mean is not equal to 75
## 95 percent confidence interval:
##  77.09283 80.71528
## sample estimates:
## mean of x 
##  78.90406

This slide includes R code that calculates statistics and performs a hypothesis test.

Conclusion

Hypothesis testing is a powerful statistical tool used to evaluate claims about populations.

In this presentation we:

defined hypothesis testing
used mathematical formulas
visualized data with ggplot
created an interactive plotly visualization
included statistical code in R

These techniques help analysts make informed, data-driven decisions.