Hypothesis Testing

Introduction

What is hypothesis testing?
A statistical method used to make inferences about a population parameter based on sample data.

What is the objective?
To evaluate whether there is enough evidence to reject or accept the null hypothesis.

Part of Hypothesis Tests

Hypothesis: A proposed claim that we want to investigate.

Null Hypothesis: \(H_0\)
Represents a default assumption, typically stating no effect or no difference.

Alternative Hypothesis:\(H_1\)
Contradicts the null hypothesis, suggesting an effect or difference exists.

Null and Alternative Hypothesis

We are going to assume that \(H_0\) is true unless the evidence says that we need to reject it and accept \(H_1\).

Thus, the possible outcomes include:

- Reject the Null Hypothesis \(H_0\) and accept the Alternative Hypothesis \(H_1\).

- Fail to reject the Null Hypothesis \(H_0\).

Level of Confidence and Level of Significance

Level of Confidence: How confident we are in the proposed claim. We denote the level of confidence by c and it takes values such as 95% or 99%.
Level of Significance: We denote it by \(\alpha\) and it is the complement of the level of confidence, being \(\alpha\) = 1 - c.

Type of errors:
Type I Error (α): Rejecting the null hypothesis when it is true.
Type II Error (β): Failing to reject the null hypothesis when it is false.

If p-value < α, reject \(H_0\); otherwise, do not reject \(H_0\).

Test Statistics

Is it calculated from the sample data and used to decide whether we reject or accept the null hypothesis \(H_0\).

When we do hypothesis tests for the population mean we have 2 options when it comes to the test statistic:

Z = \(\frac{\bar{X} - \mu}{\sigma / \sqrt{n}}\)

The Z-test is typically used when a sample has a normal distribution or the sample size is greater than 30.

\(t_{n-1}\) = \(\frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}\)

The T-test is used when a sample is unknown or is less than 30. It follows a t-student distribution with n-1 degrees of freedom.

Example

Null Hypothesis: Means are equal
Alternative Hypothesis: Means are unequal
We test the null hypothesis of two populations (\(H_0: \mu_1 = \mu_2\)) against the alternative hypothesis (\(H_1: \mu_1 \neq \mu_2\)). We can use the two-sample t-test:
t = \(\frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\)

We reject the null hypothesis if the value of t is greater than the critical value.

ggplot #1

ggplot #2

Plotly Plot

R Code

ggplot #1 R code
xvals <- seq(-10, 10, by = 0.01)
df <- data.frame(x = xvals, y = dnorm(xvals, mean = 0, sd = 1))

ggplot(df, aes(x = x, y = y)) +
geom_line(color = “lightpink”) +
labs(x = “x”, y = “y”)

ggplot #2 R code
xvals <- seq(-4, 4, by = 0.01)
df <- 5
dt <- data.frame(x = xvals, y = dt(xvals, df))

ggplot(dt, aes(x = x, y = y)) +
geom_line(color = “skyblue”) +
labs(x = “x”, y = “y”)