Lecture 10: Hypothesis Testing — Null, Alternative, P-Values

tlhanna@central.uh.edu

2026-03-29

Agenda and Announcements

Today

  - **Topic 9: Hypothesis Testing — Null, Alternative, P-Values**
  - t-tests and Chi-Squared Tests: Simple Examples
  - Top Hat Quiz

Next Class

  - Topic 12: Experimental Design Internal Validity & Controls.
  - We will do two articles next week and I will announce them Wednesday

The Story of the t-Test

William Sealy Gossett

Watch First (~5 min)

Before we dive into the mechanics, let’s meet the statistician who invented the t-test.

Key Takeaways from Gossett

Worked at Guinness Brewery — needed to test small samples of barley
Published under the pseudonym “Student” (company secrecy rules)
The test is formally called Student’s t-test
Problem he solved: small samples don’t follow the normal distribution exactly
The t-distribution is wider/flatter than the normal — it accounts for extra uncertainty with small n
As sample size grows, the t-distribution approaches the normal distribution

Review: Connecting the Dots

Where We’ve Been

Lecture 5: Descriptive statistics — mean, variance, standard deviation
Lecture 8: Sampling distributions, Central Limit Theorem, Law of Large Numbers
Lecture 9: Confidence intervals — the range of plausible values for a parameter

The Bridge to Hypothesis Testing

A confidence interval tells us: “The true parameter is probably in this range”
A hypothesis test answers: “Is this relationship real, or just random noise?”
Both rely on the same machinery: - Sample statistics as estimators of population parameters - Probability distributions to quantify uncertainty - The 68-95-99.7 rule and critical values

Null and Alternative Hypotheses

What Is a Hypothesis?

A falsifiable statement about what we believe based on our theory
In statistics, we always test two competing claims simultaneously

The Two Hypotheses

Hypothesis	Symbol	Meaning
Null hypothesis	H₀	No relationship; any pattern is random chance
Alternative hypothesis	H₁ or H_a	A real relationship exists

We start from the assumption that H₀ is true
Our goal: find evidence strong enough to reject H₀

Examples in Political Science

H₀: Democracy has no effect on economic growth
H₁: Democracies have higher economic growth than autocracies

H₀: Gender and party identification are independent (unrelated)
H₁: Gender and party identification are not independent

Important Reminder from Last Lecture

If we reject H₀, does that mean H₁ is definitely true?

NO!

Rejecting H₀ means: there is strong evidence that H₁ is approximately true
We are working within a probability framework — we are never 100% certain
We may be wrong — that’s what Type I and Type II errors are about

The P-Value and Alpha Level

What Is a P-Value?

The p-value is the probability of observing a result at least this extreme if H₀ were true
A small p-value means: “It would be very unlikely to see this pattern by random chance alone”

Common Misconception

The p-value is NOT the probability that H₀ is true.
It is the probability of the data given that H₀ is true.

The Alpha Level (α)

α (alpha) is our pre-chosen threshold — the maximum p-value at which we reject H₀
In social sciences: α = .05 is the standard
This means we are willing to be wrong 1 time in 20 (5%)
Sometimes researchers use α = .01 (1%) or α = .10 (10%)

The Decision Rule

\[\text{If } p < \alpha \text{, reject } H_0\]

\[\text{If } p \geq \alpha \text{, fail to reject } H_0\]

Why “Fail to Reject” and not “Accept”?

We never prove H₀ is true. We simply lack sufficient evidence to reject it.
Absence of evidence is not evidence of absence.

Why Does p < .05 Make Sense?

If H₀ is true and we ran this study 100 times, we’d expect results this extreme 5 times or fewer just by chance
When p < .05, the result would happen fewer than 1 in 20 times by random chance
That’s unlikely enough that we conclude: “This is probably a real pattern, not noise”

The Critical Value Connection

Recall from Lecture 8: Z ≥ 1.96 corresponds to p ≤ .05 (two-tailed)
For a t-test: the critical t-value depends on degrees of freedom (sample size)
Larger test statistic = smaller p-value = stronger evidence against H₀
When the test statistic exceeds the critical value, the p-value is below α

The t-Test

What Is a t-Test?

Tests whether a mean (or difference in means) is statistically significant
Uses the t-distribution (Gossett’s contribution!) which accounts for small sample uncertainty
When n is large (≥ 30), the t-distribution ≈ normal distribution

When Do You Use a t-Test?

Situation	Test Type
One group vs. a known value	One-sample t-test
Two independent groups	Independent samples t-test
Same group measured twice	Paired samples t-test

Tip

Key requirement: The dependent variable (DV) must be continuous (interval or ratio level)

t-Test Formula

The t-statistic for comparing two independent groups:

\[t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\]

\(\bar{x}_1, \bar{x}_2\) = sample means of the two groups
\(s_p\) = pooled standard deviation
\(n_1, n_2\) = sample sizes

As consumers: You will read the t-statistic and p-value from software output — you will not calculate this by hand in most research contexts.

Political Science Example: t-Test

Research Question: Do democracies have higher GDP per capita than autocracies?

H₀: Mean GDP per capita is the same in democracies and autocracies
H₁: Mean GDP per capita is different (or higher) in democracies

Note

This is a real question in comparative politics — the relationship between regime type and economic development has been studied extensively.

t-Test in R

# Simulated data: GDP per capita (thousands USD) by regime type
set.seed(42)
democracies <- c(28, 35, 41, 22, 55, 47, 38, 30, 52, 44, 29, 61, 37, 49,
    33, 58, 26, 43, 51, 36)
autocracies <- c(12, 18, 9, 25, 15, 7, 21, 11, 19, 14, 8, 23, 16, 10, 27,
    13, 20, 6, 17, 22)

# Run independent samples t-test
t.test(democracies, autocracies, alternative = "two.sided", var.equal = FALSE)


    Welch Two Sample t-test

data:  democracies and autocracies
t = 8.7705, df = 29.559, p-value = 1.006e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 19.25163 30.94837
sample estimates:
mean of x mean of y 
    40.75     15.65

Reading the t-Test Output

When you see t-test output, look for:

t = the test statistic (how many standard errors the difference is from zero)
df = degrees of freedom (related to sample size)
p-value = probability of this difference by chance if H₀ is true
95% confidence interval = plausible range for the true difference in means

Decision

If p < .05: Reject H₀ → Evidence supports a real difference between groups
If p ≥ .05: Fail to reject H₀ → Insufficient evidence of a difference

The Chi-Squared Test

What Is a Chi-Squared (χ²) Test?

Tests whether two categorical variables are independent of each other
Uses the χ² distribution (we saw this in Lecture 9 for variance CIs)
Compares observed frequencies to expected frequencies (what we’d expect if the variables were unrelated)

When Do You Use χ²?

Both variables are categorical (nominal or ordinal)
You have a contingency table (cross-tabulation)
Common in survey-based political science research

Tip

Examples: Party ID × Gender, Vote choice × Education level, Region × Policy opinion

The χ² Formula

\[\chi^2 = \sum \frac{(O - E)^2}{E}\]

\(O\) = observed frequency in each cell
\(E\) = expected frequency if the variables were independent
Larger χ² = greater deviation from independence = stronger evidence against H₀

As consumers: Again, software calculates this. Your job is to interpret the test statistic and p-value.

Political Science Example: χ² Test

Research Question: Is party identification independent of gender?

H₀: Party ID and gender are independent (no relationship)
H₁: Party ID and gender are not independent (a relationship exists)

Note

The “gender gap” in political party affiliation is one of the most studied phenomena in American political behavior.

χ² Test in R

# Contingency table: Party ID by Gender (simulated survey data, n =
# 300)
party_gender <- matrix(c(65, 55, 40, 60, 30, 50), nrow = 3, byrow = TRUE,
    dimnames = list(Party = c("Democrat", "Republican", "Independent"),
        Gender = c("Women", "Men")))

# Display the contingency table
print(party_gender)

             Gender
Party         Women Men
  Democrat       65  55
  Republican     40  60
  Independent    30  50

# Run chi-squared test
chisq.test(party_gender)


    Pearson's Chi-squared test

data:  party_gender
X-squared = 6.9024, df = 2, p-value = 0.03171

Reading the χ² Output

When you see chi-squared output, look for:

χ² (X-squared) = the test statistic (sum of squared deviations from expected)
df = degrees of freedom = (rows - 1) × (columns - 1)
p-value = probability of this pattern if the variables were truly independent

Decision

If p < .05: Reject H₀ → Evidence of a relationship between the two categorical variables
If p ≥ .05: Fail to reject H₀ → Insufficient evidence of a relationship

Visualizing the χ² Test Result

Comparing the Two Tests

t-Test vs. Chi-Squared: At a Glance

Feature	t-Test	χ² Test
What it tests	Difference in means	Independence of categorical variables
Variable type (DV)	Continuous (interval/ratio)	Categorical (nominal/ordinal)
Variable type (IV)	Categorical (2 groups)	Categorical
Distribution used	t-distribution	χ² distribution
Key statistic	t	χ²
Decision rule	p < α → reject H₀	p < α → reject H₀

Choosing the Right Test

Quick Test Selection Guide
Research Question	DV Type	IV Type	Test
Is avg. GDP different between regime types?	Continuous	Categorical (2 groups)	t-Test
Is vote choice related to education level?	Categorical	Categorical	Chi-Squared
Does support for policy differ by region?	Continuous	Categorical	t-Test / ANOVA
Is income related to media consumption?	Continuous	Categorical	t-Test / Regression

Putting It All Together

The Full Hypothesis Testing Workflow

State H₀ and H₁
Choose α (usually .05 in social science)
Select the appropriate test (t-test, χ², etc.)
Calculate the test statistic (software does this)
Find the p-value
Decision: Is p < α? - Yes → Reject H₀ — evidence supports the alternative - No → Fail to reject H₀ — insufficient evidence

What p < .05 Really Means

The Intuition

If p = .03, it means: “If there were truly no relationship (H₀ true), we would observe a result this extreme only 3 times out of 100 by random chance.”
That’s unlikely enough that we conclude the relationship is probably real.

Common Mistakes to Avoid

❌ “p = .03 means there is a 97% chance H₁ is true” — Wrong
❌ “p = .06 means there is no relationship” — Wrong (just insufficient evidence)
❌ “Statistical significance = practical importance” — Not necessarily!
✅ p < .05 means: the result is unlikely due to random chance alone
✅ Always report effect size alongside p-values in real research

Summary

Key Concepts from Today

H₀ (null): No relationship; pattern is due to random chance
H₁ (alternative): A real relationship exists
p-value: Probability of the data if H₀ is true
α = .05: Our threshold — willing to be wrong 1 in 20 times
p < α → Reject H₀ — the relationship is unlikely due to chance
t-test: Tests differences in means (continuous DV)
χ² test: Tests independence of categorical variables

As Consumers of Statistics

When you read political science research, you will see:

“t(38) = 3.24, p = .003” → Reject H₀, statistically significant
“χ²(2) = 9.47, p = .009” → Reject H₀, variables are not independent
“p = .42” → Fail to reject H₀, no significant relationship found

Your job: Understand what these mean, evaluate whether the test was appropriate, and assess whether the conclusions follow from the results.

Authorship, License, Credits

Author: Tom Hanna
Website: tomhanna.me
Gossett video: William Sealy Gossett and the t-test
License: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Lecture 10: Hypothesis Testing — Null, Alternative, P-Values

Agenda and Announcements

The Story of the t-Test

William Sealy Gossett

Key Takeaways from Gossett

Review: Connecting the Dots

Where We’ve Been

The Bridge to Hypothesis Testing

Null and Alternative Hypotheses

What Is a Hypothesis?

The Two Hypotheses

Examples in Political Science

Important Reminder from Last Lecture

The P-Value and Alpha Level

What Is a P-Value?

The Alpha Level (α)

The Decision Rule

Why Does p < .05 Make Sense?

The Critical Value Connection

The t-Test

What Is a t-Test?

When Do You Use a t-Test?

t-Test Formula

Political Science Example: t-Test

t-Test in R

Reading the t-Test Output

The Chi-Squared Test

What Is a Chi-Squared (χ²) Test?

When Do You Use χ²?

The χ² Formula

Political Science Example: χ² Test

χ² Test in R

Reading the χ² Output

Visualizing the χ² Test Result

Comparing the Two Tests

t-Test vs. Chi-Squared: At a Glance

Choosing the Right Test

Putting It All Together

The Full Hypothesis Testing Workflow

What p < .05 Really Means

Common Mistakes to Avoid

Alpha = .05: A Social Science Convention

Summary

Key Concepts from Today

As Consumers of Statistics

Authorship, License, Credits