Correlation and Linear Regression

Harriet Goers

What is Correlation?

  • Correlation measures how two interval-level variables move together.
  • Pearson’s r ranges from −1 to +1:
    • +1 = perfect positive
    • −1 = perfect negative
    • 0 = no linear relationship

Example: Education and Turnout

What is Regression?

  • Regression quantifies how much Y changes for a one-unit change in X.
  • Simple regression equation:

\[ \hat{y} = \hat{a} + \hat{b}x \]

Regression Example: Studying and Exam Scores

  • Data from 8 students shows more hours studied → higher scores.
  • Regression line:

\[ \hat{y} = 55 + 6x \]

  • A student who studies 3 hours is predicted to score:

\[ 55 + (6 \times 3) = 73 \]

Testing Significance

  • Use a t-test to assess if the relationship is real.
  • Formula:

\[ t = \frac{b - 0}{SE_b} \]

  • Example: t = 11.5 → highly significant (p < .001)

R-Squared: Model Fit

  • R-squared tells us how much of Y is explained by X.

\[ R^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} \]

  • Example: R² = 0.30 → education explains 30% of turnout variation.

Multiple Regression: Adding Variables

  • Lets us control for other factors.
  • Equation:

\[ \hat{y} = \hat{a} + b_1x_1 + b_2x_2 \]

  • Example: Turnout = education + battleground status
  • After control: Education effect = 0.67; Battleground = +5.74 pts

Dummy Variables

  • Use 0/1 indicators for categories.
  • Example: Voter ID laws → 4 dummies for 5 categories.
  • Intercept = predicted value for base category.

Interaction Effects

  • Interaction = effect of X1 depends on X2.
  • Include a term like:

\[ \text{Education} \times \text{Battleground} \]

  • If significant, it shows different effects in different contexts.

Why It Matters

  • These tools help us answer big questions:
    • Who votes and why?
    • Do policies work?
    • What explains political attitudes?
  • Regression allows testing and explaining our theories with data.

Wrap-Up Questions

  • What does Pearson’s r tell us?
  • What does the slope in regression mean?
  • What is R²?
  • Why use multiple regression?

Thanks!

See you next time!