Linear Regression

RSM2074 Lecture Week 5

Dr. Jun Ho Chai

Linear Regression

The Plan for Today

  • Overview of Correlations
  • How is Regression different from Correlation?
  • Regression Coefficients: Slope and Intercept
  • The R² Statistic
  • Comparing Regression with ANOVA
  • Conducting a Simple Linear Regression

Correlations Review

We should already know how correlations work

Explore Correlation Strength

Correlation

Key Points:

• Correlations require two continuous variables

• No IV/DV

Correlation ≠ causation

Pearson’s r Statistic

r is NOT based on how steep the line is (the slope)

r IS based on how spread out the points are from an imaginary line

Regression vs Correlation

At first glance, Regression appears the same as Correlation

Note

Correlation says: Mental wellbeing is associated with Healthy lifestyle

Regression says: Healthy lifestyle predicts Mental Wellbeing

Correlation vs Regression Thinking

Key Difference: Correlation is symmetric; Regression is directional!

Regression Terminology

One variable is always the outcome variable:

• It’s your DV (depends on other variables)

• Often called y

Other variables are called predictors:

• Like IVs (affect your outcome)

• Often called x

Larger changes = steeper slope

The Regression Line

How the Regression Line Works

Correlation vs Regression Lines

Correlation Trendline

  • The line is really a cluster that shape like a line

  • Says: “x and y tend to move together this closely”

Regression Line

  • This line is directional

  • Allows you to predict new data points

  • Says: “If x is here, y is predicted to be there”

Regression Formula

The Regression Equation

Play with Your Own Regression Line

Challenge: Can you beat the best-fit line? Green dashed = optimal

Understanding Prediction Error

Error = Natural Human Variability

Perfect Prediction vs Reality

The R² Statistic

What’s a pirate’s favourite test statistic?

R²!

R²: A Measure of Precision

The β Statistic

β: A Measure of Magnitude

β vs R² Comparison

Patterns:

• Low β + Low R² = Flat & scattered :(

• High β + Low R² = Steep but scattered

• Low β + High R² = Flat but precise

• High β + High R² = Strong & precise :)

ANOVA vs Regression

ANOVA vs Regression

Both Use Same Logic: Observed Known Value → Signal ± Error

Example

Research Question: Does temperature predict ice cream sales?

Predictor (IV): Temperature (°C)

Outcome (DV): Ice cream sales (RM)

Hypothesis: High temperature leads to higher sales.

Example: Ice Cream Data

Interpretation: Temperature significantly predicts sales ( = 0.947).

Example: Exam Score Prediction

Example: APA Reporting

Standardised Beta

The Problem: How much is “a lot”?

Standardised Beta Comparison

Summary

What We Learned Today

Simple Linear Regression determines if continuous variable predicts continuous outcome

Slope (β) tells how much y changes with each change in x

tells how much variance explained by predictor

Regression Line: y = βx + c

ANOVA & Regression: Same logic (Known → Predicted ± error)

Standardised Beta allows cross-study comparison

Error = Natural Variability we cannot explain

Real-World: The Coffee Shop Mystery

Scenario: Coffee shop owner notices more sales on sunny days. Is it weather or something else?

Using regression we can:

  • Quantify relationship (β = extra RM per degree?)

  • Assess precision (R² = variance explained?)

  • Make predictions (30°C tomorrow → predict RM 2,450)

  • Control confounds (sunshine OR temperature OR day of week?)

Next Week: Multiple Regression

Multiple Linear Regression = Simple + more predictors

• Some IVs can be categorical!