Key Difference: Correlation is symmetric; Regression is directional!
Regression Terminology
One variable is always the outcome variable:
• It’s your DV (depends on other variables)
• Often called y
Other variables are called predictors:
• Like IVs (affect your outcome)
• Often called x
• Larger changes = steeper slope
The Regression Line
How the Regression Line Works
{const scatter_pairs = [ {x:2,y:95}, {x:2,y:93}, {x:2,y:97}, {x:4,y:90}, {x:4,y:88}, {x:4,y:92}, {x:4,y:89}, {x:6,y:87}, {x:6,y:85}, {x:6,y:89}, {x:6,y:84}, {x:8,y:84}, {x:8,y:82}, {x:8,y:86}, {x:8,y:81}, {x:10,y:81}, {x:10,y:79}, {x:10,y:83}, {x:10,y:78} ];return Plot.plot({marks: [ Plot.dot(scatter_pairs, {x:"x",y:"y",fill:"steelblue",r:5}), Plot.text([{x:6,y:75}], {x:"x",y:"y",text:"Multiple observations at each x value creates proper scatter",fontSize:11}) ],x: {label:"Number of Homework (x)",domain: [0,12]},y: {label:"Mental Wellbeing (y)",domain: [75,100]},grid:true,caption:"Real data has natural variability - multiple y values for similar x values" });}
Correlation vs Regression Lines
Correlation Trendline
The line is really a cluster that shape like a line
Says: “x and y tend to move together this closely”
Regression Line
This line is directional
Allows you to predict new data points
Says: “If x is here, y is predicted to be there”
Regression Formula
The Regression Equation
html`<div style="text-align: center; font-size: 60px; margin: 40px 0; font-weight: bold;">y = βx + c</div><div style="display: flex; justify-content: space-around; font-size: 22px; margin-top: 40px;"><div style="text-align: center;"><strong style="color: #1f77b4; font-size: 36px;">y</strong><br/>The value of the<br/><strong>outcome variable</strong><br/>Also called a DV</div><div style="text-align: center;"><strong style="color: #ff7f0e; font-size: 36px;">β</strong><br/><strong>The slope</strong><br/>Amount that y changes<br/>for each change in x</div><div style="text-align: center;"><strong style="color: #2ca02c; font-size: 36px;">x</strong><br/>The value of the<br/><strong>predictor</strong><br/>Also called an IV</div><div style="text-align: center;"><strong style="color: #d62728; font-size: 36px;">c</strong><br/><strong>The intercept</strong><br/>The value of y<br/>if x was 0</div></div>`
Challenge: Can you beat the best-fit line? Green dashed = optimal
Understanding Prediction Error
Error = Natural Human Variability
html`<div style="font-size: 28px; text-align: center; margin: 40px 0; line-height: 1.6;"><p><strong>Why isn't prediction perfect?</strong></p><br/><p>Humans are naturally variable! Even with the same study hours,<br/>different students get different exam scores because of:</p><br/><div style="font-size: 24px; text-align: middle; max-width: 600px; margin: 0 auto;">• <strong>Sleep quality</strong> the night before<br/>• <strong>Test anxiety</strong> levels<br/>• <strong>Prior knowledge</strong> differences<br/>• <strong>Motivation</strong> on that day<br/>• <strong>Luck</strong> with question topics<br/>• <strong>Coffee consumption</strong> 😊</div><p style="color: #d32f2f;"><strong>Error term = All the natural variability<br/>we cannot explain with our predictor</strong></p></div>`
{const x_vals = d3.range(1,11);let data_points = [];if (error_scenario ==="Perfect prediction") { x_vals.forEach(x => { data_points.push({x: x,y:2* x +3}); }); } elseif (error_scenario ==="Some variability") { x_vals.forEach(x => {const base_y =2* x +3;for (let i =0; i <3; i++) { data_points.push({x: x + (Math.random() -0.5) *0.3,y: base_y + (Math.random() -0.5) *4}); } }); } else { x_vals.forEach(x => {const base_y =2* x +3;for (let i =0; i <4; i++) { data_points.push({x: x + (Math.random() -0.5) *0.4,y: base_y + (Math.random() -0.5) *10}); } }); }return Plot.plot({marks: [ Plot.line([{x:0,y:3}, {x:11,y:25}], {x:"x",y:"y",stroke:"red",strokeWidth:3}), Plot.dot(data_points, {x:"x",y:"y",fill:"steelblue",r:4,opacity:0.7}) ],x: {domain: [0,11],label:"Study Hours (x)"},y: {domain: [0,30],label:"Exam Score (y)"},grid:true,caption: error_scenario ==="Perfect prediction"?"Impossible: No human variability": error_scenario ==="Some variability"?"Realistic: Some natural variability":"Common: Many unmeasured factors matter" });}
The R² Statistic
What’s a pirate’s favourite test statistic?
R²!
html`<div style="text-align: center; font-size: 25px; margin: 30px 0;"><strong>R² = Proportion of variance in Y explained by X</strong><br/><br/><div style="font-size: 26px;">If R² = 0.60, then:<br/>• 60% of variation in Y is due to X<br/>• 40% of variation is due to other factors (natural variability)</div></div>`
html`<div style="text-align: center; font-size: 28px; margin: 30px 0;"><strong>β = How much y changes each time x changes</strong><br/><br/><div style="font-size: 28px;">• This is the <strong>slope</strong> of the regression line<br/>• Larger β = steeper line <br/>• β ≠ how close points are to line</div></div>`
html`<div style="background: white; padding: 25px; border-radius: 10px; font-family: 'Times New Roman', serif; font-size: 20px; line-height: 1.7; box-shadow: 0 4px 8px rgba(0,0,0,0.1);"><p><strong>A simple linear regression analysis</strong> was used to investigate whether the temperature on each day (in degrees C) predicted the amount of money earned in ice cream sales (in RM).</p><p>Temperature <strong>significantly and positively predicted</strong> ice cream sales, <em>R²</em> = .36, <em>F</em>(1,98) = 56.07, <em>p</em> < .001, indicating that temperature <strong>explained 36% of the variance</strong> in ice cream sales.</p><p>The standardised coefficient indicated that as temperature increased by 1 SD, ice cream sales increased by <strong>.60 SD</strong>.</p><hr style="margin: 20px 0;"><div style="background: #f5f5f5; padding: 20px; border-left: 5px solid #2196F3; font-size: 20px; line-height: 1.8;"><strong>Key APA Reporting Elements:</strong><br/><br/>✓ Name the test: "Simple linear regression"<br/>✓ Describe relationship: "significantly predicted"<br/>✓ Report R²: "explained 36% of variance"<br/>✓ Include statistics: F, df, p-value<br/>✓ Interpret standardised coefficient (SD units)</div></div>`
Standardised Beta
The Problem: How much is “a lot”?
html`<div style="font-size: 28px; text-align: center; margin: 30px 0;"><p>Every degree increase in temperature gets you another <strong>RM 430</strong></p><p>• RM 430 </br>• £75 </br>• $96 </br>• 0.0000059 BTC </br></p><p style="color: red; margin-top: 20px;">When we use β, it is <strong>not standardised</strong><br/>Cannot be compared across different measures!</p></div>`