2025-11-10

Introduction

  • Simple Linear Regression models the relationship between two quantitative variables.
  • Example: predicting house price (dependent variable) from square footage (independent variable).
  • We find a best-fit line of the form
    \[ \hat{y} = \beta_0 + \beta_1 x \]

Data Overview

We’ll use a small simulated dataset representing house prices (in $1000s) vs. square footage.

head(data)
##       sqft    price
## 1 1000.000 111.5929
## 2 1051.020 120.1188
## 3 1102.041 150.5235
## 4 1153.061 131.7719
## 5 1204.082 136.2250
## 6 1255.102 163.5831

Scatter Plot (ggplot2)

Fitting the Regression Model

We fit a linear model to estimate \(\beta_0\) and \(\beta_1\).

\[ \hat{y} = \hat{\beta_0} + \hat{\beta_1}x \]

model <- lm(price ~ sqft, data = data)
summary(model)
## 
## Call:
## lm(formula = price ~ sqft, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.168  -9.333  -1.229   9.988  32.395 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 51.413952   6.379256    8.06 1.79e-10 ***
## sqft         0.069601   0.002695   25.83  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.03 on 48 degrees of freedom
## Multiple R-squared:  0.9329, Adjusted R-squared:  0.9315 
## F-statistic: 667.2 on 1 and 48 DF,  p-value: < 2.2e-16

Regression Line (ggplot2)

Interactive 3D Visualization (Plotly)

Mathematical Interpretation

  • \(\beta_0\): intercept — estimated base price when sqft = 0
  • \(\beta_1\): slope — estimated price increase per additional square foot
  • Example:
    \[ \hat{\beta_1} = 0.07 \Rightarrow \text{Each extra 1 sqft increases price by \$70.} \]

Example R Code

Conclusion

  • Simple Linear Regression is foundational in data analysis and prediction.
  • It helps estimate relationships between variables.
  • Applications include:
    • Economics (price vs. demand)
    • Engineering (stress vs. strain)
    • Computer science (runtime vs. input size)

References

  • James et al., An Introduction to Statistical Learning
  • R Documentation: ?lm, ?ggplot2, ?plotly