09-17-2025

What is Simple Linear Regression?

Simple linear regression is a method used with the gold of modeling relationship between a predictor and a response

We will use it to show such a relationship between median house price and average number of rooms in R’s Boston housing dataset.

The Math

Mathematically, simple linear regression is:

\(y = \beta_0 + \beta_1\cdot x + \varepsilon\), where \(\varepsilon \sim N(\mu=0; \,\,\sigma^2)\)

  • \(y\): median house price
  • \(x\): number or rooms
  • \(\beta_0\): intercept (price when \(x = 0\))
  • \(\beta_1\): slope (change in price per room)
  • \(\epsilon\): error term

House Price vs. Rooms

## `geom_smooth()` using formula = 'y ~ x'

Residuals vs. Fitted

Multiple Regression: Median House Price, Average Number of Rooms, and Lower Status Population

Predictions (R Code Example)

We can use the fitted line: \[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x \] where \(\hat{\beta}_1\) ≈ expected change in price for each additional room.

coef(model_boston)
## (Intercept)          rm 
##  -34.670621    9.102109
new_data <- data.frame(rm = c(5, 6, 7))
predict(model_boston, newdata = new_data, interval = "prediction")
##        fit       lwr      upr
## 1 10.83992 -2.214474 23.89432
## 2 19.94203  6.928435 32.95563
## 3 29.04414 16.019333 42.06895

Takeaways

  • Positive Relationship: The more rooms a house has the higher the house price.
  • Limitations: Other factors also play into housing prices like age, lower socioeconomic status, location, and more.