2026-02-07

Linear Regression

Modeling Relationships Between Variables

What Linear Regression Does

Linear regression models how a predictor variable \(x\) helps explain variation in a response variable \(y\).

\[ y = \beta_0 + \beta_1 x + \varepsilon \]

\(\beta_0\) = theoretical y‑intercept
\(\beta_1\) = theoretical slope
\(\varepsilon\) = error term (also called the disturbance or noise). It captures everything the model cannot explain — measurement error, randomness, natural variability, or omitted variables from the regression.

Initial Assumptions to the Model

Linear regression relies on some key assumptions about the relationship between the predictor, the response, and the error term. The first time assumes linearity as shown below:

\(\mathbb{E}[Y \mid X = x] = \beta_0 + \beta_1 x\)

\(\mathbb{E}[Y \mid X = x]\) = the expected value of \(Y\) when the predictor \(X\) takes a specific value \(x\).

\(\varepsilon \sim i.i.d. N(0, \sigma^2)\)

\(i.i.d.\) = The errors are independent (one error does not influence another), and identically distributed (all errors come from the same distribution).

\(N(0, \sigma^2)\) = The error follows a normal distribution with mean \(0\) and variance \(\sigma^2\). Also known as the Constant Variance Assumption.

Summary of Model

model <- lm(medv ~ rm, data = Boston)
summary(model)
## 
## Call:
## lm(formula = medv ~ rm, data = Boston)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.346  -2.547   0.090   2.986  39.433 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -34.671      2.650  -13.08   <2e-16 ***
## rm             9.102      0.419   21.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.616 on 504 degrees of freedom
## Multiple R-squared:  0.4835, Adjusted R-squared:  0.4825 
## F-statistic: 471.8 on 1 and 504 DF,  p-value: < 2.2e-16
sorted <- Boston[order(Boston$rm), ]
predicted_values <- predict(model, newdata = sorted)

Visualizing Linear Regression

Average Number of Rooms vs. Median Housing Value This plotyly plot shows how room count relates to home value.

:::

Lower Socioeconomic Status (LSTAT) v. Median Home Value

Crime Rate (CRIM) vs. Median Home Value