2025-09-17

What is Simple Linear Regression?

  • Linear regression models the relationship between two variables by fitting a straight line to data.
  • It can be used to quantify the strength of the relationship between the two variables.
  • The best fit line can be modeled as y = mx + b, the equation can be used to predict the outcome of the dependent variable for new values of the independent variable.
  • Simple Linear Regression is used in many fields such as finance, technology, and research.

Interactive 3D: mpg vs weight & horsepower

The 3D view allows you to see how fuel efficiency (mpg) varies with car weight (wt) and horsepower (hp).

Relationship Between Car Weight and MPG

Heavier cars generally have lower fuel efficiency.
This scatter plot visualizes the negative relationship between weight (x) and miles per gallon (y).

Residuals vs Fitted (Model Check)

If the linear model is accurate, residuals should: - be centered around 0, - have roughly constant spread across all fitted values.

Model & Assumptions (LaTeX)

We model MPG from a single predictor \(X\) (e.g., weight):

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i,\quad i=1,\dots,n \]

Assumptions on the errors: \[ \mathbb{E}[\varepsilon_i]=0,\qquad \operatorname{Var}(\varepsilon_i)=\sigma^2,\qquad \operatorname{Cov}(\varepsilon_i,\varepsilon_j)=0~(i\ne j) \] Often for inference we also assume \(\varepsilon_i \sim \mathcal{N}(0,\sigma^2)\).

Definitions: \[ \bar X=\frac{1}{n}\sum_{i=1}^n X_i,\qquad \bar Y=\frac{1}{n}\sum_{i=1}^n Y_i \]

OLS Estimates & Fit (LaTeX)

Closed-form estimators: \[ \hat{\beta}_1 = \frac{\sum_{i=1}^n (X_i-\bar X)(Y_i-\bar Y)} {\sum_{i=1}^n (X_i-\bar X)^2} = \frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)},\qquad \hat{\beta}_0=\bar Y-\hat{\beta}_1\bar X \]

Fitted values and residuals: \[ \hat Y_i=\hat\beta_0+\hat\beta_1 X_i,\qquad e_i=Y_i-\hat Y_i \]

Goodness of fit: \[ R^2 =1-\frac{\sum_{i=1}^n e_i^2}{\sum_{i=1}^n (Y_i-\bar Y)^2} \]

R Code: 3D Plotly (mpg ~ wt + hp)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "steelblue", size = 3) +
  geom_smooth(method = "lm", col = "red", se = TRUE) +
  labs(
    x = "Car Weight (1000 lbs)",
    y = "Miles Per Gallon",
    title = "Scatterplot with Regression Line"
  ) +
  theme_minimal()