2026-02-08

Linear Regression

Linear regression is a statistical method used to model and predict a numeric outcome using one or more input variables.

We use linear regression to: understand relationships, make predictions, and test statistical significance.

Examples:

Fuel efficiency vs car weight, House prices vs square footage, Sales vs advertising spend

This presentation will:

  • explain the model
  • fit models in R
  • visualize results
  • interpret the data

The Linear Regression Model

We model the relationship between an input variable \(X\) and an outcome \(Y\) as:

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \] Where:

\(Y_i\) is the outcome for observation \(i\), \(X_i\) is the predictor (input), \(\beta_0\) is the intercept, \(\beta_1\) is the slope, and \(\varepsilon_i\) is random error.

Interpretation of \(\beta_1\)

A one-unit increase in \(X\) is associated with an average change of \(\beta_1\) in \(Y\).

Example Dataset: mtcars

Using the built-in dataset mtcars, we can make observations based on 32 cars.

For our linear regression example, we will model:

Response variable (\(Y\)): Miles per gallon (mpg)

Predictor variable (\(X\)): Car weight (wt)

This helps us answer the question: Do heavier cars get worse gas mileage?

Fitting the model in R

We can fit a regression model using the lm() function in R:

dat = lm(mpg ~ wt, data = mtcars)
summary(dat)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

MPG vs Weight

Residuals vs Fitted Values

The residual for each observation is: \[e_i = y_i - \hat{y}_i\]

Interactive Plot (Plotly)

Below is an interactive scatter plot created with plotly.

Takeaways

  • Linear regression models the relationship between predictors and an outcome
  • The slope describes how the expected value of \(Y\) changes with \(X\)
  • Linear regression is a powerful and widely used tool for understanding data and making predictions.