Linear regression is a statistical method to model the relationship between a dependent variable (y) and one or more independent variables (x).
We will explore both simple and multiple linear regression using the mtcars dataset.
2025-09-17
Linear regression is a statistical method to model the relationship between a dependent variable (y) and one or more independent variables (x).
We will explore both simple and multiple linear regression using the mtcars dataset.
We use the mtcars dataset and select variables:
mpg: Miles per gallon (response)wt: Car weight in 1000 lbs (predictor)hp: Horsepower (predictor)## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats': ## ## filter, lag
## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
## mpg wt hp ## Mazda RX4 21.0 2.620 110 ## Mazda RX4 Wag 21.0 2.875 110 ## Datsun 710 22.8 2.320 93 ## Hornet 4 Drive 21.4 3.215 110 ## Hornet Sportabout 18.7 3.440 175 ## Valiant 18.1 3.460 105
Model a response \(y\) using predictor \(x\):
\[ y = \beta_0 + \beta_1 x + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2) \] OLS estimates minimize squared residuals:
\[ \hat{\beta} = \arg\min_{\beta} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \]
For predictors \(X\):
\[ y = X \beta + \varepsilon \]
Estimator:
\[ \hat{\beta} = (X^\top X)^{-1} X^\top y \]
Standard error of coefficient \(j\):
\[ SE(\hat{\beta}_j) = \hat{\sigma}^2 \Big[ (X^\top X)^{-1} \Big]_{jj} \]
## # A tibble: 3 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 37.2 1.60 23.3 2.57e-20 ## 2 wt -3.88 0.633 -6.13 1.12e- 6 ## 3 hp -0.0318 0.00903 -3.52 1.45e- 3
mpg decreases as weight increases.wt and hp influence mpg.