Linear regression is a statistical method to model the relationship between a dependent variable (y
) and one or more independent variables (x
).
We will explore both simple and multiple linear regression using the mtcars
dataset.
2025-09-17
Linear regression is a statistical method to model the relationship between a dependent variable (y
) and one or more independent variables (x
).
We will explore both simple and multiple linear regression using the mtcars
dataset.
We use the mtcars
dataset and select variables:
mpg
: Miles per gallon (response)wt
: Car weight in 1000 lbs (predictor)hp
: Horsepower (predictor)## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats': ## ## filter, lag
## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
## mpg wt hp ## Mazda RX4 21.0 2.620 110 ## Mazda RX4 Wag 21.0 2.875 110 ## Datsun 710 22.8 2.320 93 ## Hornet 4 Drive 21.4 3.215 110 ## Hornet Sportabout 18.7 3.440 175 ## Valiant 18.1 3.460 105
Model a response \(y\) using predictor \(x\):
\[ y = \beta_0 + \beta_1 x + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2) \] OLS estimates minimize squared residuals:
\[ \hat{\beta} = \arg\min_{\beta} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \]
For predictors \(X\):
\[ y = X \beta + \varepsilon \]
Estimator:
\[ \hat{\beta} = (X^\top X)^{-1} X^\top y \]
Standard error of coefficient \(j\):
\[ SE(\hat{\beta}_j) = \hat{\sigma}^2 \Big[ (X^\top X)^{-1} \Big]_{jj} \]
## # A tibble: 3 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 37.2 1.60 23.3 2.57e-20 ## 2 wt -3.88 0.633 -6.13 1.12e- 6 ## 3 hp -0.0318 0.00903 -3.52 1.45e- 3
mpg
decreases as weight increases.wt
and hp
influence mpg
.