What is Linear Regression?
Linear regression models the relationship between a dependent variable (y) and one or more independent variables (x) using a straight line.
The general model is:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
Linear regression models the relationship between a dependent variable (y) and one or more independent variables (x) using a straight line.
The general model is:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
We’ll use R’s built-in mtcars dataset to explore the relationship between car weight (wt) and miles per gallon (mpg).
head(mtcars[, c("wt", "mpg")])
## wt mpg ## Mazda RX4 2.620 21.0 ## Mazda RX4 Wag 2.875 21.0 ## Datsun 710 2.320 22.8 ## Hornet 4 Drive 3.215 21.4 ## Hornet Sportabout 3.440 18.7 ## Valiant 3.460 18.1
We can visualize the relationship between wt and mpg.
We’ll fit a simple linear regression model predicting mpg from weight.
model <- lm(mpg ~ wt, data = mtcars) summary(model)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Let’s add the regression line to the scatter plot.
## `geom_smooth()` using formula = 'y ~ x'
From the model output, suppose we obtained:
\[ \hat{y} = 37.285 - 5.344x \]
This means for each additional 1000 lbs of car weight, the fuel efficiency decreases by 5.34 mpg on average.
Let’s visualize mpg, wt, and hp (horsepower) together in 3D.
We minimize the Sum of Squared Errors (SSE):
\[ SSE = \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2 \]
To find estimates:
\[ \frac{\partial SSE}{\partial \beta_0} = 0, \quad \frac{\partial SSE}{\partial \beta_1} = 0 \]
Solving these gives the Least Squares Estimators for \(\beta_0\) and \(\beta_1\).
Thank you for viewing!
mtcars (R Base)ggplot2, plotly, dplyr