In this presentation, we will talk about simple linear regression, from defining it, to visualizing it for sample seed data and for the dataset mtcars.
2024-10-17
In this presentation, we will talk about simple linear regression, from defining it, to visualizing it for sample seed data and for the dataset mtcars.
\[ y = \beta_0 + \beta_1 x + \varepsilon \]
Where:
\[ \min_{\beta_0, \beta_1} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \]
library(ggplot2) library(plotly)
## ## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2': ## ## last_plot
## The following object is masked from 'package:stats': ## ## filter
## The following object is masked from 'package:graphics': ## ## layout
library(tidyr) library(dplyr)
## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats': ## ## filter, lag
## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
# Sample data set.seed(123) x <- 1:100 y <- 2 + 3 * x + rnorm(100, mean = 0, sd = 30) data <- data.frame(x, y)
## ## Call: ## lm(formula = y ~ x, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -73.607 -16.571 -1.039 19.455 62.846 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.90788 5.52862 0.164 0.87 ## x 3.07533 0.09505 32.356 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 27.44 on 98 degrees of freedom ## Multiple R-squared: 0.9144, Adjusted R-squared: 0.9135 ## F-statistic: 1047 on 1 and 98 DF, p-value: < 2.2e-16
The intercept is the starting point where x=0. The slope shows, how much y changes with a one unit increase in x. The R squared shows how well our line fits to the data.
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "blue") +
labs(
title = "MPG vs. Car Weight",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon (MPG)"
) +
theme_minimal()
## Fitting Linear regression on them
mtcars_model <- lm(mpg ~ wt, data = mtcars) summary(mtcars_model)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
## `geom_smooth()` using formula = 'y ~ x'
## A marker object has been specified, but markers is not in the mode ## Adding markers to the mode...
library(ggplot2); small_plot <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(color = "blue", size = 2) + geom_smooth(method = "lm", se = FALSE, color = "darkgreen") + labs(title = "MPG vs. Weight: Summary Plot", x = "Weight (1000 lbs)", y = "Miles Per Gallon (MPG)") + theme_minimal(base_size = 5); small_plot
## `geom_smooth()` using formula = 'y ~ x'
mtcars dataset, weight has a significant negative impact on MPG.