Topic: Simple Linear Regression
In this presentation, I will explain what Simple Linear Regression is and will use it to model a relationship between our example dataset.
- Dataset:
mtcars
- Dataset:
Topic: Simple Linear Regression
In this presentation, I will explain what Simple Linear Regression is and will use it to model a relationship between our example dataset.
mtcarsSimple linear regression models a straight-line relationship between a response and a predictor.
Given:
Lets assume a straight-line relationship between them.
For example:
mpg)wt)Here, Simple linear regression will help us describe how fuel efficiency changes as the weight of the car changes.
Model Y:
\[ Y = \beta_0 + \beta_1 X + \varepsilon, \]
where
The slope \(\beta_1\) shows how much the mean of \(Y\) changes when \(X\) increases by 1 unit.
From data, we estimate \(\beta_0\) and \(\beta_1\).
The fitted line is:
\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X. \]
The slope can be written as:
\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sum (x_i - \bar{x})^2}, \]
and the intercept is:
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}. \]
mtcarsHere, i am using a default mtcars dataset.
mpg= miles per gallonwt= weight of the car (1000 lbs)head(mtcars[, c("mpg", "wt")])
## mpg wt ## Mazda RX4 21.0 2.620 ## Mazda RX4 Wag 21.0 2.875 ## Datsun 710 22.8 2.320 ## Hornet 4 Drive 21.4 3.215 ## Hornet Sportabout 18.7 3.440 ## Valiant 18.1 3.460
For this, I will model mpg as a function of wt.
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Miles per Gallon vs. Weight",
x = "Weight",
y = "Miles per Gallon"
)
This will generate a scatter plot of mpg vs wt and will add a regression line using geom_smooth(method = "lm").
## `geom_smooth()` using formula = 'y ~ x'
Conclusion: Heavier cars tend to have lower miles per gllon.
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 2, fill = "skyblue", color = "black") +
labs(
title = "Distribution of Miles per Gallon",
x = "Miles per Gallon",
y = "Count"
)
This will generate a simple histogram that shosw how mpg values are distributed across the cars.
Most cars have miles per gallon between about 15 and 25.
Now lets make a 3D scatter plot with plotly to see how weight, horsepower, and mpg relate to each other in the mtcars data.
plot_ly( mtcars, x = ~wt, y = ~hp, z = ~mpg, color = ~factor(cyl), type = "scatter3d", mode = "markers" )
This shows weight, horsepower and mpg. Points are colored by number of cylinders.
Rotate the plot to see how these three variables relate. Cars with more cylinders (8) tend to be heavier and have lower mpg.
# fit the regression model model <- lm(mpg ~ wt, data = mtcars) summary(model)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10