summary(cars)
## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.:19.0 3rd Qu.: 56.00 ## Max. :25.0 Max. :120.00
2024-03-20
summary(cars)
## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.:19.0 3rd Qu.: 56.00 ## Max. :25.0 Max. :120.00
head(cars)
## speed dist ## 1 4 2 ## 2 4 10 ## 3 7 4 ## 4 7 22 ## 5 8 16 ## 6 9 10
First, let’s visualize the relationship between car speeds and stopping distances.
## `geom_smooth()` using formula = 'y ~ x'
Now, let’s fit a simple linear regression model to our data.
## ## Call: ## lm(formula = dist ~ speed, data = cars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -29.069 -9.525 -2.272 9.215 43.201 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -17.5791 6.7584 -2.601 0.0123 * ## speed 3.9324 0.4155 9.464 1.49e-12 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 15.38 on 48 degrees of freedom ## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438 ## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Let’s fit a simple linear regression model using the lm() function in R. This model will help us understand how the stopping distance of cars can be predicted by their speed.
# Fit the linear model cars_lm <- lm(dist ~ speed, data = cars) # Display the summary of the model to interpret coefficients summary(cars_lm)
## ## Call: ## lm(formula = dist ~ speed, data = cars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -29.069 -9.525 -2.272 9.215 43.201 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -17.5791 6.7584 -2.601 0.0123 * ## speed 3.9324 0.4155 9.464 1.49e-12 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 15.38 on 48 degrees of freedom ## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438 ## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Understanding residuals is key to diagnosing the model. Let’s visualize them.
Let’s create an interactive 3D plot using plotly to visualize speed, distance, and car index in the dataset.
Simple linear regression models the relationship between two variables by fitting a linear equation to observed data. The equation is given by:
\[ y = \beta_0 + \beta_1x + \epsilon \]
where: - \(y\) is the dependent variable, - \(x\) is the independent variable, - \(\beta_0\) is the intercept of the regression line, - \(\beta_1\) is the slope of the regression line, and - \(\epsilon\) is the error term.
Therefore, Simple linear regression on the cars dataset suggests a strong linear relationship between car speed and stopping distance.
The coefficients \(\beta_0\) and \(\beta_1\) provide us with important information:
The error term \(\epsilon\) represents the difference between the observed values and the values predicted by the linear equation.
Thank you for attending the presentation.