2024-03-20

Slide with car dataset overview

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Slide with car dataset

head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

Visualizing the Cars Dataset

First, let’s visualize the relationship between car speeds and stopping distances.

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression Model on Cars Dataset

Now, let’s fit a simple linear regression model to our data.

Do the sum of the results

## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Fitting a Linear Model

Let’s fit a simple linear regression model using the lm() function in R. This model will help us understand how the stopping distance of cars can be predicted by their speed.

Output for linear model

# Fit the linear model
cars_lm <- lm(dist ~ speed, data = cars)

# Display the summary of the model to interpret coefficients
summary(cars_lm)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Residuals Plot

Understanding residuals is key to diagnosing the model. Let’s visualize them.

Interactive 3D Plot

Let’s create an interactive 3D plot using plotly to visualize speed, distance, and car index in the dataset.

Simple Linear Regression Formula

Simple linear regression models the relationship between two variables by fitting a linear equation to observed data. The equation is given by:

\[ y = \beta_0 + \beta_1x + \epsilon \]

where: - \(y\) is the dependent variable, - \(x\) is the independent variable, - \(\beta_0\) is the intercept of the regression line, - \(\beta_1\) is the slope of the regression line, and - \(\epsilon\) is the error term.

Therefore, Simple linear regression on the cars dataset suggests a strong linear relationship between car speed and stopping distance.

Interpretation of Regression Coefficients

The coefficients \(\beta_0\) and \(\beta_1\) provide us with important information:

  • \(\beta_0\) (Intercept): The expected value of \(y\) when \(x\) is 0.
  • \(\beta_1\) (Slope): The change in the expected value of \(y\) for a one-unit change in \(x\).

The error term \(\epsilon\) represents the difference between the observed values and the values predicted by the linear equation.

Thank You!

Thank you for attending the presentation.