Overview

  • Topic: Simple Linear Regression

  • In this presentation, I will explain what Simple Linear Regression is and will use it to model a relationship between our example dataset.

    • Dataset: mtcars

Simple Linear Regression:

  • Simple linear regression models a straight-line relationship between a response and a predictor.

  • Given:

    • A response variable = Y
    • A predictor variable = X
  • Lets assume a straight-line relationship between them.

  • For example:

    • Y = miles per gallon (mpg)
    • X = weight of the car (wt)

Here, Simple linear regression will help us describe how fuel efficiency changes as the weight of the car changes.

Regression Model:

Model Y:

\[ Y = \beta_0 + \beta_1 X + \varepsilon, \]

where

  • \(\beta_0\) is the intercept.
  • \(\beta_1\) is the slope.
  • \(\varepsilon\) is a random error term.

The slope \(\beta_1\) shows how much the mean of \(Y\) changes when \(X\) increases by 1 unit.

Lets estimate the line:

From data, we estimate \(\beta_0\) and \(\beta_1\).

The fitted line is:

\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X. \]

The slope can be written as:

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sum (x_i - \bar{x})^2}, \]

and the intercept is:

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}. \]

Dataset: mtcars

Here, i am using a default mtcars dataset.

  • mpg= miles per gallon
  • wt= weight of the car (1000 lbs)
head(mtcars[, c("mpg", "wt")])
##                    mpg    wt
## Mazda RX4         21.0 2.620
## Mazda RX4 Wag     21.0 2.875
## Datsun 710        22.8 2.320
## Hornet 4 Drive    21.4 3.215
## Hornet Sportabout 18.7 3.440
## Valiant           18.1 3.460

For this, I will model mpg as a function of wt.

Scatter Plot

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Miles per Gallon vs. Weight",
    x = "Weight",
    y = "Miles per Gallon"
  )

This will generate a scatter plot of mpg vs wt and will add a regression line using geom_smooth(method = "lm").

Scatter Plot with the Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Conclusion: Heavier cars tend to have lower miles per gllon.

Histogram of MPG

ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2, fill = "skyblue", color = "black") +
  labs(
    title = "Distribution of Miles per Gallon",
    x = "Miles per Gallon",
    y = "Count"
  )

This will generate a simple histogram that shosw how mpg values are distributed across the cars.

Histogram:

Most cars have miles per gallon between about 15 and 25.

3D Plot with Plotly

Now lets make a 3D scatter plot with plotly to see how weight, horsepower, and mpg relate to each other in the mtcars data.

plot_ly(
  mtcars,
  x = ~wt,
  y = ~hp,
  z = ~mpg,
  color = ~factor(cyl),
  type = "scatter3d",
  mode = "markers"
)

This shows weight, horsepower and mpg. Points are colored by number of cylinders.

3D Visualization

Rotate the plot to see how these three variables relate. Cars with more cylinders (8) tend to be heavier and have lower mpg.

Fitting the Model

# fit the regression model
model <- lm(mpg ~ wt, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Conclusion

  • Simple linear regression models a straight-line relationship between a response and a predictor.
  • Our model shows heavier cars have significantly lower mpg.
    • The slope of -5.34 means each 1000 lb increase reduces mpg by about 5.
  • We also created a 3D visualization showing how weight, horsepower, and mpg relate to each other in the dataset.