16/03/2025

What is Simple Linear Regression?

  • A method to model the relationship between two numerical variables.
  • The goal is to find the best-fitting line: \[ Y = \beta_0 + \beta_1 X + \varepsilon \]
  • Used in various fields such as finance, healthcare, and engineering.
  • Helps in making predictions based on past data trends.
  • Assumes a linear relationship between the dependent and independent variables.

Example Dataset: mtcars

  • We will use the built-in mtcars dataset in R, which contains real-world car data.
  • It has 32 observations of different cars with 11 features, such as:
    • mpg (Miles per Gallon): Fuel efficiency (how far a car can go per gallon of fuel)
    • wt (Weight of Car): Vehicle weight (how heavy the car is, in 1000s of lbs)
  • Our goal is to Understand if heavier cars consume more fuel (lower mpg).

Preview of the Dataset

  • Let us take a look at the first few rows of the mtcars dataset.
  • This will help us understand how the data is structured before we can analyze it.
data(mtcars)


head(mtcars,5)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

Visualizing the Relationship

  • A scatter plot helps us see how car weight affects fuel efficiency.
  • We expect heavier cars to have lower mpg (use more fuel).
  • Let’s plot mpg (y-axis) against wt (x-axis).
  • Expectation: Heavier cars use more fuel (lower mpg).
  • The next slide will show the scatter plot.

Scatter Plot of MPG vs Weight

library(ggplot2)


ggplot(mtcars, aes(x=wt, y=mpg)) +
  geom_point() +  
  labs(title="Scatter Plot of MPG vs Weight",
       x="Car Weight (1000s of lbs)", 
       y="Miles per Gallon (mpg)")

Scatter Plot with Regression Line

ggplot(mtcars, aes(x=wt, y=mpg)) +
  geom_point() +  # Scatter plot points
  geom_smooth(method = "lm", color = "blue", se = TRUE) + 
  labs(title="MPG vs Weight with Regression Line",
       x="Car Weight (1000s of lbs)", 
       y="Miles per Gallon (mpg)")
## `geom_smooth()` using formula = 'y ~ x'

Observation from Regression

  • The scatter plot shows a negative relationship between car weight and miles per gallon (MPG).
  • The regression line (blue) notices that as weight increases, fuel efficiency decreases.
  • The shaded region represents the confidence interval, and shows the uncertainty around the regression line.

What Regression Does

  • Regression helps quantify the relationship between two variables.
  • Here, we used a linear model (lm), which assumes a straight-line relationship.
  • The equation of the regression line where:
    • β0 (Intercept): Predicted MPG when weight is zero.
    • β1 (Slope): Shows how much MPG decreases for each additional 1000 lbs of weight.
  • This model helps predict MPG based on car weight.

Interactive Plotly Plot