2025-06-03

What is Simple Linear Regression?

  • The term may sound very complex, but it is a rather simple concept.
  • It models the relationship between two variables (predictor and response) using a straight line.
  • One variable is the independent variable (predictor), the other is the dependent variable (response).
  • The goal is to find the “best-fit” line among the data.
  • Below is the general equation of any linear regression model:

    \(y = \beta_0 + \beta_1 x + \varepsilon\)

What does this equation actually mean?

\(y = \beta_0 + \beta_1 x + \varepsilon\)

Let’s break it down

  • \(y\) - The dependent variable, the one that is being predicted.
  • \(\beta_0\) - The intercept.
  • \(\beta_1\) - The slope.
  • \(x\) - The independent variable, the one that is thought to affect the dependent variable.
  • \(\varepsilon\) - A constant used for error.

Example of linear regression (Plotly)

In this example, I am using R’s built in car dataset and I am looking at the correlation between car weight and car speed. I want to answer the question to whether or not heavier cars are actually slow because of their weight.

As you can see on the graph, as the weight starts to increase going from left to right, the speed of the car gardually starts to decrease aswell. This creates a slowly descending fit line indicating that heavier cars are indeed slow because of their weight.

Code used to make the plot part 1

library(plotly)

#get weight and speed of cars
mod = lm(qsec ~ wt, data = mtcars)
x = mtcars$wt; y = mtcars$qsec

#Axis formatting from the videos
xax = list(
  title = "Weight per 1000 lbs",
  titlefont = list(family = "Modern Computer Roman")
)

yax = list(
  title = "Quarter Mile Time in seconds (speed)",
  titlefont = list(family = "Modern Computer Roman")
)

Code used to make the plot part 2

#Plot setup, also from the video
fig = plot_ly(x = x, y = y, type = "scatter", mode = "markers", 
  name = "Data", height = 300, width = 700) %>%
  add_lines(x = x, y = fitted(mod), name = "Fit line") %>%
  layout(
    title = "Car Weight vs Speed",
    xaxis = xax,
    yaxis = yax,
    margin = list(l = 100, r = 50, b = 60, t = 50)
  )

fig

Breakdown of the plot

In this case:

\(y\)=\(qsec\)(speed) - The dependent variable, the speed of the car.

\(\beta_0\) - The intercept, the expected speed when weight is 0.

\(\beta_1\) - The slope, how much the speed changes per weight (1000 lbs).

\(x\) - The independent variable, the weight of the car.

\(\varepsilon\) - The difference between the predicted speed of the car and actual speed of the car.

Another Example using ggplot2

Let us now look at another example using the car dataset using ggplot2. The question now is whether automatic cars or manual cars use more fuel. The graph below shows the linear regression model for automatic cars first.

Another Example using ggplot2 cont.

The graph below shows the linear regression model for manual cars.

Plot with BOTH Automatic and Manual car data

Breakdown of the plot

Reminder to what the general equation looks like:

\(y = \beta_0 + \beta_1 x + \varepsilon\)

In the case of the miles per gallon and automatic vs manual plot:

\(y\)=\(miles per gallon\) - The dependent variable, the fuel efficiency of the car in miles per gallon.

\(\beta_0\) - The intercept, the expected fuel efficiency when the car’s weight is 0.

\(\beta_1\) - The slope, how much the fuel efficiency changes per 1000 lbs of car weight.

\(x\) - The independent variable, the weight of the car per 1000 lbs.

\(\varepsilon\) - The difference between the predicted miles per gallon and the actual miles per gallon for each car.

Conclusion: According to the plot, manual cars have better fuel efficiency.

Conclusion

– Simple linear regression is a really good tool for modeling relationships between two variables.


– In the first example, we found that heavier cars are slow because of their weight, showing a negative correlation between weight and speed.


– In the second example, we compared automatic vs manual cars and found that manual cars tend to be more fuel efficient than automatic ones.


– Also, fuel efficiency decreases as weight increases, for both transmission types.


– Across both examples, the regression equation helped us interpret the impact of weight on performance or efficiency.


– In the end, this shows how real world questions like “Are heavier cars slower?” or “Which type of car saves more fuel?” can be answered using basic statistical modeling and problems.

The End