2025-10-18

Introduction

Linear regression is a popular statistical method used to model the relationship between two variables.

It essentially fits a straight line through the data in order to model the relationship between the variables.

This type of model is best suited for data where there exists some sort of linear relationship between two variables.

How does it work

Behind the model is a simple equation.

\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n + \epsilon\)

  • \(y\) is the dependent variable
  • \(x_n\) represents the independent variable(s)
  • \(\beta_n\) represents the coefficient or weight assigned to a variable
  • \(\epsilon\) is the error in the model

In Action

In the following slides we will be going through some examples of linear regression applied to real world data.

  • Earthquake Magnitude ~ Stations Reporting Earthquake
  • Speed of car ~ Stopping Distance

Earthquake Magnitude ~ Stations Reporting Earthquake Code

mod = lm(stations ~ mag, data=quakes)
x = quakes$mag
y = quakes$stations
xax = list(
  title = 'Magnitude',
  titlefront = list(family = 'Modern Computer Roman')
)
yax = list(
  title = 'Stations',
  titlefront = list(family = 'Modern Computer Roman')
)
fig = plot_ly(x=x, y=y, type = "scatter", mode="markers", name="data",
              width=800, height = 430) %>%
      add_lines(x=x, y = fitted(mod), name = "fitted") %>%
      layout(xaxis = xax, yaxis = yax) %>%
      layout(margin=list(l=150,r=50,b=20,t=40))

Earthquake Magnitude ~ Stations Reporting Earthquake Plot

Speed of car ~ Stopping Distance

Department complaints ~ Department rating Math

The linear regression model for this example can be modeled as follows:

\(y = \beta_0 + \beta_1x_1 + \epsilon\)

Where \(\beta_0\) is the y-intercept, which seems to hover around 40. \(y\) is the dependent variable which is the overall rating of a department. \(\beta_1\) is the independent variable (Favourable responses on handling of employee complaints). Lastly \(\epsilon\) is the error in the model.

Department complaints ~ Department rating Plot

The numbers here refer to the percent of favorable responses to a survey.

References