2025-11-13

What is Simple Linear Regression?

  • A statistical method to model the relationship between a dependent variable and an independent variable.
  • Assumes a linear relationship between the two variables, and attempts to model the best fitting linear equation that fits the data.
  • The goal with simple linear regression is to minimize the error:
    • \(e_i = y_i - \hat{y}_i\)
      • \(e_i\) is the error
      • \(y_i\) is the observed value
      • \(\hat{y}_i\) is the predicted value

Form of Simple Linear Regression

  • \(\hat{Y} = \beta_0 + \beta_1X\)
    • \(\hat{Y}\) represents the predicted Y value.
    • \(\beta_0\) represents the y-intercept of the equation.
    • \(\beta_1\) represents the slope of the equation.
    • \(X\) represents the inputs.
  • \(Y = \beta_0 + \beta_1X + e\)
    • \(Y\) represents the true Y value.
    • \(e\) represents the residuals or the error.

Example 1

summary(pressure)
##   temperature     pressure       
##  Min.   :  0   Min.   :  0.0002  
##  1st Qu.: 90   1st Qu.:  0.1800  
##  Median :180   Median :  8.8000  
##  Mean   :180   Mean   :124.3367  
##  3rd Qu.:270   3rd Qu.:126.5000  
##  Max.   :360   Max.   :806.0000

Pressure Data

Pressure Data Code

x = pressure$temperature; y = pressure$pressure

xax <- list(
  title = "Temperature"
)

yax <- list(
  title = "Pressure"
)

fig <- plot_ly(x=x, y=y, type="scatter", mode="markers", name="data",
               width=800, height=430) %>%
layout(xaxis=xax, yaxis=yax)
config(fig, displaylogo=FALSE)

Linear Regression

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression Code

ggplot(pressure, aes(x=x, y=y)) + 
  geom_point() + geom_smooth(method="lm")

Example 2

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Cars Data

Cars Data Code

x = cars$speed; y = cars$dist

xax <- list(
  title = "Speed"
)

yax <- list(
  title = "Distance"
)

fig <- plot_ly(x=x, y=y, type="scatter", mode="markers", name="data",
               width=800, height=430) %>%
layout(xaxis=xax, yaxis=yax)
config(fig, displaylogo=FALSE)

Linear Regression

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression Code

ggplot(cars, aes(x=x, y=y)) + 
  geom_point() + geom_smooth(method="lm")