03/23/2025

What is Simple Linear Regression?

Simple linear regression is a tool used to calculate the extent of the relationship between two variables.

There are usually two instances in which one uses simple linear regression:

  1. When you want to see how strong the relationship is between two variables
  2. When you want to estimate the value of a dependent variable at a specific value of the independent variable.

Regression Models

  • Regression models are used to describe the relationship between two quantitative variabless by creating a line of best fit.

  • Linear regression, as you may assume, uses a straight line.

  • Regression models help you identify how dependent variables change according to their independent counterparts.

Assumptions

Since linear regression is a parametric test, there are certain assumptions made. These include:

  • The margin of error across the estimate does not vary too much

  • Each observation collected is independent

  • The data is normally distributed

Formula for Simple Linear Regression

The formula for simple linear regression is as follows:



\[ \Large y = \beta_0 + \beta_1 X + \varepsilon \]

What does the formula mean?

\[ y = \beta_0 + \beta_1 X + \varepsilon \]

  • \(y\) is the estimated (or predicted) value of the dependent variable.

  • \(\beta_0\) (intercept) is the expected value of \(y\) when \(X = 0\).

  • \(\beta_1\) (slope) shows how much \(y\) is expected to change for a one-unit change in \(X\).

  • \(X\) is the independent (or explanatory) variable.

  • \(\varepsilon\) is the error term, capturing the random variation or noise not explained by the linear model.

Interpreting Slope and Intercept

  • Intercept may be meaningful if X = 0 is in the data range

  • Positive slope: Y increases as X increases

  • Negative slope: Y decreases as X increases

  • Magnitude of slope shows how strong X influences Y

Plotly

ggPlot 1

library(ggplot2)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Scatter Plot of MPG vs Weight",
       x = "Weight (in 1000 lbs)",
       y = "Miles Per Gallon")

ggPlot 1 (cont.)

## `geom_smooth()` using formula = 'y ~ x'

ggPlot 2 (Boxplot)