2026-03-09

Simple Linear Regression

Simple linear regression is a basic statistical method used to look at the relationship between two variables.

It helps us see how one variable (x) can affect another variable (y).

Some simple examples could be:

  • predicting a car’s fuel efficiency
  • predicting housing prices
  • predicting exam scores

Regression Equation

The basic linear regression equation looks like this:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Where:

  • \(y\) = the predicted value
  • \(x\) = the variable we are using to predict
  • \(β_0\) = the intercept
  • \(β_1\) = the slope
  • \(ε\) = the error term

How the Slope is Calculated

The slope basically shows how much \(y\) changes when \(x\) changes.

The formula used to calculate the slope is:

\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sum (x_i - \bar{x})^2} \]

This formula helps find the line that best fits the data.

Example Dataset

For this example we will use the built-in mtcars dataset in R.

This dataset has information about different cars.

Two variables we will look at are:

  • mpg = miles per gallon
  • hp = horsepower

We will look at how horsepower relates to fuel efficiency.

Scatter Plot (ggplot #1)

Regression Line (ggplot #2)

## `geom_smooth()` using formula = 'y ~ x'

Interactive Plot (Plotly)

R Code Example

Here is the R code used to create a simple linear regression model.

model <- lm(mpg ~ hp, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

Conclusion

Simple linear regression is one of the easiest tools in statistics.

It helps us understand how two variables are related and can also help us make predictions.

Even though it is a simple method, it is still used a lot when analyzing data.