2026-06-06

Introduction

Simple Linear Regression models the linear relationship between:

  • A predictor variable \(X\), which is also known as the independent variable
  • A response variable \(Y\), which is also known as the dependent variable

It is one of the most widely used statistical methods for:

  • Predicting future values
  • Understanding how one variable changes with another

Linear Regression Model Explained

The simple linear regression model is written as:

\[Y = \beta_0 + \beta_1 X\] Where:

  • \(Y\) = response variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(X\) = predictor variable

Least Squares Regression Line

The regression output produces an equation for the best fitting line and the following formula represents the best fitting regression line:

\[y = mx + b\]

Where:

  • \(y\) is the dependent variable
  • \(m\) is the slope of the line
  • \(x\) is the independent variable
  • \(b\) is the y-intercept

Example Data: diamond from UsingR

Question: Can a diamond’s weight be used to predict its price?

Sample of diamond dataset
Carat Price
0.17 355
0.16 328
0.17 350
0.18 325
0.25 642
0.16 342

plotly plot: Price vs. Carat

Scatterplot with Fitted Line: Price vs. Carat

## `geom_smooth()` using formula = 'y ~ x'

Residual Plot: Price vs. Carat

R Code: Price vs. Carat

library(UsingR)
data(diamond)

mod <- lm(price ~ carat, data = diamond)

summary(mod)

# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)
# (Intercept)  -259.63      17.32  -14.99   <2e-16 ***
# carat        3721.02      81.79   45.50   <2e-16 ***
#
# R-squared: 0.9783,     Adjusted R-squared: 0.9778