2024-09-20

What is Simple Linear Regression?

Simple linear regression is a statistical method that models the relationship between two variables by fitting a linear equation to observed data.

Formula

The general form of the simple linear regression equation is: \[ y = \beta_0 + \beta_1x \]

  • \(y\) = Dependent variable (what we want to predict, e.g., weight)
  • \(x\) = Independent variable (the predictor, e.g., height)
  • \(\beta_0\) = Intercept (the value of \(y\) when \(x\) is 0)
  • \(\beta_1\) = Slope (the amount \(y\) changes when \(x\) increases by 1 unit)

Purpose

  • Simple linear regression helps predict the value of a dependent variable based on the value of an independent variable.

Dataset Overview

The data set contains two variables:

  • Height (in cm): The independent variable, used to predict weight.
  • Weight (in kg): The dependent variable, which we want to predict.

Sample data:

Height Weight
164.4 52.5
167.7 59.6
185.6 73.0
170.7 60.9
171.3 57.2
187.2 68.0
174.6 61.0
157.3 53.3
163.1 53.8
165.5 65.9

Height vs Weight

ggplot Regression Line

Interpreting the Regression Results

The fitting linear regression equation is: \[ y = \beta_0 + \beta_1x \]

Where:

  • \(y\) = the predicted weight
  • \(x\) = the height

Based on the repression output:

  • Intercept (\(\beta_0\)) is the estimated weight when the height is zero
  • Slope (\(\beta_1\)) is the amount by which weight changes for each additional cm of height. A positive slope indicates that as height increases, weight tends to increase

Code used for fitting the model:

model <- lm(Weight ~ Height, data = sample_data)

Residuals plot

The multiple linear regression

The multiple linear regression model is \[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon \] Where:

  • \(y\) is the predicted weight
  • \(x_1\) is the height
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the coefficient for height
  • \(\beta_2\) is the coefficient for age
  • \(\epsilon\) is the error term

Create a multiple linear regression model (add age)

set.seed(123)
sample_data$Age <- round(runif(n = nrow(sample_data), 
                               min = 18, max = 60), 0)
model_2 <- lm(Weight ~ Height + Age, data = sample_data)
knitr::kable(head(sample_data,10), format = "html", caption = "")
Height Weight Age
164.4 52.5 30
167.7 59.6 51
185.6 73.0 35
170.7 60.9 55
171.3 57.2 57
187.2 68.0 20
174.6 61.0 40
157.3 53.3 55
163.1 53.8 41
165.5 65.9 37

Create a 3D scatter plot