2023-10-13

What is Simple Linear Regression

Simple Linear Regression

  • Simple linear regression is a statistical method that uses a straight line to represent the relationship between two variables, often one dependent variable (Y) and one independent variable (X). It seeks the best-fitting line (regression line) that minimises the difference between observed data points and forecasted values on the line, allowing Y to be predicted from X.

Mathematical Formulas Used in Simple Linear Regression

Simple Linear Regression Model

The simple linear regression model represents the relationship between a dependent variable (Y) and an independent variable (X) with a straight line. \[ Y = \beta_0 + \beta_1X + \varepsilon \] - Equation in terms of errors \[Y -\hat{Y} = \varepsilon\] - Equation in terms of the slope and Error Term \[\beta_1 = \frac{Y - \beta_0 - \varepsilon}{X}\]

Estimate of the Slope

The coefficient B represents the slope of the regression line, which indicates how much the dependent variable changes for a one-unit change in the independent variable. \[\hat{\beta_1} = \frac{\sum_{i=1}^{n}(X_i-\bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^{n}(X_i - \bar{X})^{2}}\] Equation in terms of the mean of X and Y \[\hat{\beta_1} = \frac{(\bar{X}\bar{Y})(\bar{XY})}{(\bar{X})^{2} - \bar{X^{2}}}\]

3D Ploty Plot for 3D Surface Plot for Residuals

Residulas vary with changes in horsepower of the car

This example helps visualize how the residuals vary with changes in horsepower, providing insights into the goodness of fit of the linear regression model.

By visualizing the patterns, we can assess whether the linear regression model adequately captures the relationship between horsepower and miles per gallon. In the context of this specific example, it helps us evaluate how well the model predicts car fuel efficiency (mpg) based on engine power (horsepower).

Graph

GgPlot’s for Simple Linear Regression suing the dataset of the MTCars

Q-Q Plots for Residuals

  • Code
library(ggplot2)

lm_model <- lm(mpg ~ hp, data = mtcars)
residuals <- resid(lm_model)

ggplot(data = data.frame(
  Residuals = residuals), 
  aes(sample = Residuals)) +
  geom_qq() +
  geom_qq_line() +
  labs(title = "Q-Q Plot for Residuals")

Levrage Residual Plot