2024-10-20

Simple Linear Regression

Introduction to Linear Regression

Linear regression is a statistical method used for predictive analysis. It is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable’s value is called the independent variable. It models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.

Mathematical Formulation of Linear Regression

The equation of a simple linear regression line is given by: \[ y = \beta_0 + \beta_1x + \epsilon \] where:

  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\epsilon\) is the error term

Dataset Overview

We will use the mtcars dataset available in R. Specifically, we will model the relationship between mpg (miles per gallon) and wt (weight of the car).

Visualizing the Data

Performing Regression Analysis

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Regression Analysis Output

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## wt          -5.344472   0.559101 -9.559044 1.293959e-10

Plotting the Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Advanced Visualization: 3D Plot