03-21-24

Introduction

Simple linear regression is a foundational statistical method used to predict a quantitative response based on a single predictor variable. It involves fitting a straight line through a set of data points in such a way that it best summarizes the relationship between the input (independent variable) and output (dependent variable).

Key Components - Dependent Variable (Y): The outcome or response we aim to predict or explain. - Independent Variable (X): The predictor or explanatory variable used to predict Y. - Intercept and Slope: The intercept (β₀) represents the value of Y when X is zero, and the slope (β₁) indicates the change in Y for a one-unit change in X.

Math Equation-1

The relationship is modeled through a linear equation:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

  • \(\beta_0\) is the intercept.
  • \(\beta_1\) is the slope.
  • \(\epsilon\) is the error term.

Estimating the Coefficients

The coefficients \(\beta_0\) and \(\beta_1\) are estimated during the regression process. They are calculated using the following formulas:

  • \(\beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\)
  • \(\beta_0 = \bar{y} - \beta_1 \bar{x}\)

where: - \(\bar{x}\) is the mean of \(x\) values. - \(\bar{y}\) is the mean of \(y\) values.

Regression Analysis with Weight

In our study, we’ll delve into the mtcars dataset, focusing on predicting the miles per gallon (mpg) as a function of the car’s weight (wt). The mtcars dataset, integral to R’s datasets package, comprises various car attributes, offering a rich basis for our linear regression analysis.

OBJECTIVE

The primary aim is to ascertain the impact of a car’s weight on its fuel efficiency. Intuitively, we might anticipate that heavier cars exhibit lower fuel efficiency, given the greater power required for their propulsion. Through our regression model, we aim to quantitatively assess this relationship.

Analysis Procedure

  • We commence with plotting wt against mpg to visually explore their relationship.
  • Subsequently, a linear regression model is fitted to these variables, providing a statistical framework to examine their correlation.
  • The model’s summary yields insights into the relationship’s strength and significance, elucidating the extent to which weight influences fuel efficiency.

Slide With R code

library(ggplot2)
data(mtcars)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

GGPLOT-1

plot1 <- ggplot(mtcars, aes(x=wt, y=mpg)) +geom_point() +geom_smooth(method="lm", se=FALSE) +theme_minimal() 
plot1
## `geom_smooth()` using formula = 'y ~ x'

GGPLOT-2

plot2 <- ggplot(mtcars, aes(x=hp, y=mpg)) +geom_point() +geom_smooth(method="lm", se=FALSE, color="blue") +theme_minimal()
plot2
## `geom_smooth()` using formula = 'y ~ x'

Plotly 3D Plot

plot3 <- plot_ly(data = mtcars, x = ~wt, y = ~mpg, z = ~hp, type = 'scatter3d', mode = 'markers')
plot3