15 November 24

What is Linear Regression?

Definition

Linear regression is a statistical method to model the relationship between:

  • Dependent variable (\(y\)): The outcome or response variable.
  • Independent variable(s) (\(x\)): The predictor(s) or explanatory variables.

It assumes that the relationship between \(x\) and \(y\) can be described by a straight line:

[ y = _0 + _1 x + ] — ## Key Terms - \(\beta_0\) (Intercept): The value of \(y\) when \(x = 0\). - \(\beta_1\) (Slope): The change in \(y\) for a one-unit increase in \(x\). - \(\epsilon\) (Error): The difference between the observed \(y\) and the predicted \(y\). —

Dataset Overview: mtcars

Dataset

# Load the dataset
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

## Warning: package 'plotly' was built under R version 4.4.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

## `geom_smooth()` using formula = 'y ~ x'

Scatterplot Analysis: mpg vs hp

Observations

  • Negative Relationship:
    • As horsepower (\(hp\)) increases, miles per gallon (\(mpg\)) decreases.
    • This suggests that vehicles with higher horsepower are generally less fuel-efficient.
  • Linear Trend:
    • The red regression line indicates a linear relationship between \(hp\) and \(mpg\).
    • This trend supports the assumption for linear regression.
  • Spread of Data:
    • Data points show moderate variability around the regression line.
    • Residuals (the differences between observed and predicted \(mpg\)) are relatively small, indicating a good fit.

Model Insight

  • The slope of the regression line represents:
    • The expected decrease in \(mpg\) for a one-unit increase in \(hp\).
  • Practical Implication:
    • Manufacturers may need to trade-off between performance (high \(hp\)) and fuel efficiency (high \(mpg\)).

3D Scatterplot Analysis: mpg v hp v wt

Observations

  • 3D Relationships:
    • Vehicles with higher horsepower (\(hp\)) tend to have lower fuel efficiency (\(mpg\)).
    • Heavier vehicles (\(wt\)) also tend to have lower \(mpg\).
  • Combined Effect:
    • Both \(hp\) and \(wt\) negatively impact \(mpg\), highlighting a combined influence of power and weight on fuel efficiency.
  • Cluster Patterns:
    • Data points form distinct clusters, suggesting groups of vehicles with similar characteristics.