2025-11-09

What is a Linear Regression?

  • A statistical tool used to model the relationship between a dependent variable and one or more independent variables.
  • What is it useful for? Predictive analysis and finding trendlines, such that a conclusion can be drawn and applied to an independent variable outside the dataset.
  • The linear regression is represented in the form: \(y = a + bx\)

Example 1 - Motor Trend Car Road Tests Data - Summary

  • We will use the mtcars dataset to illustrate linear regression. Below is a summary of the dataset. We will employ the Linear Regression Line equation: \(Y = B0 + B1X\)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Example 1.1 - GGPlot Linear Regression of Data - Weight v MPG

ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "blue", size = 4) +
geom_smooth(method = "lm", se = TRUE, color = "green") +
labs(
title = "Relationship between Weight and Miles per Gallon",
x = "Weight (1000 lbs)",
y = "Miles per Gallon"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Example 1.2 - Plotly Linear Regression of Data - Weight v MPG

## Warning in geom_point(aes(text = paste("Car:", rownames(mtcars))), color =
## "blue", : Ignoring unknown aesthetics: text
## `geom_smooth()` using formula = 'y ~ x'

Example 2 - GGPlot - MPG vs Horsepower based on # of Cylinders

## `geom_smooth()` using formula = 'y ~ x'

Conclusions to be drawn from the examples

-The GGPlot and Plotly graphs in Example 1 show us that as the weight of the car increases, the MPG (Miles per gallon of fuel) decreases.

-The GGPlot graph in Example 2 shows us that cars with more cylinders tend to be able to travel for less miles per gallon of fuel.

Advantages and Disadvantages of a Linear Regression

  • Linear regressions are simple and computationally efficient, which means that they can handle large datasets well.

  • However, a linear regression assumes that the relationship between the independent variable and the dependent variable is linear. It will not work well if they aren’t.