2025-10-28

1

Linear regression model

The linear regression model is the mathematical representation of the relationship between two variables.

This is represented by the equation \[ \hat{Y} = b_0 + b_1X_1 \]

Set up

  • The data we will be looking at to use linear regression is the cars data
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Linear regresion

regres <- lm(mpg ~ hp, data = mtcars)

summary(regres)
## 
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

Regression between MPH and HP

## `geom_smooth()` using formula = 'y ~ x'

The relation ship between MPG and CYL in the data

Slide with care

plotly_mpg_cyl <- plot_ly(
  data = mtcars,
  x = ~factor(cyl),
  y = ~mpg,
  type = "box",
  boxpoints = "all",
  jitter = 0.3,
  marker = list(color = 'rgba(7, 164, 181, 0.7)'),
  line = list(color = 'rgba(7, 164, 181, 1)')
) %>%
  layout(
    title = "MPG vs. Number of Cylinders ",
    xaxis = list(title = "Number of Cylinders"),
    yaxis = list(title = "Miles per Gallon (MPG)")
  )

Slide with no care