2024-04-07

Dataset  mtcars

##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

Simple Linear Regression

Quarter-mile Time vs. Engine Displacement

model:  

\(T_\text{1/4 mile} = \beta_0 + \beta_1 \cdot \text{Displacement}\)

Hypothesis Testing

“Early automatic transmissions from the 1970s gave cars worse gas mileage compared to manual transmissions.”

Hypothesis Testing

Continued

model:  

\(t_{\hat{\beta}} = \frac{{\hat{\beta} - \beta_0}}{{\text{SE}(\hat{\beta})}}\)

 

We will now conduct a T-test on each cylinder count:

group_by(mtcars, cyl) %>% summarize(p_value = t.test(mpg~am)$p.value)
## # A tibble: 3 × 2
##   cyl         p_value
##   <fct>         <dbl>
## 1 4 cylinders  0.0180
## 2 6 cylinders  0.187 
## 3 8 cylinders  0.704

As we can see from the p-values, our hypothesis is only confirmed for cars with 4 cylinders.

Although our graph in the previous slide indicated a clear mpg advantage in manual 6-cylinder cars, the mtcars sample size isn’t large enough to support this observation.

3D Linear Regression

Effects of horsepower and weight on quarter-mile time

model:  

\(Y = \beta_0 + \beta_1 \cdot X_1 + \beta_2 \cdot X_2 + ... + \beta_i \cdot X_i\)

Behind the Scenes

Code for the simple linear regression graph:

ggplot(mtcars, aes(x=disp, y=qsec)) +
  geom_point() +
  geom_smooth(formula = y ~ x, 
              method = "lm", 
              se = FALSE, 
              color = "#8C1D40") +
  labs(x = "Engine Displacement (CI)", 
       y = "Quarter-mile Time (s)") +
  theme_minimal()

Behind the Scenes

Code for the hypothesis test graph:

mtcars$cyl <- factor(mtcars$cyl, levels = c(4, 6, 8),
                     labels = c("4 cylinders", 
                                "6 cylinders", 
                                "8 cylinders"))

ggplot(mtcars, aes(x = factor(am), y = mpg)) +
  geom_boxplot() +
  facet_grid(. ~ cyl) +
  labs(x = "Transmission Type", y = "Miles per Gallon (mpg)", 
       title="Gas mileage by Transmission Type and Cylinder Count") +
  scale_x_discrete(labels = c("0" = "Automatic", "1" = "Manual"))

Behind the Scenes

Code for the 3D regression graph:

model <- lm(qsec ~ wt + hp, data = mtcars)
wt_seq <- seq(min(mtcars$wt), max(mtcars$wt), 
              length.out = length(mtcars$wt))
hp_seq <- seq(min(mtcars$hp), max(mtcars$hp), 
              length.out = length(mtcars$hp))
grid <- expand.grid(wt = wt_seq, hp = hp_seq)
z_matrix <- matrix(predict(model, newdata = grid), 
                   nrow = length(wt_seq), 
                   ncol = length(hp_seq), byrow = TRUE)
scatter_3d <- plot_ly(data = mtcars, x = ~wt, y = ~hp, z = ~qsec, 
                      type = "scatter3d", 
                      mode = "markers", marker = list(size = 5)) %>%
  layout(scene = list(xaxis = list(title = "Weight (t)"),
                      yaxis = list(title = "Horsepower (hp)"),
                      zaxis = list(title = "Quarter-mile Time (s)")))

add_surface(scatter_3d, x = wt_seq, y = hp_seq, z = z_matrix)

Thank you