October 18, 2025
data(mtcars) head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Model (2D): \(Y = \beta_0 + \beta_1\cdot x_i + \varepsilon \hspace{1 cm} \varepsilon \sim N(0;\sigma^2)\)
Model (3D): \(Y = \beta_0 + \beta_1\cdot x_{1i} + \beta_2\cdot x_{2i} + \varepsilon\)
Where:
\(Y\) = The dependent variable you are trying to predict
\(x_i\) = The independent variable(s) you are trying to associate with \(Y\)
\(\beta_0\) = The y-intercept
\(\beta_{>0}\) = (beta coefficient) The slope of the independent variable(s)
\(\varepsilon\) = The regression error term
2D Plot of Mile Per Gallon vs Weight: \(MGP = \beta_0 + \beta_1\cdot Wt + \varepsilon\)
2D Plot of Mile Per Gallon vs Horsepower: \(MPG = \beta_0 + \beta_1\cdot Hp + \varepsilon\)
3D Plot of Mile Per Gallon vs Weight and Horsepower \(MPG = \beta_0 + \beta_1\cdot Wt + \beta_2\cdot Hp + \varepsilon\)
# Run linear regression on the variables I am interested in (MPG and Weight) fit <- lm(mpg ~ wt, data=mtcars) # Use ggplot to make a scatterplot of these variables plot1 <- ggplot( mtcars, aes(wt, mpg) ) + geom_point() + geom_smooth(method = "lm", se = FALSE) + # To add the line of best fit that we calculated above labs(title = "MPG vs Weight with Fitted Line", x = "Weight (1000 lbs)", y = "MPG") plot1
# Run linear regression on the variables I am interested in (MPG and Horsepower) fit <- lm(mpg ~ hp, data=mtcars) # Use ggplot to make a scatterplot of these variables plot2 <- ggplot( mtcars, aes(hp, mpg) ) + geom_point() + geom_smooth(method = "lm", se = FALSE) + # To add the line of best fit that we calculated above labs(title = "MPG vs Horsepower with Fitted Line", x = "Horsepower", y = "MPG") plot2
# Run linear regression on the variables I am interested in
# (MPG, Weight, and Horsepower)
fit = lm(mpg ~ wt + hp, data = mtcars)
# Use plot_ly to make a 3D scatter plot of these variables
plot3d <- plot_ly(mtcars, x = ~wt, y = ~hp, z = ~mpg, type = "scatter3d",
mode = "markers", marker = list(size = 4), showlegend = FALSE)
# Make the plane/surface of best fit for this data (cannot
# just be a line since we are in 3D now)
grid.lines = 40
wt.seq <- seq(min(mtcars$wt), max(mtcars$wt), length.out = grid.lines)
hp.seq <- seq(min(mtcars$hp), max(mtcars$hp), length.out = grid.lines)
grid <- expand.grid(wt = wt.seq, hp = hp.seq)
Z <- matrix(predict(fit, newdata = grid), nrow = grid.lines,
ncol = grid.lines)
# Add the plane/surface of best fit to the plot
plot3d <- plot3d %>%
add_surface(x = wt.seq, y = hp.seq, z = Z, opacity = 0.8,
showscale = FALSE, inherit = FALSE) %>%
layout(scene = list(xaxis = list(title = "Weight (1000 lbs)"),
yaxis = list(title = "Horsepower"), zaxis = list(title = "MPG")))
plot3d
After observing each graph it is evident that low horsepower and low weight lead to an increase in MPG. This is likely in line with what you would expect, but we have demonstrated / proven to a certain degree of confidence that this is true.
Note that while the details of plot construction was not explicitly stated, the code is included in the slides as reference. Only slight changes must be made for different datasets to yield similar results to the ones shown here.