October 18, 2025

Abstract

My goal in this presentation is to explore the the mtcars database using 2D and 3D linear regression. Specifically, I will do the following:


- Describe the mtcars Dataset

- Notate the Math used in Regression

- Display Two 2D ggplots Plots

- Display One 3D plotly Plot

About the Dataset

mtcars is built into R as a standard dataset. It contains a variety of metrics for specific car models.

data(mtcars)
head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Regression Math

Regression is a statistical method that analyzes the relationship between a dependent variable and one or more independent variables.

It attempts to capture the correlation between these variables and determine whether they are significant or not (Done using confidence intervals and p-tests).

The following formulas will be used:


Model (2D): \(Y = \beta_0 + \beta_1\cdot x_i + \varepsilon \hspace{1 cm} \varepsilon \sim N(0;\sigma^2)\)

Model (3D): \(Y = \beta_0 + \beta_1\cdot x_{1i} + \beta_2\cdot x_{2i} + \varepsilon\)

Where:

\(Y\) = The dependent variable you are trying to predict
\(x_i\) = The independent variable(s) you are trying to associate with \(Y\)
\(\beta_0\) = The y-intercept
\(\beta_{>0}\) = (beta coefficient) The slope of the independent variable(s)
\(\varepsilon\) = The regression error term

Math for Each Graph

For the 2D graphs in this presentation, some generic variables will have to be replaced. We want to track MPG, so that will become our \(Y\). I want to see how weight and horsepower might effect MPG so those will be our \(x_i\) or dependent variables.

For the 3D graph, MPG will be our independent variable and both weight and horsepower will be the dependent variables (in the same equation this time).


2D Plot of Mile Per Gallon vs Weight: \(MGP = \beta_0 + \beta_1\cdot Wt + \varepsilon\)

2D Plot of Mile Per Gallon vs Horsepower: \(MPG = \beta_0 + \beta_1\cdot Hp + \varepsilon\)

3D Plot of Mile Per Gallon vs Weight and Horsepower \(MPG = \beta_0 + \beta_1\cdot Wt + \beta_2\cdot Hp + \varepsilon\)

Plot Miles Per Gallon vs Weight

# Run linear regression on the variables I am interested in (MPG and Weight)
fit <- lm(mpg ~ wt, data=mtcars)

# Use ggplot to make a scatterplot of these variables
plot1 <- ggplot(
  mtcars,
  aes(wt, mpg)
) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +    # To add the line of best fit that we calculated above
  labs(title = "MPG vs Weight with Fitted Line", x = "Weight (1000 lbs)", y = "MPG")
plot1

Plot Miles Per Gallon vs Horsepower

# Run linear regression on the variables I am interested in (MPG and Horsepower)
fit <- lm(mpg ~ hp, data=mtcars)

# Use ggplot to make a scatterplot of these variables
plot2 <- ggplot(
  mtcars,
  aes(hp, mpg)
) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +    # To add the line of best fit that we calculated above
  labs(title = "MPG vs Horsepower with Fitted Line", x = "Horsepower", y = "MPG")
plot2

Plot 3D Miles Per Gallon vs Weight and Horsepower

# Run linear regression on the variables I am interested in
# (MPG, Weight, and Horsepower)
fit = lm(mpg ~ wt + hp, data = mtcars)

# Use plot_ly to make a 3D scatter plot of these variables
plot3d <- plot_ly(mtcars, x = ~wt, y = ~hp, z = ~mpg, type = "scatter3d",
    mode = "markers", marker = list(size = 4), showlegend = FALSE)


# Make the plane/surface of best fit for this data (cannot
# just be a line since we are in 3D now)
grid.lines = 40
wt.seq <- seq(min(mtcars$wt), max(mtcars$wt), length.out = grid.lines)
hp.seq <- seq(min(mtcars$hp), max(mtcars$hp), length.out = grid.lines)
grid <- expand.grid(wt = wt.seq, hp = hp.seq)
Z <- matrix(predict(fit, newdata = grid), nrow = grid.lines,
    ncol = grid.lines)


# Add the plane/surface of best fit to the plot
plot3d <- plot3d %>%
    add_surface(x = wt.seq, y = hp.seq, z = Z, opacity = 0.8,
        showscale = FALSE, inherit = FALSE) %>%
    layout(scene = list(xaxis = list(title = "Weight (1000 lbs)"),
        yaxis = list(title = "Horsepower"), zaxis = list(title = "MPG")))
plot3d

Conclusion

In this presentation I have:


- Described the mtcars Dataset

- Notated the Math used in Regression

- Displayed Two 2D ggplots Plots

- Displayed One 3D plotly Plot


After observing each graph it is evident that low horsepower and low weight lead to an increase in MPG. This is likely in line with what you would expect, but we have demonstrated / proven to a certain degree of confidence that this is true.

Note that while the details of plot construction was not explicitly stated, the code is included in the slides as reference. Only slight changes must be made for different datasets to yield similar results to the ones shown here.