October 26, 2025

Car Dataset Overview

Objective: Analyze factors influencing fuel efficiency using linear regression models


Dataset Features:

  • Fuel Efficiency: Miles per gallon (MPG)
  • Engine Specifications: Horsepower, cylinders, displacement
  • Vehicle Attributes: Weight, acceleration, model year
  • Vehicle Origin: Manufacturing region (USA, Europe, Asia)
  • Sample Size: 398 vehicles with 8 attributes each


Fuel Efficiency Questions:

  • How does engine size affect fuel consumption?
  • What is the relationship between weight and MPG?
  • Can we predict fuel efficiency from vehicle specifications?
  • How has fuel economy changed over model years?


cars <- read.csv("auto-mpg.csv")
key <- c("car.name", "weight", "displacement", "mpg")

# Preview of the data used in this presentation
head(cars[key], 5)
                   car.name weight displacement mpg
1 chevrolet chevelle malibu   3504          307  18
2         buick skylark 320   3693          350  15
3        plymouth satellite   3436          318  18
4             amc rebel sst   3433          304  16
5               ford torino   3449          302  17

Simple Linear Regression

What is a Simple Linear Regression?

It is a statistical method that models the relationship between two variables using a straight line. It helps us understand how changes in one variable (predictor) are associated with changes in another variable (response).


Mathematical Foundation

\[\Large Y_i = \beta_0 + \beta_1X_i + \varepsilon_i\]

  • Response variable (\(Y_i\)): Fuel efficiency we want to predict (miles per gallon)
  • Predictor variable (\(X_i\)): Vehicle feature used for prediction (horsepower, weight, cylinders)
  • Intercept (\(\beta_0\)): Baseline MPG when predictor is zero
  • Slope (\(\beta_1\)): Change in MGP is change in the predictor
  • Error term (\(\varepsilon_i\)): Unexplained variation or random noise

Simple Linear Regression Example

MPG vs. Vehicle Weight

\[\Large MPG = \beta_0 + \beta_1(Weight) + \varepsilon\]


# Remove rows with N/A values
cars_clean <- na.omit(cars)

# Simple linear regression
mpg_model <- lm(mpg ~ weight, data = cars_clean)
b0 <- round(coef(mpg_model)[1], 4)
b1 <- round(coef(mpg_model)[2], 4)

cat("NOTE: For prediction, we typically assume epsilon = 0 (the average case)\n
General Equation
    MPG = ", b0, " + ", b1, "(Weight)\n\n", sep = "")
NOTE: For prediction, we typically assume epsilon = 0 (the average case)

General Equation
    MPG = 46.3174 + -0.0077(Weight)
# Solve for MPG (example)
example_weight <- 3000
prediction = b0 + b1 * example_weight

cat("Example Calculation:
    If weight: ", example_weight, " lbs
    MPG = ", b0, " + ", b1, "(", example_weight, ")
    MPG = ", prediction, " mpg\n\n", sep = "")
Example Calculation:
    If weight: 3000 lbs
    MPG = 46.3174 + -0.0077(3000)
    MPG = 23.2174 mpg
# Remove rows with N/A values
cars_clean <- na.omit(cars)

# Simple Linear Regression plot for fuel efficiency
ggplot(cars_clean, aes(x = weight, y = mpg)) +
  geom_point(color = "forestgreen", alpha = 0.7, size = 2.5) +
  geom_smooth(method = "lm", color = "blue", se = TRUE) +
  labs(title = "MPG vs. Vehicle Weight",
       subtitle = "Heavier vehicles correlate with decreased fuel efficiency",
       x = "Weight (lbs)", 
       y = "Miles per Gallon (MPG)") +
  scale_x_continuous(breaks = seq(1500, 5500, by = 500)) +
  scale_y_continuous(breaks = seq(5, 50, by = 5)) +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5, size = 12),
        axis.title.x = element_text(size = 10, vjust = -2),
        axis.title.y = element_text(size = 10, vjust = 4))

Simple Linear Regression Example 2

MPG vs. Engine Displacement

\[\Large MPG = \beta_0 + \beta_1(Displacement) + \varepsilon\]


# Remove rows with N/A values
cars_clean <- na.omit(cars)

# Simple linear regression
mpg_model <- lm(mpg ~ displacement, data = cars_clean)
b0 <- round(coef(mpg_model)[1], 4)
b1 <- round(coef(mpg_model)[2], 4)

cat("NOTE: For prediction, we typically assume epsilon = 0 (the average case)\n
General Equation
    MPG = ", b0, " + ", b1, "(Displacement)\n\n", sep = "")
NOTE: For prediction, we typically assume epsilon = 0 (the average case)

General Equation
    MPG = 35.1748 + -0.0603(Displacement)
# Solve for MPG (example)
example_displacement <- 200
prediction = b0 + b1 * example_displacement

cat("Example Calculation:
    If displacement: ", example_displacement, " lbs
    MPG = ", b0, " + ", b1, "(", example_displacement, ")
    MPG = ", prediction, " mpg\n\n", sep = "")
Example Calculation:
    If displacement: 200 lbs
    MPG = 35.1748 + -0.0603(200)
    MPG = 23.1148 mpg
# Remove rows with N/A values
cars_clean <- na.omit(cars)

# Simple Linear Regression plot for fuel efficiency
ggplot(cars_clean, aes(x = displacement, y = mpg)) +
  geom_point(color = "forestgreen", alpha = 0.7, size = 2.5) +
  geom_smooth(method = "lm", color = "blue", se = TRUE) +
  labs(title = "MPG vs. Engine Displacement",
       subtitle = "Larger engines correlate with decreased fuel efficiency",
       x = "Displacement (cu in)", 
       y = "Miles per Gallon (MPG)") +
  scale_x_continuous(breaks = seq(50, 500, by = 50)) +
  scale_y_continuous(breaks = seq(5, 50, by = 5)) +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5, size = 12),
        axis.title.x = element_text(size = 10, vjust = -2),
        axis.title.y = element_text(size = 10, vjust = 4))

Multiple Linear Regression

What is a Multiple Linear Regression?

It is a statistical method that models the relationship between multiple predictor variables and a single response variable. It helps us understand how changes in several factors affect an outcome, accounting for interactions between variables.


Mathematical Foundation

\[\Large Y_i = \beta_0 + \beta_1X_{1} + \beta_2X_{2} + \cdots + \beta_pX_{p} + \varepsilon_i\]

  • Response variable (\(Y_i\)): Fuel efficiency we want to predict (miles per gallon)
  • Predictor variables (\(X_{1i}, X_{2i}, \ldots\)): Multiple vehicle parameters (weight, displacement, horsepower)
  • Intercept (\(\beta_0\)): Baseline MPG when all predictors are zero
  • Slopes (\(\beta_1, \beta_2, \ldots\)): Change in MGP is change in each predictor
  • Error term (\(\varepsilon_i\)): Unexplained variation or random noise

Multiple Linear Regression Example

MPG vs. Vehicle Weight and Engine Displacement

\[\Large MPG = \beta_0 + \beta_1(Weight) + \beta_2(Displacement) + \varepsilon\]


# Remove rows with N/A values
cars_clean <- na.omit(cars)

# Multiple linear regression
mpg_model <- lm(mpg ~ weight + displacement, data = cars_clean)
b0 <- round(coef(mpg_model)[1], 4)
b1 <- round(coef(mpg_model)[2], 4)
b2 <- round(coef(mpg_model)[3], 4)

cat("NOTE: For prediction, we typically assume epsilon = 0 (the average case)\n
General Equation
    MPG = ", b0, " + ", b1, "(Weight) + ", b2, "(Displacement)\n\n", sep = "")
NOTE: For prediction, we typically assume epsilon = 0 (the average case)

General Equation
    MPG = 43.9005 + -0.0058(Weight) + -0.0164(Displacement)
# Solve for MPG (example)
example_weight <- 3000
example_displacement <- 200
prediction = b0 + b1 * example_weight + b2 * example_displacement

cat("Example Calculation:
    If weight: ", example_weight, " lbs
    If displacement: ", example_displacement, " cu in
    MPG = ", b0, " + ", b1, "(", example_weight, ") + ", b2, "(", example_displacement, ")
    MPG = ", prediction, " mpg\n\n", sep = "")
Example Calculation:
    If weight: 3000 lbs
    If displacement: 200 cu in
    MPG = 43.9005 + -0.0058(3000) + -0.0164(200)
    MPG = 23.2205 mpg
# Remove rows with N/A values
cars_clean <- na.omit(cars)

# Multiple Linear Regression plane for fuel efficiency
mpg_model <- lm(mpg ~ weight + displacement, data = cars_clean)
weight_range <- seq(min(cars_clean$weight), max(cars_clean$weight), length.out = 20)
displacement_range <- seq(min(cars_clean$displacement), max(cars_clean$displacement), length.out = 20)
grid <- expand.grid(weight = weight_range, displacement = displacement_range)
grid$prediction <- predict(mpg_model, newdata = grid)

# 3D scatter plot
plot_ly() %>%
  add_trace(x = cars_clean$weight, y = cars_clean$displacement, z = cars_clean$mpg,
            type = "scatter3d", mode = "markers",
            marker = list(color = "forestgreen", size = 3),
            showlegend = FALSE) %>%
  add_surface(x = weight_range, y = displacement_range, 
              z = matrix(grid$prediction, nrow = length(weight_range), ncol = length(displacement_range)),
              colorscale = list(list(0, "blue"), list(1, "blue")),
              opacity = 0.3,
              showscale = FALSE,
              name = "fitted") %>%
  layout(title = list(text = "MPG vs. Weight and Displacement",
                      x = 0.5, y = 0.95,
                      font = list(size = 20)),
         scene = list(
           xaxis = list(title = "Weight (lbs)"),
           yaxis = list(title = "Displacement (cu in)"), 
           zaxis = list(title = "MPG"),
           aspectmode = "cube",
           camera = list(eye = list(x = 1.5, y = 1.5, z = 1.5))
         ),
         annotations = list(
           text = "Heavier vehicles and larger engines correlate with decreased fuel efficiency",
           x = 0.5, y = 0.93,
           xref = "paper", yref = "paper",
           showarrow = FALSE,
           font = list(size = 15, color = "gray")
           ))


Automotive Engineering Application

Where Linear Regression powers vehicle design
- Fuel Efficiency Prediction: Estimating MPG from vehicle specifications
- Design Optimization: Balancing performance vs fuel economy
- Regulatory Compliance: Meeting emissions and efficiency standards
- Manufacturing: Quality control of engine and weight specifications
- Target Setting: Defining MPG requirements for new models

Broader engineering applications
- Environmental Engineering: Emissions impact analysis
- Mechanical Engineering: Engine efficiency optimization
- Transportation Engineering: Fleet management and planning
- Materials Engineering: Lightweight material development

Key Engineering Insights

Summary
- Linear regression effectively models vehicle fuel efficiency relationships
- Weight and engine displacement show strong correlation with MPG
- Multiple factors provide comprehensive fuel economy understanding
- Model explains 69.8% of MPG variation

Table of Contents
- Slide 1: Title
- Slide 2: Dataset Overview
- Slide 3: LaTex of Simple Linear Regression
- Slide 4: ggplot of MPG vs. Weight
- Slide 5: ggplot of MPG vs. Displacement
- Slide 6: LaTex of Multiple Linear Regression
- Slide 7: plotly of MPG vs. Weight and Displacement
- Slide 8: Engineering Application of Linear Regressions
- Slide 9: Conclusion

Engineering value
- Data-driven vehicle design decisions
- Fuel efficiency optimization insights
- Technical specification validation
- Predictive modeling for new vehicle designs