February 4, 2025

Introduction

Formula 1 & Data Science

Formula 1 teams rely on data analysis to improve performance.

Aerodynamics and lap speed are crucial for qualifying, race pace, and tire strategy.

Engineers use statistical methods like hypothesis testing to verify performance upgrades.

Problem Statement

An F1 team is testing a new aerodynamics package designed to increase the car’s average lap speed over a race distance.
The team claims that the new package improves average lap speed by at least 5 km/h compared to the previous version.

To verify this claim, engineers collect race lap speed data from 50 races using both the old and new aero package.

Statistical Methods Used

This analysis incorporates:

  • Hypothesis Testing

  • p-value Interpretation

  • Confidence Intervals

  • Data Visualization with ggplot2 and Plotly

  • Point & Interval Estimation for Speed Gains

Step 1: Define Hypotheses

  • Null Hypothesis (\(H_0\)): The new aerodynamics package does not improve lap speed, meaning the average speed remains the same or lower.

    \[ H_0: \mu_{\text{new}} \leq \mu_{\text{old}} \]

  • Alternative Hypothesis (\(H_a\)): The new aerodynamics package increases lap speed, proving the upgrade is effective.

    \[ H_a: \mu_{\text{new}} > \mu_{\text{old}} \]

Step 2: Data Simulation & Visualization

library(ggplot2)


set.seed(44)
speed_data <- data.frame(
  Package = rep(c("Old Aero", "New Aero"), each = 50),
  Lap_Speed = c(rnorm(50, mean = 220, sd = 3),  # Old package ~220 km/h
                rnorm(50, mean = 225, sd = 3))  # New package ~225 km/h
)


speed_plot <- ggplot(speed_data, aes(x = Package, y = Lap_Speed, fill = Package)) +
  geom_boxplot() +
  labs(title = "Lap Speed Comparison: Old vs. New Aero Package",
       x = "Aerodynamics Package", y = "Lap Speed (km/h)") +
  theme_minimal()

Step 2: Visualization of Lap Speed

Step 3: Residual Plot to Check Assumptions

What is a Residual Plot? A residual plot shows the difference between the actual values and the predicted values from the linear model (Residuals = Actual - Fitted).

Step 4: Confidence Interval Intepretation

The confidence interval provides an estimate of the range of possible true mean speeds.

  • If lower bound > 220 km/h, the new package significantly improves lap speed.

  • If interval includes 220 km/h, we cannot conclude a significant improvement.

    \[ CI = (\bar{x} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}}) \]

  • Engineers rely on statistical confidence before implementing changes.

3D Scatterplot of Lap Speed vs. Downforce

Understanding the Results

  • p-value < 0.05 → Reject \(H_0\) → The new aerodynamics package significantly increases lap speed.
  • p-value > 0.05 → Fail to reject \(H_0\) → No statistical evidence that the new package improves speed.
  • Confidence Interval Interpretation:
    • If lower bound > 220 km/h, there is strong evidence of improvement.
    • If interval includes 220 km/h, results are inconclusive.

🏁Conclusion🏁

Key Takeaways

  • Hypothesis testing is crucial for Formula 1 teams when testing performance upgrades.
  • Lap speed improvements impact qualifying, tire strategy, and race performance.
  • Statistical significance helps engineers make data-driven decisions.

Formula 1 teams use data science and statistics to optimize performance on race day!🏁🏎️💨