knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.5.2

## Warning: package 'lubridate' was built under R version 4.5.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(knitr)
library(scales)

## Warning: package 'scales' was built under R version 4.5.2

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

# Print working directory and files
setwd("C:/Users/Nicop/Downloads/RStudio Projects/f1")

# Load all CSV files
circuits <- read.csv("circuits.csv")
constructor_results <- read.csv("constructor_results.csv")
constructor_standings <- read.csv("constructor_standings.csv")
constructors <- read.csv("constructors.csv")
driver_standings <- read.csv("driver_standings.csv")
drivers <- read.csv("drivers.csv")
lap_times <- read.csv("lap_times.csv")
pit_stops <- read.csv("pit_stops.csv")
qualifying <- read.csv("qualifying.csv")
races <- read.csv("races.csv")
results <- read.csv("results.csv")
seasons <- read.csv("seasons.csv")
sprint_results <- read.csv("sprint_results.csv")
status <- read.csv("status.csv")

# Preview key columns from results
summary(select(results, grid, positionOrder, points))

##       grid       positionOrder       points      
##  Min.   : 0.00   Min.   : 1.00   Min.   : 0.000  
##  1st Qu.: 5.00   1st Qu.: 6.00   1st Qu.: 0.000  
##  Median :11.00   Median :12.00   Median : 0.000  
##  Mean   :11.13   Mean   :12.79   Mean   : 1.988  
##  3rd Qu.:17.00   3rd Qu.:18.00   3rd Qu.: 2.000  
##  Max.   :34.00   Max.   :39.00   Max.   :50.000

# Join year and names to data
results <- results %>%
  left_join(races %>% select(raceId, year), by = "raceId")

driver_standings <- driver_standings %>%
  left_join(races %>% select(raceId, year), by = "raceId") %>%
  left_join(drivers, by = "driverId") %>%
  mutate(Driver = paste(forename, surname))

constructor_standings <- constructor_standings %>%
  left_join(races %>% select(raceId, year), by = "raceId") %>%
  left_join(constructors, by = "constructorId")

# Filter to modern F1 era (2000+)
results_2000 <- results %>% filter(year >= 2000)
driver_standings_2000 <- driver_standings %>% filter(year >= 2000)
constructor_standings_2000 <- constructor_standings %>% filter(year >= 2000)

# Add helper fields
results_2000 <- results_2000 %>%
  mutate(
    grid = as.numeric(grid),
    finish = as.numeric(positionOrder),
    positions_gained = grid - finish
  )

1 Business Questions

Which constructors dominated Formula 1 since 2000?
Which drivers consistently finished at the top?
Does qualifying position predict race results?
Do pit stops affect finishing position?

2 Analysis

2.1 1. Constructor Dominance

top_constructors <- constructor_standings_2000 %>%
  group_by(year, name) %>%
  summarise(points = sum(points, na.rm = TRUE), .groups = "drop")

ggplot(top_constructors, aes(x = year, y = points, color = name)) +
  geom_line(size = 1) +
  labs(title = "Constructor Points by Season (2000+)",
       x = "Year", y = "Total Points") +
  theme_minimal()

This visualization shows the total number of constructor points earned by each team in every season since 2000. Clear periods of dominance emerge, with specific constructors consistently outperforming others over multi‑year spans. These trends reflect how long‑term success in Formula 1 is driven by sustained engineering advantages, regulatory adaptation, and organizational stability rather than short‑term performance spikes. The chart highlights how Formula 1 tends to be dominated by a small number of elite teams during distinct eras.

# Number of wins per constructor
wins_per_team <- results_2000 %>%
  filter(positionOrder == 1) %>%
  left_join(constructors, by = "constructorId") %>%
  count(year, name)

ggplot(wins_per_team, aes(x = year, y = n, fill = name)) +
  geom_col(position = "stack") +
  labs(title = "Race Wins by Constructor (2000+)",
       x = "Year", y = "Number of Wins") +
  theme_minimal()

While total points reflect consistency, race wins capture outright competitive dominance. This chart shows how many races each constructor won per season, revealing sharper peaks of dominance than the points-based analysis. Certain teams convert performance advantages directly into victories, reinforcing the idea that Formula 1 success is highly concentrated. The visualization demonstrates how dominant teams often control a majority of race wins in a given season, leaving limited opportunities for competitors.

2.2 2. Top Drivers by Season

top_drivers <- driver_standings_2000 %>%
  filter(position <= 3) %>%
  group_by(year, Driver) %>%
  summarise(points = max(points), .groups = "drop")

ggplot(top_drivers, aes(x = year, y = points, color = Driver)) +
  geom_line(size = 1) +
  labs(title = "Top 3 Drivers by Season (2000+)",
       x = "Year", y = "Points") +
  theme_minimal()

This plot tracks the points scored by the top three drivers in each season, illustrating patterns of driver consistency and competitive hierarchy. Drivers who repeatedly appear in the top three demonstrate sustained elite performance, often supported by strong constructors. The chart emphasizes that driver success in Formula 1 is closely tied to team performance, as top drivers frequently cluster within dominant teams during each era.

# Who appeared in top 3 most often?
driver_podiums <- top_drivers %>%
  count(Driver, sort = TRUE)

ggplot(driver_podiums, aes(x = reorder(Driver, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Most Frequent Top 3 Finishers (2000+)",
       x = "Driver", y = "Top 3 Finishes") +
  theme_minimal()

This chart aggregates how often individual drivers finished in the top three positions across all seasons since 2000. Unlike championship standings, this visualization focuses on consistency rather than peak success. Drivers appearing most frequently highlight long-term reliability and adaptability across seasons and regulations. The results suggest that sustained excellence in Formula 1 requires not only talent but also continued access to competitive machinery.

2.3 3. Qualifying vs Finishing Position

ggplot(results_2000, aes(x = grid, y = finish)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Qualifying Position vs Finishing Position",
       x = "Starting Grid Position", y = "Finishing Position") +
  theme_minimal()

This scatter plot examines the relationship between a driver’s starting grid position and their finishing position. The downward-sloping trend indicates a strong association between qualifying performance and race outcomes. Drivers who qualify closer to the front are far more likely to finish near the front, reflecting the importance of track position, overtaking difficulty, and race strategy. This supports the conclusion that qualifying is one of the most critical performance factors in Formula 1.

correlation <- cor(results_2000$grid, results_2000$finish, use = "complete.obs")
paste(
  "Correlation between grid position and finishing position:",
  round(correlation, 3),
  "indicating that qualifying position is a strong predictor of race outcomes."
)

## [1] "Correlation between grid position and finishing position: 0.556 indicating that qualifying position is a strong predictor of race outcomes."

The reported correlation coefficient quantifies the strength of the relationship between qualifying position and finishing position. A strong positive correlation confirms that starting position is a statistically meaningful predictor of race results. This numerical summary reinforces the visual trend observed in the scatter plot and provides formal statistical evidence that qualifying performance plays a central role in determining race outcomes.

2.4 4. Pit Stops vs Finishing Position

pit_summary <- pit_stops %>%
  group_by(raceId, driverId) %>%
  summarise(pit_stops = n(), .groups = "drop")

pit_results <- pit_summary %>%
  left_join(results_2000, by = c("raceId", "driverId"))

ggplot(pit_results, aes(x = pit_stops, y = finish)) +
  geom_jitter(alpha = 0.4) +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "Number of Pit Stops vs Finish",
       x = "Pit Stops", y = "Finishing Position") +
  theme_minimal()

This visualization explores how the number of pit stops made during a race relates to finishing position. The trend suggests that drivers with fewer pit stops tend to achieve better finishing positions on average. While pit strategy depends on race conditions and tire degradation, the results indicate that minimizing pit stops—when possible—can be advantageous. This highlights the strategic trade-offs teams must manage between tire performance, track position, and total race time.

3 Results

Constructors: Mercedes, Ferrari, and Red Bull dominated different eras of Formula 1.
Drivers: Hamilton, Vettel, Alonso, and Verstappen consistently finish near the front.
Qualifying: Strong correlation between grid and finish — position matters.
Pit stops: Fewer stops generally lead to better finishing positions.

4 Conclusion

This analysis shows how constructor strength, driver consistency, qualifying strategy, and pit management have shaped modern F1 performance.

Red Bull’s recent dominance shows team strategy and engineering precision.
Grid position heavily influences success — especially on tighter tracks.
Efficient pit stops are critical to gaining track position without overtaking.

5 LLM Usage Report

I used an AI assistant (ChatGPT – GPT‑4o) to help with:

Designing the structure of this R Markdown report
Writing and debugging R code for loading, filtering, and visualizing data
Creating additional charts to tell a clear story
Writing short narrative explanations of results and business insights

All code and analysis were reviewed and executed by me to ensure correctness.

5.1 Bonus Insight

One interesting pattern I noticed is that even when dominant teams change, the relationship between qualifying position and finishing position remains consistently strong. This suggests that while teams and drivers vary by era, track position remains one of the most stable performance advantages in Formula 1.

Formula 1 Performance Analysis – Final Project

Nico Gonzalez