Hero or Villain? The Statistical Case for Max Verstappen’s Era of Dominance

Author

Cole Tenfelde

Introduction

If you have watched Formula One at any point in the last few years you already know who Max Verstappen is. He is a 27 year old Dutch driver for Red Bull Racing and he has been really hard to beat. Between 2021 and 2024 he won four straight World Drivers Championships which is one of the best runs any driver has ever had in the sport. His 2023 season was honestly kind of crazy. He won 19 out of 22 races that year. To put that in perspective most champions win somewhere around 10 to 13 races in a season so 19 is just a lot. It shows how consistent he was the whole year. For anyone who does not follow Formula One here is basically how it works. There are twenty drivers who race at tracks around the world from March to December. You get points based on where you finish, 25 for winning and 1 point if you finish tenth. Whoever has the most points at the end of the year is the World Drivers Champion. Teams also compete for something called the Constructors Championship which just adds up the points from both of their drivers. This project is looking at two things. First what does the race data actually tell us about what it takes to do well in Formula One, like does where you start matter, does rain change anything, and how often does the top driver at each team beat their teammate in qualifying. Second how does all of that connect to the championship standings from 2021 through 2024. To answer that I am using two datasets. The first one has race by race data from 2014 to 2024 with things like starting positions, finishing results, and weather. The second is championship standings I pulled from f1-fansite.com that go round by round for each season from 2021 to 2024.


Part 1: Primary Data

Loading the Data

library(readr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(gt)
Warning: package 'gt' was built under R version 4.3.3
library(rvest)

Attaching package: 'rvest'
The following object is masked from 'package:readr':

    guess_encoding
library(httr)
library(stringr)

F1.Data <- read_csv(
  "https://myxavier-my.sharepoint.com/:x:/g/personal/tenfeldec_xavier_edu/IQDobfUqiw6RSJfaNL1xZ-uyAauxsOh1oFQlt8BzYxYYdP4?e=r4Q1w4&download=1",
  show_col_types = FALSE
)

Data Dictionary

Each row in this dataset is one driver’s entry in one race. Below are the variables I used.

tibble(
  Variable             = c("year", "round", "grid", "Top 3 Finish",
                           "rainy", "wins", "nro_cond_escuderia",
                           "constructorId", "driver_age", "statusId"),
  Type                 = c("Integer", "Integer", "Integer", "Logical",
                           "Binary (0/1)", "Integer", "Integer",
                           "Integer", "Numeric", "Integer"),
  Description          = c(
    "Year the race took place",
    "Round number within the season",
    "Starting grid position (1 = pole position, 0 = did not qualify)",
    "Whether the driver finished in the top 3",
    "Whether the race was held in rainy conditions (1 = yes, 0 = no)",
    "Number of wins the driver had in the dataset",
    "Driver number within their team (1 = lead driver, 2 = second driver)",
    "Unique ID for the constructor (team)",
    "Age of the driver at the time of the race",
    "Race outcome code (1 = finished, 11-19 = lapped by the leader)"
  )
) %>%
  gt() %>%
  tab_header(title = "Primary Dataset: Data Dictionary")
Primary Dataset: Data Dictionary
Variable Type Description
year Integer Year the race took place
round Integer Round number within the season
grid Integer Starting grid position (1 = pole position, 0 = did not qualify)
Top 3 Finish Logical Whether the driver finished in the top 3
rainy Binary (0/1) Whether the race was held in rainy conditions (1 = yes, 0 = no)
wins Integer Number of wins the driver had in the dataset
nro_cond_escuderia Integer Driver number within their team (1 = lead driver, 2 = second driver)
constructorId Integer Unique ID for the constructor (team)
driver_age Numeric Age of the driver at the time of the race
statusId Integer Race outcome code (1 = finished, 11-19 = lapped by the leader)

Summary Statistics

F1.Data <- F1.Data %>%
  mutate(got_lapped = statusId >= 11 & statusId <= 19)

F1.Data %>%
  summarise(
    `Total Race Entries`   = n(),
    `Years Covered`        = paste(min(year, na.rm = TRUE), "to", max(year, na.rm = TRUE)),
    `Avg Starting Grid`    = round(mean(grid[grid > 0], na.rm = TRUE), 2),
    `% Rainy Races`        = round(mean(rainy, na.rm = TRUE) * 100, 1),
    `% Podium Finishes`    = round(mean(`Top 3 Finish`, na.rm = TRUE) * 100, 1),
    `Max Wins in a Season` = max(wins, na.rm = TRUE)
  ) %>%
  gt() %>%
  tab_header(title = "Primary Dataset: Summary Statistics")
Primary Dataset: Summary Statistics
Total Race Entries Years Covered Avg Starting Grid % Rainy Races % Podium Finishes Max Wins in a Season
9839 2000 to 2024 11.02 42.8 14.2 19

Visual 1: Does Rain Actually Level the Playing Field?

You hear a lot in Formula One that rain creates chaos and gives smaller teams a shot at an upset. The idea is that wet conditions are unpredictable enough that starting position does not matter as much. I wanted to see if that was actually true so I looked at 2024 and asked if a driver started in the top 3 in a rainy race did they still end up on the podium.

top3_rain_2024 <- F1.Data %>%
  filter(year == 2024,
         rainy == 1,
         grid <= 3,
         grid != 0)

podium_counts <- table(top3_rain_2024$`Top 3 Finish`)

barplot(podium_counts,
        names.arg = c("Did Not Podium", "Podiumed"),
        col = c("red", "green"),
        main = "Top 3 Starters in Rainy 2024 Races",
        ylab = "Number of Drivers")

The short answer is yes, starting up front still matters a lot even in the rain. Drivers who started in the top 3 finished on the podium 80% of the time in wet 2024 races. I honestly thought rain would shake things up more than that but it turns out if your car is fast enough to qualify at the front it is probably fast enough to stay there even when it gets slippery.


Visual 2: Does the Number 1 Driver Actually Beat Their Teammate in Qualifying?

Every Formula One team has two drivers and there is usually a pretty clear pecking order between them. The number 1 driver is basically the team’s main guy and should theoretically be faster. Qualifying is probably the best way to test that because both drivers are in the same car so there is no strategy or luck involved. I looked at the hybrid era from 2014 through 2024 to see how often the number 1 driver actually outqualified their teammate.

hybrid_data <- F1.Data %>%
  filter(year >= 2014,
         year <= 2024,
         grid != 0)

teammates <- hybrid_data %>%
  filter(nro_cond_escuderia %in% c(1, 2))

teammate_compare <- teammates %>%
  select(year, round, constructorId, nro_cond_escuderia, grid) %>%
  pivot_wider(names_from = nro_cond_escuderia,
              values_from = grid,
              names_prefix = "driver_") %>%
  mutate(driver1_outqualified = driver_1 < driver_2)

outqualify_by_year <- teammate_compare %>%
  group_by(year) %>%
  summarise(times_driver1_outqualified = sum(driver1_outqualified, na.rm = TRUE))

barplot(outqualify_by_year$times_driver1_outqualified,
        names.arg = outqualify_by_year$year,
        col = "orange",
        main = "Times No.1 Driver Outqualified No.2 (2014-2024)",
        xlab = "Season",
        ylab = "Count")

For most of the hybrid era the number 1 driver pretty consistently beat their teammate in qualifying. The interesting drop comes in 2024, where that gap closed noticeably compared to 2022 and 2023. That tells you something about how competitive the field got internally at most teams, not just between teams. One thing worth noting is that the number 1 and number 2 labels are not always official, so there is some guesswork in how that gets assigned in the data.


Visual 3: Who Had the Most Wins in a Season?

To show how unusual 2023 was I looked at the most wins any single driver got in each season of the hybrid era. This gives a pretty easy way to see just how dominant certain seasons were compared to others.

max_wins_by_season <- hybrid_data %>%
  group_by(year) %>%
  summarise(max_wins = max(wins, na.rm = TRUE))

barplot(max_wins_by_season$max_wins,
        names.arg = max_wins_by_season$year,
        col = "purple",
        main = "Max Driver Wins Per Season (2014-2024)",
        xlab = "Season",
        ylab = "Maximum Wins")

The 2023 bar is not really close to anything else on the chart. Nineteen wins in a single season has not happened before in this era. Most years the top driver wins somewhere around 10 to 13 races and that already feels like a lot. The other thing that stands out is 2024 which had the lowest peak win total of any season here. That basically means the 2024 championship was spread out across a bunch of different drivers which matches what the standings data shows later on.

Part 2: Secondary Data (Scraped from f1-fansite.com)

The second dataset I used was scraped from f1-fansite.com using the rvest package in R. This covers the 2021 through 2025 seasons and tracks every driver’s points round by round throughout each championship. Sprint race results are not included.

How I scrapped the data

#user agent
set_config(user_agent("Rvest Practice/student scraper for academic research/tenfeldec@xavier.edu"))


points_map <- c(
  "1" = 25, "2" = 18, "3" = 15, "4" = 12, "5" = 10,
  "6" = 8,  "7" = 6,  "8" = 4,  "9" = 2,  "10" = 1
)


scrape_f1_standings <- function(year) {
  
  url <- paste0("https://www.f1-fansite.com/f1-results/f1-standings-", year, "-championship/")
  message("Scraping: ", url)
  
#Load live page
  page <- read_html_live(url)
  Sys.sleep(5)
  
#Confirm position of the correct table heading
  headings <- page %>%
    html_elements("h2") %>%
    html_text2()
  
  message("Headings found: ", paste(headings, collapse = " | "))
  
#Scrape standings table
  standings <- page %>%
    html_elements("table") %>%
    .[[3]] %>%
    html_table()
  

  colnames(standings)[1] <- "Position"
  colnames(standings)[2] <- "Driver"
  
#Cleaning data frame
  standings_clean <- standings %>%
    mutate(across(everything(), as.character)) %>%
    pivot_longer(
      cols      = -c(Position, Driver),
      names_to  = "Race",
      values_to = "Result"
    ) %>%
    filter(!is.na(Result), Result != "") %>%
    filter(Race != "Pts") %>%
    group_by(Driver) %>%
    mutate(
      Season         = year,
      Fastest_Lap    = str_detect(Result, "\\*"),
      Result         = str_remove(Result, "\\*"),
      Points_Gained  = replace_na(as.numeric(points_map[Result]), 0),
      Round          = match(Race, unique(Race)),
      Cumulative_Pts = cumsum(Points_Gained)
    ) %>%
    ungroup()
  
  return(standings_clean)
}

#Can't use 2026 because the table is in the wrong spot
years <- c(2021,2022,2023, 2024, 2025)

all_seasons <- list()

for (year in years) {
  all_seasons[[as.character(year)]] <- scrape_f1_standings(year)
  Sys.sleep(3)
}
Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2021-championship/
Headings found: F1 2021 Championship Overview
Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2022-championship/
Headings found: F1 Championship 2022 Overview
Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2023-championship/
Headings found: 2023 F1 Championship Results
Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2024-championship/
Headings found: 2024 F1 Championship Results
Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2025-championship/
Headings found: 2025 F1 Championship Results
#combine seasons
standings_all <- bind_rows(all_seasons)
write.csv(standings_all, "f1_standings_all.csv", row.names = FALSE)
f1_standings_data <- read.csv("f1_standings_all.csv")

Loading the Scraped Data

f1_standings_data <- read.csv(
  "https://myxavier-my.sharepoint.com/:x:/g/personal/tenfeldec_xavier_edu/IQB2IQb6aMbIRpaJMAFSv64lAU_S2xU7ajAT6u4YX03xnto?e=MD6UbX&download=1"
)

rivals <- c("Max Verstappen", "Lando Norris", "Charles Leclerc",
            "Lewis Hamilton", "Carlos Sainz")

rivals_data     <- f1_standings_data %>% filter(Driver %in% rivals)
verstappen_data <- f1_standings_data %>% filter(Driver == "Max Verstappen")

summary_table <- f1_standings_data %>%
  filter(Driver == "Max Verstappen") %>%
  mutate(
    Win    = Result == "1",
    Podium = Result %in% c("1", "2", "3"),
    DNF    = Result %in% c("Wd", "DSQ", "DNS", "DNF")
  ) %>%
  group_by(Season) %>%
  summarise(
    Races        = n(),
    Wins         = sum(Win,         na.rm = TRUE),
    Podiums      = sum(Podium,      na.rm = TRUE),
    DNFs         = sum(DNF,         na.rm = TRUE),
    Fastest_Laps = sum(Fastest_Lap, na.rm = TRUE),
    Total_Points = max(Cumulative_Pts, na.rm = TRUE)
  ) %>%
  mutate(
    Points_Possible = Races * 25,
    Points_Pct      = round(Total_Points / Points_Possible * 100, 1)
  )

Data Dictionary

tibble(
  Variable       = c("Driver", "Season", "Race", "Round", "Result",
                     "Points_Gained", "Cumulative_Pts", "Fastest_Lap", "Position"),
  Description    = c(
    "Driver full name",
    "Championship season year",
    "Grand Prix abbreviation (e.g. BAH, MON, SPA)",
    "Round number within the season",
    "Finishing position in the race",
    "Points awarded for that race",
    "Running total of points through that round",
    "Whether the driver recorded the fastest lap that race (TRUE/FALSE)",
    "Final season championship standing"
  )
) %>%
  gt() %>%
  tab_header(title = "Secondary Dataset: Data Dictionary")
Secondary Dataset: Data Dictionary
Variable Description
Driver Driver full name
Season Championship season year
Race Grand Prix abbreviation (e.g. BAH, MON, SPA)
Round Round number within the season
Result Finishing position in the race
Points_Gained Points awarded for that race
Cumulative_Pts Running total of points through that round
Fastest_Lap Whether the driver recorded the fastest lap that race (TRUE/FALSE)
Position Final season championship standing

Visual 4: Verstappen Season Summary

Before getting into the charts, here is a simple table breaking down Verstappen’s numbers season by season. This is probably the most useful single view of just how different 2023 was compared to everything else.

summary_table %>%
  gt() %>%
  tab_header(
    title    = "Max Verstappen Season Summary 2021-2025",
    subtitle = "Data scraped from f1-fansite.com"
  ) %>%
  cols_label(
    Season          = "Season",
    Races           = "Races",
    Wins            = "Wins",
    Podiums         = "Podiums",
    DNFs            = "DNFs",
    Fastest_Laps    = "Fastest Laps",
    Total_Points    = "Points",
    Points_Possible = "Pts Possible",
    Points_Pct      = "Pts %"
  )
Max Verstappen Season Summary 2021-2025
Data scraped from f1-fansite.com
Season Races Wins Podiums DNFs Fastest Laps Points Pts Possible Pts %
2021 22 10 18 3 6 396 550 72.0
2022 22 15 17 1 5 428 550 77.8
2023 22 19 21 0 9 521 550 94.7
2024 24 9 14 1 3 396 600 66.0
2025 24 8 15 1 3 389 600 64.8

In 2023 he won 19 races, had almost no retirements, and scored over 85% of the total points available that season. Every other year looks pretty normal by comparison. By 2025 the win total dropped, the DNFs crept up, and Lando Norris ended up with more points at the end.


Visual 5: How Much of the Available Points Did He Actually Get?

This chart compares how many points Verstappen scored each season to how many were theoretically available if he had won every single race.

summary_table %>%
  pivot_longer(
    cols      = c(Total_Points, Points_Possible),
    names_to  = "Type",
    values_to = "Points"
  ) %>%
  ggplot(aes(x = factor(Season), y = Points, fill = Type)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(
    values = c("Total_Points" = "red", "Points_Possible" = "blue"),
    labels = c("Total_Points" = "Points Scored", "Points_Possible" = "Points Possible")
  ) +
  labs(
    title = "Verstappen Points Scored vs Points Possible",
    x     = "Season",
    y     = "Points",
    fill  = ""
  ) +
  theme_minimal()

In 2023 the red bar almost reaches the blue one. That is what it looks like when a driver is winning almost everything. In 2024 and 2025 that gap gets noticeably wider, which means more races where he did not win, more mechanical issues, and more ground given up to rivals.


Visual 6: How Far Ahead Was He Each Season?

This one shows Verstappen’s final point total compared to whoever finished second in the championship each year.

closest_rival <- f1_standings_data %>%
  filter(Driver != "Max Verstappen") %>%
  group_by(Season, Driver) %>%
  summarise(Total_Points = max(Cumulative_Pts, na.rm = TRUE), .groups = "drop") %>%
  group_by(Season) %>%
  slice_max(Total_Points, n = 1)

verstappen_pts <- f1_standings_data %>%
  filter(Driver == "Max Verstappen") %>%
  group_by(Season) %>%
  summarise(Total_Points = max(Cumulative_Pts, na.rm = TRUE)) %>%
  mutate(Driver = "Max Verstappen")

comparison <- bind_rows(verstappen_pts, closest_rival)

ggplot(comparison, aes(x = factor(Season), y = Total_Points, fill = Driver)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Verstappen vs Closest Rival by Season",
    x     = "Season",
    y     = "Total Points",
    fill  = "Driver"
  ) +
  theme_minimal()

The 2021 season was genuinely close. Lewis Hamilton and Verstappen finished just 8 points apart after 22 races and it came down to the very last lap of the last race. Then 2023 happened and it was not close at all. By 2025 the bars flip entirely and Norris finishes on top.


Visual 7: Points Progression Round by Round

This tracks how Verstappen and his four closest rivals built up their points throughout each season, one race at a time.

rivals_data %>%
  ggplot(aes(x = Round, y = Cumulative_Pts, color = Driver)) +
  geom_line(linewidth = 1) +
  facet_wrap(~Season) +
  labs(
    title    = "Points Progression Round by Round",
    subtitle = "Verstappen vs Top Rivals (2021-2025)",
    x        = "Round",
    y        = "Cumulative Points",
    color    = "Driver"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

This chart does a good job of showing the feel of each season. In 2021 the lines stay close together the whole way through. In 2023 Verstappen pulls away from literally everyone almost immediately and just keeps going. You can see exactly when the 2025 season turned because there is a point where Norris’s line crosses Verstappen’s and does not come back.


Part 3: Putting Both Datasets Together

The two datasets basically confirm the same thing from different angles. The race-level data tells us that starting position is the most important factor in a race result, even in rain. The standings data shows that Verstappen’s championships were built on exactly that, being fastest in qualifying, converting that into wins, and almost never retiring from a race.

2024 is probably the most interesting year to look at across both datasets. From the race data it had the lowest peak win count of the whole hybrid era which means no single driver just ran away with it. From the standings Verstappen’s gap over his closest rival was a lot smaller than it was in 2022 or 2023. And the qualifying data shows that number 1 drivers across the whole grid were less dominant over their teammates in 2024 as well. Everything kind of points the same direction. The racing got more competitive everywhere at the same time and Red Bull was just not the best team on the grid anymore.

tibble(
  Finding = c(
    "Top 3 starters who podiumed in rainy 2024 races",
    "No. 1 driver qualifying edge in 2024",
    "Most wins in a single season",
    "Verstappen 2023 points efficiency",
    "Verstappen 2023 win rate",
    "Verstappen 2025 points efficiency"
  ),
  Value = c(
    "80%",
    "Lowest count of the hybrid era",
    "2023 with 19 wins",
    paste0(summary_table$Points_Pct[summary_table$Season == 2023], "%"),
    paste0(round(summary_table$Wins[summary_table$Season == 2023] /
                   summary_table$Races[summary_table$Season == 2023] * 100, 1), "%"),
    paste0(summary_table$Points_Pct[summary_table$Season == 2025], "%")
  ),
  Source = c(
    "Primary Data",
    "Primary Data",
    "Primary Data",
    "Scraped Standings",
    "Scraped Standings",
    "Scraped Standings"
  )
) %>%
  gt() %>%
  tab_header(
    title    = "Key Findings from Both Datasets"
  )
Key Findings from Both Datasets
Finding Value Source
Top 3 starters who podiumed in rainy 2024 races 80% Primary Data
No. 1 driver qualifying edge in 2024 Lowest count of the hybrid era Primary Data
Most wins in a single season 2023 with 19 wins Primary Data
Verstappen 2023 points efficiency 94.7% Scraped Standings
Verstappen 2023 win rate 86.4% Scraped Standings
Verstappen 2025 points efficiency 64.8% Scraped Standings

Conclusion

I already knew Verstappen was dominant going into this project but actually looking at the numbers made it feel a lot more real. Winning 19 races in a season sounds impressive when you hear it. Seeing it on a bar chart next to every other season in the hybrid era is a completely different thing. It honestly does not look like it belongs in the same graph. The race data helps explain why that kind of dominance even happens. Starting from the front matters more than a lot of casual fans probably realize and it still matters even when it rains. If your car is fast enough to qualify at the front it is usually fast enough to stay there once the race starts. Verstappen’s Red Bull was that car for most of 2022 and basically all of 2023.

The interesting thing about 2024 and 2025 is that the drop shows up in basically every part of the data. His win total went down, his points efficiency went down, and number 1 drivers across the whole grid were less dominant over their teammates in qualifying. All of that happening at the same time makes me think it was not really Red Bull getting worse, it was more just the whole field catching up. Norris ended up taking the most out of it and won the 2025 championship. Whether Verstappen is a hero or a villain probably just depends on who you were rooting for. But statistically what he did from 2021 through 2023 is pretty hard to argue with. That is just a really dominant stretch of racing. ————————————————————————

Primary data hosted on SharePoint. Championship standings scraped from f1-fansite.com using rvest in R.