Hero or Villain? The Statistical Case for Max Verstappen’s Era of Dominance

Author

Cole Tenfelde

Introduction

If you have watched Formula One at any point in the last few years you already know who Max Verstappen is. He is a 27 year old Dutch driver for Red Bull Racing and he has been really hard to beat. Between 2021 and 2024 he won four straight World Drivers Championships which is one of the best runs any driver has ever had in the sport. His 2023 season was honestly kind of crazy. He won 19 out of 22 races that year. To put that in perspective most champions win somewhere around 10 to 13 races in a season so 19 is just a lot. It shows how consistent he was the whole year. For anyone who does not follow Formula One here is basically how it works. There are twenty drivers who race at tracks around the world from March to December. You get points based on where you finish, 25 for winning and 1 point if you finish tenth. Whoever has the most points at the end of the year is the World Drivers Champion. Teams also compete for something called the Constructors Championship which just adds up the points from both of their drivers. This project is looking at two things. First what does the race data actually tell us about what it takes to do well in Formula One, like does where you start matter, does rain change anything, and how often does the top driver at each team beat their teammate in qualifying. Second how does all of that connect to the championship standings from 2021 through 2024. To answer that I am using two datasets. The first one has race by race data from 2014 to 2024 with things like starting positions, finishing results, and weather. The second is championship standings I pulled from f1-fansite.com that go round by round for each season from 2021 to 2024.

Part 1: Primary Data

Loading the Data

library(readr)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(tidyr)
library(ggplot2)
library(gt)

Warning: package 'gt' was built under R version 4.3.3

library(rvest)


Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding

library(httr)
library(stringr)

F1.Data <- read_csv(
  "https://myxavier-my.sharepoint.com/:x:/g/personal/tenfeldec_xavier_edu/IQDobfUqiw6RSJfaNL1xZ-uyAauxsOh1oFQlt8BzYxYYdP4?e=r4Q1w4&download=1",
  show_col_types = FALSE
)

Data Dictionary

Each row in this dataset is one driver’s entry in one race. Below are the variables I used.

tibble(
  Variable             = c("year", "round", "grid", "Top 3 Finish",
                           "rainy", "wins", "nro_cond_escuderia",
                           "constructorId", "driver_age", "statusId"),
  Type                 = c("Integer", "Integer", "Integer", "Logical",
                           "Binary (0/1)", "Integer", "Integer",
                           "Integer", "Numeric", "Integer"),
  Description          = c(
    "Year the race took place",
    "Round number within the season",
    "Starting grid position (1 = pole position, 0 = did not qualify)",
    "Whether the driver finished in the top 3",
    "Whether the race was held in rainy conditions (1 = yes, 0 = no)",
    "Number of wins the driver had in the dataset",
    "Driver number within their team (1 = lead driver, 2 = second driver)",
    "Unique ID for the constructor (team)",
    "Age of the driver at the time of the race",
    "Race outcome code (1 = finished, 11-19 = lapped by the leader)"
  )
) %>%
  gt() %>%
  tab_header(title = "Primary Dataset: Data Dictionary")

Variable	Type	Description
Primary Dataset: Data Dictionary
year	Integer	Year the race took place
round	Integer	Round number within the season
grid	Integer	Starting grid position (1 = pole position, 0 = did not qualify)
Top 3 Finish	Logical	Whether the driver finished in the top 3
rainy	Binary (0/1)	Whether the race was held in rainy conditions (1 = yes, 0 = no)
wins	Integer	Number of wins the driver had in the dataset
nro_cond_escuderia	Integer	Driver number within their team (1 = lead driver, 2 = second driver)
constructorId	Integer	Unique ID for the constructor (team)
driver_age	Numeric	Age of the driver at the time of the race
statusId	Integer	Race outcome code (1 = finished, 11-19 = lapped by the leader)

Summary Statistics

F1.Data <- F1.Data %>%
  mutate(got_lapped = statusId >= 11 & statusId <= 19)

F1.Data %>%
  summarise(
    `Total Race Entries`   = n(),
    `Years Covered`        = paste(min(year, na.rm = TRUE), "to", max(year, na.rm = TRUE)),
    `Avg Starting Grid`    = round(mean(grid[grid > 0], na.rm = TRUE), 2),
    `% Rainy Races`        = round(mean(rainy, na.rm = TRUE) * 100, 1),
    `% Podium Finishes`    = round(mean(`Top 3 Finish`, na.rm = TRUE) * 100, 1),
    `Max Wins in a Season` = max(wins, na.rm = TRUE)
  ) %>%
  gt() %>%
  tab_header(title = "Primary Dataset: Summary Statistics")

Total Race Entries	Years Covered	Avg Starting Grid	% Rainy Races	% Podium Finishes	Max Wins in a Season
Primary Dataset: Summary Statistics
9839	2000 to 2024	11.02	42.8	14.2	19

Visual 1: Does Rain Actually Level the Playing Field?

You hear a lot in Formula One that rain creates chaos and gives smaller teams a shot at an upset. The idea is that wet conditions are unpredictable enough that starting position does not matter as much. I wanted to see if that was actually true so I looked at 2024 and asked if a driver started in the top 3 in a rainy race did they still end up on the podium.

top3_rain_2024 <- F1.Data %>%
  filter(year == 2024,
         rainy == 1,
         grid <= 3,
         grid != 0)

podium_counts <- table(top3_rain_2024$`Top 3 Finish`)

barplot(podium_counts,
        names.arg = c("Did Not Podium", "Podiumed"),
        col = c("red", "green"),
        main = "Top 3 Starters in Rainy 2024 Races",
        ylab = "Number of Drivers")

The short answer is yes, starting up front still matters a lot even in the rain. Drivers who started in the top 3 finished on the podium 80% of the time in wet 2024 races. I honestly thought rain would shake things up more than that but it turns out if your car is fast enough to qualify at the front it is probably fast enough to stay there even when it gets slippery.

Visual 2: Does the Number 1 Driver Actually Beat Their Teammate in Qualifying?

Every Formula One team has two drivers and there is usually a pretty clear pecking order between them. The number 1 driver is basically the team’s main guy and should theoretically be faster. Qualifying is probably the best way to test that because both drivers are in the same car so there is no strategy or luck involved. I looked at the hybrid era from 2014 through 2024 to see how often the number 1 driver actually outqualified their teammate.

hybrid_data <- F1.Data %>%
  filter(year >= 2014,
         year <= 2024,
         grid != 0)

teammates <- hybrid_data %>%
  filter(nro_cond_escuderia %in% c(1, 2))

teammate_compare <- teammates %>%
  select(year, round, constructorId, nro_cond_escuderia, grid) %>%
  pivot_wider(names_from = nro_cond_escuderia,
              values_from = grid,
              names_prefix = "driver_") %>%
  mutate(driver1_outqualified = driver_1 < driver_2)

outqualify_by_year <- teammate_compare %>%
  group_by(year) %>%
  summarise(times_driver1_outqualified = sum(driver1_outqualified, na.rm = TRUE))

barplot(outqualify_by_year$times_driver1_outqualified,
        names.arg = outqualify_by_year$year,
        col = "orange",
        main = "Times No.1 Driver Outqualified No.2 (2014-2024)",
        xlab = "Season",
        ylab = "Count")

For most of the hybrid era the number 1 driver pretty consistently beat their teammate in qualifying. The interesting drop comes in 2024, where that gap closed noticeably compared to 2022 and 2023. That tells you something about how competitive the field got internally at most teams, not just between teams. One thing worth noting is that the number 1 and number 2 labels are not always official, so there is some guesswork in how that gets assigned in the data.

Visual 3: Who Had the Most Wins in a Season?

To show how unusual 2023 was I looked at the most wins any single driver got in each season of the hybrid era. This gives a pretty easy way to see just how dominant certain seasons were compared to others.

max_wins_by_season <- hybrid_data %>%
  group_by(year) %>%
  summarise(max_wins = max(wins, na.rm = TRUE))

barplot(max_wins_by_season$max_wins,
        names.arg = max_wins_by_season$year,
        col = "purple",
        main = "Max Driver Wins Per Season (2014-2024)",
        xlab = "Season",
        ylab = "Maximum Wins")

The 2023 bar is not really close to anything else on the chart. Nineteen wins in a single season has not happened before in this era. Most years the top driver wins somewhere around 10 to 13 races and that already feels like a lot. The other thing that stands out is 2024 which had the lowest peak win total of any season here. That basically means the 2024 championship was spread out across a bunch of different drivers which matches what the standings data shows later on.

Part 2: Secondary Data (Scraped from f1-fansite.com)

The second dataset I used was scraped from f1-fansite.com using the rvest package in R. This covers the 2021 through 2025 seasons and tracks every driver’s points round by round throughout each championship. Sprint race results are not included.

How I scrapped the data

#user agent
set_config(user_agent("Rvest Practice/student scraper for academic research/tenfeldec@xavier.edu"))


points_map <- c(
  "1" = 25, "2" = 18, "3" = 15, "4" = 12, "5" = 10,
  "6" = 8,  "7" = 6,  "8" = 4,  "9" = 2,  "10" = 1
)


scrape_f1_standings <- function(year) {
  
  url <- paste0("https://www.f1-fansite.com/f1-results/f1-standings-", year, "-championship/")
  message("Scraping: ", url)
  
#Load live page
  page <- read_html_live(url)
  Sys.sleep(5)
  
#Confirm position of the correct table heading
  headings <- page %>%
    html_elements("h2") %>%
    html_text2()
  
  message("Headings found: ", paste(headings, collapse = " | "))
  
#Scrape standings table
  standings <- page %>%
    html_elements("table") %>%
    .[[3]] %>%
    html_table()
  

  colnames(standings)[1] <- "Position"
  colnames(standings)[2] <- "Driver"
  
#Cleaning data frame
  standings_clean <- standings %>%
    mutate(across(everything(), as.character)) %>%
    pivot_longer(
      cols      = -c(Position, Driver),
      names_to  = "Race",
      values_to = "Result"
    ) %>%
    filter(!is.na(Result), Result != "") %>%
    filter(Race != "Pts") %>%
    group_by(Driver) %>%
    mutate(
      Season         = year,
      Fastest_Lap    = str_detect(Result, "\\*"),
      Result         = str_remove(Result, "\\*"),
      Points_Gained  = replace_na(as.numeric(points_map[Result]), 0),
      Round          = match(Race, unique(Race)),
      Cumulative_Pts = cumsum(Points_Gained)
    ) %>%
    ungroup()
  
  return(standings_clean)
}

#Can't use 2026 because the table is in the wrong spot
years <- c(2021,2022,2023, 2024, 2025)

all_seasons <- list()

for (year in years) {
  all_seasons[[as.character(year)]] <- scrape_f1_standings(year)
  Sys.sleep(3)
}

Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2021-championship/

Headings found: F1 2021 Championship Overview

Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2022-championship/

Headings found: F1 Championship 2022 Overview

Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2023-championship/

Headings found: 2023 F1 Championship Results

Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2024-championship/

Headings found: 2024 F1 Championship Results

Scraping: https://www.f1-fansite.com/f1-results/f1-standings-2025-championship/

Headings found: 2025 F1 Championship Results

#combine seasons
standings_all <- bind_rows(all_seasons)
write.csv(standings_all, "f1_standings_all.csv", row.names = FALSE)
f1_standings_data <- read.csv("f1_standings_all.csv")

Loading the Scraped Data

f1_standings_data <- read.csv(
  "https://myxavier-my.sharepoint.com/:x:/g/personal/tenfeldec_xavier_edu/IQB2IQb6aMbIRpaJMAFSv64lAU_S2xU7ajAT6u4YX03xnto?e=MD6UbX&download=1"
)

rivals <- c("Max Verstappen", "Lando Norris", "Charles Leclerc",
            "Lewis Hamilton", "Carlos Sainz")

rivals_data     <- f1_standings_data %>% filter(Driver %in% rivals)
verstappen_data <- f1_standings_data %>% filter(Driver == "Max Verstappen")

summary_table <- f1_standings_data %>%
  filter(Driver == "Max Verstappen") %>%
  mutate(
    Win    = Result == "1",
    Podium = Result %in% c("1", "2", "3"),
    DNF    = Result %in% c("Wd", "DSQ", "DNS", "DNF")
  ) %>%
  group_by(Season) %>%
  summarise(
    Races        = n(),
    Wins         = sum(Win,         na.rm = TRUE),
    Podiums      = sum(Podium,      na.rm = TRUE),
    DNFs         = sum(DNF,         na.rm = TRUE),
    Fastest_Laps = sum(Fastest_Lap, na.rm = TRUE),
    Total_Points = max(Cumulative_Pts, na.rm = TRUE)
  ) %>%
  mutate(
    Points_Possible = Races * 25,
    Points_Pct      = round(Total_Points / Points_Possible * 100, 1)
  )

Data Dictionary

tibble(
  Variable       = c("Driver", "Season", "Race", "Round", "Result",
                     "Points_Gained", "Cumulative_Pts", "Fastest_Lap", "Position"),
  Description    = c(
    "Driver full name",
    "Championship season year",
    "Grand Prix abbreviation (e.g. BAH, MON, SPA)",
    "Round number within the season",
    "Finishing position in the race",
    "Points awarded for that race",
    "Running total of points through that round",
    "Whether the driver recorded the fastest lap that race (TRUE/FALSE)",
    "Final season championship standing"
  )
) %>%
  gt() %>%
  tab_header(title = "Secondary Dataset: Data Dictionary")

Variable	Description
Secondary Dataset: Data Dictionary
Driver	Driver full name
Season	Championship season year
Race	Grand Prix abbreviation (e.g. BAH, MON, SPA)
Round	Round number within the season
Result	Finishing position in the race
Points_Gained	Points awarded for that race
Cumulative_Pts	Running total of points through that round
Fastest_Lap	Whether the driver recorded the fastest lap that race (TRUE/FALSE)
Position	Final season championship standing

Visual 4: Verstappen Season Summary

Before getting into the charts, here is a simple table breaking down Verstappen’s numbers season by season. This is probably the most useful single view of just how different 2023 was compared to everything else.

summary_table %>%
  gt() %>%
  tab_header(
    title    = "Max Verstappen Season Summary 2021-2025",
    subtitle = "Data scraped from f1-fansite.com"
  ) %>%
  cols_label(
    Season          = "Season",
    Races           = "Races",
    Wins            = "Wins",
    Podiums         = "Podiums",
    DNFs            = "DNFs",
    Fastest_Laps    = "Fastest Laps",
    Total_Points    = "Points",
    Points_Possible = "Pts Possible",
    Points_Pct      = "Pts %"
  )

Season	Races	Wins	Podiums	DNFs	Fastest Laps	Points	Pts Possible	Pts %
Max Verstappen Season Summary 2021-2025
Data scraped from f1-fansite.com
2021	22	10	18	3	6	396	550	72.0
2022	22	15	17	1	5	428	550	77.8
2023	22	19	21	0	9	521	550	94.7
2024	24	9	14	1	3	396	600	66.0
2025	24	8	15	1	3	389	600	64.8

In 2023 he won 19 races, had almost no retirements, and scored over 85% of the total points available that season. Every other year looks pretty normal by comparison. By 2025 the win total dropped, the DNFs crept up, and Lando Norris ended up with more points at the end.

Visual 5: How Much of the Available Points Did He Actually Get?

This chart compares how many points Verstappen scored each season to how many were theoretically available if he had won every single race.

summary_table %>%
  pivot_longer(
    cols      = c(Total_Points, Points_Possible),
    names_to  = "Type",
    values_to = "Points"
  ) %>%
  ggplot(aes(x = factor(Season), y = Points, fill = Type)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(
    values = c("Total_Points" = "red", "Points_Possible" = "blue"),
    labels = c("Total_Points" = "Points Scored", "Points_Possible" = "Points Possible")
  ) +
  labs(
    title = "Verstappen Points Scored vs Points Possible",
    x     = "Season",
    y     = "Points",
    fill  = ""
  ) +
  theme_minimal()

In 2023 the red bar almost reaches the blue one. That is what it looks like when a driver is winning almost everything. In 2024 and 2025 that gap gets noticeably wider, which means more races where he did not win, more mechanical issues, and more ground given up to rivals.

Visual 6: How Far Ahead Was He Each Season?

This one shows Verstappen’s final point total compared to whoever finished second in the championship each year.

closest_rival <- f1_standings_data %>%
  filter(Driver != "Max Verstappen") %>%
  group_by(Season, Driver) %>%
  summarise(Total_Points = max(Cumulative_Pts, na.rm = TRUE), .groups = "drop") %>%
  group_by(Season) %>%
  slice_max(Total_Points, n = 1)

verstappen_pts <- f1_standings_data %>%
  filter(Driver == "Max Verstappen") %>%
  group_by(Season) %>%
  summarise(Total_Points = max(Cumulative_Pts, na.rm = TRUE)) %>%
  mutate(Driver = "Max Verstappen")

comparison <- bind_rows(verstappen_pts, closest_rival)

ggplot(comparison, aes(x = factor(Season), y = Total_Points, fill = Driver)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Verstappen vs Closest Rival by Season",
    x     = "Season",
    y     = "Total Points",
    fill  = "Driver"
  ) +
  theme_minimal()

The 2021 season was genuinely close. Lewis Hamilton and Verstappen finished just 8 points apart after 22 races and it came down to the very last lap of the last race. Then 2023 happened and it was not close at all. By 2025 the bars flip entirely and Norris finishes on top.

Visual 7: Points Progression Round by Round

This tracks how Verstappen and his four closest rivals built up their points throughout each season, one race at a time.

rivals_data %>%
  ggplot(aes(x = Round, y = Cumulative_Pts, color = Driver)) +
  geom_line(linewidth = 1) +
  facet_wrap(~Season) +
  labs(
    title    = "Points Progression Round by Round",
    subtitle = "Verstappen vs Top Rivals (2021-2025)",
    x        = "Round",
    y        = "Cumulative Points",
    color    = "Driver"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

This chart does a good job of showing the feel of each season. In 2021 the lines stay close together the whole way through. In 2023 Verstappen pulls away from literally everyone almost immediately and just keeps going. You can see exactly when the 2025 season turned because there is a point where Norris’s line crosses Verstappen’s and does not come back.

Part 3: Putting Both Datasets Together

The two datasets basically confirm the same thing from different angles. The race-level data tells us that starting position is the most important factor in a race result, even in rain. The standings data shows that Verstappen’s championships were built on exactly that, being fastest in qualifying, converting that into wins, and almost never retiring from a race.

2024 is probably the most interesting year to look at across both datasets. From the race data it had the lowest peak win count of the whole hybrid era which means no single driver just ran away with it. From the standings Verstappen’s gap over his closest rival was a lot smaller than it was in 2022 or 2023. And the qualifying data shows that number 1 drivers across the whole grid were less dominant over their teammates in 2024 as well. Everything kind of points the same direction. The racing got more competitive everywhere at the same time and Red Bull was just not the best team on the grid anymore.

tibble(
  Finding = c(
    "Top 3 starters who podiumed in rainy 2024 races",
    "No. 1 driver qualifying edge in 2024",
    "Most wins in a single season",
    "Verstappen 2023 points efficiency",
    "Verstappen 2023 win rate",
    "Verstappen 2025 points efficiency"
  ),
  Value = c(
    "80%",
    "Lowest count of the hybrid era",
    "2023 with 19 wins",
    paste0(summary_table$Points_Pct[summary_table$Season == 2023], "%"),
    paste0(round(summary_table$Wins[summary_table$Season == 2023] /
                   summary_table$Races[summary_table$Season == 2023] * 100, 1), "%"),
    paste0(summary_table$Points_Pct[summary_table$Season == 2025], "%")
  ),
  Source = c(
    "Primary Data",
    "Primary Data",
    "Primary Data",
    "Scraped Standings",
    "Scraped Standings",
    "Scraped Standings"
  )
) %>%
  gt() %>%
  tab_header(
    title    = "Key Findings from Both Datasets"
  )

Finding	Value	Source
Key Findings from Both Datasets
Top 3 starters who podiumed in rainy 2024 races	80%	Primary Data
No. 1 driver qualifying edge in 2024	Lowest count of the hybrid era	Primary Data
Most wins in a single season	2023 with 19 wins	Primary Data
Verstappen 2023 points efficiency	94.7%	Scraped Standings
Verstappen 2023 win rate	86.4%	Scraped Standings
Verstappen 2025 points efficiency	64.8%	Scraped Standings

Conclusion

I already knew Verstappen was dominant going into this project but actually looking at the numbers made it feel a lot more real. Winning 19 races in a season sounds impressive when you hear it. Seeing it on a bar chart next to every other season in the hybrid era is a completely different thing. It honestly does not look like it belongs in the same graph. The race data helps explain why that kind of dominance even happens. Starting from the front matters more than a lot of casual fans probably realize and it still matters even when it rains. If your car is fast enough to qualify at the front it is usually fast enough to stay there once the race starts. Verstappen’s Red Bull was that car for most of 2022 and basically all of 2023.

The interesting thing about 2024 and 2025 is that the drop shows up in basically every part of the data. His win total went down, his points efficiency went down, and number 1 drivers across the whole grid were less dominant over their teammates in qualifying. All of that happening at the same time makes me think it was not really Red Bull getting worse, it was more just the whole field catching up. Norris ended up taking the most out of it and won the 2025 championship. Whether Verstappen is a hero or a villain probably just depends on who you were rooting for. But statistically what he did from 2021 through 2023 is pretty hard to argue with. That is just a really dominant stretch of racing. ————————————————————————

Primary data hosted on SharePoint. Championship standings scraped from f1-fansite.com using rvest in R.