Hero or Villain? The Statistical Case for Max Verstappen’s Era of Dominance
Author
Cole Tenfelde
Introduction
If you have watched Formula One at any point in the last few years you already know who Max Verstappen is. He is a 27 year old Dutch driver for Red Bull Racing and he has been really hard to beat. Between 2021 and 2024 he won four straight World Drivers Championships which is one of the best runs any driver has ever had in the sport. His 2023 season was honestly kind of crazy. He won 19 out of 22 races that year. To put that in perspective most champions win somewhere around 10 to 13 races in a season so 19 is just a lot. It shows how consistent he was the whole year. For anyone who does not follow Formula One here is basically how it works. There are twenty drivers who race at tracks around the world from March to December. You get points based on where you finish, 25 for winning and 1 point if you finish tenth. Whoever has the most points at the end of the year is the World Drivers Champion. Teams also compete for something called the Constructors Championship which just adds up the points from both of their drivers. This project is looking at two things. First what does the race data actually tell us about what it takes to do well in Formula One, like does where you start matter, does rain change anything, and how often does the top driver at each team beat their teammate in qualifying. Second how does all of that connect to the championship standings from 2021 through 2024. To answer that I am using two datasets. The first one has race by race data from 2014 to 2024 with things like starting positions, finishing results, and weather. The second is championship standings I pulled from f1-fansite.com that go round by round for each season from 2021 to 2024.
Part 1: Primary Data
Loading the Data
library(readr)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)library(ggplot2)library(gt)
Warning: package 'gt' was built under R version 4.3.3
library(rvest)
Attaching package: 'rvest'
The following object is masked from 'package:readr':
guess_encoding
Each row in this dataset is one driver’s entry in one race. Below are the variables I used.
tibble(Variable =c("year", "round", "grid", "Top 3 Finish","rainy", "wins", "nro_cond_escuderia","constructorId", "driver_age", "statusId"),Type =c("Integer", "Integer", "Integer", "Logical","Binary (0/1)", "Integer", "Integer","Integer", "Numeric", "Integer"),Description =c("Year the race took place","Round number within the season","Starting grid position (1 = pole position, 0 = did not qualify)","Whether the driver finished in the top 3","Whether the race was held in rainy conditions (1 = yes, 0 = no)","Number of wins the driver had in the dataset","Driver number within their team (1 = lead driver, 2 = second driver)","Unique ID for the constructor (team)","Age of the driver at the time of the race","Race outcome code (1 = finished, 11-19 = lapped by the leader)" )) %>%gt() %>%tab_header(title ="Primary Dataset: Data Dictionary")
Primary Dataset: Data Dictionary
Variable
Type
Description
year
Integer
Year the race took place
round
Integer
Round number within the season
grid
Integer
Starting grid position (1 = pole position, 0 = did not qualify)
Top 3 Finish
Logical
Whether the driver finished in the top 3
rainy
Binary (0/1)
Whether the race was held in rainy conditions (1 = yes, 0 = no)
wins
Integer
Number of wins the driver had in the dataset
nro_cond_escuderia
Integer
Driver number within their team (1 = lead driver, 2 = second driver)
constructorId
Integer
Unique ID for the constructor (team)
driver_age
Numeric
Age of the driver at the time of the race
statusId
Integer
Race outcome code (1 = finished, 11-19 = lapped by the leader)
Visual 1: Does Rain Actually Level the Playing Field?
You hear a lot in Formula One that rain creates chaos and gives smaller teams a shot at an upset. The idea is that wet conditions are unpredictable enough that starting position does not matter as much. I wanted to see if that was actually true so I looked at 2024 and asked if a driver started in the top 3 in a rainy race did they still end up on the podium.
top3_rain_2024 <- F1.Data %>%filter(year ==2024, rainy ==1, grid <=3, grid !=0)podium_counts <-table(top3_rain_2024$`Top 3 Finish`)barplot(podium_counts,names.arg =c("Did Not Podium", "Podiumed"),col =c("red", "green"),main ="Top 3 Starters in Rainy 2024 Races",ylab ="Number of Drivers")
The short answer is yes, starting up front still matters a lot even in the rain. Drivers who started in the top 3 finished on the podium 80% of the time in wet 2024 races. I honestly thought rain would shake things up more than that but it turns out if your car is fast enough to qualify at the front it is probably fast enough to stay there even when it gets slippery.
Visual 2: Does the Number 1 Driver Actually Beat Their Teammate in Qualifying?
Every Formula One team has two drivers and there is usually a pretty clear pecking order between them. The number 1 driver is basically the team’s main guy and should theoretically be faster. Qualifying is probably the best way to test that because both drivers are in the same car so there is no strategy or luck involved. I looked at the hybrid era from 2014 through 2024 to see how often the number 1 driver actually outqualified their teammate.
For most of the hybrid era the number 1 driver pretty consistently beat their teammate in qualifying. The interesting drop comes in 2024, where that gap closed noticeably compared to 2022 and 2023. That tells you something about how competitive the field got internally at most teams, not just between teams. One thing worth noting is that the number 1 and number 2 labels are not always official, so there is some guesswork in how that gets assigned in the data.
Visual 3: Who Had the Most Wins in a Season?
To show how unusual 2023 was I looked at the most wins any single driver got in each season of the hybrid era. This gives a pretty easy way to see just how dominant certain seasons were compared to others.
max_wins_by_season <- hybrid_data %>%group_by(year) %>%summarise(max_wins =max(wins, na.rm =TRUE))barplot(max_wins_by_season$max_wins,names.arg = max_wins_by_season$year,col ="purple",main ="Max Driver Wins Per Season (2014-2024)",xlab ="Season",ylab ="Maximum Wins")
The 2023 bar is not really close to anything else on the chart. Nineteen wins in a single season has not happened before in this era. Most years the top driver wins somewhere around 10 to 13 races and that already feels like a lot. The other thing that stands out is 2024 which had the lowest peak win total of any season here. That basically means the 2024 championship was spread out across a bunch of different drivers which matches what the standings data shows later on.
Part 2: Secondary Data (Scraped from f1-fansite.com)
The second dataset I used was scraped from f1-fansite.com using the rvest package in R. This covers the 2021 through 2025 seasons and tracks every driver’s points round by round throughout each championship. Sprint race results are not included.
How I scrapped the data
#user agentset_config(user_agent("Rvest Practice/student scraper for academic research/tenfeldec@xavier.edu"))points_map <-c("1"=25, "2"=18, "3"=15, "4"=12, "5"=10,"6"=8, "7"=6, "8"=4, "9"=2, "10"=1)scrape_f1_standings <-function(year) { url <-paste0("https://www.f1-fansite.com/f1-results/f1-standings-", year, "-championship/")message("Scraping: ", url)#Load live page page <-read_html_live(url)Sys.sleep(5)#Confirm position of the correct table heading headings <- page %>%html_elements("h2") %>%html_text2()message("Headings found: ", paste(headings, collapse =" | "))#Scrape standings table standings <- page %>%html_elements("table") %>% .[[3]] %>%html_table()colnames(standings)[1] <-"Position"colnames(standings)[2] <-"Driver"#Cleaning data frame standings_clean <- standings %>%mutate(across(everything(), as.character)) %>%pivot_longer(cols =-c(Position, Driver),names_to ="Race",values_to ="Result" ) %>%filter(!is.na(Result), Result !="") %>%filter(Race !="Pts") %>%group_by(Driver) %>%mutate(Season = year,Fastest_Lap =str_detect(Result, "\\*"),Result =str_remove(Result, "\\*"),Points_Gained =replace_na(as.numeric(points_map[Result]), 0),Round =match(Race, unique(Race)),Cumulative_Pts =cumsum(Points_Gained) ) %>%ungroup()return(standings_clean)}#Can't use 2026 because the table is in the wrong spotyears <-c(2021,2022,2023, 2024, 2025)all_seasons <-list()for (year in years) { all_seasons[[as.character(year)]] <-scrape_f1_standings(year)Sys.sleep(3)}
tibble(Variable =c("Driver", "Season", "Race", "Round", "Result","Points_Gained", "Cumulative_Pts", "Fastest_Lap", "Position"),Description =c("Driver full name","Championship season year","Grand Prix abbreviation (e.g. BAH, MON, SPA)","Round number within the season","Finishing position in the race","Points awarded for that race","Running total of points through that round","Whether the driver recorded the fastest lap that race (TRUE/FALSE)","Final season championship standing" )) %>%gt() %>%tab_header(title ="Secondary Dataset: Data Dictionary")
Secondary Dataset: Data Dictionary
Variable
Description
Driver
Driver full name
Season
Championship season year
Race
Grand Prix abbreviation (e.g. BAH, MON, SPA)
Round
Round number within the season
Result
Finishing position in the race
Points_Gained
Points awarded for that race
Cumulative_Pts
Running total of points through that round
Fastest_Lap
Whether the driver recorded the fastest lap that race (TRUE/FALSE)
Position
Final season championship standing
Visual 4: Verstappen Season Summary
Before getting into the charts, here is a simple table breaking down Verstappen’s numbers season by season. This is probably the most useful single view of just how different 2023 was compared to everything else.
In 2023 he won 19 races, had almost no retirements, and scored over 85% of the total points available that season. Every other year looks pretty normal by comparison. By 2025 the win total dropped, the DNFs crept up, and Lando Norris ended up with more points at the end.
Visual 5: How Much of the Available Points Did He Actually Get?
This chart compares how many points Verstappen scored each season to how many were theoretically available if he had won every single race.
summary_table %>%pivot_longer(cols =c(Total_Points, Points_Possible),names_to ="Type",values_to ="Points" ) %>%ggplot(aes(x =factor(Season), y = Points, fill = Type)) +geom_bar(stat ="identity", position ="dodge") +scale_fill_manual(values =c("Total_Points"="red", "Points_Possible"="blue"),labels =c("Total_Points"="Points Scored", "Points_Possible"="Points Possible") ) +labs(title ="Verstappen Points Scored vs Points Possible",x ="Season",y ="Points",fill ="" ) +theme_minimal()
In 2023 the red bar almost reaches the blue one. That is what it looks like when a driver is winning almost everything. In 2024 and 2025 that gap gets noticeably wider, which means more races where he did not win, more mechanical issues, and more ground given up to rivals.
Visual 6: How Far Ahead Was He Each Season?
This one shows Verstappen’s final point total compared to whoever finished second in the championship each year.
closest_rival <- f1_standings_data %>%filter(Driver !="Max Verstappen") %>%group_by(Season, Driver) %>%summarise(Total_Points =max(Cumulative_Pts, na.rm =TRUE), .groups ="drop") %>%group_by(Season) %>%slice_max(Total_Points, n =1)verstappen_pts <- f1_standings_data %>%filter(Driver =="Max Verstappen") %>%group_by(Season) %>%summarise(Total_Points =max(Cumulative_Pts, na.rm =TRUE)) %>%mutate(Driver ="Max Verstappen")comparison <-bind_rows(verstappen_pts, closest_rival)ggplot(comparison, aes(x =factor(Season), y = Total_Points, fill = Driver)) +geom_bar(stat ="identity", position ="dodge") +labs(title ="Verstappen vs Closest Rival by Season",x ="Season",y ="Total Points",fill ="Driver" ) +theme_minimal()
The 2021 season was genuinely close. Lewis Hamilton and Verstappen finished just 8 points apart after 22 races and it came down to the very last lap of the last race. Then 2023 happened and it was not close at all. By 2025 the bars flip entirely and Norris finishes on top.
Visual 7: Points Progression Round by Round
This tracks how Verstappen and his four closest rivals built up their points throughout each season, one race at a time.
rivals_data %>%ggplot(aes(x = Round, y = Cumulative_Pts, color = Driver)) +geom_line(linewidth =1) +facet_wrap(~Season) +labs(title ="Points Progression Round by Round",subtitle ="Verstappen vs Top Rivals (2021-2025)",x ="Round",y ="Cumulative Points",color ="Driver" ) +theme_minimal() +theme(legend.position ="bottom")
This chart does a good job of showing the feel of each season. In 2021 the lines stay close together the whole way through. In 2023 Verstappen pulls away from literally everyone almost immediately and just keeps going. You can see exactly when the 2025 season turned because there is a point where Norris’s line crosses Verstappen’s and does not come back.
Part 3: Putting Both Datasets Together
The two datasets basically confirm the same thing from different angles. The race-level data tells us that starting position is the most important factor in a race result, even in rain. The standings data shows that Verstappen’s championships were built on exactly that, being fastest in qualifying, converting that into wins, and almost never retiring from a race.
2024 is probably the most interesting year to look at across both datasets. From the race data it had the lowest peak win count of the whole hybrid era which means no single driver just ran away with it. From the standings Verstappen’s gap over his closest rival was a lot smaller than it was in 2022 or 2023. And the qualifying data shows that number 1 drivers across the whole grid were less dominant over their teammates in 2024 as well. Everything kind of points the same direction. The racing got more competitive everywhere at the same time and Red Bull was just not the best team on the grid anymore.
tibble(Finding =c("Top 3 starters who podiumed in rainy 2024 races","No. 1 driver qualifying edge in 2024","Most wins in a single season","Verstappen 2023 points efficiency","Verstappen 2023 win rate","Verstappen 2025 points efficiency" ),Value =c("80%","Lowest count of the hybrid era","2023 with 19 wins",paste0(summary_table$Points_Pct[summary_table$Season ==2023], "%"),paste0(round(summary_table$Wins[summary_table$Season ==2023] / summary_table$Races[summary_table$Season ==2023] *100, 1), "%"),paste0(summary_table$Points_Pct[summary_table$Season ==2025], "%") ),Source =c("Primary Data","Primary Data","Primary Data","Scraped Standings","Scraped Standings","Scraped Standings" )) %>%gt() %>%tab_header(title ="Key Findings from Both Datasets" )
Key Findings from Both Datasets
Finding
Value
Source
Top 3 starters who podiumed in rainy 2024 races
80%
Primary Data
No. 1 driver qualifying edge in 2024
Lowest count of the hybrid era
Primary Data
Most wins in a single season
2023 with 19 wins
Primary Data
Verstappen 2023 points efficiency
94.7%
Scraped Standings
Verstappen 2023 win rate
86.4%
Scraped Standings
Verstappen 2025 points efficiency
64.8%
Scraped Standings
Conclusion
I already knew Verstappen was dominant going into this project but actually looking at the numbers made it feel a lot more real. Winning 19 races in a season sounds impressive when you hear it. Seeing it on a bar chart next to every other season in the hybrid era is a completely different thing. It honestly does not look like it belongs in the same graph. The race data helps explain why that kind of dominance even happens. Starting from the front matters more than a lot of casual fans probably realize and it still matters even when it rains. If your car is fast enough to qualify at the front it is usually fast enough to stay there once the race starts. Verstappen’s Red Bull was that car for most of 2022 and basically all of 2023.
The interesting thing about 2024 and 2025 is that the drop shows up in basically every part of the data. His win total went down, his points efficiency went down, and number 1 drivers across the whole grid were less dominant over their teammates in qualifying. All of that happening at the same time makes me think it was not really Red Bull getting worse, it was more just the whole field catching up. Norris ended up taking the most out of it and won the 2025 championship. Whether Verstappen is a hero or a villain probably just depends on who you were rooting for. But statistically what he did from 2021 through 2023 is pretty hard to argue with. That is just a really dominant stretch of racing. ————————————————————————
Primary data hosted on SharePoint. Championship standings scraped from f1-fansite.com using rvest in R.