During the Formula 1 2022 season, fans of the sport had an opportunity to celebrate the 100th anniversary of a historic Italian racetrack known as the Monza Circuit. I’ve been a follower of F1 for a little under 10 years, this celebration piqued my curiosity about the history of the circuits and the sport itself. As such, I saw this assignment as an opportunity to uncover that story through data.


Overview

Taking time to understand the data available I chose to break the story down into 3 key areas—the history and origins of Formula 1 tracks, how those tracks and the sport have evolved over time, and whether there is any advantage for individuals and teams racing at their home circuit.

In terms of the visuals and report, I sought to follow key design principles and best practices. I utilized a bold high contrast color palette and kept colors consistent across circuits. I modified the default r markdown template to allow visuals to occupy more space and make it easier for the user to read. Visuals have been simplified, removing any clutter not vital for understanding key takeaways. Where possible I’ve leveraged interactivity to give the user control and allow them to explore both the macro and micro views of the data.

Dataset

I analyzed F1 data from http://ergast.com/mrd/db/. The full dataset contains 14 different csv formatted data tables and provides F1 race data from 1950 to present day. Data is updated within 2 to 6 hours after each race. The largest table is the results.csv, which contains 25,740 rows of data for the race results of each driver. Each table is connected by a series of keys such as driverId, raceId, circuitId, constructorId, and resultsId. These keys were utilized when joining data to create the final data frames for each visual. In this research, I’ve used 6 of the 14 tables—circuits.csv, drivers.csv, lap_times.csv, races.csv, results.csv, and constructors.csv. The table below outlines the variables contained in file.

circuits.csv drivers.csv lap_times.csv races.csv results.csv constructors.csv
circuitId driverId raceId raceId resultsId constructorId
circuitRef driverRef driverId year raceId constructorRef
name number lap round driverId name
location code position circuitId constructorId nationality
country forename time name number url
lat surname milliseconds date grid
lng dob time position
alt nationality url positionText
url url fp1_date positionOrder
fp1_time points
fp2_date laps
fp2_time time
fp3_date milliseconds
fp3_time fastestLap
quali_date rank
quali_time fastestLapTime
sprint_date fastestLapSpeed
sprint_time statusId

01. History and Origins

What are the oldest and most used tracks in F1? How do they differ? Where are they located and how many have there been? This section seeks to shed some light on these questions utilizing a dual plot graph and a bubble map to represent key data.

Most popular tracks with top speed on each track.

This dual plot chart utilizes a bar chart—to quickly show the user the most utilized tracks in Formula 1 history—and an additional layer of plotted points showing circuits’ top speed. With this top speed layer the chart begins to illustrate the nature of these top tracks. To gather this data I utilized the key values to merge tables. This allowed me to generate a data frame where I then summarized the number of races at each circuit and identified the fastest speeds using the max function. The order of the bars is determined by the number of races at each circuit. The colors of the top 5 tracks from this diagram are consistent in visualizations throughout the report. Names of circuits were relatively long, so to enhance readability I chose to display them at an angle. The legend, grid lines, and axis lines have been removed to eliminate any distractions from that data.

circuitCount <- data.frame(count(main_df, circuitName))
circuitCount <- circuitCount[order(circuitCount$n, decreasing = TRUE),]
circuitCount$n <- as.numeric(circuitCount$n)

circuitTopSpeed <- main_df %>%
  filter(circuitName %in% circuitCount[1:10,]$circuitName, !is.na(fastestLapSpeed)) %>%
  group_by(circuitName) %>%
  summarize(fastestLapSpeed = max(fastestLapSpeed)) %>%
  data.frame()

ylab <- seq(0, max(circuitTopSpeed$fastestLapSpeed)+40, 40)
my_labels <- paste0("", ylab, "km  ")

ggplot(circuitCount[1:10,], aes(x = reorder(circuitName, -n), y = n, fill = reorder(circuitName, -n))) +
  guides(fill = "none") +
  geom_bar(stat = "identity") +
  labs(title = "Top 10 Circuits by Number of F1 Races", x = "", y = "Total Number of Races") +
  # scale_fill_brewer(palette="Spectral") +
  scale_fill_manual(values = c(
    "#0AC92D",
    "#F9CD05", 
    "#3498db",
    "#e74c3c", 
    "#FF6736", 
    "#00BC8C", 
    "#0BD468", 
    "#f39c12", 
    "#FF7070", 
    "#2ED7F2"
  )) +
  scale_x_discrete(labels = function(x) lapply(strwrap(x, width = 15, simplify = FALSE), paste, collapse = "\n")) +
  geom_point(inherit.aes = FALSE,
             data = circuitTopSpeed,
             aes(x = circuitName, y = fastestLapSpeed/4, group = 1.2),
             size = 8,
             alpha = 0.8,
             color = "#f1f1f1") +
  scale_color_manual(NULL, values = "#ffffff") +
  scale_y_continuous(labels=comma, sec.axis = sec_axis(~. *4, name = "Top Speed at Circuit", labels = my_labels, breaks = ylab)) +
  theme_dark(base_size = 15) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 21),
    plot.background = element_rect(fill = "#222222", size = 0),
    panel.border = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.title.y = element_text(angle = 90, margin = margin(r = 6, l = 6), size = rel(1.3)),
    axis.line = element_blank(),
    panel.background = element_rect(fill = "transparent", colour = "transparent" ),
    legend.position = c(.98, .98),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6),
    text = element_text(colour = "#ffffff"),
    axis.text.y = element_text(colour = "#ffffff", size = rel(1.2), margin = margin(r = 6, l = 6)),
    axis.text.x = element_text(angle = 45, colour = "#ffffff", vjust = 1, hjust = 1.0, size = 16)
    )

Areas of Interest

Monza vs. Monaco

As you can see Autodromo Nazionale di Monza (Monza) and Circuit de Monaco (Monaco) have been in F1 through most of its history, both seeing their first F1 race from the initial season in 1950. However, you’ll notice the top speed of each track is drastically different. Why is that? Monza is a historic racetrack, which dates back to the 1920s. It’s known for its simplicity having long straights and only 6 curves. It has been the fastest F1 track since 1991.

Aerial View of Monza (Source):

Aerial view of Monza F1 circuit

Monaco on the other hand is a street circuit. The race is performed through the closed off narrow streets of Monaco. The track contains about 19 corners, many are extremely tight with little to no room for error. As you can imagine, this leaves drivers with a desire to be conservative with their speed in order to avoid race ending crashes. It’s also worth noting Monaco is one of F1’s most popular events attracting the who’s who of celebrity attendees every year.

Map of Monaco Track (Source):

Map of Monaco F1 circuit


F1 Circuits Around the World

This interactive bubble map utilizes circuit location data to showcase where races are held across the world. The size of each bubble and its color represent the number of races which have occurred at a particular circuit. I took it a step further to provide hover interactivity for users to specifically explore the names, city, country, and number of races for each track shown. My desire was to eliminate nearly all map details, (labels, streets, roads, landmarks, etc.) using it simply as a reference point and backdrop for the user. After exploring the provider tile options available with the leaflet approach, I decided to pivot. Instead, I used an approach leveraging the packages maps and mapproj. This uses data provided as part of the maps package to draw countries and regions as shapes, which can be styled with ggplot.

circuitLocations <- main_df %>%
  group_by(circuitName, location, country, circuitURL, lng, lat) %>%
  dplyr::summarise(numRaces = length(circuitName), .groups = "keep") %>%
  arrange(numRaces) %>%
  dplyr::mutate( hoverText = paste(
    "<span style='font-weight: 800; font-style: bold;'>", circuitName, "</span>\n<em style='font-weigth: 400; font-style: normal;'>", location, ", ", country, "</em>\nRaces: ", numRaces, sep="")
  ) %>%
  data.frame()

World <- map_data("world")

colorBreaks <- seq(min(circuitLocations$numRaces), max(circuitLocations$numRaces),10)

p <- circuitLocations %>%
  ggplot() +
  geom_polygon(data = World, aes(x = long, y = lat, group = group), fill = "grey", alpha = 0.3) +
  geom_point( aes(x = lng, y = lat, size = numRaces, color = numRaces, text = hoverText, alpha = numRaces) ) +
  scale_size_continuous(range = c(1,12)) +
  scale_alpha_continuous(trans = "log") +
  scale_color_viridis(option = "mako", trans = "log") +
  theme_void() + ylim(-75, 90) + xlim(-168,192) + coord_map() +
  guides( colour = guide_legend()) +
  ggtitle("Circuits by Country and Number of Races") +
  theme(
    legend.position = "none",
    text = element_text(color = "#FFFFFF"),
    plot.background = element_rect(fill = "#222222", size = 0),
    panel.background = element_rect(fill = "transparent", colour = "transparent" ),
    axis.line = element_blank(),
    plot.title = element_text(
            size= 16,
            hjust = 0.5,
            color = "#ffffff",
            margin = margin(t = 24, r = 12, b = 12, l = 12, unit = "pt")
            )
    )

fontStyles <- list(
  family = "Helvetica",
  size = "14",
  color = "#FFFFFF"
)

p <- ggplotly(p, tooltip = "text") %>%
  style(hoverlabel = list( bgcolor = "#3b3b3b", bordercolor = "transparent", font = fontStyles))

p
Areas of Interest

Logevity of Global Circuits

I was not surprised to see a heavy presence of historical circuits throughout Europe. It was interesting to see how many of the non-European tracks still on the schedule today, such as Tokyo, Japan, Sāo Paulo, Brazil, and Montreal, Canada, have over 30 years of F1 race history.

Range of Locations

This visualization also showed quite a few regions which I was not aware were involved in F1, including India, California, and a street track in Adelaide, Australia.


02. Evolution

This section utilizes line charts to look at how the top tracks might have changed over the years based on lap times and number of laps. The visuals help to identify key moments where significant events or shifts in the sport may have occurred.

1996 - 2022 Lap Times and Number of Laps for Top 5 Circuits

The dataset only provides lap time data from 1996 to 2022, because of this I chose to omit dates prior to 1996 to create an easier comparison between charts. Both line plots utilize similar techniques and designs. The colors of each circuit match the corresponding color in the initial top circuits bar graph. Chart elements not vital to the understanding the data have been removed to allow the user to focus on the data. I made the decision to remove the axis titles as the user should have enough information from the title to understand the values being presented.

fastestLapYear_df <- main_df %>%
  filter(circuitName %in% circuitCount[1:5,]$circuitName, !is.na(milliseconds)) %>%
  select(year, circuitName, fastestLapTime, milliseconds) %>%
  group_by(year, circuitName) %>%
  filter(milliseconds == min(milliseconds)) %>%
  data.frame()

fastestLapYear_df$year <- as.factor(fastestLapYear_df$year)
fastestLapYear_df$time <- as.difftime(fastestLapYear_df$milliseconds/1000, units = 'secs')

ggplot(fastestLapYear_df, aes(x = year, y = time, group = circuitName)) +
  geom_line(aes(color = circuitName), size = 4) +
  scale_color_manual(values = c(
    "#0AC92D", 
    "#F9CD05", 
    "#e74c3c", 
    "#FF6736", 
    "#3498db"
  )) +
  labs(title = "Fastest Lap Times by Year on Top 5 Circuits", x = "", y = "", color = "") +
  scale_y_time(labels = function(l) strftime(l, '%M:%OS')) +
  geom_point(shape = 21, size = 5, color = "transparent", fill = "white", alpha = 0.7) +
  theme_dark() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 21, colour = "#ffffff"),
    panel.background = element_rect(fill = "transparent", colour = "transparent" ),
    plot.background = element_rect(fill = "#222222", size = 0),
    panel.border = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "#464545"), 
    text = element_text(color = "#ffffff"),
    axis.text.y = element_text(colour = "#ffffff", size = rel(1.8), margin = margin(r = 6, l = 6)),
    axis.text.x = element_text(angle = 45, colour = "#ffffff", vjust = 1, hjust = 1.0, size = 16),
    axis.line = element_blank(),
    legend.background = element_rect(fill="#303030", size=.5, linetype="dotted"),
    legend.text = element_text(color = "#ffffff", size = 18),
    legend.key = element_rect(fill = 'transparent', size = 0.5),
    legend.position = c(.04, .92),
    legend.justification = c("left", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6),
    legend.title = element_blank(),
    legend.key.size = unit(28, "pt")
  )


numLapYear_df <- main_df %>%
  filter(circuitName %in% circuitCount[1:5,]$circuitName, !is.na(milliseconds)) %>%
  left_join(results_df, by = "raceId") %>%
  select(year, circuitName, laps) %>%
  group_by(year, circuitName) %>%
  filter(laps == max(laps)) %>%
  data.frame()

numLapYear_df$year <- as.factor(numLapYear_df$year)

ggplot(numLapYear_df, aes(x = year, y = laps, group = circuitName)) +
  geom_line(aes(color = circuitName), size = 4) +
  scale_color_manual(values = c(
    "#0AC92D", 
    "#F9CD05", 
    "#e74c3c", 
    "#FF6736", 
    "#3498db"
  )) +
  labs(title = "Number of Race Laps by Year on Top 5 Circuits", x = "", y = "", color = "") +
  geom_point(shape = 21, size = 5, color = "transparent", fill = "white", alpha = 0.7) +
  theme_dark() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 21, colour = "#ffffff"),
    panel.background = element_rect(fill = "transparent", colour = "transparent" ),
    plot.background = element_rect(fill = "#222222", size = 0),
    panel.border = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "#464545"), 
    text = element_text(color = "#ffffff"),
    axis.text.y = element_text(colour = "#ffffff", size = rel(1.8), margin = margin(r = 6, l = 6)),
    axis.text.x = element_text(angle = 45, colour = "#ffffff", vjust = 1, hjust = 1.0, size = 16),
    axis.line = element_blank(),
    legend.background = element_rect(fill="#303030", size=.5, linetype="dotted"),
    legend.text = element_text(color = "#ffffff", size = 18),
    legend.key = element_rect(fill = 'transparent', size = 0.5),
    legend.position = c(.04, .4),
    legend.justification = c("left", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6),
    legend.title = element_blank(),
    legend.key.size = unit(28, "pt")
  )

Areas of Interest

Steady Lap Times

One might think F1 car builders and drivers would continually improve lap times with experience and new innovations. However, the line plot shows lap times going both up and down through the years. This is due to continual changes in rules and car requirements enforced by the FIA, the governing body of F1, to keep the sport competitive and enhance safety for both drivers and crew. A few examples where changes have had a clear impact on the speed of the cars include:

  • 2010:Refueling during races was banned, which required cars to need larger tanks.
  • 2011:Lap times are increased again with the introduction of DRS (Drag Reduction System), a system that intentionally puts aerodynamic drag on the rear spoiler of cars. This slowed cars down but made it more exciting as drivers could open the wing an reduce the drag for a speed boost at key moments in the race.

    DRS System (Source): Drag reduction system shown with Red Bull and Ferrari
  • 2014:New V6 turbo hybrid engines replaced the V8 engines
  • 2022:Budget caps were introduced for teams in 2021 along with new restrictions for car designs which increase the wheel size and restrict usage of some aerodynamic designs.

    Car Designs 2021 vs. 2022 (Source): Ferrari 2021 vs. 2022

What happened at spa in 2021?

At Circuit de Spa during the 2021 Belgian Grand Prix, extremely heavy rain causes the race to be red flagged, leaving drivers to stand ready for the race to resume. Unfortunately, the rain never let up and the race clock expired. Earlier in the race, cars made a few laps around the track following behind the safety car at a drastically reduced speed. The FIA decided to keep the results of the race—including the incredibly show fastest lap time—and award half points to drivers.

Silverstone

In 2010 we see drastic changes in lap times and number of laps at Britain’s Silverstone Circuit. This was due to a major track redesign introducing an infield area known as the “Arena” area.

Silverstone Circuit 2009 (Source): Diagram of Silverstone Circuit prior to 2010

Silverstone Circuit 2010 (Source): Diagram of Silverstone Circuit 2010 redesign


03. Homefield Advantage?

In many other sports teams and players will have homefield advantage. This may be due to knowing the nuances of their home arena or the boost in performance they receive on game day from the cheering fans. Whatever the cause for this homefield advantage, I wondered if the same held true for Formula 1. The doughnut charts and sparklines below explore this topic.

Drivers’ Podium Finishes by Nationality – Overall vs. at Home Circuit

This series of doughnut charts allow users to compare overall results of drivers (i.e. podium finishes—first, second, or third place) with a particular nationality versus the results at their home track. The data looks only at results for each instance when a driver of a particular nationality competed in a race, there for eliminating any years or races where there may not have been a driver of that nationality racing. In many cases there can also be multiple drivers of the same nationality competing in a race (e.g. Lance Stroll and Nicholas Latifi are two current Canadian drivers on separate teams). Additionally, I limited the visualizations to driver with nationalities from the top 5 circuits.

In terms of design choices, I continued utilizing specific colors for the circuits that were consistent with previous visualizations and removed clutter such as small labels and legends. To avoid the known issues users have when comparing sizes of slices from multiple doughnut and pie charts, I applied very large clear labels containing the values of the slices being compared.

top5Circuits <- circuitCount[1:5,] %>%
  left_join(circuits_df, by = c("circuitName" = "name")) %>%
  select(circuitName, n, circuitId, country) %>%
  data.frame()

top5CircuitCountries <-  c("Italian", "Monegasque", "British", "Belgian", "Canadian")
top5Circuits$nationality <- top5CircuitCountries

nationalityResults_df <- results_df %>%
  left_join(drivers_df, by = "driverId") %>%
  filter(nationality %in% top5CircuitCountries) %>%
  select(resultId, raceId, driverId, positionOrder, forename, surname, nationality) %>%
  mutate(podium = ifelse(positionOrder %in% 1:3, "Y", "N")) %>%
  group_by(nationality, podium) %>%
  summarise(n = length(podium), .groups = "keep") %>%
  group_by(nationality) %>%
  mutate(percent_of_total = round(100*n/sum(n), 1)) %>%
  ungroup() %>%
  data.frame()

homefieldResults_df <- results_df %>%
  left_join(drivers_df, by = "driverId") %>%
  left_join(races_df, by = "raceId") %>%
  filter(nationality %in% top5CircuitCountries & circuitId %in% top5Circuits$circuitId) %>%
  left_join(top5Circuits, by = "circuitId") %>%
  select(resultId, raceId, driverId, circuitId, circuitName, country, positionOrder, nationality.x) %>%
  rename(driverNationality = nationality.x) %>%
  mutate(podium = ifelse(positionOrder %in% 1:3, "Y", "N")) %>%
  group_by(driverNationality, circuitId, country, podium) %>%
  summarise(n = length(podium), .groups = "keep") %>%
  group_by(driverNationality, circuitId) %>%
  mutate(percent_of_total = round(100*n/sum(n), 1)) %>%
  ungroup() %>%
  data.frame()

Italian

Italian Driver Podium Finishes by Appearance

homefield_italy <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
  add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "Italian",],
            values = ~nationalityResults_df[nationalityResults_df$nationality == "Italian", "n"], 
            title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #00BC8C;'>6.06%</span><br>Italian Driver Podiums Overall</span>",
            domain = list(row = 0, column = 0),
            textinfo = "none",
            marker = list(colors = c("#464545", "#00BC8C"))) %>%
  add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "Italian" & homefieldResults_df$country == "Italy",],
            values = ~homefieldResults_df[homefieldResults_df$driverNationality == "Italian" & homefieldResults_df$country == "Italy", "n"],
            title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #00BC8C;'>6.56%</span><br>...at Autodromo Nazionale di Monza</span>",
            domain = list(row = 0, column = 1),
            textinfo = "none",
            hoverinfo = "text") %>%
  layout(
    # title = list(text = "Italian Podium Finishes<br>by Appearance", font = list(size = 21, color = "#FFFFFF"), y = 0.92),
    showlegend = FALSE, 
    grid = list(rows = 1, columns = 2), 
    plot_bgcolor = "#222222", 
    paper_bgcolor = "#222222", 
    fg_bgcolor = "#222222"
    )

homefield_italy

Monegasque

Monegasque Drivers Podium Finishes by Appearance

homefield_monaco <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
  add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "Monegasque",],
          values = ~nationalityResults_df[nationalityResults_df$nationality == "Monegasque", "n"], 
          title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #F9CD05;'>17.3%</span><br>Monegasque Driver Podiums Overall</span>",
          domain = list(row = 0, column = 0),
          textinfo = "none",
          marker = list(colors = c("#464545", "#F9CD05"))) %>%
  add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "Monegasque" & homefieldResults_df$country == "Monaco",],
          values = ~homefieldResults_df[homefieldResults_df$driverNationality == "Monegasque" & homefieldResults_df$country == "Monaco", "n"],
          title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #F9CD05;'>9.09%</span><br>...at Circuit de Monaco</span>",
          domain = list(row = 0, column = 1),
          textinfo = "none") %>%
  layout(
        # title = "Monegasque Podium Finishes by Appearance",
         showlegend = FALSE, 
         grid = list(rows = 1, columns = 2),
         plot_bgcolor = "#222222",
         paper_bgcolor = "#222222",
         fg_bgcolor = "#222222"
         )

homefield_monaco

British

British Drivers Podium Finishes by Appearance

homefield_uk <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
  add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "British",],
          values = ~nationalityResults_df[nationalityResults_df$nationality == "British", "n"], 
          title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #3498db;'>16.6%</span><br>British Driver Podiums Overall</span>",
          domain = list(row = 0, column = 0),
          textinfo = "none",
          marker = list(colors = c("#464545", "#3498db"))) %>%
  add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "British" & homefieldResults_df$country == "UK",],
          values = ~homefieldResults_df[homefieldResults_df$driverNationality == "British" & homefieldResults_df$country == "UK", "n"],
          title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #3498db;'>14.05%</span><br>...at Silverstone Circuit</span>",
          domain = list(row = 0, column = 1),
          textinfo = "none") %>%
  layout(
    # title = "British Podium Finishes by Appearance",
    showlegend = FALSE, 
    grid = list(rows = 1, columns = 2), 
    plot_bgcolor = "#222222", 
    paper_bgcolor = "#222222", 
    fg_bgcolor = "#222222"
    )

homefield_uk

Belgian

Belgian Drivers Podium Finishes by Appearance

homefield_belgium <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
  add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "Belgian",],
          values = ~nationalityResults_df[nationalityResults_df$nationality == "Belgian", "n"], 
          title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #e74c3c;'>7.61%</span><br>Belgian Driver Podiums Overall</span>",
          domain = list(row = 0, column = 0),
          textinfo = "none",
          marker = list(colors = c("#464545", "#e74c3c")))%>%
  add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "Belgian" & homefieldResults_df$country == "Belgium",],
          values = ~homefieldResults_df[homefieldResults_df$driverNationality == "Belgian" & homefieldResults_df$country == "Belgium", "n"],
          title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #e74c3c;'>5.45%</span><br>...at Circuit de Spa-Francorchamps</span>",
          domain = list(row = 0, column = 1),
          textinfo = "none") %>%
  layout(
    # title = "Belgian Podium Finishes by Appearance",
    showlegend = FALSE, 
    grid = list(rows = 1, columns = 2), 
    plot_bgcolor = "#222222", 
    paper_bgcolor = "#222222", 
    fg_bgcolor = "#222222")

homefield_belgium

Canadian

Canadian Drivers Podium Finishes by Appearance

homefield_canada <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
  add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "Canadian",],
          values = ~nationalityResults_df[nationalityResults_df$nationality == "Canadian", "n"], 
          title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #FF6736;'>8.82%</span><br>Canadian Driver Podiums Overall</span>",
          domain = list(row = 0, column = 0),
          textinfo = "none",
          marker = list(colors = c("#464545", "#FF6736")))%>%
  add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "Canadian" & homefieldResults_df$country == "Canada",],
          values = ~homefieldResults_df[homefieldResults_df$driverNationality == "Canadian" & homefieldResults_df$country == "Canada", "n"],
          title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #FF6736;'>19%</span><br>...at Circuit Gilles Villeneuve</span>",
          domain = list(row = 0, column = 1),
          textinfo = "none") %>%
  layout(
    # title = "Canadian Podium Finishes by Appearance",
    showlegend = FALSE, 
    grid = list(rows = 1, columns = 2), 
    plot_bgcolor = "#222222", 
    paper_bgcolor = "#222222", 
    fg_bgcolor = "#222222")

homefield_canada
Areas of Interest

Lack of Correlation

Overall the results didn’t appear to show a strong correlation between a driver’s nationality and success at their home circuit with the exception of possibly Canada. This might be due to the fact that the majority of popular tracks are in Europe and within close proximity of one another. It would be worth further investigation to see if there are similar results for other non-European drivers such as American, Brazilian, Japanese, and Australian drivers.

Number of Monegasque Drivers

There are only 4 drivers from Monaco in Formula 1 history, Olivier Beretta, André Testut, Louis Chiron, and Charles Leclerc. However, I was surprised to see even that many due to the size and smaller population of the country. The 4 also span many decades across the bulk of the sport’s history.


Constructor Podium Finishes – Overall vs. at Home Circuit

Looking beyond drivers I wondered if there might be a correlation of podiums and home circuits for constructors (teams) located in a particular country. As opposed to drivers, it is less common for every country that has a circuit to have an auto manufacturer who competes in F1. So for this set of visuals, I deviated from starting with the top circuits and instead started with the top countries where constructors from that country have competed over the most F1 seasons. Filtering it down further, I only including years when there was a race on a circuit from that country.

Each country represented has two win/loss sparklines and a podium win percentage of all results shown. The top sparkline shows all races a constructor from that country competed in. When there is an upper green line, it means at least one constructor from that country got a podium position in that race. The lower gray line represents when they did not achieve a podium. The lower sparkline shows the same results but only for the races at the constructors’ home circuit. I’ve provided hover over interactivity to the graphic so the user can see the year and race name of each result shown. Additionally, the sparklines scroll horizontally as one unit so the user can make a quick comparison between overall versus home circuit results.

IMPORTANT NOTE: If the sparklines are not showing or hover functionality isn’t working, please refresh the page. The output sparkline package seems to have some conflicts between other packages utilized in the R markdown file. Additionally, don’t forget to horizontally scroll to view all the data in the visualization.

top5Constructors_df <- results_df %>%
  left_join(constructors_df, by = "constructorId") %>%
  group_by(nationality) %>%
  summarise(n = length(nationality), .groups = "keep") %>%
  arrange(-n) %>%
  data.frame()
top5Constructors_df <-top5Constructors_df$nationality[1:5]
constructorHomefields <- c("UK", "Italy", "France", "Switzerland", "Germany")

constructorResults_df <- results_df %>%
  left_join(constructors_df, by = "constructorId") %>%
  left_join(races_df, by = "raceId") %>%
  left_join(circuits_df, by = "circuitId") %>%
  filter(nationality %in% top5Constructors_df) %>%
  select(resultId, raceId, name.y, constructorId, positionOrder, nationality, year, country) %>%
  rename(constructorNationality = nationality, circuitCountry = country, raceName = name.y) %>%
  mutate(podium = ifelse(positionOrder %in% 1:3, 1, 0)) %>%
  data.frame()

yearsWithHomefield <- constructorResults_df %>%
  filter(circuitCountry == "UK") %>%
  distinct(year)

yearsWithHomefield <- yearsWithHomefield$year

britishResults_df <- constructorResults_df %>%
  filter(constructorNationality == "British" & year %in% yearsWithHomefield) %>%
  group_by(constructorNationality, year, raceName, circuitCountry) %>%
  summarise(n = sum(podium), .groups = "keep") %>%
  data.frame()

britishResults_df$n <- replace(britishResults_df$n, britishResults_df$n == 0, -1)
britishResults_df$n <- replace(britishResults_df$n, britishResults_df$n > 0, 1)

britishPodiums <- round((count(britishResults_df[britishResults_df$n == 1,])/count(britishResults_df))*100, digits = 2)
britishHomePodiums <- round((count(britishResults_df[britishResults_df$circuitCountry == "UK" & britishResults_df$n == 1,])/count(britishResults_df[britishResults_df$circuitCountry == "UK",]))*100, digits = 2)

britishHomeResults_df <- britishResults_df
britishHomeResults_df$n[britishHomeResults_df$circuitCountry != "UK"] <- 0

UK_spark <- sparkline(
  x = britishResults_df$raceId,
  y = britishResults_df$n,
  values = britishResults_df$n,
  type = "bar",
  barColor = "black",
  negBarColor = "blue",
  width =  nrow(britishResults_df) * 7,
  height = 50,
  barSpacing = 10,
  colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
  tooltipFormatter = htmlwidgets::JS(
    sprintf(
      "function(sparkline, options, field){
        return %s[field[0].offset];
      }",
      jsonlite::toJSON(paste0(as.character(britishResults_df$year), " ", as.character(britishResults_df$raceName)))
    )
  ))

UKhome_spark <- sparkline(
  x = britishHomeResults_df$raceId,
  y = britishHomeResults_df$n,
  values = britishHomeResults_df$n,
  type = "bar",
  barColor = "black",
  negBarColor = "blue",
  width =  nrow(britishResults_df) * 7,
  height = 50,
  barSpacing = 10,
  colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
  tooltipFormatter = htmlwidgets::JS(
    sprintf(
      "function(sparkline, options, field){
        return %s[field[0].offset];
      }",
      jsonlite::toJSON(paste0(as.character(britishResults_df$year), " ", as.character(britishResults_df$raceName)))
    )
  ))

# Italy

yearsWithHomefield <- constructorResults_df %>%
  filter(circuitCountry == "Italy") %>%
  distinct(year)

yearsWithHomefield <- yearsWithHomefield$year

italianResults_df <- constructorResults_df %>%
  filter(constructorNationality == "Italian" & year %in% yearsWithHomefield) %>%
  group_by(constructorNationality, year, raceName, circuitCountry) %>%
  summarise(n = sum(podium), .groups = "keep") %>%
  data.frame()

italianResults_df$n <- replace(italianResults_df$n, italianResults_df$n == 0, -1)
italianResults_df$n <- replace(italianResults_df$n, italianResults_df$n > 0, 1)

italianPodiums <- round((count(italianResults_df[italianResults_df$n == 1,])/count(italianResults_df))*100, digits = 2)
italianHomePodiums <- round((count(italianResults_df[italianResults_df$circuitCountry == "Italy" & italianResults_df$n == 1,])/count(italianResults_df[italianResults_df$circuitCountry == "Italy",]))*100, digits = 2)

italianHomeResults_df <- italianResults_df
italianHomeResults_df$n[italianHomeResults_df$circuitCountry != "Italy"] <- 0

italy_spark <- sparkline(
  x = italianResults_df$raceId,
  y = italianResults_df$n,
  values = italianResults_df$n,
  type = "bar",
  barColor = "black",
  negBarColor = "blue",
  width =  nrow(italianResults_df) * 7,
  height = 50,
  barSpacing = 10,
  colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
  tooltipFormatter = htmlwidgets::JS(
    sprintf(
      "function(sparkline, options, field){
        return %s[field[0].offset];
      }",
      jsonlite::toJSON(paste0(as.character(italianResults_df$year), " ", as.character(italianResults_df$raceName)))
    )
  ))

italyhome_spark <- sparkline(
  x = italianHomeResults_df$raceId,
  y = italianHomeResults_df$n,
  values = italianHomeResults_df$n,
  type = "bar",
  barColor = "black",
  negBarColor = "blue",
  width = nrow(italianResults_df) * 7,
  height = 50,
  barSpacing = 10,
  colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
  tooltipFormatter = htmlwidgets::JS(
    sprintf(
      "function(sparkline, options, field){
        return %s[field[0].offset];
      }",
      jsonlite::toJSON(paste0(as.character(italianResults_df$year), " ", as.character(italianResults_df$raceName)))
    )
  ))

# France

yearsWithHomefield <- constructorResults_df %>%
  filter(circuitCountry == "France") %>%
  distinct(year)

yearsWithHomefield <- yearsWithHomefield$year

frenchResults_df <- constructorResults_df %>%
  filter(constructorNationality == "French" & year %in% yearsWithHomefield) %>%
  group_by(constructorNationality, year, raceName, circuitCountry) %>%
  summarise(n = sum(podium), .groups = "keep") %>%
  data.frame()

frenchResults_df$n <- replace(frenchResults_df$n, frenchResults_df$n == 0, -1)
frenchResults_df$n <- replace(frenchResults_df$n, frenchResults_df$n > 0, 1)

frenchPodiums <- round((count(frenchResults_df[frenchResults_df$n == 1,])/count(frenchResults_df))*100, digits = 2)
frenchHomePodiums <- round((count(frenchResults_df[frenchResults_df$circuitCountry == "France" & frenchResults_df$n == 1,])/count(frenchResults_df[frenchResults_df$circuitCountry == "France",]))*100, digits = 2)

frenchHomeResults_df <- frenchResults_df
frenchHomeResults_df$n[frenchHomeResults_df$circuitCountry != "France"] <- 0

france_spark <- sparkline(
  x = frenchResults_df$raceId,
  y = frenchResults_df$n,
  values = frenchResults_df$n,
  type = "bar",
  barColor = "black",
  negBarColor = "blue",
  width =  nrow(frenchResults_df) * 7,
  height = 50,
  barSpacing = 10,
  colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
  tooltipFormatter = htmlwidgets::JS(
    sprintf(
      "function(sparkline, options, field){
        return %s[field[0].offset];
      }",
      jsonlite::toJSON(paste0(as.character(frenchResults_df$year), " ", as.character(frenchResults_df$raceName)))
    )
  ))

francehome_spark <- sparkline(
  x = frenchHomeResults_df$raceId,
  y = frenchHomeResults_df$n,
  values = frenchHomeResults_df$n,
  type = "bar",
  barColor = "black",
  negBarColor = "blue",
  width =  nrow(frenchResults_df) * 7,
  height = 50,
  barSpacing = 10,
  colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
  tooltipFormatter = htmlwidgets::JS(
    sprintf(
      "function(sparkline, options, field){
        return %s[field[0].offset];
      }",
      jsonlite::toJSON(paste0(as.character(frenchResults_df$year), " ", as.character(frenchResults_df$raceName)))
    )
  ))

# Germany

yearsWithHomefield <- constructorResults_df %>%
  filter(circuitCountry == "Germany") %>%
  distinct(year)

yearsWithHomefield <- yearsWithHomefield$year

germanResults_df <- constructorResults_df %>%
  filter(constructorNationality == "German" & year %in% yearsWithHomefield) %>%
  group_by(constructorNationality, year, raceName, circuitCountry) %>%
  summarise(n = sum(podium), .groups = "keep") %>%
  data.frame()

germanResults_df$n <- replace(germanResults_df$n, germanResults_df$n == 0, -1)
germanResults_df$n <- replace(germanResults_df$n, germanResults_df$n > 0, 1)

germanPodiums <- round((count(germanResults_df[germanResults_df$n == 1,])/count(germanResults_df))*100, digits = 2)
germanHomePodiums <- round((count(germanResults_df[germanResults_df$circuitCountry == "Germany" & germanResults_df$n == 1,])/count(germanResults_df[germanResults_df$circuitCountry == "Germany",]))*100, digits = 2)

germanHomeResults_df <- germanResults_df
germanHomeResults_df$n[germanHomeResults_df$circuitCountry != "Germany"] <- 0

germany_spark <- sparkline(
  x = germanResults_df$raceId,
  y = germanResults_df$n,
  values = germanResults_df$n,
  type = "bar",
  width = nrow(germanHomeResults_df)*7,
  height = 50,
  barSpacing = 10,
  colorMap = list("1:5" = "#00bc8c", "-1" = "#999999", "0" = "#222222"),
  tooltipFormatter = htmlwidgets::JS(
    sprintf(
      "function(sparkline, options, field){
        return %s[field[0].offset];
      }",
      jsonlite::toJSON(paste0(as.character(germanResults_df$year), " ", as.character(germanResults_df$raceName)))
    )
  ))

germanyhome_spark <- sparkline(
  x = germanHomeResults_df$raceId,
  y = germanHomeResults_df$n,
  values = germanHomeResults_df$n,
  type = "tristate",
  width = nrow(germanHomeResults_df)*7,
  height = 50,
  barSpacing = 10,
  colorMap = list("1:3" = "#00bc8c", "-1" = "#999999", "0" = "#222222"),
  tooltipFormatter = htmlwidgets::JS(
    sprintf(
      "function(sparkline, options, field){
        return %s[field[0].offset];
      }",
      jsonlite::toJSON(paste0(as.character(germanResults_df$year), " ", as.character(germanResults_df$raceName)))
    )
  ))



# NOTE: I've hidden the code for actually printing out the sparklines for aesthetic purposes. Additionally, expanding / collapsing this code sometimes seems to make the sparklines disappear and require refresh to rerender the charts

British Constructor Podium Finishes

Overall 74.81%

Home Circuit 75.32%

Italian Constructor Podium Finishes

Overall 63.51%

Home Circuit 76.92%

French Constructor Podium Finishes

Overall 20.83%

Home Circuit 26.53%

German Constructor Podium Finishes

Overall 35.87%

Home Circuit 22.22%

Areas of Interest

History of Results

I was very pleased with the way this visual came together. On top of making a quick comparison of results, it tells the story of certain countries teams over time. It does this in a compelling way, which helped to shed light on some stories I’ve heard from F1 before I regularly followed the sport. A few examples you can see in the visuals include:

  • British Constructors:Two major British teams, Williams and McLaren, have been in the sport a long time. Currently, the teams tend to be toward the back of the pack in each and every race—specifically Williams. But historically they’ve dominated the sport and it shows with all the green podium lines in the sparkline visual.
  • Italian Constructors:You don’t need to be a fan of baseball to know the Yankees and you don’t need to watch F1 to know Ferrari. Ferrari sells cars solely for the purpose of racing and being in F1. The sparkline for Italy shows some of the most consistent podium results across the full timeline of F1 history.
  • German Constructors:Since I started watching Formula 1 other constructors have raved about how Mercedes has consistently built better cars and dominated every season. But taking a look at the German constructor sparkline that domination seems to really only be prevalent since around 2014, which was the beginning of the V6 hybrid turbo engine era of F1 cars.

Conclusion

It was said in the introduction to this course that data visualization was a way of storytelling. That very much held true as I worked my way through the assignment. As I ventured through the data—admittedly struggling with the language at first but progressively getting more fluid and familiar with it, until it became almost second nature—a picture began to emerge of a truly global sport, which has been anything but dormant through its 70+ year history. It has weathered through wars, economic down turns, climate change, and now a global pandemic. Visualizing the data has shown how the sport is continually evolving as the world around it evolves. I hope my findings and visualizations pique the interest of others to dive deeper into the intricacies of Formula 1 as it did for me.