During the Formula 1 2022 season, fans of the sport had an opportunity to celebrate the 100th anniversary of a historic Italian racetrack known as the Monza Circuit. I’ve been a follower of F1 for a little under 10 years, this celebration piqued my curiosity about the history of the circuits and the sport itself. As such, I saw this assignment as an opportunity to uncover that story through data.
Taking time to understand the data available I chose to break the story down into 3 key areas—the history and origins of Formula 1 tracks, how those tracks and the sport have evolved over time, and whether there is any advantage for individuals and teams racing at their home circuit.
In terms of the visuals and report, I sought to follow key design principles and best practices. I utilized a bold high contrast color palette and kept colors consistent across circuits. I modified the default r markdown template to allow visuals to occupy more space and make it easier for the user to read. Visuals have been simplified, removing any clutter not vital for understanding key takeaways. Where possible I’ve leveraged interactivity to give the user control and allow them to explore both the macro and micro views of the data.
I analyzed F1 data from http://ergast.com/mrd/db/. The full dataset contains 14 different csv formatted data tables and provides F1 race data from 1950 to present day. Data is updated within 2 to 6 hours after each race. The largest table is the results.csv, which contains 25,740 rows of data for the race results of each driver. Each table is connected by a series of keys such as driverId, raceId, circuitId, constructorId, and resultsId. These keys were utilized when joining data to create the final data frames for each visual. In this research, I’ve used 6 of the 14 tables—circuits.csv, drivers.csv, lap_times.csv, races.csv, results.csv, and constructors.csv. The table below outlines the variables contained in file.
| circuits.csv | drivers.csv | lap_times.csv | races.csv | results.csv | constructors.csv |
|---|---|---|---|---|---|
| circuitId | driverId | raceId | raceId | resultsId | constructorId |
| circuitRef | driverRef | driverId | year | raceId | constructorRef |
| name | number | lap | round | driverId | name |
| location | code | position | circuitId | constructorId | nationality |
| country | forename | time | name | number | url |
| lat | surname | milliseconds | date | grid | |
| lng | dob | time | position | ||
| alt | nationality | url | positionText | ||
| url | url | fp1_date | positionOrder | ||
| fp1_time | points | ||||
| fp2_date | laps | ||||
| fp2_time | time | ||||
| fp3_date | milliseconds | ||||
| fp3_time | fastestLap | ||||
| quali_date | rank | ||||
| quali_time | fastestLapTime | ||||
| sprint_date | fastestLapSpeed | ||||
| sprint_time | statusId |
What are the oldest and most used tracks in F1? How do they differ? Where are they located and how many have there been? This section seeks to shed some light on these questions utilizing a dual plot graph and a bubble map to represent key data.
This dual plot chart utilizes a bar chart—to quickly show the user the most utilized tracks in Formula 1 history—and an additional layer of plotted points showing circuits’ top speed. With this top speed layer the chart begins to illustrate the nature of these top tracks. To gather this data I utilized the key values to merge tables. This allowed me to generate a data frame where I then summarized the number of races at each circuit and identified the fastest speeds using the max function. The order of the bars is determined by the number of races at each circuit. The colors of the top 5 tracks from this diagram are consistent in visualizations throughout the report. Names of circuits were relatively long, so to enhance readability I chose to display them at an angle. The legend, grid lines, and axis lines have been removed to eliminate any distractions from that data.
circuitCount <- data.frame(count(main_df, circuitName))
circuitCount <- circuitCount[order(circuitCount$n, decreasing = TRUE),]
circuitCount$n <- as.numeric(circuitCount$n)
circuitTopSpeed <- main_df %>%
filter(circuitName %in% circuitCount[1:10,]$circuitName, !is.na(fastestLapSpeed)) %>%
group_by(circuitName) %>%
summarize(fastestLapSpeed = max(fastestLapSpeed)) %>%
data.frame()
ylab <- seq(0, max(circuitTopSpeed$fastestLapSpeed)+40, 40)
my_labels <- paste0("", ylab, "km ")
ggplot(circuitCount[1:10,], aes(x = reorder(circuitName, -n), y = n, fill = reorder(circuitName, -n))) +
guides(fill = "none") +
geom_bar(stat = "identity") +
labs(title = "Top 10 Circuits by Number of F1 Races", x = "", y = "Total Number of Races") +
# scale_fill_brewer(palette="Spectral") +
scale_fill_manual(values = c(
"#0AC92D",
"#F9CD05",
"#3498db",
"#e74c3c",
"#FF6736",
"#00BC8C",
"#0BD468",
"#f39c12",
"#FF7070",
"#2ED7F2"
)) +
scale_x_discrete(labels = function(x) lapply(strwrap(x, width = 15, simplify = FALSE), paste, collapse = "\n")) +
geom_point(inherit.aes = FALSE,
data = circuitTopSpeed,
aes(x = circuitName, y = fastestLapSpeed/4, group = 1.2),
size = 8,
alpha = 0.8,
color = "#f1f1f1") +
scale_color_manual(NULL, values = "#ffffff") +
scale_y_continuous(labels=comma, sec.axis = sec_axis(~. *4, name = "Top Speed at Circuit", labels = my_labels, breaks = ylab)) +
theme_dark(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, size = 21),
plot.background = element_rect(fill = "#222222", size = 0),
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.title.y = element_text(angle = 90, margin = margin(r = 6, l = 6), size = rel(1.3)),
axis.line = element_blank(),
panel.background = element_rect(fill = "transparent", colour = "transparent" ),
legend.position = c(.98, .98),
legend.justification = c("right", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6),
text = element_text(colour = "#ffffff"),
axis.text.y = element_text(colour = "#ffffff", size = rel(1.2), margin = margin(r = 6, l = 6)),
axis.text.x = element_text(angle = 45, colour = "#ffffff", vjust = 1, hjust = 1.0, size = 16)
)As you can see Autodromo Nazionale di Monza (Monza) and Circuit de Monaco (Monaco) have been in F1 through most of its history, both seeing their first F1 race from the initial season in 1950. However, you’ll notice the top speed of each track is drastically different. Why is that? Monza is a historic racetrack, which dates back to the 1920s. It’s known for its simplicity having long straights and only 6 curves. It has been the fastest F1 track since 1991.
Monaco on the other hand is a street circuit. The race is performed through the closed off narrow streets of Monaco. The track contains about 19 corners, many are extremely tight with little to no room for error. As you can imagine, this leaves drivers with a desire to be conservative with their speed in order to avoid race ending crashes. It’s also worth noting Monaco is one of F1’s most popular events attracting the who’s who of celebrity attendees every year.
This interactive bubble map utilizes circuit location data to showcase where races are held across the world. The size of each bubble and its color represent the number of races which have occurred at a particular circuit. I took it a step further to provide hover interactivity for users to specifically explore the names, city, country, and number of races for each track shown. My desire was to eliminate nearly all map details, (labels, streets, roads, landmarks, etc.) using it simply as a reference point and backdrop for the user. After exploring the provider tile options available with the leaflet approach, I decided to pivot. Instead, I used an approach leveraging the packages maps and mapproj. This uses data provided as part of the maps package to draw countries and regions as shapes, which can be styled with ggplot.
circuitLocations <- main_df %>%
group_by(circuitName, location, country, circuitURL, lng, lat) %>%
dplyr::summarise(numRaces = length(circuitName), .groups = "keep") %>%
arrange(numRaces) %>%
dplyr::mutate( hoverText = paste(
"<span style='font-weight: 800; font-style: bold;'>", circuitName, "</span>\n<em style='font-weigth: 400; font-style: normal;'>", location, ", ", country, "</em>\nRaces: ", numRaces, sep="")
) %>%
data.frame()
World <- map_data("world")
colorBreaks <- seq(min(circuitLocations$numRaces), max(circuitLocations$numRaces),10)
p <- circuitLocations %>%
ggplot() +
geom_polygon(data = World, aes(x = long, y = lat, group = group), fill = "grey", alpha = 0.3) +
geom_point( aes(x = lng, y = lat, size = numRaces, color = numRaces, text = hoverText, alpha = numRaces) ) +
scale_size_continuous(range = c(1,12)) +
scale_alpha_continuous(trans = "log") +
scale_color_viridis(option = "mako", trans = "log") +
theme_void() + ylim(-75, 90) + xlim(-168,192) + coord_map() +
guides( colour = guide_legend()) +
ggtitle("Circuits by Country and Number of Races") +
theme(
legend.position = "none",
text = element_text(color = "#FFFFFF"),
plot.background = element_rect(fill = "#222222", size = 0),
panel.background = element_rect(fill = "transparent", colour = "transparent" ),
axis.line = element_blank(),
plot.title = element_text(
size= 16,
hjust = 0.5,
color = "#ffffff",
margin = margin(t = 24, r = 12, b = 12, l = 12, unit = "pt")
)
)
fontStyles <- list(
family = "Helvetica",
size = "14",
color = "#FFFFFF"
)
p <- ggplotly(p, tooltip = "text") %>%
style(hoverlabel = list( bgcolor = "#3b3b3b", bordercolor = "transparent", font = fontStyles))
pI was not surprised to see a heavy presence of historical circuits throughout Europe. It was interesting to see how many of the non-European tracks still on the schedule today, such as Tokyo, Japan, Sāo Paulo, Brazil, and Montreal, Canada, have over 30 years of F1 race history.
This visualization also showed quite a few regions which I was not aware were involved in F1, including India, California, and a street track in Adelaide, Australia.
This section utilizes line charts to look at how the top tracks might have changed over the years based on lap times and number of laps. The visuals help to identify key moments where significant events or shifts in the sport may have occurred.
The dataset only provides lap time data from 1996 to 2022, because of this I chose to omit dates prior to 1996 to create an easier comparison between charts. Both line plots utilize similar techniques and designs. The colors of each circuit match the corresponding color in the initial top circuits bar graph. Chart elements not vital to the understanding the data have been removed to allow the user to focus on the data. I made the decision to remove the axis titles as the user should have enough information from the title to understand the values being presented.
fastestLapYear_df <- main_df %>%
filter(circuitName %in% circuitCount[1:5,]$circuitName, !is.na(milliseconds)) %>%
select(year, circuitName, fastestLapTime, milliseconds) %>%
group_by(year, circuitName) %>%
filter(milliseconds == min(milliseconds)) %>%
data.frame()
fastestLapYear_df$year <- as.factor(fastestLapYear_df$year)
fastestLapYear_df$time <- as.difftime(fastestLapYear_df$milliseconds/1000, units = 'secs')
ggplot(fastestLapYear_df, aes(x = year, y = time, group = circuitName)) +
geom_line(aes(color = circuitName), size = 4) +
scale_color_manual(values = c(
"#0AC92D",
"#F9CD05",
"#e74c3c",
"#FF6736",
"#3498db"
)) +
labs(title = "Fastest Lap Times by Year on Top 5 Circuits", x = "", y = "", color = "") +
scale_y_time(labels = function(l) strftime(l, '%M:%OS')) +
geom_point(shape = 21, size = 5, color = "transparent", fill = "white", alpha = 0.7) +
theme_dark() +
theme(
plot.title = element_text(hjust = 0.5, size = 21, colour = "#ffffff"),
panel.background = element_rect(fill = "transparent", colour = "transparent" ),
plot.background = element_rect(fill = "#222222", size = 0),
panel.border = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "#464545"),
text = element_text(color = "#ffffff"),
axis.text.y = element_text(colour = "#ffffff", size = rel(1.8), margin = margin(r = 6, l = 6)),
axis.text.x = element_text(angle = 45, colour = "#ffffff", vjust = 1, hjust = 1.0, size = 16),
axis.line = element_blank(),
legend.background = element_rect(fill="#303030", size=.5, linetype="dotted"),
legend.text = element_text(color = "#ffffff", size = 18),
legend.key = element_rect(fill = 'transparent', size = 0.5),
legend.position = c(.04, .92),
legend.justification = c("left", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6),
legend.title = element_blank(),
legend.key.size = unit(28, "pt")
)numLapYear_df <- main_df %>%
filter(circuitName %in% circuitCount[1:5,]$circuitName, !is.na(milliseconds)) %>%
left_join(results_df, by = "raceId") %>%
select(year, circuitName, laps) %>%
group_by(year, circuitName) %>%
filter(laps == max(laps)) %>%
data.frame()
numLapYear_df$year <- as.factor(numLapYear_df$year)
ggplot(numLapYear_df, aes(x = year, y = laps, group = circuitName)) +
geom_line(aes(color = circuitName), size = 4) +
scale_color_manual(values = c(
"#0AC92D",
"#F9CD05",
"#e74c3c",
"#FF6736",
"#3498db"
)) +
labs(title = "Number of Race Laps by Year on Top 5 Circuits", x = "", y = "", color = "") +
geom_point(shape = 21, size = 5, color = "transparent", fill = "white", alpha = 0.7) +
theme_dark() +
theme(
plot.title = element_text(hjust = 0.5, size = 21, colour = "#ffffff"),
panel.background = element_rect(fill = "transparent", colour = "transparent" ),
plot.background = element_rect(fill = "#222222", size = 0),
panel.border = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "#464545"),
text = element_text(color = "#ffffff"),
axis.text.y = element_text(colour = "#ffffff", size = rel(1.8), margin = margin(r = 6, l = 6)),
axis.text.x = element_text(angle = 45, colour = "#ffffff", vjust = 1, hjust = 1.0, size = 16),
axis.line = element_blank(),
legend.background = element_rect(fill="#303030", size=.5, linetype="dotted"),
legend.text = element_text(color = "#ffffff", size = 18),
legend.key = element_rect(fill = 'transparent', size = 0.5),
legend.position = c(.04, .4),
legend.justification = c("left", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6),
legend.title = element_blank(),
legend.key.size = unit(28, "pt")
)One might think F1 car builders and drivers would continually improve lap times with experience and new innovations. However, the line plot shows lap times going both up and down through the years. This is due to continual changes in rules and car requirements enforced by the FIA, the governing body of F1, to keep the sport competitive and enhance safety for both drivers and crew. A few examples where changes have had a clear impact on the speed of the cars include:
2011:Lap times are increased again with the introduction of DRS (Drag Reduction System), a system that intentionally puts aerodynamic drag on the rear spoiler of cars. This slowed cars down but made it more exciting as drivers could open the wing an reduce the drag for a speed boost at key moments in the race.
2022:Budget caps were introduced for teams in 2021 along with new restrictions for car designs which increase the wheel size and restrict usage of some aerodynamic designs.
At Circuit de Spa during the 2021 Belgian Grand Prix, extremely heavy rain causes the race to be red flagged, leaving drivers to stand ready for the race to resume. Unfortunately, the rain never let up and the race clock expired. Earlier in the race, cars made a few laps around the track following behind the safety car at a drastically reduced speed. The FIA decided to keep the results of the race—including the incredibly show fastest lap time—and award half points to drivers.
In 2010 we see drastic changes in lap times and number of laps at Britain’s Silverstone Circuit. This was due to a major track redesign introducing an infield area known as the “Arena” area.
Silverstone Circuit 2009
(Source):
Silverstone Circuit 2010
(Source):
In many other sports teams and players will have homefield advantage. This may be due to knowing the nuances of their home arena or the boost in performance they receive on game day from the cheering fans. Whatever the cause for this homefield advantage, I wondered if the same held true for Formula 1. The doughnut charts and sparklines below explore this topic.
This series of doughnut charts allow users to compare overall results of drivers (i.e. podium finishes—first, second, or third place) with a particular nationality versus the results at their home track. The data looks only at results for each instance when a driver of a particular nationality competed in a race, there for eliminating any years or races where there may not have been a driver of that nationality racing. In many cases there can also be multiple drivers of the same nationality competing in a race (e.g. Lance Stroll and Nicholas Latifi are two current Canadian drivers on separate teams). Additionally, I limited the visualizations to driver with nationalities from the top 5 circuits.
In terms of design choices, I continued utilizing specific colors for the circuits that were consistent with previous visualizations and removed clutter such as small labels and legends. To avoid the known issues users have when comparing sizes of slices from multiple doughnut and pie charts, I applied very large clear labels containing the values of the slices being compared.
top5Circuits <- circuitCount[1:5,] %>%
left_join(circuits_df, by = c("circuitName" = "name")) %>%
select(circuitName, n, circuitId, country) %>%
data.frame()
top5CircuitCountries <- c("Italian", "Monegasque", "British", "Belgian", "Canadian")
top5Circuits$nationality <- top5CircuitCountries
nationalityResults_df <- results_df %>%
left_join(drivers_df, by = "driverId") %>%
filter(nationality %in% top5CircuitCountries) %>%
select(resultId, raceId, driverId, positionOrder, forename, surname, nationality) %>%
mutate(podium = ifelse(positionOrder %in% 1:3, "Y", "N")) %>%
group_by(nationality, podium) %>%
summarise(n = length(podium), .groups = "keep") %>%
group_by(nationality) %>%
mutate(percent_of_total = round(100*n/sum(n), 1)) %>%
ungroup() %>%
data.frame()
homefieldResults_df <- results_df %>%
left_join(drivers_df, by = "driverId") %>%
left_join(races_df, by = "raceId") %>%
filter(nationality %in% top5CircuitCountries & circuitId %in% top5Circuits$circuitId) %>%
left_join(top5Circuits, by = "circuitId") %>%
select(resultId, raceId, driverId, circuitId, circuitName, country, positionOrder, nationality.x) %>%
rename(driverNationality = nationality.x) %>%
mutate(podium = ifelse(positionOrder %in% 1:3, "Y", "N")) %>%
group_by(driverNationality, circuitId, country, podium) %>%
summarise(n = length(podium), .groups = "keep") %>%
group_by(driverNationality, circuitId) %>%
mutate(percent_of_total = round(100*n/sum(n), 1)) %>%
ungroup() %>%
data.frame()homefield_italy <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "Italian",],
values = ~nationalityResults_df[nationalityResults_df$nationality == "Italian", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #00BC8C;'>6.06%</span><br>Italian Driver Podiums Overall</span>",
domain = list(row = 0, column = 0),
textinfo = "none",
marker = list(colors = c("#464545", "#00BC8C"))) %>%
add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "Italian" & homefieldResults_df$country == "Italy",],
values = ~homefieldResults_df[homefieldResults_df$driverNationality == "Italian" & homefieldResults_df$country == "Italy", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #00BC8C;'>6.56%</span><br>...at Autodromo Nazionale di Monza</span>",
domain = list(row = 0, column = 1),
textinfo = "none",
hoverinfo = "text") %>%
layout(
# title = list(text = "Italian Podium Finishes<br>by Appearance", font = list(size = 21, color = "#FFFFFF"), y = 0.92),
showlegend = FALSE,
grid = list(rows = 1, columns = 2),
plot_bgcolor = "#222222",
paper_bgcolor = "#222222",
fg_bgcolor = "#222222"
)
homefield_italyhomefield_monaco <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "Monegasque",],
values = ~nationalityResults_df[nationalityResults_df$nationality == "Monegasque", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #F9CD05;'>17.3%</span><br>Monegasque Driver Podiums Overall</span>",
domain = list(row = 0, column = 0),
textinfo = "none",
marker = list(colors = c("#464545", "#F9CD05"))) %>%
add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "Monegasque" & homefieldResults_df$country == "Monaco",],
values = ~homefieldResults_df[homefieldResults_df$driverNationality == "Monegasque" & homefieldResults_df$country == "Monaco", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #F9CD05;'>9.09%</span><br>...at Circuit de Monaco</span>",
domain = list(row = 0, column = 1),
textinfo = "none") %>%
layout(
# title = "Monegasque Podium Finishes by Appearance",
showlegend = FALSE,
grid = list(rows = 1, columns = 2),
plot_bgcolor = "#222222",
paper_bgcolor = "#222222",
fg_bgcolor = "#222222"
)
homefield_monacohomefield_uk <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "British",],
values = ~nationalityResults_df[nationalityResults_df$nationality == "British", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #3498db;'>16.6%</span><br>British Driver Podiums Overall</span>",
domain = list(row = 0, column = 0),
textinfo = "none",
marker = list(colors = c("#464545", "#3498db"))) %>%
add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "British" & homefieldResults_df$country == "UK",],
values = ~homefieldResults_df[homefieldResults_df$driverNationality == "British" & homefieldResults_df$country == "UK", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #3498db;'>14.05%</span><br>...at Silverstone Circuit</span>",
domain = list(row = 0, column = 1),
textinfo = "none") %>%
layout(
# title = "British Podium Finishes by Appearance",
showlegend = FALSE,
grid = list(rows = 1, columns = 2),
plot_bgcolor = "#222222",
paper_bgcolor = "#222222",
fg_bgcolor = "#222222"
)
homefield_ukhomefield_belgium <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "Belgian",],
values = ~nationalityResults_df[nationalityResults_df$nationality == "Belgian", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #e74c3c;'>7.61%</span><br>Belgian Driver Podiums Overall</span>",
domain = list(row = 0, column = 0),
textinfo = "none",
marker = list(colors = c("#464545", "#e74c3c")))%>%
add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "Belgian" & homefieldResults_df$country == "Belgium",],
values = ~homefieldResults_df[homefieldResults_df$driverNationality == "Belgian" & homefieldResults_df$country == "Belgium", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #e74c3c;'>5.45%</span><br>...at Circuit de Spa-Francorchamps</span>",
domain = list(row = 0, column = 1),
textinfo = "none") %>%
layout(
# title = "Belgian Podium Finishes by Appearance",
showlegend = FALSE,
grid = list(rows = 1, columns = 2),
plot_bgcolor = "#222222",
paper_bgcolor = "#222222",
fg_bgcolor = "#222222")
homefield_belgiumhomefield_canada <- plot_ly(hole = 0.9, labels = ~podium, textposition = "inside")%>%
add_pie(data = nationalityResults_df[nationalityResults_df$nationality == "Canadian",],
values = ~nationalityResults_df[nationalityResults_df$nationality == "Canadian", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #FF6736;'>8.82%</span><br>Canadian Driver Podiums Overall</span>",
domain = list(row = 0, column = 0),
textinfo = "none",
marker = list(colors = c("#464545", "#FF6736")))%>%
add_pie(data = homefieldResults_df[homefieldResults_df$driverNationality == "Canadian" & homefieldResults_df$country == "Canada",],
values = ~homefieldResults_df[homefieldResults_df$driverNationality == "Canadian" & homefieldResults_df$country == "Canada", "n"],
title = "<span style='color: #FFFFFF; font-size: 16px;'><br><span style='font-size: 40px; font-weight: bold; color: #FF6736;'>19%</span><br>...at Circuit Gilles Villeneuve</span>",
domain = list(row = 0, column = 1),
textinfo = "none") %>%
layout(
# title = "Canadian Podium Finishes by Appearance",
showlegend = FALSE,
grid = list(rows = 1, columns = 2),
plot_bgcolor = "#222222",
paper_bgcolor = "#222222",
fg_bgcolor = "#222222")
homefield_canadaOverall the results didn’t appear to show a strong correlation between a driver’s nationality and success at their home circuit with the exception of possibly Canada. This might be due to the fact that the majority of popular tracks are in Europe and within close proximity of one another. It would be worth further investigation to see if there are similar results for other non-European drivers such as American, Brazilian, Japanese, and Australian drivers.
There are only 4 drivers from Monaco in Formula 1 history, Olivier Beretta, André Testut, Louis Chiron, and Charles Leclerc. However, I was surprised to see even that many due to the size and smaller population of the country. The 4 also span many decades across the bulk of the sport’s history.
Looking beyond drivers I wondered if there might be a correlation of podiums and home circuits for constructors (teams) located in a particular country. As opposed to drivers, it is less common for every country that has a circuit to have an auto manufacturer who competes in F1. So for this set of visuals, I deviated from starting with the top circuits and instead started with the top countries where constructors from that country have competed over the most F1 seasons. Filtering it down further, I only including years when there was a race on a circuit from that country.
Each country represented has two win/loss sparklines and a podium win percentage of all results shown. The top sparkline shows all races a constructor from that country competed in. When there is an upper green line, it means at least one constructor from that country got a podium position in that race. The lower gray line represents when they did not achieve a podium. The lower sparkline shows the same results but only for the races at the constructors’ home circuit. I’ve provided hover over interactivity to the graphic so the user can see the year and race name of each result shown. Additionally, the sparklines scroll horizontally as one unit so the user can make a quick comparison between overall versus home circuit results.
IMPORTANT NOTE: If the sparklines are not showing or hover functionality isn’t working, please refresh the page. The output sparkline package seems to have some conflicts between other packages utilized in the R markdown file. Additionally, don’t forget to horizontally scroll to view all the data in the visualization.
top5Constructors_df <- results_df %>%
left_join(constructors_df, by = "constructorId") %>%
group_by(nationality) %>%
summarise(n = length(nationality), .groups = "keep") %>%
arrange(-n) %>%
data.frame()
top5Constructors_df <-top5Constructors_df$nationality[1:5]
constructorHomefields <- c("UK", "Italy", "France", "Switzerland", "Germany")
constructorResults_df <- results_df %>%
left_join(constructors_df, by = "constructorId") %>%
left_join(races_df, by = "raceId") %>%
left_join(circuits_df, by = "circuitId") %>%
filter(nationality %in% top5Constructors_df) %>%
select(resultId, raceId, name.y, constructorId, positionOrder, nationality, year, country) %>%
rename(constructorNationality = nationality, circuitCountry = country, raceName = name.y) %>%
mutate(podium = ifelse(positionOrder %in% 1:3, 1, 0)) %>%
data.frame()
yearsWithHomefield <- constructorResults_df %>%
filter(circuitCountry == "UK") %>%
distinct(year)
yearsWithHomefield <- yearsWithHomefield$year
britishResults_df <- constructorResults_df %>%
filter(constructorNationality == "British" & year %in% yearsWithHomefield) %>%
group_by(constructorNationality, year, raceName, circuitCountry) %>%
summarise(n = sum(podium), .groups = "keep") %>%
data.frame()
britishResults_df$n <- replace(britishResults_df$n, britishResults_df$n == 0, -1)
britishResults_df$n <- replace(britishResults_df$n, britishResults_df$n > 0, 1)
britishPodiums <- round((count(britishResults_df[britishResults_df$n == 1,])/count(britishResults_df))*100, digits = 2)
britishHomePodiums <- round((count(britishResults_df[britishResults_df$circuitCountry == "UK" & britishResults_df$n == 1,])/count(britishResults_df[britishResults_df$circuitCountry == "UK",]))*100, digits = 2)
britishHomeResults_df <- britishResults_df
britishHomeResults_df$n[britishHomeResults_df$circuitCountry != "UK"] <- 0
UK_spark <- sparkline(
x = britishResults_df$raceId,
y = britishResults_df$n,
values = britishResults_df$n,
type = "bar",
barColor = "black",
negBarColor = "blue",
width = nrow(britishResults_df) * 7,
height = 50,
barSpacing = 10,
colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
tooltipFormatter = htmlwidgets::JS(
sprintf(
"function(sparkline, options, field){
return %s[field[0].offset];
}",
jsonlite::toJSON(paste0(as.character(britishResults_df$year), " ", as.character(britishResults_df$raceName)))
)
))
UKhome_spark <- sparkline(
x = britishHomeResults_df$raceId,
y = britishHomeResults_df$n,
values = britishHomeResults_df$n,
type = "bar",
barColor = "black",
negBarColor = "blue",
width = nrow(britishResults_df) * 7,
height = 50,
barSpacing = 10,
colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
tooltipFormatter = htmlwidgets::JS(
sprintf(
"function(sparkline, options, field){
return %s[field[0].offset];
}",
jsonlite::toJSON(paste0(as.character(britishResults_df$year), " ", as.character(britishResults_df$raceName)))
)
))
# Italy
yearsWithHomefield <- constructorResults_df %>%
filter(circuitCountry == "Italy") %>%
distinct(year)
yearsWithHomefield <- yearsWithHomefield$year
italianResults_df <- constructorResults_df %>%
filter(constructorNationality == "Italian" & year %in% yearsWithHomefield) %>%
group_by(constructorNationality, year, raceName, circuitCountry) %>%
summarise(n = sum(podium), .groups = "keep") %>%
data.frame()
italianResults_df$n <- replace(italianResults_df$n, italianResults_df$n == 0, -1)
italianResults_df$n <- replace(italianResults_df$n, italianResults_df$n > 0, 1)
italianPodiums <- round((count(italianResults_df[italianResults_df$n == 1,])/count(italianResults_df))*100, digits = 2)
italianHomePodiums <- round((count(italianResults_df[italianResults_df$circuitCountry == "Italy" & italianResults_df$n == 1,])/count(italianResults_df[italianResults_df$circuitCountry == "Italy",]))*100, digits = 2)
italianHomeResults_df <- italianResults_df
italianHomeResults_df$n[italianHomeResults_df$circuitCountry != "Italy"] <- 0
italy_spark <- sparkline(
x = italianResults_df$raceId,
y = italianResults_df$n,
values = italianResults_df$n,
type = "bar",
barColor = "black",
negBarColor = "blue",
width = nrow(italianResults_df) * 7,
height = 50,
barSpacing = 10,
colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
tooltipFormatter = htmlwidgets::JS(
sprintf(
"function(sparkline, options, field){
return %s[field[0].offset];
}",
jsonlite::toJSON(paste0(as.character(italianResults_df$year), " ", as.character(italianResults_df$raceName)))
)
))
italyhome_spark <- sparkline(
x = italianHomeResults_df$raceId,
y = italianHomeResults_df$n,
values = italianHomeResults_df$n,
type = "bar",
barColor = "black",
negBarColor = "blue",
width = nrow(italianResults_df) * 7,
height = 50,
barSpacing = 10,
colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
tooltipFormatter = htmlwidgets::JS(
sprintf(
"function(sparkline, options, field){
return %s[field[0].offset];
}",
jsonlite::toJSON(paste0(as.character(italianResults_df$year), " ", as.character(italianResults_df$raceName)))
)
))
# France
yearsWithHomefield <- constructorResults_df %>%
filter(circuitCountry == "France") %>%
distinct(year)
yearsWithHomefield <- yearsWithHomefield$year
frenchResults_df <- constructorResults_df %>%
filter(constructorNationality == "French" & year %in% yearsWithHomefield) %>%
group_by(constructorNationality, year, raceName, circuitCountry) %>%
summarise(n = sum(podium), .groups = "keep") %>%
data.frame()
frenchResults_df$n <- replace(frenchResults_df$n, frenchResults_df$n == 0, -1)
frenchResults_df$n <- replace(frenchResults_df$n, frenchResults_df$n > 0, 1)
frenchPodiums <- round((count(frenchResults_df[frenchResults_df$n == 1,])/count(frenchResults_df))*100, digits = 2)
frenchHomePodiums <- round((count(frenchResults_df[frenchResults_df$circuitCountry == "France" & frenchResults_df$n == 1,])/count(frenchResults_df[frenchResults_df$circuitCountry == "France",]))*100, digits = 2)
frenchHomeResults_df <- frenchResults_df
frenchHomeResults_df$n[frenchHomeResults_df$circuitCountry != "France"] <- 0
france_spark <- sparkline(
x = frenchResults_df$raceId,
y = frenchResults_df$n,
values = frenchResults_df$n,
type = "bar",
barColor = "black",
negBarColor = "blue",
width = nrow(frenchResults_df) * 7,
height = 50,
barSpacing = 10,
colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
tooltipFormatter = htmlwidgets::JS(
sprintf(
"function(sparkline, options, field){
return %s[field[0].offset];
}",
jsonlite::toJSON(paste0(as.character(frenchResults_df$year), " ", as.character(frenchResults_df$raceName)))
)
))
francehome_spark <- sparkline(
x = frenchHomeResults_df$raceId,
y = frenchHomeResults_df$n,
values = frenchHomeResults_df$n,
type = "bar",
barColor = "black",
negBarColor = "blue",
width = nrow(frenchResults_df) * 7,
height = 50,
barSpacing = 10,
colorMap = list("1:3" = "#00bc8c", "-1" = "#999999"),
tooltipFormatter = htmlwidgets::JS(
sprintf(
"function(sparkline, options, field){
return %s[field[0].offset];
}",
jsonlite::toJSON(paste0(as.character(frenchResults_df$year), " ", as.character(frenchResults_df$raceName)))
)
))
# Germany
yearsWithHomefield <- constructorResults_df %>%
filter(circuitCountry == "Germany") %>%
distinct(year)
yearsWithHomefield <- yearsWithHomefield$year
germanResults_df <- constructorResults_df %>%
filter(constructorNationality == "German" & year %in% yearsWithHomefield) %>%
group_by(constructorNationality, year, raceName, circuitCountry) %>%
summarise(n = sum(podium), .groups = "keep") %>%
data.frame()
germanResults_df$n <- replace(germanResults_df$n, germanResults_df$n == 0, -1)
germanResults_df$n <- replace(germanResults_df$n, germanResults_df$n > 0, 1)
germanPodiums <- round((count(germanResults_df[germanResults_df$n == 1,])/count(germanResults_df))*100, digits = 2)
germanHomePodiums <- round((count(germanResults_df[germanResults_df$circuitCountry == "Germany" & germanResults_df$n == 1,])/count(germanResults_df[germanResults_df$circuitCountry == "Germany",]))*100, digits = 2)
germanHomeResults_df <- germanResults_df
germanHomeResults_df$n[germanHomeResults_df$circuitCountry != "Germany"] <- 0
germany_spark <- sparkline(
x = germanResults_df$raceId,
y = germanResults_df$n,
values = germanResults_df$n,
type = "bar",
width = nrow(germanHomeResults_df)*7,
height = 50,
barSpacing = 10,
colorMap = list("1:5" = "#00bc8c", "-1" = "#999999", "0" = "#222222"),
tooltipFormatter = htmlwidgets::JS(
sprintf(
"function(sparkline, options, field){
return %s[field[0].offset];
}",
jsonlite::toJSON(paste0(as.character(germanResults_df$year), " ", as.character(germanResults_df$raceName)))
)
))
germanyhome_spark <- sparkline(
x = germanHomeResults_df$raceId,
y = germanHomeResults_df$n,
values = germanHomeResults_df$n,
type = "tristate",
width = nrow(germanHomeResults_df)*7,
height = 50,
barSpacing = 10,
colorMap = list("1:3" = "#00bc8c", "-1" = "#999999", "0" = "#222222"),
tooltipFormatter = htmlwidgets::JS(
sprintf(
"function(sparkline, options, field){
return %s[field[0].offset];
}",
jsonlite::toJSON(paste0(as.character(germanResults_df$year), " ", as.character(germanResults_df$raceName)))
)
))
# NOTE: I've hidden the code for actually printing out the sparklines for aesthetic purposes. Additionally, expanding / collapsing this code sometimes seems to make the sparklines disappear and require refresh to rerender the chartsOverall 74.81%
Home Circuit 75.32%
Overall 63.51%
Home Circuit 76.92%
Overall 20.83%
Home Circuit 26.53%
Overall 35.87%
Home Circuit 22.22%
I was very pleased with the way this visual came together. On top of making a quick comparison of results, it tells the story of certain countries teams over time. It does this in a compelling way, which helped to shed light on some stories I’ve heard from F1 before I regularly followed the sport. A few examples you can see in the visuals include: