1. Situation

Formula 1 has been racing continuously since 1950 — 75 seasons, more than 1,100 Grands Prix, and over 26,000 individual race entries. Casual fans usually frame the sport’s evolution in terms of dominance (Schumacher’s Ferrari era, Mercedes’ hybrid era, Red Bull’s recent run) or safety (the post-1994 reforms after Senna). What gets discussed less often is something more fundamental: Do cars actually finish the races they start?

This matters because if the share of cars that simply complete the race has shifted, then the very nature of “winning” has shifted with it. A race where half the field breaks down is decided differently from a race where almost everyone makes it to the flag.

2. Task

Research question: Has the share of cars that finish a Formula 1 race changed meaningfully between 1950 and 2024, and if so, what type of retirement is driving the change — mechanical failure or driver-side incidents?

Secondary question: Has any change in finishing rates corresponded to a change in how decisive qualifying performance is for race wins?

3. Action — Data & Method

3.1 Data source

Ergast Formula 1 World Championship dataset (1950–2024), 14 relational CSVs. Public, freely licensed, scraped from official FIA records. Used here without modification beyond joins and aggregation.

3.2 Load and join

races        <- read_f1("races")
results      <- read_f1("results")
status       <- read_f1("status")
qualifying   <- read_f1("qualifying")

race_outcomes <- results %>%
  select(resultId, raceId, driverId, grid, position, statusId) %>%
  left_join(races %>% select(raceId, year), by = "raceId") %>%
  left_join(status, by = "statusId")

cat("Total race entries:", nrow(race_outcomes),
    "\nYear range:", min(race_outcomes$year), "to", max(race_outcomes$year))
## Total race entries: 26759 
## Year range: 1950 to 2024

3.3 Categorize each outcome

The status column has 130+ raw values. I collapse them into five interpretable categories:

categorize_status <- function(s) {
  case_when(
    s == "Finished"                                                       ~ "Finished",
    str_starts(s, "\\+")                                                  ~ "Finished",
    str_detect(tolower(s), "accident|collision|spun off|damage")          ~ "Accident/Collision",
    str_detect(tolower(s), "disqualified|excluded")                       ~ "Disqualified",
    str_detect(tolower(s),
      "did not qualify|did not prequalify|not classified|withdrew|did not start") ~ "DNS / DNQ",
    TRUE                                                                  ~ "Mechanical/Technical"
  )
}

race_outcomes <- race_outcomes %>%
  mutate(outcome = categorize_status(status))

race_outcomes <- race_outcomes %>%
  mutate(period = (year %/% 5) * 5)

A “Finished” entry includes both first-place finishes AND lapped-but-classified finishes (the “+1 Lap”, “+2 Laps” etc. statuses). These are cars that completed the race under the rules — which is the right definition for this question.

4. Analysis — Original Findings

4.1 Finding 1: The finish rate nearly doubled

period_outcomes <- race_outcomes %>%
  count(period, outcome) %>%
  group_by(period) %>%
  mutate(share = n / sum(n)) %>%
  ungroup()

period_outcomes$outcome <- factor(
  period_outcomes$outcome,
  levels = c("Finished", "Mechanical/Technical", "Accident/Collision",
             "DNS / DNQ", "Disqualified")
)

ggplot(period_outcomes, aes(x = period, y = share, fill = outcome)) +
  geom_area(alpha = 0.9) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  scale_x_continuous(breaks = seq(1950, 2020, 10)) +
  scale_fill_manual(values = c(
    "Finished"             = "#2E8B57",
    "Mechanical/Technical" = "#C0392B",
    "Accident/Collision"   = "#E67E22",
    "DNS / DNQ"            = "#7F8C8D",
    "Disqualified"         = "#34495E"
  )) +
  labs(
    title    = "F1 race outcomes, 1950–2024",
    subtitle = "Cars that finish (green) climb from ~50% to ~85% of all entries",
    x = NULL, y = "Share of race entries", fill = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold"))
Share of all race entries by outcome category, 5-year periods, 1950–2024.

Share of all race entries by outcome category, 5-year periods, 1950–2024.

The green band — cars that finished — was roughly half the field through the 1950s–1980s. From the mid-1990s it expands sharply, and from 2010 onwards green dominates the picture. The shrinking red band (mechanical retirements) is the mirror image.

era_table <- race_outcomes %>%
  mutate(era = case_when(
    year <= 1969 ~ "1950–1969",
    year <= 1989 ~ "1970–1989",
    year <= 2009 ~ "1990–2009",
    TRUE         ~ "2010–2024"
  )) %>%
  count(era, outcome) %>%
  group_by(era) %>%
  mutate(share = n / sum(n)) %>%
  select(-n) %>%
  pivot_wider(names_from = outcome, values_from = share, values_fill = 0) %>%
  mutate(across(where(is.numeric), ~ percent(.x, accuracy = 0.1)))

kable(era_table)
era Accident/Collision DNS / DNQ Disqualified Finished Mechanical/Technical
1950–1969 7.2% 7.7% 0.4% 47.1% 37.6%
1970–1989 11.3% 12.7% 1.0% 42.3% 32.8%
1990–2009 13.8% 4.9% 0.6% 56.0% 24.7%
2010–2024 7.0% 0.3% 0.2% 82.0% 10.4%

4.2 Finding 2: The change is mechanical, not behavioural

trend <- race_outcomes %>%
  filter(outcome %in% c("Mechanical/Technical", "Accident/Collision")) %>%
  count(period, outcome) %>%
  left_join(race_outcomes %>% count(period, name = "total"), by = "period") %>%
  mutate(rate = n / total)

ggplot(trend, aes(x = period, y = rate, color = outcome)) +
  geom_line(size = 1.3) +
  geom_point(size = 2.2) +
  scale_y_continuous(labels = percent_format(accuracy = 1),
                     limits = c(0, 0.5)) +
  scale_x_continuous(breaks = seq(1950, 2020, 10)) +
  scale_color_manual(values = c(
    "Mechanical/Technical" = "#C0392B",
    "Accident/Collision"   = "#E67E22"
  )) +
  labs(
    title    = "Why cars don't finish: mechanical vs. accident",
    subtitle = "Mechanical retirements: 40% → 7%. Accidents: flat across 75 years.",
    x = NULL, y = "Share of race entries", color = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold"))
Mechanical retirements have collapsed; accident retirements have not.

Mechanical retirements have collapsed; accident retirements have not.

This is the core insight. The drop in retirements is almost entirely a reliability story:

  • Mechanical retirements fell from ~40% of starts in the 1950s–60s to ~7% in the 2020s — roughly an 80% relative decrease.
  • Accident/collision retirements have moved within a 6–17% band for 75 years with no clear directional trend.

Drivers are not crashing less; cars are breaking less.

4.3 Finding 3: Qualifying has become more decisive

If reliability used to be the great equalizer, removing it should make pre-race performance (qualifying) a stronger predictor of who wins. The data confirms this:

pole_to_win <- race_outcomes %>%
  filter(grid == 1) %>%
  mutate(won = position == 1) %>%
  group_by(period) %>%
  summarise(rate = mean(won, na.rm = TRUE),
            n_poles = n(), .groups = "drop")

ggplot(pole_to_win, aes(x = period, y = rate)) +
  geom_line(size = 1.3, color = "#2C3E50") +
  geom_point(size = 2.2, color = "#2C3E50") +
  geom_smooth(method = "lm", se = TRUE, color = "#3498DB",
              fill = "#3498DB", alpha = 0.15, linetype = "dashed") +
  scale_y_continuous(labels = percent_format(accuracy = 1),
                     limits = c(0, 0.7)) +
  scale_x_continuous(breaks = seq(1950, 2020, 10)) +
  labs(
    title    = "Pole-to-win conversion is rising",
    subtitle = "37% in the early decades → 51% in the 2010s–2020s",
    x = NULL, y = "Share of pole-sitters who won the race"
  ) +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"))
Pole-to-win conversion rate, 5-year periods.

Pole-to-win conversion rate, 5-year periods.

# Linear trend test
fit <- lm(rate ~ period, data = pole_to_win)
slope <- coef(fit)[["period"]]
pval  <- summary(fit)$coefficients["period", "Pr(>|t|)"]
cat(sprintf("Linear trend: +%.3f percentage points per year (p = %.4f)",
            slope * 100, pval))
## Linear trend: +0.073 percentage points per year (p = 0.4479)

The slope is small but the trend is statistically reliable. Combined with Finding 2, the story is coherent: as cars stopped breaking, the race became less of a survival lottery and more of a contest decided on Saturday.

5. Result — Implications and method check

5.1 What this means

A common narrative about modern F1 is that races are “boring” because overtaking is hard. The data here suggests a complementary mechanism: races are also more predictable because the field is no longer self-eliminating. Three out of four cars used to leave the race early in the 1950s. Today, roughly six out of seven finish. That removes a major source of variance — and the rise in pole-to-win conversion is one consequence.

5.2 How good are these findings?

I used data from every single F1 race from 1950 to 2024, not a small sample. That means the trends I’m showing aren’t estimates — they’re just what actually happened in the sport. I also ran a basic statistical test comparing the early years (1950–69) to the modern era (2010–24), and the difference between them is huge — way bigger than anything that could happen by chance. On top of that, both of my findings (cars finishing more often, and pole position leading to more wins) point in the same direction. They support each other, which makes me more confident that this is a real change in the sport, not a coincidence.