This assignment uses FiveThirtyEight’s airline safety dataset, which looks at incidents, fatal accidents, and fatalities for 56 major airlines from 1985–2014.
The related article I’m following is “Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?” by Nate Silver, which investigates whether an airline’s crash history predicts its future safety record. The findings suggest that while high-profile crashes affect passenger behavior, the correlation between past and future fatal accidents is weak. Instead, broader factors like a country’s wealth may be better predictors of airline safety.
airline <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv")
head(airline)
## airline avail_seat_km_per_week incidents_85_99
## 1 Aer Lingus 320906734 2
## 2 Aeroflot* 1197672318 76
## 3 Aerolineas Argentinas 385803648 6
## 4 Aeromexico* 596871813 3
## 5 Air Canada 1865253802 2
## 6 Air France 3004002661 14
## fatal_accidents_85_99 fatalities_85_99 incidents_00_14 fatal_accidents_00_14
## 1 0 0 0 0
## 2 14 128 6 1
## 3 0 0 1 0
## 4 1 64 5 0
## 5 0 0 2 0
## 6 4 79 6 2
## fatalities_00_14
## 1 0
## 2 88
## 3 0
## 4 0
## 5 0
## 6 337
I created a smaller subset that keeps exposure and 2000–2014 outcomes. I also renamed the columns for clarity and removed the trailing * from airline names.
airline_clean <- airline[, c("airline",
"avail_seat_km_per_week",
"incidents_00_14",
"fatal_accidents_00_14",
"fatalities_00_14")]
colnames(airline_clean) <- c("Airline",
"SeatKmPerWeek",
"Incidents_2000_2014",
"FatalAccidents_2000_2014",
"Fatalities_2000_2014")
airline_clean$Airline <- sub("\\*$", "", airline_clean$Airline)
head(airline_clean)
## Airline SeatKmPerWeek Incidents_2000_2014
## 1 Aer Lingus 320906734 0
## 2 Aeroflot 1197672318 6
## 3 Aerolineas Argentinas 385803648 1
## 4 Aeromexico 596871813 5
## 5 Air Canada 1865253802 2
## 6 Air France 3004002661 6
## FatalAccidents_2000_2014 Fatalities_2000_2014
## 1 0 0
## 2 1 88
## 3 0 0
## 4 0 0
## 5 0 0
## 6 2 337
To explore the dataset, I visualized fatalities between 2000–2014.
ord <- order(-airline_clean$Fatalities_2000_2014)
top_n <- 12
air_top <- airline_clean[ord, ][1:top_n, ]
op <- par(mar = c(8, 4, 3, 1))
barplot(air_top$Fatalities_2000_2014,
names.arg = air_top$Airline,
las = 2, cex.names = 0.7,
main = "Top 12 Airlines by Fatalities (2000–2014)",
ylab = "Fatalities")
par(op)
The cleaned dataset provides a focused view of airline safety outcomes between 2000 and 2014. While some airlines show higher incident or fatality counts, the article emphasizes that these numbers don’t strongly predict future crashes. Instead, incident rates (fatal and non-fatal) tend to be more consistent over time, and airline safety is closely related to broader socioeconomic factors such as the wealth and regulations of the airline’s home country.