install.packages(“fivethirtyeight”) library(fivethirtyeight)

install.packages(“knitr”)

library(tidyverse)
library(fivethirtyeight)

Introduction

I was inspired to research airline safety after the most recent plane crash and my flightt to Seattle coming up in the next few months. I will be looking at available airline incident data to see how airline safety has changed over time. As technology gets better each year, we wonder - is airline safety also improving?

Data Overview

I am using a dataset from fivethirtyeight, titled airline_safety. The dataset includes multiple time periods to analyze, number of accidents, whether or not they were fatal, and how many fatalities there have been total.

dplyr Data Wrangling

airline_safety_clean <- airline_safety %>%
  select(airline,
         incidents_85_99,
         incidents_00_14,
         fatalities_85_99,
         fatalities_00_14)

airline_safety_clean <- airline_safety_clean %>%
  mutate(total_incidents = incidents_85_99 + incidents_00_14)

Comparing Incidents Over Time

The below graph compares all recorded airline accidents over the two given periods of time, 1985-1999 and 2000-2014. Each data point on the graph represents an airline. If airlines are equally risky, the data points would fall along a diagonal line. However, several points fall below the diagonal line, thus suggesting that there are actually fewer accidents occuring in the most recent years.

This proves that airlines appear to be getting safer over time, with less accidents.

ggplot(airline_safety_clean, aes(x = incidents_85_99, y = incidents_00_14)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Airline Incidents: Past vs Recent",
    x = "Incidents (1985–1999)",
    y = "Incidents (2000–2014)"
  )
## `geom_smooth()` using formula = 'y ~ x'

Airline specific Incidents

This bar chart shows each airline’s individual incidents. By looking at the graph, you can see there is a large variance from airline to airline.

This means that the risk of flying is not evenly distributed, and is greatly influenced by which airline carrier you choose to fly on.

ggplot(airline_safety_clean, aes(x = reorder(airline, total_incidents), y = total_incidents)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Total Airline Incidents (1985–2014)",
    x = "Airline",
    y = "Total Incidents"
  )

Conclusion

Although airline safety has generally improved over time, the consumer should take into account which airline they are flying on to be sure of the lowest risk possible. While there is an industry wide downtrend in incidents, you can still decrease your odds of experiencing an incident simply by choosing a better carrier.

By using data visualization and analysis, we can make information more digestible for consumers. It is easier to get information across with a bar chart showing a large variance between airline carriers, than just ending over an excel file of data. Using ggplot2 and dplyr streamlines turning large datasets into easy to read graphs and charts easy for anyoone to read.

References

FiveThirtyEight Airline Safety Dataset