Import your data

# excel file
airlines <- read_excel("../00_data/MyData.xlsx") %>%
     mutate(n_events = as.numeric(n_events)) %>%
    mutate(avail_seat_km_per_week = as.numeric(avail_seat_km_per_week))
airlines
## # A tibble: 336 × 6
##      Ref airline               avail_seat_km_per_week year_range type_…¹ n_eve…²
##    <dbl> <chr>                                  <dbl> <chr>      <chr>     <dbl>
##  1    NA Aer Lingus                         320906734 85_99      incide…       2
##  2     2 Aeroflot*                         1197672318 85_99      incide…      76
##  3     3 Aerolineas Argentinas              385803648 85_99      incide…       6
##  4     4 Aeromexico*                        596871813 85_99      incide…       3
##  5     5 Air Canada                        1865253802 85_99      incide…       2
##  6     6 Air France                        3004002661 85_99      incide…      14
##  7     7 Air India*                         869253552 85_99      incide…       2
##  8     8 Air New Zealand*                   710174817 85_99      incide…       3
##  9     9 Alaska Airlines*                   965346773 85_99      incide…       5
## 10    10 Alitalia                           698012498 85_99      incide…       7
## # … with 326 more rows, and abbreviated variable names ¹​type_of_event,
## #   ²​n_events
airlines
## # A tibble: 336 × 6
##      Ref airline               avail_seat_km_per_week year_range type_…¹ n_eve…²
##    <dbl> <chr>                                  <dbl> <chr>      <chr>     <dbl>
##  1    NA Aer Lingus                         320906734 85_99      incide…       2
##  2     2 Aeroflot*                         1197672318 85_99      incide…      76
##  3     3 Aerolineas Argentinas              385803648 85_99      incide…       6
##  4     4 Aeromexico*                        596871813 85_99      incide…       3
##  5     5 Air Canada                        1865253802 85_99      incide…       2
##  6     6 Air France                        3004002661 85_99      incide…      14
##  7     7 Air India*                         869253552 85_99      incide…       2
##  8     8 Air New Zealand*                   710174817 85_99      incide…       3
##  9     9 Alaska Airlines*                   965346773 85_99      incide…       5
## 10    10 Alitalia                           698012498 85_99      incide…       7
## # … with 326 more rows, and abbreviated variable names ¹​type_of_event,
## #   ²​n_events

Data Preview

My data focuses on the safety records for major commercial airlines over the last 30 years. This data set is created by Fivethirtyeight.com and is split into two 15 year periods. Furthermore, The data provides information which includes three different types of safety events: Incidents, Fatal Incidents, and fatalities. Other sources of information provided are: number of events, available seats kilometers traveled per week, and the year range. The foal is to use these variables to better answer the question below. ## Question

Should Travelers be more concerned and use certain airlines that are more susceptible to crashes in years past?

Using a categorical and continuos variable to determine if there were any correlation between the first 15 year period and the most recent

airlines %>%
    
    ggplot(aes(x = year_range, y = n_events)) +
    geom_boxplot()

Using Geom_bin2d for two continuos variables

airlines %>%
    ggplot(airlines = airlines) +
    geom_bin2d(mapping = aes(x = n_events, y = airline)) +
    theme(axis.text.x = element_text(angle = 90))

Using geom_boxplot to map patterns and models of events per airline

airlines %>%
    ggplot(airlines = airlines) + 
    geom_boxplot(mapping = aes(x = n_events, y = airline)) +
    theme(axis.text.x = element_text(angle = 90))

Conclusion

Upon Analyzing the three charts that compared the number of events with the year range to see if there is a correlation between periods of time. Afterwards using different plots to compare airlines with number of events, we can determine that there is some correlation but not much without a more in depth understanding of airline situation. American Airlines reported high number of events with 17 incidents and 416 deaths in a 15 year span. Though Malaysia Airlines has the most fatalities, they suffered those with just 6 total incidents and 2 fatal. Delta/Northwest Airlines reported high results in both charts with 24 incidents, 12 fatal accidents, and 407 deaths proving that they have been prone to issues. Though airline safety issues occur, it is not at an alarming rate for even the most susceptible airlines. If passengers are concerned of traveling certain airlines, understanding that Delta and American seem more prone to issues might lead passengers to choose different companies for travel. Lastly, It is important to not that many singular fatal incidents have lead to airlines being in the top 10 for fatalities. the odds of a fatal airline crash are more than 1 and 1.2 million and death by plane are 1 and 11 million according to a study conducted by Harvard University. These numbers should reassure passengers that planes are safe. If we want to dive into specifics than the 30 year data provided below proves that some companies are more prone than others.