This is an extract from my work for week 19 of the #TidyTuesday project.
This week’s dataset was focused on airlines’ accidents, particularly on differences between the 1985-1999 and 2000-2014 years. I decided to highlight how much each airline company reduced (or not) the amount of incidents, fatalities and fatal accidents in recent years. More information on these categories of accidents can be found in the original article.
All code and data can be found in my dedicated GitHub repository MyTidyTuesday.
library(tidyverse)
library(fivethirtyeight)
data("airline_safety")
These are the starting data (the original article can be found here):
head(airline_safety)
## # A tibble: 6 x 9
## airline incl_reg_subsid… avail_seat_km_p… incidents_85_99
## <chr> <lgl> <dbl> <int>
## 1 Aer Li… FALSE 320906734 2
## 2 Aerofl… TRUE 1197672318 76
## 3 Aeroli… FALSE 385803648 6
## 4 Aerome… TRUE 596871813 3
## 5 Air Ca… FALSE 1865253802 2
## 6 Air Fr… FALSE 3004002661 14
## # … with 5 more variables: fatal_accidents_85_99 <int>,
## # fatalities_85_99 <int>, incidents_00_14 <int>,
## # fatal_accidents_00_14 <int>, fatalities_00_14 <int>
Let’s calculate the difference of accidents in 2000-2014 vs 1985-1999; lower values mean a reduced number of accidents in recent years. After that, we’ll gather these values.
airline_diff <- airline_safety %>%
mutate(fatal_accidents = fatal_accidents_00_14 - fatal_accidents_85_99,
fatalities = fatalities_00_14 - fatalities_85_99,
incidents = incidents_00_14 - incidents_85_99) %>%
gather(key = "event", value = "occurrences", fatal_accidents, fatalities, incidents) %>%
select(everything(), -c(fatal_accidents_85_99, fatal_accidents_00_14, fatalities_85_99, fatalities_00_14, incidents_85_99, incidents_00_14))
The tidy dataset looks like this:
head(airline_diff)
## # A tibble: 6 x 5
## airline incl_reg_subsidia… avail_seat_km_per… event occurrences
## <chr> <lgl> <dbl> <chr> <int>
## 1 Aer Lingus FALSE 320906734 fatal_ac… 0
## 2 Aeroflot TRUE 1197672318 fatal_ac… -13
## 3 Aerolineas A… FALSE 385803648 fatal_ac… 0
## 4 Aeromexico TRUE 596871813 fatal_ac… -1
## 5 Air Canada FALSE 1865253802 fatal_ac… 0
## 6 Air France FALSE 3004002661 fatal_ac… -2
airline_diff %>%
filter(event == "fatalities", occurrences != 0) %>%
ggplot(aes(x = reorder(airline, occurrences), y = occurrences, fill = occurrences)) +
geom_col() +
coord_flip() +
scale_fill_gradientn(colors = c("darkgreen", "aquamarine3", "seagreen3", "yellow", "orange", "darkred")) +
labs(x = "Airline", y = "Fatalities", fill = "", title = "Difference in number of fatalities", subtitle = "Years 1985-1999 vs 2000-2014")
airline_diff %>%
filter(event == "fatal_accidents", occurrences != 0) %>%
ggplot(aes(x = reorder(airline, occurrences), y = occurrences, fill = occurrences)) +
geom_col() +
coord_flip() +
scale_fill_gradientn(colors = c("darkgreen", "aquamarine3", "seagreen3", "orange", "darkred"), values = c(0, 0.6, 0.7, 0.8, 1)) +
labs(x = "Airline", y = "Fatal Accidents", fill = "", title = "Difference in number of fatal accidents", subtitle = "Years 1985-1999 vs 2000-2014")
airline_diff %>%
filter(event == "incidents", occurrences != 0) %>%
ggplot(aes(x = reorder(airline, occurrences), y = occurrences, fill = occurrences)) +
geom_col() +
coord_flip() +
scale_fill_gradientn(colors = c("darkgreen", "aquamarine3", "seagreen3", "orange", "darkred"), values = c(0, 0.7, 0.8, 0.9, 1)) +
labs(x = "Airline", y = "Incidents", fill = "", title = "Difference in number of incidents", subtitle = "Years 1985-1999 vs 2000-2014")
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] fivethirtyeight_0.4.0 forcats_0.4.0 stringr_1.4.0
## [4] dplyr_0.8.1 purrr_0.3.2 readr_1.3.1
## [7] tidyr_0.8.3 tibble_2.1.1 ggplot2_3.1.1
## [10] tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 xfun_0.7 haven_2.1.0 lattice_0.20-38
## [5] colorspace_1.4-1 generics_0.0.2 vctrs_0.1.0 htmltools_0.3.6
## [9] yaml_2.2.0 utf8_1.1.4 rlang_0.3.4 pillar_1.4.1
## [13] glue_1.3.1 withr_2.1.2 modelr_0.1.4 readxl_1.3.1
## [17] plyr_1.8.4 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0
## [21] rvest_0.3.4 evaluate_0.14 labeling_0.3 knitr_1.23
## [25] fansi_0.4.0 broom_0.5.2 Rcpp_1.0.1 scales_1.0.0
## [29] backports_1.1.4 jsonlite_1.6 hms_0.4.2 digest_0.6.19
## [33] stringi_1.4.3 grid_3.6.0 cli_1.1.0 tools_3.6.0
## [37] magrittr_1.5 lazyeval_0.2.2 crayon_1.3.4 pkgconfig_2.0.2
## [41] zeallot_0.1.0 xml2_1.2.0 lubridate_1.7.4 assertthat_0.2.1
## [45] rmarkdown_1.13 httr_1.4.0 rstudioapi_0.10 R6_2.4.0
## [49] nlme_3.1-139 compiler_3.6.0