Plane crashes always garner much attention when they occur, causing
many to be fearful of taking flight. Gathering this data together will
help to take a step back and review:
1.) Is it safer to fly now than it was in the past?
2.) Do all crashes result in fatalities?
3.) Is one airline more likely to crash than another?
The data was collected from https://www.kaggle.com/saurograndi/airplane-crashes-since-1908. The data set comes in .csv form and contains the full history of plane crashes around the world from 1908 to 2009. It has 13 different columns of data.
Dataset updated Feb 6, 2024
Dataset provided by
data.world, Inc.Β
Authors Data Society
Time period covered
Sep 17, 1908 - Jun 8, 2009
Data Description
The available dataset is about Airplane Crashes throughout the world
since 1908. Scholar publications : Click DOI
Variables:
Date: Date of cras Time: When in the day Location: of crash Operator:
From which department?
Flight: Kind of flight
Route: Reason of flying Type: Which type?
And other descriptions.
To begin I will take a quick look at the types of columns and the data in general. I chose to uplaod the csv into Google sheets prior to bring it into R Studio. The columns: Time, Flight #, Route, Registration, cn/ln, and Ground are mostly empty and are not needed for the review. I have decided to remove these.
There was not an white space or blank cells needing to be taken care of. The date was formatted into a typical mm/dd/yyyy format.
Database named and reviewed:
Airplane <- read_csv("AirplaneDatace.csv")
## Rows: 5268 Columns: 13
## ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## Delimiter: ","
## chr (10): Date, Time, Location, Operator, Flight #, Route, Type, Registratio...
## dbl (3): Aboard, Fatalities, Ground
##
## βΉ Use `spec()` to retrieve the full column specification for this data.
## βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Airplane)
## # A tibble: 6 Γ 13
## Date Time Location Operator `Flight #` Route Type Registration `cn/In`
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 09/17/1908 17:18 Fort My⦠Militar⦠<NA> Demo⦠Wrig⦠<NA> 1
## 2 07/12/1912 6:30 Atlanti⦠Militar⦠<NA> Test⦠Diri⦠<NA> <NA>
## 3 08/06/1913 <NA> Victori⦠Private - <NA> Curt⦠<NA> <NA>
## 4 09/09/1913 18:30 Over th⦠Militar⦠<NA> <NA> Zepp⦠<NA> <NA>
## 5 10/17/1913 10:30 Near Jo⦠Militar⦠<NA> <NA> Zepp⦠<NA> <NA>
## 6 03/05/1915 1:00 Tienen,⦠Militar⦠<NA> <NA> Zepp⦠<NA> <NA>
## # βΉ 4 more variables: Aboard <dbl>, Fatalities <dbl>, Ground <dbl>,
## # Summary <chr>
colnames(Airplane)
## [1] "Date" "Time" "Location" "Operator" "Flight #"
## [6] "Route" "Type" "Registration" "cn/In" "Aboard"
## [11] "Fatalities" "Ground" "Summary"
str(Airplane)
## spc_tbl_ [5,268 Γ 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : chr [1:5268] "09/17/1908" "07/12/1912" "08/06/1913" "09/09/1913" ...
## $ Time : chr [1:5268] "17:18" "6:30" NA "18:30" ...
## $ Location : chr [1:5268] "Fort Myer, Virginia" "AtlantiCity, New Jersey" "Victoria, British Columbia, Canada" "Over the North Sea" ...
## $ Operator : chr [1:5268] "Military - U.S. Army" "Military - U.S. Navy" "Private" "Military - German Navy" ...
## $ Flight # : chr [1:5268] NA NA "-" NA ...
## $ Route : chr [1:5268] "Demonstration" "Test flight" NA NA ...
## $ Type : chr [1:5268] "Wright Flyer III" "Dirigible" "Curtiss seaplane" "Zeppelin L-1 (airship)" ...
## $ Registration: chr [1:5268] NA NA NA NA ...
## $ cn/In : chr [1:5268] "1" NA NA NA ...
## $ Aboard : num [1:5268] 2 5 1 20 30 41 19 20 22 19 ...
## $ Fatalities : num [1:5268] 1 5 1 14 30 21 19 20 22 19 ...
## $ Ground : num [1:5268] 0 0 0 0 0 0 0 0 0 0 ...
## $ Summary : chr [1:5268] "During a demonstration flight, a U.S. Army flyer flown by Orville Wright nose-dived into the ground from a heig"| __truncated__ "First U.S. dirigible Akron exploded just offshore at an altitude of 1,000 ft. during a test flight." "The first fatal airplane accident in Canada occurred when American barnstormer, John M. Bryant, California aviator was killed." "The airship flew into a thunderstorm and encountered a severe downdraft crashing 20 miles north of Helgoland Is"| __truncated__ ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_character(),
## .. Time = col_character(),
## .. Location = col_character(),
## .. Operator = col_character(),
## .. `Flight #` = col_character(),
## .. Route = col_character(),
## .. Type = col_character(),
## .. Registration = col_character(),
## .. `cn/In` = col_character(),
## .. Aboard = col_double(),
## .. Fatalities = col_double(),
## .. Ground = col_double(),
## .. Summary = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Removing the columns:Time,Flight #, Route, Registration, cn/In, Ground
Airplane <- Airplane %>% select(-c('Flight #', 'Route', 'Registration', 'cn/In', 'Ground'))
The βsummaryβ column was kept to check back in to review/remove any potential outliers or military causes.
Now that the data has been formatted and packages loaded we can begin our analysis.
colnames(Airplane)
## [1] "Date" "Time" "Location" "Operator" "Type"
## [6] "Aboard" "Fatalities" "Summary"
## # A tibble: 6 Γ 2
## Operator Total_Fatalities
## <chr> <dbl>
## 1 Aeroflot 7156
## 2 Military - U.S. Air Force 3717
## 3 Air France 1734
## 4 American Airlines 1421
## 5 Pan American World Airways 1302
## 6 Military - U.S. Army Air Forces 1070
The chart shows us that the number of fatalities peaked in the 1970βs and has been on a steady decline since. This is also great news since the number of total flights flown has increased as well. With the number of fatalities dropping, and the total passengers rising, the overall percentage of airline passenger fatalities has dropped significantly.
From what we are to determine from the data given, the number of total fatalities has gone down since 1970. However, the percentage of the total number of passenger fatalities compared to the total number of passengers included in the crash has stayed somewhat the same. Roughly about 25% of passengers included in an airline crash do survive.
library(ggplot2)
library(dplyr)
# Summarize total fatalities per operator
Operator_summary <- Airplane %>%
group_by(Operator) %>%
summarise(Total_Fatalities = sum(Fatalities, na.rm = TRUE)) %>%
arrange(desc(Total_Fatalities)) %>%
filter(Total_Fatalities > 0) %>%
slice_max(order_by = Total_Fatalities, n = 10) # Select top 10
# Calculate the percentages for each operator
Operator_summary <- Operator_summary %>%
mutate(Percentage = Total_Fatalities / sum(Total_Fatalities) * 100)
# Create Pie Chart with percentages displayed
ggplot(Operator_summary, aes(x = "", y = Total_Fatalities, fill = Operator)) +
geom_bar(stat = "identity", width = 1) + # Creates bars (pie slices)
coord_polar("y", start = 0) + # Converts to a pie chart
theme_void() + # Removes background, grid, and axes
labs(title = "Total Fatalities by Operator (Top 10)") +
theme(legend.position = "right") + # Moves legend to the right
geom_text(aes(label = paste0(round(Percentage, 1), "%")), # Add percentage labels
position = position_stack(vjust = 0.5)) # Place the labels in the middle of the slices
I originally broke down the pie chart by the top 10 operators due to the issue of there being a large set of operators to choose from, with many with so few issues. I then wanted to show that the top 10 make up such a small percentage of the total by adding an ββ11thβ variable of all remaining operators.
Is it safer to fly now than it was in the past?
From what we have seen it is safer to fly now than it has been in the past. The number of total airline passenger fatalities peaked in the 70βs, and aha been on a steady decline since. When you couple this with the fact that total passengers traveling has increased as well, the overall percentage of airline passengers fatalities has dropped significantly.
Do all crashes result in fatalities?
Our next issue is to see if when an airline does crash, does it always end in fatalities. From what we are to determine from the data given, the number of total fatalities has gone down since 1970. However, the percentage of the total number of passenger fatalities compared to the total number of passengers included in the crash has stayed somewhat the same. Roughly about 25% of passengers included in an airline crash do survive.
Is one airline more likely to crash than another?
When we broke it down by the top ten airpline operators, Aeroflot looked like an airline you would want to shy away from. However, although they have ahd the most crashes, the blow softens when we compare the top 10 to all others. When we added the others as an 11th operator you can visually see the top 10 are not at fault for the majority of crashes.