The article that I have chosen is called “Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?”. It looks at data collected about the number of incidents and fatal incidents that airlines have had in two time periods, from 1985-1989 and from 2000-2014. There are a number of questions that can be examined when looking at this data. Such as, does a history of previous incidents predict future incidents?
Please find a link to the article below. https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/.
Load the data from the github repository provided on the article above.
x <- getURL('https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv')
airlinesafety <- read.csv(text = x)
I will now give the column’s more meaningful names.
names(airlinesafety)=c("Airline","Available Seat Kilometers Per Week","Incidents 1985-1999","Fatal Accidents 1985-1999","Fatalities 1985-1999","Incidents 2000-2014","Fatal Accidents 2000-2014","Fatalities 2000-2014")
colnames(airlinesafety)
## [1] "Airline" "Available Seat Kilometers Per Week"
## [3] "Incidents 1985-1999" "Fatal Accidents 1985-1999"
## [5] "Fatalities 1985-1999" "Incidents 2000-2014"
## [7] "Fatal Accidents 2000-2014" "Fatalities 2000-2014"
The article makes use of a safety score as defined below. 1
airlinesafety <- mutate(airlinesafety, "Safety Score 1985-1999 Incidents" = (mean(airlinesafety$`Incidents 1985-1999`) -`Incidents 1985-1999`) * sqrt( `Available Seat Kilometers Per Week` ) )
airlinesafety <- mutate(airlinesafety, "Safety Score 1985-1999 Fatal Accidents" = (mean(airlinesafety$`Fatal Accidents 1985-1999`) -`Fatal Accidents 1985-1999`) * sqrt( `Available Seat Kilometers Per Week` ) )
airlinesafety <- mutate(airlinesafety, "Safety Score 1985-1999 Fatalities" = (mean(airlinesafety$`Fatalities 1985-1999`) -`Fatalities 1985-1999`) * sqrt( `Available Seat Kilometers Per Week` ) )
airlinesafety <- mutate(airlinesafety, "Safety Score 2000-2014 Incidents" = (mean(airlinesafety$`Incidents 2000-2014`) -`Incidents 2000-2014`) * sqrt( `Available Seat Kilometers Per Week` ) )
airlinesafety <- mutate(airlinesafety, "Safety Score 2000-2014 Fatal Accidents" = (mean(airlinesafety$`Fatal Accidents 2000-2014`) -`Fatal Accidents 2000-2014`) * sqrt( `Available Seat Kilometers Per Week` ) )
airlinesafety <- mutate(airlinesafety, "Safety Score 2000-2014 Fatalities" = (mean(airlinesafety$`Fatalities 1985-1999`) -`Fatalities 1985-1999`) * sqrt( `Available Seat Kilometers Per Week` ) )
airlinesafety$"Safety Score 1985-1999 Incidents Normalized" = scale( airlinesafety$"Safety Score 1985-1999 Incidents" )
airlinesafety$"Safety Score 1985-1999 Fatal Accidents Normalized" = scale( airlinesafety$"Safety Score 1985-1999 Fatal Accidents" )
airlinesafety$"Safety Score 1985-1999 Fatalities Normalized" = scale( airlinesafety$"Safety Score 1985-1999 Fatalities" )
airlinesafety$"Safety Score 2000-2014 Incidents Normalized" = scale( airlinesafety$"Safety Score 2000-2014 Incidents" )
airlinesafety$"Safety Score 2000-2014 Fatal Accidents Normalized" = scale( airlinesafety$"Safety Score 2000-2014 Fatal Accidents" )
airlinesafety$"Safety Score 2000-2014 Fatalities Normalized" = scale( airlinesafety$"Safety Score 2000-2014 Fatalities" )
airlinesafety <- mutate ( airlinesafety , "Average Safety Score 1985-1999"= rowMeans( select(airlinesafety ,"Safety Score 1985-1999 Incidents Normalized" ,"Safety Score 1985-1999 Fatal Accidents Normalized", "Safety Score 1985-1999 Fatalities Normalized" ) ) )
airlinesafety <- mutate ( airlinesafety , "Average Safety Score 2000-2014"= rowMeans( select(airlinesafety ,"Safety Score 2000-2014 Incidents Normalized" ,"Safety Score 2000-2014 Fatal Accidents Normalized", "Safety Score 2000-2014 Fatalities Normalized" ) ) )
airlinesafety <- mutate(airlinesafety, "Increase in Score" = ifelse((airlinesafety$"Average Safety Score 1985-1999" < airlinesafety$"Average Safety Score 2000-2014"), "True","False"))
For the sake of this assignment I will only display the safety scores for the periods from 1985-1999 and from 2000-2014.
reactable ( select(airlinesafety,"Airline","Average Safety Score 1985-1999","Average Safety Score 2000-2014","Increase in Score"))
Although, not directly analyzed in this assignment, the data did suggest that there was as weak correlation between previous incidents and future incidents for a given airline. The article went on to further associate each airline and country of origin, and based on GDP for each country, was able to find a stronger correlation between development of a country and the safety of the airline. This may be due to stricter safety rules that exists in more developed countries.
The subsequent text was taken directly from the article, https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/. I have tried to reproduce the score in the code below.↩︎