Data & Packages

Data Source

The “Accidents, Fatalities, and Rates, 1995 through 2014, U.S. General Aviation” data was obtained from Data.gov and its access level was classified as public. While no license data was provided on the US Data Catalog Metadata Page the web page does state that the data was intended for public access and use. The data also appears to conform to the U.S. Government Work copyright guidelines.

# Load data
 aviation <- read.csv(url("http://www.ntsb.gov/investigations/data/Documents/datafiles/table10_2014.csv"), stringsAsFactors = FALSE)

Packages

Loaded Packages

  • r markdown
  • ggplot2
  • forcats
  • tidyverse
  • yaml
  • knitr
#Load packages
library("forcats", lib.loc="~/R/R-3.5.2/library")
library("ggplot2", lib.loc="~/R/R-3.5.2/library")
library("tidyverse", lib.loc="~/R/R-3.5.2/library")
library("yaml", lib.loc="~/R/R-3.5.2/library")
library("rmarkdown", lib.loc="~/R/R-3.5.2/library")
library(knitr)

Data Wrangling

Here is the data manipulation and calculation made during this examination.

#Remove blank entries
aviation2 <- aviation[-c(1,2,3,4,41, 45:65), -c(9,10)]

#Create Aviation3 dataframe for manipulation
aviation3 <- aviation2

#Rename fluoride colunms
aviation3 <- aviation3 %>% rename("Year" = "Table.10...Accidents..Fatalities..and.Rates..1995.through.2014.")
aviation3 <- aviation3 %>% rename("All_Accidents" = X)
aviation3 <- aviation3 %>% rename("Fatal_Accidents" = X.1)
aviation3 <- aviation3 %>% rename("Fatalities" = X.2)
aviation3 <- aviation3 %>% rename("Aboard" = X.3)
aviation3 <- aviation3 %>% rename("Flight_Hours" = X.4)
aviation3 <- aviation3 %>% rename("Accidents_per_100,000_Flight_Hours" = X.5)
aviation3 <- aviation3 %>% rename("Fatal_Accidents_per_100,000_Flight_Hours" = X.6)

#Remove commas from numeric data
aviation3[, 'All_Accidents'] <- gsub(",","", aviation3[, 'All_Accidents'])
aviation3[, 'Fatal_Accidents'] <- gsub(",","", aviation3[, 'Fatal_Accidents'])
aviation3[, 'Fatalities'] <- gsub(",","", aviation3[, 'Fatalities'])
aviation3[, 'Aboard'] <- gsub(",","", aviation3[, 'Aboard'])
aviation3[, 'Flight_Hours'] <- gsub(",","", aviation3[, 'Flight_Hours'])
aviation3[, 'Accidents_per_100,000_Flight_Hours'] <- gsub(",","", aviation3[, 'Accidents_per_100,000_Flight_Hours'])
aviation3[, 'Fatal_Accidents_per_100,000_Flight_Hours'] <- gsub(",","", aviation3[, 'Fatal_Accidents_per_100,000_Flight_Hours'])

#Change data types to numeric
aviation3[, c(1:6)] <- sapply(aviation3[, c(1:6)], as.integer)
aviation3[, c(7:8)] <- sapply(aviation3[, c(7:8)], as.numeric)

#Calulations

#Percent of accidents that result in fatalities
aviation3$"Percent_Accidents_Fatal" <- aviation3$`Fatal_Accidents` / aviation3$`All_Accidents`

#GGPLOT of the percentage of accidents the included fatalities
ggplot(aviation3, aes(Year, Percent_Accidents_Fatal)) + geom_point() + geom_smooth(method = "lm") + 
  labs(x = "Year", y = "Percentage") + 
  ggtitle("Precent of Accidents That Included Fatalities")

#GGPLOT of the accidents and fatalities
#ggplot(aviation3, aes(Year, All_Accidents)) + geom_point() + labs(x = "Year", y = "All Accident")

ggplot(aviation3, aes(x = Year))+ 
  geom_line(aes(y = All_Accidents, colour = "Accidents"))+
  geom_line(aes(y = Fatal_Accidents, colour = "Fatal Accidents")) +
  scale_y_continuous(sec.axis = sec_axis(~.*5, name = "Fatal Accidents")) +
  scale_colour_manual(values = c("blue", "red")) + 
  labs(x = "Year", y = "Accidents", colour = "Parameter")+
  ggtitle("Number of Accident & Fatal Accidents")

Aviation Accidents and Fatalities

Being a frequent flyer I have always felt safe in the air. In reviewing accident data over a 40 year span has reaffirmed my belief that flying is the quickest and safest way to travel. While the number of aviation accidents and the number of fatal aviation accidents have both dropped steadily in the last 40 year, the ratio of accidents to fatal accident has increased. This mean that while you less likely to be in an aviation accident, your are more likely to die if you happen to be in one.

#Table
avi4 <- aviation3
avi4 <- select(avi4, -'Accidents_per_100,000_Flight_Hours')
avi4 <- select(avi4, -'Fatal_Accidents_per_100,000_Flight_Hours')

kable(avi4, caption="Aviation Accident Data")
Aviation Accident Data
Year All_Accidents Fatal_Accidents Fatalities Aboard Flight_Hours Percent_Accidents_Fatal
5 1975 3995 633 1252 1231 28799000 0.1584481
6 1976 4018 658 1216 1203 30476000 0.1637631
7 1977 4079 661 1276 1265 31578000 0.1620495
8 1978 4216 719 1556 1398 34887000 0.1705408
9 1979 3818 631 1221 1203 38641000 0.1652698
10 1980 3590 618 1239 1230 36402000 0.1721448
11 1981 3500 654 1282 1261 36803000 0.1868571
12 1982 3233 591 1187 1171 29640000 0.1828024
13 1983 3075 555 1068 1061 28673000 0.1804878
14 1984 3017 545 1042 1021 29099000 0.1806430
15 1985 2739 498 956 945 28322000 0.1818182
16 1986 2581 474 967 879 27073000 0.1836497
17 1987 2494 446 837 822 26972000 0.1788292
18 1988 2388 460 797 792 27446000 0.1926298
19 1989 2242 432 769 766 27920000 0.1926851
20 1990 2242 444 770 765 28510000 0.1980375
21 1991 2197 439 800 786 27678000 0.1998179
22 1992 2110 450 866 864 24780000 0.2132701
23 1993 2064 401 744 740 22796000 0.1942829
24 1994 2021 404 730 723 22235000 0.1999010
25 1995 2056 412 734 727 24906000 0.2003891
26 1996 1908 361 636 619 24881000 0.1892034
27 1997 1840 350 631 625 25591000 0.1902174
28 1998 1902 364 624 618 25518000 0.1913775
29 1999 1905 340 621 615 29246000 0.1784777
30 2000 1837 345 596 585 27838000 0.1878062
31 2001 1727 325 562 558 25431000 0.1881876
32 2002 1716 345 581 575 25545000 0.2010490
33 2003 1741 352 633 630 25998000 0.2021827
34 2004 1619 314 559 559 24888000 0.1939469
35 2005 1671 321 563 558 23168000 0.1921005
36 2006 1523 308 706 547 23963000 0.2022324
37 2007 1654 288 496 491 23819000 0.1741233
38 2008 1568 277 496 487 22805000 0.1766582
39 2009 1480 275 479 470 20862000 0.1858108
40 2010 1440 271 458 455 21688000 0.1881944
42 2012 1470 272 437 437 20881000 0.1850340
43 2013 1224 222 391 386 19492000 0.1813725
44 2014 1221 253 419 410 18103000 0.2072072

Issues and Concerns

  1. I struggled to adjust the secondary axis on my “Number of Accident & Fatal Accidents” chart.
  2. The other issue I encountered was finding data I wanted to use for the assignment.
  3. Data wrangling can be challenging.