OVERVIEW

This task focuses on the Exploratory Data Analysis (EDA) of airline accidents, aiming to uncover patterns and trends related to total incidents, fatal accidents, and total fatalities. By analyzing and visualizing the data, we can gain valuable insights into the factors that contribute to these accidents and their impact on the aviation industry.The analysis primarily revolves around the visualization of data, using various charts and graphs to present the information in a clear and concise manner. By examining the frequency and distribution of airline accidents over time, we can identify any significant changes or trends. Additionally, we explore the relationship between fatal accidents and total fatalities, shedding light on the severity of different incidents.

INTRODUCTION

Airline accidents have always been a matter of concern for both the aviation industry and the general public. Understanding the patterns and trends of these accidents is crucial for improving safety measures and preventing future incidents. Exploratory Data Analysis (EDA) provides valuable insights into the characteristics and factors associated with airline accidents. By analyzing data related to total incidents, fatal accidents, and total fatalities, we can gain a deeper understanding of the risks and challenges faced by the aviation industry.

ABOUT DATASET

The dataset is from Kaggle: https://www.kaggle.com/datasets/khaledshawky/airline-accidents

KEY TERMS

  1. Incidents and Accidents: Tracking the frequency and severity of accidents involving airline passengers

  2. Fatal Accidents: The count of accidents resulting in fatalities

  3. Total Incidents: The sum of of incidents recorded in the dataset

  4. Total Fatalities: The total number of fatalities across all recorded accidents

DATA IMPORTATION

Data Cleaning

Total Number of missing values in each column
x
airline 0
incidents_85_99 0
fatal_accidents_85_99 0
fatalities_85_99 0
incidents_00_14 0
fatal_accidents_00_14 0
fatalities_00_14 0
## [1] 0

Data Description

## tibble [56 Ă— 7] (S3: tbl_df/tbl/data.frame)
##  $ airline              : chr [1:56] "Aer Lingus" "Aeroflot" "Aerolineas Argentinas" "Aeromexico" ...
##  $ incidents_85_99      : num [1:56] 2 76 6 3 2 14 2 3 5 7 ...
##  $ fatal_accidents_85_99: num [1:56] 0 14 0 1 0 4 1 0 0 2 ...
##  $ fatalities_85_99     : num [1:56] 0 128 0 64 0 79 329 0 0 50 ...
##  $ incidents_00_14      : num [1:56] 0 6 1 5 2 6 4 5 5 4 ...
##  $ fatal_accidents_00_14: num [1:56] 0 1 0 0 0 2 1 1 1 0 ...
##  $ fatalities_00_14     : num [1:56] 0 88 0 0 0 337 158 7 88 0 ...
airline incidents_85_99 fatal_accidents_85_99 fatalities_85_99 incidents_00_14 fatal_accidents_00_14 fatalities_00_14
Length:56 Min. : 0.000 Min. : 0.000 Min. : 0.0 Min. : 0.000 Min. :0.0000 Min. : 0.00
Class :character 1st Qu.: 2.000 1st Qu.: 0.000 1st Qu.: 0.0 1st Qu.: 1.000 1st Qu.:0.0000 1st Qu.: 0.00
Mode :character Median : 4.000 Median : 1.000 Median : 48.5 Median : 3.000 Median :0.0000 Median : 0.00
NA Mean : 7.179 Mean : 2.179 Mean :112.4 Mean : 4.125 Mean :0.6607 Mean : 55.52
NA 3rd Qu.: 8.000 3rd Qu.: 3.000 3rd Qu.:184.2 3rd Qu.: 5.250 3rd Qu.:1.0000 3rd Qu.: 83.25
NA Max. :76.000 Max. :14.000 Max. :535.0 Max. :24.000 Max. :3.0000 Max. :537.00

DESCRIPTIVE SUMMARY STATISTICS

Total Incidents

## [1] 402
## [1] 231
## [1] 633
  • There is a decrease in the number of incidents from the first 14 years (1985 - 1999) and the second 14 years period (2000-2014)

Fatal Accidents

Count of fatal accidents by airlines
Var1 Freq
0 17
1 16
2 4
3 8
4 3
5 3
6 1
7 1
8 1
12 1
14 1
Count of fatal accidents by airlines
Var1 Freq
0 32
1 12
2 11
3 1
  • Most airlines didn’t cause fatal accidents from 1985 to 1999

  • Most airlines didn’t cause fatal accidents from 2000 to 2014 showing an improvement

## fatal_accidents_85_99 fatal_accidents_00_14 
##                   122                    37
  • 122 serious accidents happened from 1985 to 1999.

  • 37 serious accidents happened from 2000 to 2014 showing a sharp decrease in the number of fatal accidents.

Total Fatalities

## fatalities_85_99 fatalities_00_14 
##             6295             3109
  • 6295 deaths occured from 1985 to 1999

  • 3109 deaths occured from 2000 upto 2014 showing

  • A sharp decrease in the number of people killed

## [1] 9404
  • 9404 deaths were recorded from 1985 up to 2014

Comparison analysis

## [1] 159

  • Out of the total incidents only 20% fatal accidents happened from 1985 to 2014

TREND ANALYSIS

Incidents

  • Observation

    As the years approach 2014, the number of incidents are decreasing showing a decreasing trend

Fatal Accidents

  • Observation

    Decreasing trend in the number of fatal accidents

Fatalities

  • Observation

    From 2000 to 2014, most airlines were recording less deaths.

Incident Distribution Per Airline

CODE APPENDIX

knitr::opts_chunk$set(echo = F,warning = F, message = F)
library(readxl)
ac=read_xlsx(file.choose())

#Cleaning variable names

library(janitor)

ac=clean_names(ac)
#checking for missing values

library(knitr)

kable(colSums(is.na(ac)),caption="Total Number of missing values in each column")

#No missing values

#Checking for duplicated entries

anyDuplicated(ac)

#No duplicated entries in the dataset

str(ac)

kable(summary(ac),format = "pipe") #Summary Statistics

total_incidents_1=sum(ac$incidents_85_99) 

total_incidents_1 #402 incidents from 1985 up to 1999

total_incidents_2=sum(ac$incidents_00_14) 

total_incidents_2 #231 incidents from 2000 up to 2014

total=total_incidents_1+total_incidents_2 

total #633 total incidents recorded in the dataset


#Fatal accidents

table(ac$fatal_accidents_85_99) %>%
  kable(caption = "Count of fatal accidents by airlines") 

table(ac$fatal_accidents_00_14) %>%
  kable(caption = "Count of fatal accidents by airlines") 
library(dplyr)
ac %>% dplyr::select(fatal_accidents_85_99,fatal_accidents_00_14) %>%
  colSums()
#Total fatalities
ac %>% dplyr::select(fatalities_85_99,fatalities_00_14) %>%
  colSums()

ac %>% dplyr::select(fatalities_85_99,fatalities_00_14) %>%
  sum()

library(MASS)

sum(ac$fatal_accidents_85_99)+sum(ac$fatal_accidents_00_14) #159

kpi1=c("total accidents","total incidents")

values=c(159,633)

pct=round(values/sum(values)*100)

kpi1=paste(kpi1,pct,"%",sep=" ")

pie(values,labels = kpi1,col=c("blue","green"),
    main = "Proportion of fatal accidents within total incidents")

#trend analysis
library(ggplot2)

ggplot(ac,aes(x=airline))+
  geom_line(aes(y=ac$incidents_85_99,
                color="incidents from 1985 to 1999"),
            group=1,show.legend = T)+
  geom_line(aes(y=ac$incidents_00_14,
                color="incidents from 2000 to 2014"),group=1,
            show.legend = T)+
  theme_bw()+
  theme(axis.text.x = element_text(size = 8,hjust=1,angle=90))+
  theme(legend.position="bottom")+
  labs(colour="",
       title = "Incident Trend Analysis Over Time Per Airline",
       y="Incidents",x="Airlines")

ggplot(ac,aes(x=airline))+
  geom_line(aes(y=ac$fatal_accidents_85_99,
                color="fatal accidents from 1985 to 1999"),
            group=1,show.legend = T)+
  geom_line(aes(y=ac$fatal_accidents_00_14,
                color="fatal accidents from 2000 to 2014"),group=1,
            show.legend = T)+
  theme_bw()+
  theme(axis.text.x = element_text(size = 8,hjust=1,angle=90))+
  theme(legend.position="bottom")+
  labs(colour="",
       title = "Fatal Accidents Trend Analysis Over Time Per Airline",
       y="Fatal Accidents",x="Airlines")
ggplot(ac,aes(x=airline))+
  geom_line(aes(y=ac$fatalities_85_99,
                color="fatalities from 1985 to 1999"),
            group=1,show.legend = T)+
  geom_line(aes(y=ac$fatalities_00_14,
                color="fatalities from 2000 to 2014"),group=1,
            show.legend = T)+
  theme_minimal()+
  theme(axis.text.x = element_text(size = 8,hjust=1,angle=90))+
  labs(title = "Fatalities Trend Analysis Over Time Per Airline",
       y="Fatalities",x="Airlines")
id2=ggplot(ac,aes(airline,incidents_85_99,))+
  geom_segment(aes(x=airline,xend=airline,y=0,yend=incidents_85_99),
               color="skyblue",size=1)+
  geom_point(size=2,color="black")+
  coord_flip()+ theme_bw()+
  theme(axis.text.y = element_text(size = 6,angle=0))+
  labs(title = "Incident Distribution Per Airline",
       subtitle = "1985-1999",
       y="Incidents",x="Airlines")

id2

id3=ggplot(ac,aes(airline,incidents_00_14,))+
  geom_segment(aes(x=airline,xend=airline,y=0,yend=incidents_00_14),
               color="orange",size=1, show.legend = F)+
  geom_point(size=2,color="black")+
  coord_flip()+ theme_bw()+
  theme(axis.text.y = element_text(size = 6,angle=0))+
  labs(title = "Incident Distribution Per Airline",
       subtitle = "2000-2014",
       y="Incidents",x="Airlines")

id3


library(dplyr)

safe= ac %>%
  dplyr::select(airline,fatal_accidents_85_99,fatal_accidents_00_14)

ggplot(safe,aes(airline, fatal_accidents_85_99))+
  geom_bar(aes(fill="fatal accident from 1985 to 1999"),
           stat = "identity",
           show.legend = T)+
  geom_bar(aes(y=fatal_accidents_00_14,
           fill="fatal accident from 2000 to 2014"),
           stat = "identity",show.legend = T)+
  theme_bw()+
  theme(axis.text.x = element_text(size = 8,vjust=-0,hjust=1,angle=90))+
  theme(legend.position = "bottom")+
  labs(fill="",title = "Safest Airline from 1985 to 2014",
       y="frequency",x="airlines")