Objective
Our objective is to take a .csv of untidy data of two airlines and their on time or delayed flying information into multiple destination. Once the data is loaced, we need to transform and tidy the data, and then do an analysis to compare the arrival delays for both airlines.
Setup
Environment Prep
if (!require("dplyr")) install.packages('dplyr')
if (!require("tidyr")) install.packages('tidyr')
Data Import
airlines <- read.csv("https://raw.githubusercontent.com/ahussan/DATA_607_CUNY_SPS/master/Week5_Assignment/airlineData.csv", strip.white=TRUE)
knitr::kable(airlines)
| ALASKA |
on time |
497 |
221 |
212 |
503 |
1841 |
|
delayed |
62 |
12 |
20 |
102 |
305 |
|
|
NA |
NA |
NA |
NA |
NA |
| AMWEST |
on time |
694 |
4840 |
383 |
320 |
201 |
|
delayed |
117 |
415 |
65 |
129 |
61 |
Data Analysis
Ratios by City
#Delay ratios
tidy <- tidy %>%
mutate(total=ontime + delayed, delayratio=round(delayed/total, 3))
#Plot these delay ratios
dotchart(tidy$delayratio, groups = tidy$Airline, labels = tidy$cities , main = "Ratio of Delayed Flights by City")

Ratios in Total
summary <- tidy %>%
group_by(Airline) %>%
summarise(delayed = sum(delayed), ontime = sum(ontime), total = sum(total)) %>%
mutate(delayratio = round(delayed/total, 3))
knitr::kable(summary)
| ALASKA |
501 |
3274 |
3775 |
0.133 |
| AMWEST |
787 |
6438 |
7225 |
0.109 |
#Plot delay ratios
dotchart(summary$delayratio, groups = summary$Airline, main = "Ratio of Delayed Flights Across All Cities")

Conclusion
The individual city delay ratios by airline indicates that ALASKA airlines had a lower individual delay ratio than AMWEST in every single city. But the aggregated data shows that AMWEST had a lower overall delay ratio, due to the larger volume of flights.