Assignment

The purpose of the assignment is to read in a file with on time and delayed number of flights for 2 airlines and perform an analysis compaing arrival delays of the two airlines

Load Libraries

library(tidyr)
library(dplyr)
library(knitr)

Read in CSV

airline_status <- read.csv("airline_status.csv", stringsAsFactors =  FALSE)
kable(airline_status)
Airline Status Los.Angeles Phoenix San.Diego San.Fransisco Seattle
Alaska on-time 497 221 212 503 1841
Alaska delayed 62 12 20 102 305
AM West on-time 694 4840 383 320 201
AM West delayed 117 415 65 129 61

Tidy Data Set

Our goal is to create a dataset that looks as follows:

Airline City Delayed On_Time
Delta Atlanta 1,900 3,000
Delta New York 2,000 1,900
AA Chicago 1,000 2,000

Since each observation is an ariline at a particular city.

First, we need to unpivot the column headings and place each column heading into a row.

airline_status$Status[airline_status$Status=="on-time"] <- "on_time" #mutate does not like '-' in a column header

airline_status <-
  airline_status %>% 
  gather("City", "n", 3:7)

kable(airline_status)
Airline Status City n
Alaska on_time Los.Angeles 497
Alaska delayed Los.Angeles 62
AM West on_time Los.Angeles 694
AM West delayed Los.Angeles 117
Alaska on_time Phoenix 221
Alaska delayed Phoenix 12
AM West on_time Phoenix 4840
AM West delayed Phoenix 415
Alaska on_time San.Diego 212
Alaska delayed San.Diego 20
AM West on_time San.Diego 383
AM West delayed San.Diego 65
Alaska on_time San.Fransisco 503
Alaska delayed San.Fransisco 102
AM West on_time San.Fransisco 320
AM West delayed San.Fransisco 129
Alaska on_time Seattle 1841
Alaska delayed Seattle 305
AM West on_time Seattle 201
AM West delayed Seattle 61

Now we need to pivot the on-time and delayed count to the column headings.

airline_status <-
  airline_status %>% 
  spread("Status", "n")
kable(airline_status)
Airline City delayed on_time
Alaska Los.Angeles 62 497
Alaska Phoenix 12 221
Alaska San.Diego 20 212
Alaska San.Fransisco 102 503
Alaska Seattle 305 1841
AM West Los.Angeles 117 694
AM West Phoenix 415 4840
AM West San.Diego 65 383
AM West San.Fransisco 129 320
AM West Seattle 61 201

Add Derived Values

To help with the analysis I am adding the ratio of on-time and delayed flights to the data frame.

airline_status <- 
  airline_status %>% 
  mutate(ratio_on_time = on_time/(delayed + on_time), ratio_delayed = 1 - ratio_on_time )
kable(airline_status)
Airline City delayed on_time ratio_on_time ratio_delayed
Alaska Los.Angeles 62 497 0.8890877 0.1109123
Alaska Phoenix 12 221 0.9484979 0.0515021
Alaska San.Diego 20 212 0.9137931 0.0862069
Alaska San.Fransisco 102 503 0.8314050 0.1685950
Alaska Seattle 305 1841 0.8578751 0.1421249
AM West Los.Angeles 117 694 0.8557337 0.1442663
AM West Phoenix 415 4840 0.9210276 0.0789724
AM West San.Diego 65 383 0.8549107 0.1450893
AM West San.Fransisco 129 320 0.7126949 0.2873051
AM West Seattle 61 201 0.7671756 0.2328244

Perform Analysis

My first analysis will be to see what airline has the best overall on-time rate. I will do this by grouping the data frame by airline and summing the on-time and delayed flights for that airline and then calculate the on-time ratio at the higher level of analysis.

total_percent_on_time <- 
  airline_status %>% 
  group_by(Airline) %>% 
  summarise (total_delayed = sum(delayed), total_on_time = sum(on_time)) %>% 
  mutate (total_ratio_on_time =  total_on_time / ( total_on_time + total_delayed ))

kable(total_percent_on_time)
Airline total_delayed total_on_time total_ratio_on_time
Alaska 501 3274 0.8672848
AM West 787 6438 0.8910727
most_on_time_airline <- 
  total_percent_on_time %>% 
  filter (total_ratio_on_time == max(total_ratio_on_time))

The second analysis I will perform is to see what airline has the most variance in its on-time performance. To do this I will calculate the straight average and standard deviation of the airlines on-time ratio, and then see what airline has the lowest standard deviation.

summary_stats_by_airline <-  
  airline_status %>% 
  group_by(Airline) %>% 
  summarise(mean_ratio_ontime = mean(ratio_on_time), sd_ratio_on_time = sd(ratio_on_time))

kable(summary_stats_by_airline)
Airline mean_ratio_ontime sd_ratio_on_time
Alaska 0.8881317 0.0459262
AM West 0.8223085 0.0821285
most_consistent_airline <- 
  summary_stats_by_airline %>% 
  filter (sd_ratio_on_time == min(sd_ratio_on_time))

The airline with the best on-time percentage is AM West with on on-time percentage of 89.11%

The airline with the most consistent performance is Alaska with a standard deviation of the percentage on-time of 4.59%