Assignment

The purpose of the assignment is to read in a file with on time and delayed number of flights for 2 airlines and perform an analysis compaing arrival delays of the two airlines

Load Libraries

library(tidyr)
library(dplyr)
library(knitr)

Read in CSV

airline_status <- read.csv("airline_status.csv", stringsAsFactors =  FALSE)
kable(airline_status)

Airline	Status	Los.Angeles	Phoenix	San.Diego	San.Fransisco	Seattle
Alaska	on-time	497	221	212	503	1841
Alaska	delayed	62	12	20	102	305
AM West	on-time	694	4840	383	320	201
AM West	delayed	117	415	65	129	61

Tidy Data Set

Our goal is to create a dataset that looks as follows:

Airline	City	Delayed	On_Time
Delta	Atlanta	1,900	3,000
Delta	New York	2,000	1,900
AA	Chicago	1,000	2,000

Since each observation is an ariline at a particular city.

First, we need to unpivot the column headings and place each column heading into a row.

airline_status$Status[airline_status$Status=="on-time"] <- "on_time" #mutate does not like '-' in a column header

airline_status <-
  airline_status %>% 
  gather("City", "n", 3:7)

kable(airline_status)

Airline	Status	City	n
Alaska	on_time	Los.Angeles	497
Alaska	delayed	Los.Angeles	62
AM West	on_time	Los.Angeles	694
AM West	delayed	Los.Angeles	117
Alaska	on_time	Phoenix	221
Alaska	delayed	Phoenix	12
AM West	on_time	Phoenix	4840
AM West	delayed	Phoenix	415
Alaska	on_time	San.Diego	212
Alaska	delayed	San.Diego	20
AM West	on_time	San.Diego	383
AM West	delayed	San.Diego	65
Alaska	on_time	San.Fransisco	503
Alaska	delayed	San.Fransisco	102
AM West	on_time	San.Fransisco	320
AM West	delayed	San.Fransisco	129
Alaska	on_time	Seattle	1841
Alaska	delayed	Seattle	305
AM West	on_time	Seattle	201
AM West	delayed	Seattle	61

Now we need to pivot the on-time and delayed count to the column headings.

airline_status <-
  airline_status %>% 
  spread("Status", "n")
kable(airline_status)

Airline	City	delayed	on_time
Alaska	Los.Angeles	62	497
Alaska	Phoenix	12	221
Alaska	San.Diego	20	212
Alaska	San.Fransisco	102	503
Alaska	Seattle	305	1841
AM West	Los.Angeles	117	694
AM West	Phoenix	415	4840
AM West	San.Diego	65	383
AM West	San.Fransisco	129	320
AM West	Seattle	61	201

Add Derived Values

To help with the analysis I am adding the ratio of on-time and delayed flights to the data frame.

airline_status <- 
  airline_status %>% 
  mutate(ratio_on_time = on_time/(delayed + on_time), ratio_delayed = 1 - ratio_on_time )
kable(airline_status)

Airline	City	delayed	on_time	ratio_on_time	ratio_delayed
Alaska	Los.Angeles	62	497	0.8890877	0.1109123
Alaska	Phoenix	12	221	0.9484979	0.0515021
Alaska	San.Diego	20	212	0.9137931	0.0862069
Alaska	San.Fransisco	102	503	0.8314050	0.1685950
Alaska	Seattle	305	1841	0.8578751	0.1421249
AM West	Los.Angeles	117	694	0.8557337	0.1442663
AM West	Phoenix	415	4840	0.9210276	0.0789724
AM West	San.Diego	65	383	0.8549107	0.1450893
AM West	San.Fransisco	129	320	0.7126949	0.2873051
AM West	Seattle	61	201	0.7671756	0.2328244

Perform Analysis

My first analysis will be to see what airline has the best overall on-time rate. I will do this by grouping the data frame by airline and summing the on-time and delayed flights for that airline and then calculate the on-time ratio at the higher level of analysis.

total_percent_on_time <- 
  airline_status %>% 
  group_by(Airline) %>% 
  summarise (total_delayed = sum(delayed), total_on_time = sum(on_time)) %>% 
  mutate (total_ratio_on_time =  total_on_time / ( total_on_time + total_delayed ))

kable(total_percent_on_time)

Airline	total_delayed	total_on_time	total_ratio_on_time
Alaska	501	3274	0.8672848
AM West	787	6438	0.8910727

most_on_time_airline <- 
  total_percent_on_time %>% 
  filter (total_ratio_on_time == max(total_ratio_on_time))

The second analysis I will perform is to see what airline has the most variance in its on-time performance. To do this I will calculate the straight average and standard deviation of the airlines on-time ratio, and then see what airline has the lowest standard deviation.

summary_stats_by_airline <-  
  airline_status %>% 
  group_by(Airline) %>% 
  summarise(mean_ratio_ontime = mean(ratio_on_time), sd_ratio_on_time = sd(ratio_on_time))

kable(summary_stats_by_airline)

Airline	mean_ratio_ontime	sd_ratio_on_time
Alaska	0.8881317	0.0459262
AM West	0.8223085	0.0821285

most_consistent_airline <- 
  summary_stats_by_airline %>% 
  filter (sd_ratio_on_time == min(sd_ratio_on_time))

The airline with the best on-time percentage is AM West with on on-time percentage of 89.11%

The airline with the most consistent performance is Alaska with a standard deviation of the percentage on-time of 4.59%

Week5-TidyData-Rnash