Objective

The goal of this analysis is to determine which Carriers have favorable departure times.

Analysis

I started off bringing in the data from the csv file.

Jan_flights <- read.csv("domestic_flights_jan_2016.csv", stringsAsFactors = FALSE)
library(dplyr)

Next I recreated the fields from the unit lecture notes. I did perform a filter to determine incomplete cases. I learned that observations with incomplete values consisted of flights that were either cancelled OR diverted.

Jan_flights$FlightDate <- as.Date(Jan_flights$FlightDate, format = "%m/%d/%Y")

Jan_flights <- Jan_flights %>% mutate(New_CRSDepTime = paste(FlightDate, sprintf("%04d", CRSDepTime)), New_CRSArrTime = paste(FlightDate, sprintf("%04d", CRSArrTime)), New_DepTime = paste(FlightDate, sprintf("%04d", DepTime)), New_ArrTime = paste(FlightDate, sprintf("%04d", ArrTime)), New_WheelsOff = paste(FlightDate, sprintf("%04d", WheelsOff)), New_WheelsOn = paste(FlightDate, sprintf("%04d", WheelsOn)))
Jan_flights$New_CRSDepTime <- as.POSIXct(Jan_flights$New_CRSDepTime, format="%Y-%m-%d %H%M")
Jan_flights$New_CRSArrTime <- as.POSIXct(Jan_flights$New_CRSArrTime, format="%Y-%m-%d %H%M")
Jan_flights$New_DepTime <- as.POSIXct(Jan_flights$New_DepTime, format="%Y-%m-%d %H%M")
Jan_flights$New_ArrTime <- as.POSIXct(Jan_flights$New_ArrTime, format="%Y-%m-%d %H%M")
Jan_flights$New_WheelsOff <- as.POSIXct(Jan_flights$New_WheelsOff, format="%Y-%m-%d %H%M")
Jan_flights$New_WheelsOn <- as.POSIXct(Jan_flights$New_WheelsOn, format="%Y-%m-%d %H%M")

All the applicable columns have been formatted similarly. I then attempted to filter out flights that were either cancelled or diverted as well as those that did not lend themselves well to analysis (e.g. flights that spanned over the course of two days).

Jan_flights <- Jan_flights %>% filter(Cancelled == 0, Diverted == 0) %>% filter(ArrTime>DepTime)

With the data ready for analysis, I decided to figure out which carriers had the greatest amount of early departures in terms of minutes.

Jan_flights <- Jan_flights %>% mutate(DepDelayMinutes = as.integer(difftime(New_DepTime, New_CRSDepTime, units = "mins")))
library(ggvis)
Jan_flights <- Jan_flights %>% mutate(EarlyDepartureMin = ifelse(DepDelayMinutes < 0, DepDelayMinutes*-1, 0))
Jan_flights %>% ggvis(x = ~Carrier, y = ~EarlyDepartureMin) %>% layer_bars()

Looking at this graph, it appears that carrier B6 is favorable because of the aggregate amounts of minutes that flights left early. However we need to look at that data in a bit more detail to determine if that carrier is actually favorable because this graph could easily be skewed based on a few flights that left incredibly early. I decided to determine the average early departure by carrier in terms of minutes by dividing the total early minutes for each Carrier by the number of flights for each Carrier. The new Jflights3 table contains only the necessary information to get this calculation.

Jflights <- Jan_flights %>% group_by(Carrier) %>% summarise(EarlyDepartureMin = sum(EarlyDepartureMin))
colnames(Jflights)[2] <- "EarlyDepartureMinTotal"
Jflights2 <- Jan_flights %>% count(Carrier)
Jflight3 <- inner_join(Jflights, Jflights2, by = "Carrier")
Jflights3 <- mutate(Jflight3, AvgEarlyDep = EarlyDepartureMinTotal / n)
colnames(Jflights3)[3] <- "NumberofFlights"
library(knitr)
kable(Jflights3)
Carrier EarlyDepartureMinTotal NumberofFlights AvgEarlyDep
AA 439700 69775 6.301684
AS 140312 13057 10.746113
B6 463897 20071 23.112800
DL 381196 66069 5.769665
EV 216157 39737 5.439691
F9 113133 6633 17.056083
HA 27324 6162 4.434275
NK 153449 9865 15.554891
OO 296924 45588 6.513205
UA 281197 36244 7.758443
VX 30820 4874 6.323348
WN 238259 99051 2.405417

Looking at the new graph, it appears that B6 is still the most favorable airline in terms of early departure minutes.

Jflights3 %>% ggvis(x = ~Carrier, y = ~AvgEarlyDep) %>% layer_bars()

This analysis does not include late departures as it was difficult to create a visualization in bar graph form with negative values. Therefore it is possible that this analysis will not present the most favorable Carrier. For example, carrier B6 might also have a large amount of delay minutes as well. This has the potential to allow another Carrier to take the spot as most favorable if they have few to no late departures. For this reason, I ran a similar scenario for flights that left late and netted early departure minutes with late departure minutes. I then divided these nettted amounts by the total number of flights for each Carrier to determine whether flights related to each Carrier departed late or early on average, and if so, how many minutes late or early.

Jan_flights <- Jan_flights %>% mutate(LateDepartureMin = ifelse(DepDelayMinutes > 0, DepDelayMinutes, 0))
JflightsL <- Jan_flights %>% group_by(Carrier) %>% summarise(LateDepartureMin = sum(LateDepartureMin))
January_Flights <- inner_join(Jflights3, JflightsL, by = "Carrier")
January_Flights <- mutate(January_Flights, NetDeptMin = EarlyDepartureMinTotal - LateDepartureMin)
January_Flights <- mutate(January_Flights, AvgDeptMin = NetDeptMin / NumberofFlights)
kable(January_Flights)
Carrier EarlyDepartureMinTotal NumberofFlights AvgEarlyDep LateDepartureMin NetDeptMin AvgDeptMin
AA 439700 69775 6.301684 616742 -177042 -2.537327
AS 140312 13057 10.746113 73166 67146 5.142529
B6 463897 20071 23.112800 285716 178181 8.877535
DL 381196 66069 5.769665 555333 -174137 -2.635684
EV 216157 39737 5.439691 371287 -155130 -3.903918
F9 113133 6633 17.056083 61384 51749 7.801749
HA 27324 6162 4.434275 14480 12844 2.084388
NK 153449 9865 15.554891 144865 8584 0.870147
OO 296924 45588 6.513205 572708 -275784 -6.049487
UA 281197 36244 7.758443 392826 -111629 -3.079931
VX 30820 4874 6.323348 69189 -38369 -7.872179
WN 238259 99051 2.405417 784851 -546592 -5.518289

As we can see from the table, flights associated with Carrier B6 left an average of almost 9min early. Once again, it has proven to be the most favorable airline to fly in terms of timeliness. Based on this information, if you desire to be on a flight that leaves earlier than expected, your best bet is to fly using Carrier B6. There is no way to guarantee an earlier flight departure, but your chances would be greater with Carrier B6.

Conclusion

Based on the analyses performed, if timeliness is important to you when traveling by airplane, Carrier B6 is the way to go!