The goal of this analysis is to determine which Carriers have favorable departure times.
I started off bringing in the data from the csv file.
Jan_flights <- read.csv("domestic_flights_jan_2016.csv", stringsAsFactors = FALSE)
library(dplyr)
Next I recreated the fields from the unit lecture notes. I did perform a filter to determine incomplete cases. I learned that observations with incomplete values consisted of flights that were either cancelled OR diverted.
Jan_flights$FlightDate <- as.Date(Jan_flights$FlightDate, format = "%m/%d/%Y")
Jan_flights <- Jan_flights %>% mutate(New_CRSDepTime = paste(FlightDate, sprintf("%04d", CRSDepTime)), New_CRSArrTime = paste(FlightDate, sprintf("%04d", CRSArrTime)), New_DepTime = paste(FlightDate, sprintf("%04d", DepTime)), New_ArrTime = paste(FlightDate, sprintf("%04d", ArrTime)), New_WheelsOff = paste(FlightDate, sprintf("%04d", WheelsOff)), New_WheelsOn = paste(FlightDate, sprintf("%04d", WheelsOn)))
Jan_flights$New_CRSDepTime <- as.POSIXct(Jan_flights$New_CRSDepTime, format="%Y-%m-%d %H%M")
Jan_flights$New_CRSArrTime <- as.POSIXct(Jan_flights$New_CRSArrTime, format="%Y-%m-%d %H%M")
Jan_flights$New_DepTime <- as.POSIXct(Jan_flights$New_DepTime, format="%Y-%m-%d %H%M")
Jan_flights$New_ArrTime <- as.POSIXct(Jan_flights$New_ArrTime, format="%Y-%m-%d %H%M")
Jan_flights$New_WheelsOff <- as.POSIXct(Jan_flights$New_WheelsOff, format="%Y-%m-%d %H%M")
Jan_flights$New_WheelsOn <- as.POSIXct(Jan_flights$New_WheelsOn, format="%Y-%m-%d %H%M")
All the applicable columns have been formatted similarly. I then attempted to filter out flights that were either cancelled or diverted as well as those that did not lend themselves well to analysis (e.g. flights that spanned over the course of two days).
Jan_flights <- Jan_flights %>% filter(Cancelled == 0, Diverted == 0) %>% filter(ArrTime>DepTime)
With the data ready for analysis, I decided to figure out which carriers had the greatest amount of early departures in terms of minutes.
Jan_flights <- Jan_flights %>% mutate(DepDelayMinutes = as.integer(difftime(New_DepTime, New_CRSDepTime, units = "mins")))
library(ggvis)
Jan_flights <- Jan_flights %>% mutate(EarlyDepartureMin = ifelse(DepDelayMinutes < 0, DepDelayMinutes*-1, 0))
Jan_flights %>% ggvis(x = ~Carrier, y = ~EarlyDepartureMin) %>% layer_bars()
Jflights <- Jan_flights %>% group_by(Carrier) %>% summarise(EarlyDepartureMin = sum(EarlyDepartureMin))
colnames(Jflights)[2] <- "EarlyDepartureMinTotal"
Jflights2 <- Jan_flights %>% count(Carrier)
Jflight3 <- inner_join(Jflights, Jflights2, by = "Carrier")
Jflights3 <- mutate(Jflight3, AvgEarlyDep = EarlyDepartureMinTotal / n)
colnames(Jflights3)[3] <- "NumberofFlights"
library(knitr)
kable(Jflights3)
| Carrier | EarlyDepartureMinTotal | NumberofFlights | AvgEarlyDep |
|---|---|---|---|
| AA | 439700 | 69775 | 6.301684 |
| AS | 140312 | 13057 | 10.746113 |
| B6 | 463897 | 20071 | 23.112800 |
| DL | 381196 | 66069 | 5.769665 |
| EV | 216157 | 39737 | 5.439691 |
| F9 | 113133 | 6633 | 17.056083 |
| HA | 27324 | 6162 | 4.434275 |
| NK | 153449 | 9865 | 15.554891 |
| OO | 296924 | 45588 | 6.513205 |
| UA | 281197 | 36244 | 7.758443 |
| VX | 30820 | 4874 | 6.323348 |
| WN | 238259 | 99051 | 2.405417 |
Looking at the new graph, it appears that B6 is still the most favorable airline in terms of early departure minutes.
Jflights3 %>% ggvis(x = ~Carrier, y = ~AvgEarlyDep) %>% layer_bars()
Jan_flights <- Jan_flights %>% mutate(LateDepartureMin = ifelse(DepDelayMinutes > 0, DepDelayMinutes, 0))
JflightsL <- Jan_flights %>% group_by(Carrier) %>% summarise(LateDepartureMin = sum(LateDepartureMin))
January_Flights <- inner_join(Jflights3, JflightsL, by = "Carrier")
January_Flights <- mutate(January_Flights, NetDeptMin = EarlyDepartureMinTotal - LateDepartureMin)
January_Flights <- mutate(January_Flights, AvgDeptMin = NetDeptMin / NumberofFlights)
kable(January_Flights)
| Carrier | EarlyDepartureMinTotal | NumberofFlights | AvgEarlyDep | LateDepartureMin | NetDeptMin | AvgDeptMin |
|---|---|---|---|---|---|---|
| AA | 439700 | 69775 | 6.301684 | 616742 | -177042 | -2.537327 |
| AS | 140312 | 13057 | 10.746113 | 73166 | 67146 | 5.142529 |
| B6 | 463897 | 20071 | 23.112800 | 285716 | 178181 | 8.877535 |
| DL | 381196 | 66069 | 5.769665 | 555333 | -174137 | -2.635684 |
| EV | 216157 | 39737 | 5.439691 | 371287 | -155130 | -3.903918 |
| F9 | 113133 | 6633 | 17.056083 | 61384 | 51749 | 7.801749 |
| HA | 27324 | 6162 | 4.434275 | 14480 | 12844 | 2.084388 |
| NK | 153449 | 9865 | 15.554891 | 144865 | 8584 | 0.870147 |
| OO | 296924 | 45588 | 6.513205 | 572708 | -275784 | -6.049487 |
| UA | 281197 | 36244 | 7.758443 | 392826 | -111629 | -3.079931 |
| VX | 30820 | 4874 | 6.323348 | 69189 | -38369 | -7.872179 |
| WN | 238259 | 99051 | 2.405417 | 784851 | -546592 | -5.518289 |
As we can see from the table, flights associated with Carrier B6 left an average of almost 9min early. Once again, it has proven to be the most favorable airline to fly in terms of timeliness. Based on this information, if you desire to be on a flight that leaves earlier than expected, your best bet is to fly using Carrier B6. There is no way to guarantee an earlier flight departure, but your chances would be greater with Carrier B6.
Based on the analyses performed, if timeliness is important to you when traveling by airplane, Carrier B6 is the way to go!