January 2016 Flight Data

The main data set in this week’s assignment provides an endless array of variables and insights. To keep from getting too overwhelmed, I focused on my personal travel experiences and some of the questions that come to my mind while traveling. Along the way, I aim to display compelling graphs that show insights about Arrival Delays nationally and regionally. To begin I used the ‘install.packages()’, ‘library’ & ‘read.csv’ commands to configure my R environment.

library(dplyr)
library(lubridate)
library(ggvis)
library(tidyr)
library(ggplot2)
library(knitr)
flights <- read.csv("domestic_flights_jan_2016.csv", stringsAsFactors = FALSE)

Formating & Variable Creation

Once here, I leveraged some the commands from this “Unit 6’s Lecture Notes” to configure my ‘data.frame’ called “flights”. Specifically, I used these commands to format my date fields and create new variables and calculations that will lead to some insights later. You will also see I changed some of the values to help streamline the results and plots for later on. As a traveler, I find a delay of fifteen minutes tolerable. However, the delays of three or more hours can really disrupt peoples daily goals.

flights$FlightDate <- as.Date(flights$FlightDate, format = "%m/%d/%Y")
flights <- flights %>% mutate(new_CRSDepTime = paste(FlightDate, sprintf("%04d", CRSDepTime)))
flights$new_CRSDepTime <- as.POSIXct(flights$new_CRSDepTime, format="%Y-%m-%d %H%M")
flights <- flights %>% mutate(new_CRSArrTime = paste(FlightDate, sprintf("%04d", CRSArrTime)))
flights$new_CRSArrTime <- as.POSIXct(flights$new_CRSArrTime, format="%Y-%m-%d %H%M")
flights <- flights %>% filter(Cancelled == 0) %>% 
  mutate(new_DepTime = paste(FlightDate, sprintf("%04d", DepTime)), new_WheelsOff = paste(FlightDate, sprintf("%04d", WheelsOff)),
         new_WheelsOn = paste(FlightDate, sprintf("%04d", WheelsOn)), new_ArrTime = paste(FlightDate, sprintf("%04d", ArrTime)))
flights$new_DepTime <- as.POSIXct(flights$new_DepTime, format="%Y-%m-%d %H%M")
flights$new_WheelsOff <- as.POSIXct(flights$new_WheelsOff, format="%Y-%m-%d %H%M")
flights$new_WheelsOn <- as.POSIXct(flights$new_WheelsOn, format="%Y-%m-%d %H%M")
flights$new_ArrTime <- as.POSIXct(flights$new_ArrTime, format="%Y-%m-%d %H%M")
flights <- flights %>% filter(Cancelled == 0) %>%
  mutate(DepDelay = as.integer(difftime(new_DepTime, new_CRSDepTime, units = "mins")))
flights <- flights %>% filter(Cancelled == 0) %>% 
  mutate(TaxiOut = as.integer(difftime(new_WheelsOff, new_DepTime, units = "mins")),
         TaxiIn = as.integer(difftime(new_ArrTime, new_WheelsOn, units = "mins")),
         ArrDelay = as.integer(difftime(new_ArrTime, new_CRSArrTime, units = "mins")),
         ArrDelayMinutes = ifelse(ArrDelay < 0, 0, ArrDelay), 
         ArrDel180 = ifelse(ArrDelay >= 180, 1, 0),
         FlightTimeBuffer = CRSElapsedTime - ActualElapsedTime)
flights <- flights %>% filter(Cancelled == 0) %>% mutate(AirTime = ActualElapsedTime - TaxiOut - TaxiIn)
flights <- flights %>% filter(Cancelled == 0) %>% mutate(AirSpeed = Distance / (AirTime / 60))
head(flights)
##   FlightDate Carrier TailNum FlightNum Origin        OriginCityName
## 1 2016-01-06      AA  N4YBAA        43    DFW Dallas/Fort Worth, TX
## 2 2016-01-07      AA  N434AA        43    DFW Dallas/Fort Worth, TX
## 3 2016-01-08      AA  N541AA        43    DFW Dallas/Fort Worth, TX
## 4 2016-01-09      AA  N489AA        43    DFW Dallas/Fort Worth, TX
## 5 2016-01-10      AA  N439AA        43    DFW Dallas/Fort Worth, TX
## 6 2016-01-11      AA  N468AA        43    DFW Dallas/Fort Worth, TX
##   OriginState Dest DestCityName DestState CRSDepTime DepTime WheelsOff
## 1          TX  DTW  Detroit, MI        MI       1100    1057      1112
## 2          TX  DTW  Detroit, MI        MI       1100    1056      1110
## 3          TX  DTW  Detroit, MI        MI       1100    1055      1116
## 4          TX  DTW  Detroit, MI        MI       1100    1102      1115
## 5          TX  DTW  Detroit, MI        MI       1100    1240      1300
## 6          TX  DTW  Detroit, MI        MI       1100    1107      1118
##   WheelsOn CRSArrTime ArrTime Cancelled Diverted CRSElapsedTime
## 1     1424       1438    1432         0        0            158
## 2     1416       1438    1426         0        0            158
## 3     1431       1438    1445         0        0            158
## 4     1424       1438    1433         0        0            158
## 5     1617       1438    1631         0        0            158
## 6     1426       1438    1435         0        0            158
##   ActualElapsedTime Distance      new_CRSDepTime      new_CRSArrTime
## 1               155      986 2016-01-06 11:00:00 2016-01-06 14:38:00
## 2               150      986 2016-01-07 11:00:00 2016-01-07 14:38:00
## 3               170      986 2016-01-08 11:00:00 2016-01-08 14:38:00
## 4               151      986 2016-01-09 11:00:00 2016-01-09 14:38:00
## 5               171      986 2016-01-10 11:00:00 2016-01-10 14:38:00
## 6               148      986 2016-01-11 11:00:00 2016-01-11 14:38:00
##           new_DepTime       new_WheelsOff        new_WheelsOn
## 1 2016-01-06 10:57:00 2016-01-06 11:12:00 2016-01-06 14:24:00
## 2 2016-01-07 10:56:00 2016-01-07 11:10:00 2016-01-07 14:16:00
## 3 2016-01-08 10:55:00 2016-01-08 11:16:00 2016-01-08 14:31:00
## 4 2016-01-09 11:02:00 2016-01-09 11:15:00 2016-01-09 14:24:00
## 5 2016-01-10 12:40:00 2016-01-10 13:00:00 2016-01-10 16:17:00
## 6 2016-01-11 11:07:00 2016-01-11 11:18:00 2016-01-11 14:26:00
##           new_ArrTime DepDelay TaxiOut TaxiIn ArrDelay ArrDelayMinutes
## 1 2016-01-06 14:32:00       -3      15      8       -6               0
## 2 2016-01-07 14:26:00       -4      14     10      -12               0
## 3 2016-01-08 14:45:00       -5      21     14        7               7
## 4 2016-01-09 14:33:00        2      13      9       -5               0
## 5 2016-01-10 16:31:00      100      20     14      113             113
## 6 2016-01-11 14:35:00        7      11      9       -3               0
##   ArrDel180 FlightTimeBuffer AirTime AirSpeed
## 1         0                3     132 448.1818
## 2         0                8     126 469.5238
## 3         0              -12     135 438.2222
## 4         0                7     129 458.6047
## 5         0              -13     137 431.8248
## 6         0               10     128 462.1875

Data wranglin’ for days!

At this point of the exercize I started to think about how weather can impact travel plans. January is a particular tricky travel month, especially in the Northern US. So, I thought it would be interesting to use this data set aligned with some inclimate weather. The plot below show a significant spike in delays in first part of January, I focused my attention to understand where these delays were occuring and found these commands helpful.

flights2 <- flights %>% filter(ArrDel180 > 0, Cancelled == 0) %>% select(FlightDate, Origin, ArrDel180, AirSpeed)
flights2 %>% group_by(FlightDate) %>% summarize(Tot3hrDelayedArrivals = sum(ArrDel180)) %>% ggplot(aes(x = FlightDate, y = Tot3hrDelayedArrivals)) + geom_line()

I thought these were interesting on a couple levels:

1) The histogram shows that US travel is fairly reliable. Even in tough weather!

JanStorm <- flights2 %>% group_by(Origin) %>% filter(FlightDate > 2016-01-07, FlightDate > 2016-01-11) %>% summarize(Tot3hrDelayedArrivals = sum(ArrDel180)) %>% arrange(desc(Tot3hrDelayedArrivals))
JanStorm %>% ggvis(~Tot3hrDelayedArrivals) %>% layer_histograms(width = 20)

2) These airports experienced the most delayed flights in the early part of January. However, something other than weather occurred during the beginning of this month.

kable(head(JanStorm))
Origin Tot3hrDelayedArrivals
ATL 244
DEN 152
SFO 151
CLT 145
DFW 131
ORD 110

3) In the future, It might be a good exercise to couple this data with other data sets like [this one] (https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp?pn=1)

“Step on it! We’re Late!”

Next, I focused on Portland, Maine (PWM) and Boston, Massachussetts (BOS) airports as these are the airports I typically travel through. So, I applied some filters and selected the relevant variables to see if To get the following ‘data.frames’. I find these insights compelling and may look to travel through more reliable connections.

BOS <- flights %>% filter(Dest == "BOS", ArrDel180 > 0, Cancelled == 0) %>% select(FlightDate, Origin, Dest, DepDelay, ArrDel180, AirSpeed)
kable(head(BOS))
FlightDate Origin Dest DepDelay ArrDel180 AirSpeed
2016-01-06 MIA BOS 200 1 490.1299
2016-01-04 CLT BOS 218 1 455.0000
2016-01-07 PHL BOS 172 1 373.3333
2016-01-10 CLT BOS 318 1 496.3636
2016-01-13 PHL BOS 193 1 336.0000
2016-01-23 CLT BOS 124 1 297.1429
BOS %>% group_by(Origin) %>% summarize(Tot3hrDelayedArrivals = sum(ArrDel180)) %>% ggplot(aes(x = Origin, y = Tot3hrDelayedArrivals), desc(Tot3hrDelayedArrivals)) + geom_col()

There was only one - three hour late arrivals in Portland.

PWM <- flights %>% filter(Dest == "PWM", ArrDel180 > 0, Cancelled == 0) %>% select(FlightDate, Origin, Dest, DepDelay, ArrDel180, AirSpeed)
kable(head(PWM))
FlightDate Origin Dest DepDelay ArrDel180 AirSpeed
2016-01-13 CLT PWM 199 1 487.8

There is nothing more compelling than your own home when on a trip that has been impacted by a significant delay. I have wanted to be beamed home (ala Star Trek) on several occaisions. Quite simply, once on the plane - you simply want to know the pilot is flying at the highest possible speed! So, I wanted to see if pilots press the envelop when they have a plane full of tired travelers.’

It seems they may when you compare the delayed Boston flights…

BOS_AS <- BOS %>% select(Origin, ArrDel180, AirSpeed)
BOS_AS %>% ggvis(~AirSpeed) %>% layer_histograms(width = 20, fill := "red") %>%  add_axis("x", title = "Air Speed") %>% add_axis("y", title = "Total Flights Arriving more than 3 Hours Late")

…with All the domestic flights in the same time frame.

flights2 %>% ggvis(~AirSpeed) %>% layer_histograms(width = 20, fill := "red") %>%  add_axis("x", title = "Air Speed") %>% add_axis("y", title = "Total Flights Arriving more than 3 Hours Late")

Thank You!