The main data set in this week’s assignment provides an endless array of variables and insights. To keep from getting too overwhelmed, I focused on my personal travel experiences and some of the questions that come to my mind while traveling. Along the way, I aim to display compelling graphs that show insights about Arrival Delays nationally and regionally. To begin I used the ‘install.packages()’, ‘library’ & ‘read.csv’ commands to configure my R environment.
library(dplyr)
library(lubridate)
library(ggvis)
library(tidyr)
library(ggplot2)
library(knitr)
flights <- read.csv("domestic_flights_jan_2016.csv", stringsAsFactors = FALSE)
Once here, I leveraged some the commands from this “Unit 6’s Lecture Notes” to configure my ‘data.frame’ called “flights”. Specifically, I used these commands to format my date fields and create new variables and calculations that will lead to some insights later. You will also see I changed some of the values to help streamline the results and plots for later on. As a traveler, I find a delay of fifteen minutes tolerable. However, the delays of three or more hours can really disrupt peoples daily goals.
flights$FlightDate <- as.Date(flights$FlightDate, format = "%m/%d/%Y")
flights <- flights %>% mutate(new_CRSDepTime = paste(FlightDate, sprintf("%04d", CRSDepTime)))
flights$new_CRSDepTime <- as.POSIXct(flights$new_CRSDepTime, format="%Y-%m-%d %H%M")
flights <- flights %>% mutate(new_CRSArrTime = paste(FlightDate, sprintf("%04d", CRSArrTime)))
flights$new_CRSArrTime <- as.POSIXct(flights$new_CRSArrTime, format="%Y-%m-%d %H%M")
flights <- flights %>% filter(Cancelled == 0) %>%
mutate(new_DepTime = paste(FlightDate, sprintf("%04d", DepTime)), new_WheelsOff = paste(FlightDate, sprintf("%04d", WheelsOff)),
new_WheelsOn = paste(FlightDate, sprintf("%04d", WheelsOn)), new_ArrTime = paste(FlightDate, sprintf("%04d", ArrTime)))
flights$new_DepTime <- as.POSIXct(flights$new_DepTime, format="%Y-%m-%d %H%M")
flights$new_WheelsOff <- as.POSIXct(flights$new_WheelsOff, format="%Y-%m-%d %H%M")
flights$new_WheelsOn <- as.POSIXct(flights$new_WheelsOn, format="%Y-%m-%d %H%M")
flights$new_ArrTime <- as.POSIXct(flights$new_ArrTime, format="%Y-%m-%d %H%M")
flights <- flights %>% filter(Cancelled == 0) %>%
mutate(DepDelay = as.integer(difftime(new_DepTime, new_CRSDepTime, units = "mins")))
flights <- flights %>% filter(Cancelled == 0) %>%
mutate(TaxiOut = as.integer(difftime(new_WheelsOff, new_DepTime, units = "mins")),
TaxiIn = as.integer(difftime(new_ArrTime, new_WheelsOn, units = "mins")),
ArrDelay = as.integer(difftime(new_ArrTime, new_CRSArrTime, units = "mins")),
ArrDelayMinutes = ifelse(ArrDelay < 0, 0, ArrDelay),
ArrDel180 = ifelse(ArrDelay >= 180, 1, 0),
FlightTimeBuffer = CRSElapsedTime - ActualElapsedTime)
flights <- flights %>% filter(Cancelled == 0) %>% mutate(AirTime = ActualElapsedTime - TaxiOut - TaxiIn)
flights <- flights %>% filter(Cancelled == 0) %>% mutate(AirSpeed = Distance / (AirTime / 60))
head(flights)
## FlightDate Carrier TailNum FlightNum Origin OriginCityName
## 1 2016-01-06 AA N4YBAA 43 DFW Dallas/Fort Worth, TX
## 2 2016-01-07 AA N434AA 43 DFW Dallas/Fort Worth, TX
## 3 2016-01-08 AA N541AA 43 DFW Dallas/Fort Worth, TX
## 4 2016-01-09 AA N489AA 43 DFW Dallas/Fort Worth, TX
## 5 2016-01-10 AA N439AA 43 DFW Dallas/Fort Worth, TX
## 6 2016-01-11 AA N468AA 43 DFW Dallas/Fort Worth, TX
## OriginState Dest DestCityName DestState CRSDepTime DepTime WheelsOff
## 1 TX DTW Detroit, MI MI 1100 1057 1112
## 2 TX DTW Detroit, MI MI 1100 1056 1110
## 3 TX DTW Detroit, MI MI 1100 1055 1116
## 4 TX DTW Detroit, MI MI 1100 1102 1115
## 5 TX DTW Detroit, MI MI 1100 1240 1300
## 6 TX DTW Detroit, MI MI 1100 1107 1118
## WheelsOn CRSArrTime ArrTime Cancelled Diverted CRSElapsedTime
## 1 1424 1438 1432 0 0 158
## 2 1416 1438 1426 0 0 158
## 3 1431 1438 1445 0 0 158
## 4 1424 1438 1433 0 0 158
## 5 1617 1438 1631 0 0 158
## 6 1426 1438 1435 0 0 158
## ActualElapsedTime Distance new_CRSDepTime new_CRSArrTime
## 1 155 986 2016-01-06 11:00:00 2016-01-06 14:38:00
## 2 150 986 2016-01-07 11:00:00 2016-01-07 14:38:00
## 3 170 986 2016-01-08 11:00:00 2016-01-08 14:38:00
## 4 151 986 2016-01-09 11:00:00 2016-01-09 14:38:00
## 5 171 986 2016-01-10 11:00:00 2016-01-10 14:38:00
## 6 148 986 2016-01-11 11:00:00 2016-01-11 14:38:00
## new_DepTime new_WheelsOff new_WheelsOn
## 1 2016-01-06 10:57:00 2016-01-06 11:12:00 2016-01-06 14:24:00
## 2 2016-01-07 10:56:00 2016-01-07 11:10:00 2016-01-07 14:16:00
## 3 2016-01-08 10:55:00 2016-01-08 11:16:00 2016-01-08 14:31:00
## 4 2016-01-09 11:02:00 2016-01-09 11:15:00 2016-01-09 14:24:00
## 5 2016-01-10 12:40:00 2016-01-10 13:00:00 2016-01-10 16:17:00
## 6 2016-01-11 11:07:00 2016-01-11 11:18:00 2016-01-11 14:26:00
## new_ArrTime DepDelay TaxiOut TaxiIn ArrDelay ArrDelayMinutes
## 1 2016-01-06 14:32:00 -3 15 8 -6 0
## 2 2016-01-07 14:26:00 -4 14 10 -12 0
## 3 2016-01-08 14:45:00 -5 21 14 7 7
## 4 2016-01-09 14:33:00 2 13 9 -5 0
## 5 2016-01-10 16:31:00 100 20 14 113 113
## 6 2016-01-11 14:35:00 7 11 9 -3 0
## ArrDel180 FlightTimeBuffer AirTime AirSpeed
## 1 0 3 132 448.1818
## 2 0 8 126 469.5238
## 3 0 -12 135 438.2222
## 4 0 7 129 458.6047
## 5 0 -13 137 431.8248
## 6 0 10 128 462.1875
At this point of the exercize I started to think about how weather can impact travel plans. January is a particular tricky travel month, especially in the Northern US. So, I thought it would be interesting to use this data set aligned with some inclimate weather. The plot below show a significant spike in delays in first part of January, I focused my attention to understand where these delays were occuring and found these commands helpful.
flights2 <- flights %>% filter(ArrDel180 > 0, Cancelled == 0) %>% select(FlightDate, Origin, ArrDel180, AirSpeed)
flights2 %>% group_by(FlightDate) %>% summarize(Tot3hrDelayedArrivals = sum(ArrDel180)) %>% ggplot(aes(x = FlightDate, y = Tot3hrDelayedArrivals)) + geom_line()
I thought these were interesting on a couple levels:
JanStorm <- flights2 %>% group_by(Origin) %>% filter(FlightDate > 2016-01-07, FlightDate > 2016-01-11) %>% summarize(Tot3hrDelayedArrivals = sum(ArrDel180)) %>% arrange(desc(Tot3hrDelayedArrivals))
JanStorm %>% ggvis(~Tot3hrDelayedArrivals) %>% layer_histograms(width = 20)
kable(head(JanStorm))
| Origin | Tot3hrDelayedArrivals |
|---|---|
| ATL | 244 |
| DEN | 152 |
| SFO | 151 |
| CLT | 145 |
| DFW | 131 |
| ORD | 110 |
Next, I focused on Portland, Maine (PWM) and Boston, Massachussetts (BOS) airports as these are the airports I typically travel through. So, I applied some filters and selected the relevant variables to see if To get the following ‘data.frames’. I find these insights compelling and may look to travel through more reliable connections.
BOS <- flights %>% filter(Dest == "BOS", ArrDel180 > 0, Cancelled == 0) %>% select(FlightDate, Origin, Dest, DepDelay, ArrDel180, AirSpeed)
kable(head(BOS))
| FlightDate | Origin | Dest | DepDelay | ArrDel180 | AirSpeed |
|---|---|---|---|---|---|
| 2016-01-06 | MIA | BOS | 200 | 1 | 490.1299 |
| 2016-01-04 | CLT | BOS | 218 | 1 | 455.0000 |
| 2016-01-07 | PHL | BOS | 172 | 1 | 373.3333 |
| 2016-01-10 | CLT | BOS | 318 | 1 | 496.3636 |
| 2016-01-13 | PHL | BOS | 193 | 1 | 336.0000 |
| 2016-01-23 | CLT | BOS | 124 | 1 | 297.1429 |
BOS %>% group_by(Origin) %>% summarize(Tot3hrDelayedArrivals = sum(ArrDel180)) %>% ggplot(aes(x = Origin, y = Tot3hrDelayedArrivals), desc(Tot3hrDelayedArrivals)) + geom_col()
There was only one - three hour late arrivals in Portland.
PWM <- flights %>% filter(Dest == "PWM", ArrDel180 > 0, Cancelled == 0) %>% select(FlightDate, Origin, Dest, DepDelay, ArrDel180, AirSpeed)
kable(head(PWM))
| FlightDate | Origin | Dest | DepDelay | ArrDel180 | AirSpeed |
|---|---|---|---|---|---|
| 2016-01-13 | CLT | PWM | 199 | 1 | 487.8 |
There is nothing more compelling than your own home when on a trip that has been impacted by a significant delay. I have wanted to be beamed home (ala Star Trek) on several occaisions. Quite simply, once on the plane - you simply want to know the pilot is flying at the highest possible speed! So, I wanted to see if pilots press the envelop when they have a plane full of tired travelers.’
BOS_AS <- BOS %>% select(Origin, ArrDel180, AirSpeed)
BOS_AS %>% ggvis(~AirSpeed) %>% layer_histograms(width = 20, fill := "red") %>% add_axis("x", title = "Air Speed") %>% add_axis("y", title = "Total Flights Arriving more than 3 Hours Late")
flights2 %>% ggvis(~AirSpeed) %>% layer_histograms(width = 20, fill := "red") %>% add_axis("x", title = "Air Speed") %>% add_axis("y", title = "Total Flights Arriving more than 3 Hours Late")