Looking at flight data made me interested in the effects of jet streams and if differences in air speeds are realized based on the flight path. I refreshed on some information on jet streams from the North Carolina Climate Office and found that in winter, the polar jet stream will affect a greater portion of the continental U.S. Though jet streams do not follow an exact west-east pattern (see Jet Stream Diagram), I chose flight paths that were more directly eastbound or westbound along lines of latitude for simplicity. The questions I chose to address in this assignment were:
Jet Stream Diagram:
I downloaded the dataset, domestic_flights_jan_2016.csv, from the class website. I began by creating the calculated fields that were demonstrated in the Unit 6 lecture.
library(dplyr)
library(knitr)
usflights <- read.csv("domestic_flights_jan_2016.csv")
Next, I converted all date and time variables to cleaner formats using sprintf() and as.POSIXct.
usflights$FlightDate <- as.Date(usflights$FlightDate, format = "%m/%d/%Y")
usflights <- usflights %>%
mutate(new_CRSDepTime = paste(FlightDate, sprintf("%04d", CRSDepTime)))
usflights$new_CRSDepTime <- as.POSIXct(usflights$new_CRSDepTime, format = "%Y-%m-%d %H%M")
usflights <- usflights %>%
mutate(new_CRSArrTime = paste(FlightDate, sprintf("%04d", CRSArrTime)))
usflights$new_CRSArrTime <- as.POSIXct(usflights$new_CRSArrTime, format="%Y-%m-%d %H%M")
usflights <- usflights %>% filter(Cancelled == 0) %>%
mutate(new_DepTime = paste(FlightDate, sprintf("%04d", DepTime)), new_WheelsOff = paste(FlightDate, sprintf("%04d", WheelsOff)),
new_WheelsOn = paste(FlightDate, sprintf("%04d", WheelsOn)), new_ArrTime = paste(FlightDate, sprintf("%04d", ArrTime)))
usflights$new_DepTime <- as.POSIXct(usflights$new_DepTime, format="%Y-%m-%d %H%M")
usflights$new_WheelsOff <- as.POSIXct(usflights$new_WheelsOff, format="%Y-%m-%d %H%M")
usflights$new_WheelsOn <- as.POSIXct(usflights$new_WheelsOn, format="%Y-%m-%d %H%M")
usflights$new_ArrTime <- as.POSIXct(usflights$new_ArrTime, format="%Y-%m-%d %H%M")
To answer my questions, I didn’t need all of the flight data, so I selected just the flights I’d be using. For the northern flight path, I chose Boston <-> Seattle. For the southern flight path, I chose Atlanta <-> Los Angeles. I created new variables that will help me answer my questions. I glanced through the data to ensure my variables Fdirection and Path were assigned correctly.
ewusflights <- usflights %>%
mutate(TaxiOut = as.integer(difftime(new_WheelsOff, new_DepTime, units = "mins")),
TaxiIn = as.integer(difftime(new_ArrTime, new_WheelsOn, units = "mins")),
ArrDelay = as.integer(difftime(new_ArrTime, new_CRSArrTime, units = "mins")),
ArrDelayMinutes = ifelse(ArrDelay < 0, 0, ArrDelay),
ArrDel15 = ifelse(ArrDelay >= 15, 1, 0),
AirTime = ActualElapsedTime - TaxiOut - TaxiIn,
AirSpeed = Distance / (AirTime / 60)) %>%
filter(OriginCityName == "Boston, MA" & DestCityName == "Seattle, WA" |
OriginCityName == "Seattle, WA" & DestCityName == "Boston, MA" |
OriginCityName == "Atlanta, GA" & DestCityName == "Los Angeles, CA" |
OriginCityName == "Los Angeles, CA" & DestCityName == "Atlanta, GA") %>%
mutate(Fdirection = ifelse(OriginCityName == "Boston, MA" & DestCityName == "Seattle, WA" |
OriginCityName == "Atlanta, GA" & DestCityName == "Los Angeles, CA", "Westbound", "Eastbound"),
Path = ifelse(OriginCityName == "Boston, MA" & DestCityName == "Seattle, WA" | OriginCityName == "Seattle, WA" & DestCityName == "Boston, MA", "Northern Path", "Southern Path")) %>%
select(OriginCityName, DestCityName, new_CRSDepTime, AirSpeed, ArrDel15, Fdirection, Path)
head(ewusflights)
summary(ewusflights$AirSpeed)
library(ggvis)
avg_speed <- na.omit(ewusflights) %>% group_by(Fdirection) %>% summarize(Average_Speed = mean(AirSpeed))
avg_speed %>% ggvis(~Fdirection, ~Average_Speed) %>% layer_bars(fill := "#6699CC")
eastbound <-na.omit(ewusflights) %>% filter(Fdirection == "Eastbound") %>% ggvis(~Path, ~AirSpeed) %>%
layer_boxplots(fill := "#6699CC")
eastbound
westbound <-na.omit(ewusflights) %>% filter(Fdirection == "Westbound") %>% ggvis(~Path, ~AirSpeed) %>%
layer_boxplots(fill := "#6699CC")
westbound
directiondelay <- ewusflights %>% group_by(Fdirection) %>% summarize(Percent_Delayed = sum(ArrDel15)/n()*100)
directiondelay %>% ggvis(~Fdirection, ~Percent_Delayed) %>% layer_bars(fill := "#6699CC")
Without actually testing for significance, it appears that there is a difference in average air speed between eastbound and westbound flights, and this difference reflects the direction of the jet stream. Average eastbound flight speed was 506 mph while average westbound flight speed was 442 mph. If these numbers were closer, I might have imagined that pilots successfully navigated around jet streams by adjusting altitude at which they flew. Or, I thought it possible that average speeds would be similar if more fuel was used to maintain faster speeds since the plane would be working against more force. I might conclude from my first question that there is an advantage for eastbound flights due to the jet stream.
My next questions show no difference in air speeds on northern paths versus southern paths. This lack of difference leads me to wonder if it is because I am using data from winter without a summer comparison. In winter, the more extensive polar jet stream may equalize the magitude of jet stream effects in the northern and southern regions of the country. I would be curious to test this theory by using summer data for comparison.
But the most interesting finding is the percent of delayed flights difference between eastbound and westbound directions. A greater percent of eastbound flights were delayed more than 15 minutes compared to westbound flights. Even though eastbound flights averaged higher speeds, more eastbound flights were delayed. Perhaps eastbound flights travel at higher speeds because they were more often running behind schedule. Maybe they don’t have the jet stream to thank at all! To tease out these questions, I would look at seasonal differences north to south, departure delay differences, and fuel use differences. Based on my exploration, I can’t say for sure that the jet stream is responsible for eastbound/westbound flight differences.