LaGuardia Airport (LGA) in New York City is a common destination for my family coming from New Mexico but they often face delays while connecting en route to LGA. I chose to look at all flights arriving in LGA to begin my analysis.

To start I will read in my csv file.

flights <- read.csv("domestic_flights_jan_2016.csv", header = TRUE, stringsAsFactors = FALSE)

And load in the additional packages I will be using to sort and analyze the data.

library(dplyr)
library(ggvis)

To then be able to create a new data frame of only flights with LGA as a destination.

LGA_arrivals <- flights %>% filter(Dest == "LGA")
Beginning Metrics

Next, I will be performing many of the same metrics Prof. Suleiman demonstrated in the Unit 6 lecture on my new LGA data frame “LGA_arrivals”. For this reason, I will move through this section without much of a narrative.

#Clean up date convention
LGA_arrivals$FlightDate <- as.Date(LGA_arrivals$FlightDate, format = "%m/%d/%Y")
#Format and add lead zeros back into time variables
LGA_arrivals <- LGA_arrivals %>% mutate(new_CRSDepTime = paste(FlightDate, sprintf("%04d", CRSDepTime)))

LGA_arrivals$new_CRSDepTime <- as.POSIXct(LGA_arrivals$new_CRSDepTime, format="%Y-%m-%d %H%M")

LGA_arrivals <- LGA_arrivals %>% mutate(new_CRSArrTime = paste(FlightDate, sprintf("%04d", CRSArrTime)))

LGA_arrivals$new_CRSArrTime <- as.POSIXct(LGA_arrivals$new_CRSArrTime, format="%Y-%m-%d %H%M")

LGA_arrivals <- LGA_arrivals %>% filter(Cancelled == 0) %>% filter(Diverted == 0) %>% mutate(new_DepTime = paste(FlightDate, sprintf("%04d", DepTime)), new_WheelsOff = paste(FlightDate, sprintf("%04d", WheelsOff)), new_WheelsOn = paste(FlightDate, sprintf("%04d", WheelsOn)), new_ArrTime = paste(FlightDate, sprintf("%04d", ArrTime)))

LGA_arrivals$new_DepTime <- as.POSIXct(LGA_arrivals$new_DepTime, format="%Y-%m-%d %H%M")

LGA_arrivals$new_WheelsOff <- as.POSIXct(LGA_arrivals$new_WheelsOff, format="%Y-%m-%d %H%M")

LGA_arrivals$new_WheelsOn <- as.POSIXct(LGA_arrivals$new_WheelsOn, format="%Y-%m-%d %H%M")

LGA_arrivals$new_ArrTime <- as.POSIXct(LGA_arrivals$new_ArrTime, format="%Y-%m-%d %H%M")
#Speed metrics
LGA_arrivals <- LGA_arrivals %>% filter(Cancelled == 0) %>% mutate(TaxiOut = as.integer(difftime(new_WheelsOff, new_DepTime, units = "mins")), TaxiIn = as.integer(difftime(new_ArrTime, new_WheelsOn, units = "mins")), ArrDelay = as.integer(difftime(new_ArrTime, new_CRSArrTime, units = "mins")), ArrDelayMinutes = ifelse(ArrDelay < 0, 0, ArrDelay), ArrDel15 = ifelse(ArrDelay >= 15, 1, 0), FlightTimeBuffer = CRSElapsedTime - ActualElapsedTime)

LGA_arrivals <- LGA_arrivals %>% filter(Cancelled == 0) %>% mutate(AirTime = ActualElapsedTime - TaxiOut - TaxiIn)

LGA_arrivals <- LGA_arrivals %>% filter(Cancelled == 0) %>% mutate(AirSpeed = Distance / (AirTime / 60))
#Departure delays
LGA_arrivals <- LGA_arrivals %>% filter(Cancelled == 0) %>% filter(Diverted == 0) %>% mutate(DepDelay = as.integer(difftime(new_DepTime, new_CRSDepTime, units = "mins")))

LGA_arrivals <- LGA_arrivals %>% filter(Cancelled == 0) %>% filter(Diverted == 0) %>% mutate(DepDelayMinutes = ifelse(DepDelay < 0, 0, DepDelay), DepDel15 = ifelse(DepDelay >= 15, 1, 0))
Analysis

Now I can start manipulating the data to see if there is a way to avoid delays when flying into LGA.

First, I will look at significant delays, more than 15 mintues, by carrier.

LGA_arrivals %>% ggvis(x = ~Carrier, y = ~DepDel15) %>% layer_bars()

I havent the slightest idea what carriers are F9, OO, or VX but they might be the better options.

Another parameter to look at would be delays by location. In this analysis I calculated the rate of significant delays by origin airport.

DelayRate_ByLocation <- LGA_arrivals %>% group_by(Origin) %>% summarize(DelayRate = sum(DepDel15) / n())
DelayRate_ByLocation %>% ggvis(~Origin, ~DelayRate) %>% layer_bars()

I will instruct my family to not travel through the airports with significantly high rates of departure delays heading towards LGA.

Finally, because I want to graph a linear regression, despite lacking terribly logical variables with which to do so, I will be looking at the relationship between plane speed (AirSpeed) and departure delay in mintues (DepDelayMinutes).

LGA_arrivals %>% ggvis(x = ~DepDelayMinutes, y = ~AirSpeed) %>% layer_points() %>% layer_model_predictions(model = "lm", se = TRUE, stroke := "red")

In spite of my hopes, there does not appear to be much of a relationship between a late departure and airplane speed. Although the hypothesis, that a delay might impact a pilot’s choice to fly faster, seems like it is could be logical, I am assuming there are far more significant variables that affect speed.