Introduction

This report will answer the following question:

How does the Departure Delay affect the Average Departure Hour of flight departing NYC in 2013?

We will be using the flights data set from the nycflights package. Within the data set, there are a total of 336776 observations of 19 variables; the relevant variables to this report are month(the month of the date of departure), day(the day of the date of departure), hour(the scheduled hour of departure), dep_delay(departure delay displayed in minutes), and dep_time(actual departure time displayed in HHMM). The full data set cannot be viewed within the report as it has too many observations to be displayed.

Throughout, the tidyverse package will be used to create visualizations with the data provided from the nycflights13 package.

library(tidyverse)
library(nycflights13)

Dates with the Most Cancellations or Delays over an Hour

In order to find the relationship between the average departure hour and the departure delay, we must first determine the days in which the percent of flights cancelled or delayed by over an hour is greater than or equal to 35%. The code chunk below will determine these days.

perc_c_d <- flights %>% 
  group_by(month, day) %>% 
  summarise(percentage = mean((is.na(dep_time))|(dep_delay >= 60))) %>% 
  filter(percentage>=0.35)
perc_c_d

Having found the dates of higher cancellation or delay percentages, it is reasonable to question why these dates had a larger percentage than other days. A variable in the amount cancellations or delays on each day could be the weather in New York. To prove this hypothesis, external research was conducted and these are the results:

Data Obtained from https://www.timeanddate.com/weather/usa/new-york/historic?month=9&year=2013

2/8 - Snow and Fog with over 10 mph winds

2/9 - Snow in the early morning with freezing temperatures throughout the day.

3/8 - Clear and sunny

5/23 - Rain, Fog and Cloudy

6/24 - Clear and sunny

6/28 - Partly Cloudy

7/1 - Overcast with light Rain and Fog

7/10 - Partly Cloudy

7/23 - Light Rain in the early morning, Overcast

9/2 - Fog, Rain and Partly Cloudy

9/12 - Fog in the afternoon

12/5 - Fog and overcast

From the weather conditions above, we can determine that only some of the dates with high cancellations or delays were due to weather, others may have been due to the flight arriving in NYC late, or due to a technical issue with the aircraft. This data only partially proved the orginal hypothesis.

From the first data set created demonstrating the dates with cancellation or delay percentages above 35%, we can now find the cancellation or delay percentages by the hour.

perc_c_d_hour <- flights %>% 
  group_by(hour) %>% 
  summarise(percentage = mean((is.na(dep_time))|(dep_delay >= 60)))
perc_c_d_hour

The data set above provides a decent presentation of the data that has been extracted; however, a visualization would be able to assist in analyzing the data more effectively. Based on the variables presented, a dot plot would represent the data the most accurately.

ggplot(perc_c_d_hour)+
  geom_point(mapping=aes(x=hour, y=percentage))+
  labs(title = "Percentage vs. Hour",
       x= "Hour",
       y= "Percentage")

The plot above demonstrates that the most flights were either canceled or delayed at 2100 hours or 9:00pm, and the least at 0500 or 5:00am. It can be assessed that as it gets later into the day, the more flights get cancelled or delayed. That analysis excluded the outlier in the visualization, at 0100 or 1:00am, which had a 100% cancellation or delay percentage. This could have been due to a few factors, one possibility is that there was a data entry error for this specific time, or another possibility is that the weather conditions were repeated unsafe for departure. The riskiest time to fly throughout the day, based on the data presented, is 0100 hours since it has a cancellation or delay percentage of 100%. If we ignore the outlier, the riskiest time to fly would be 2100 hours as it has a cancellation or delay percentage of roughly 20%.

On time or Early Flights on Days with High Cancelation or Delay rates

Even though at least 35% of flights that were cancelled or delayed by more than an hour on the dates given in the first data set, there were many flights that were able to leave early or on time. The data set below, on_ear will present the flights that left either early or on time.

on_ear<-flights %>% 
  filter((dep_delay <= 0) & (((month == 2) & (day == 8))|((month == 2) & (day == 9))|((month == 3) & (day == 8))|((month == 5) & (day == 23))|((month == 6) & (day == 24))|((month == 6) & (day == 28))|((month == 7) & (day == 1))|((month == 7) & (day == 10))|((month == 7) & (day == 23))|((month == 9) & (day == 2))|((month == 9) & (day == 12))|((month == 12) & (day == 5))))
on_ear

It can be reasonably assumed that the flights that were able to leave early or on time despite the weather, left in the morning prior to the weather becoming unsafe to fly in. The data set above can be manipulated to find the average scheduled hour for departure on the days with a 35% cancellation or delay rate.

avg_hour <- on_ear%>% 
  group_by(month, day) %>% 
  summarize("average departure hour" = mean(hour, na.rm = TRUE))
avg_hour

The majority of the average hours of departure for the flights that left early or on time were in the morning from 8:00am to 10am. On February 9th, the average departure hour was at 16.96 hours or around 5:00pm. Returning to the weather data for this date, it was found that there was snow in the early morning with temperatures remaining at or below freezing. By around 5:00pm, it had returned to clear and the visibility became 10 mi, when earlier in the day it was around 1-2 mi. In this case, the average hour of departure being in the afternoon appears to be influenced by the weather, rather than a technical issue with the aircraft or the previous flight’s arrival was late.

Conclusion

From the data extracted and calculated, as well as the external data used to interpret found data, we can conclude that on days with a higher cancellation or delay over an hour rate, many of the flights were cancelled or delayed due to poor weather yet some were still able to depart on time or even early. The data is able to support this claim and we were able to extract and calculate new variables with the existing data to show that with a higher cancellation or delay rate, the majority of flights that were able to leave on time or early, left in the morning.