This report will answer the following question:
How does the Departure Delay affect the Average Departure Hour of flight departing NYC in 2013?
We will be using the flights data set from the
nycflights package. Within the data set, there are a total
of 336776 observations of 19 variables; the relevant variables to this
report are month(the month of the date of departure),
day(the day of the date of departure),
hour(the scheduled hour of departure),
dep_delay(departure delay displayed in minutes), and
dep_time(actual departure time displayed in HHMM). The full
data set cannot be viewed within the report as it has too many
observations to be displayed.
Throughout, the tidyverse package will be used to create
visualizations with the data provided from the nycflights13
package.
library(tidyverse)
library(nycflights13)
In order to find the relationship between the average departure hour and the departure delay, we must first determine the days in which the percent of flights cancelled or delayed by over an hour is greater than or equal to 35%. The code chunk below will determine these days.
perc_c_d <- flights %>%
group_by(month, day) %>%
summarise(percentage = mean((is.na(dep_time))|(dep_delay >= 60))) %>%
filter(percentage>=0.35)
perc_c_d
Having found the dates of higher cancellation or delay percentages, it is reasonable to question why these dates had a larger percentage than other days. A variable in the amount cancellations or delays on each day could be the weather in New York. To prove this hypothesis, external research was conducted and these are the results:
Data Obtained from https://www.timeanddate.com/weather/usa/new-york/historic?month=9&year=2013
2/8 - Snow and Fog with over 10 mph winds
2/9 - Snow in the early morning with freezing temperatures throughout the day.
3/8 - Clear and sunny
5/23 - Rain, Fog and Cloudy
6/24 - Clear and sunny
6/28 - Partly Cloudy
7/1 - Overcast with light Rain and Fog
7/10 - Partly Cloudy
7/23 - Light Rain in the early morning, Overcast
9/2 - Fog, Rain and Partly Cloudy
9/12 - Fog in the afternoon
12/5 - Fog and overcast
From the weather conditions above, we can determine that only some of the dates with high cancellations or delays were due to weather, others may have been due to the flight arriving in NYC late, or due to a technical issue with the aircraft. This data only partially proved the orginal hypothesis.
From the first data set created demonstrating the dates with cancellation or delay percentages above 35%, we can now find the cancellation or delay percentages by the hour.
perc_c_d_hour <- flights %>%
group_by(hour) %>%
summarise(percentage = mean((is.na(dep_time))|(dep_delay >= 60)))
perc_c_d_hour
The data set above provides a decent presentation of the data that has been extracted; however, a visualization would be able to assist in analyzing the data more effectively. Based on the variables presented, a dot plot would represent the data the most accurately.
ggplot(perc_c_d_hour)+
geom_point(mapping=aes(x=hour, y=percentage))+
labs(title = "Percentage vs. Hour",
x= "Hour",
y= "Percentage")
The plot above demonstrates that the most flights were either canceled or delayed at 2100 hours or 9:00pm, and the least at 0500 or 5:00am. It can be assessed that as it gets later into the day, the more flights get cancelled or delayed. That analysis excluded the outlier in the visualization, at 0100 or 1:00am, which had a 100% cancellation or delay percentage. This could have been due to a few factors, one possibility is that there was a data entry error for this specific time, or another possibility is that the weather conditions were repeated unsafe for departure. The riskiest time to fly throughout the day, based on the data presented, is 0100 hours since it has a cancellation or delay percentage of 100%. If we ignore the outlier, the riskiest time to fly would be 2100 hours as it has a cancellation or delay percentage of roughly 20%.
Even though at least 35% of flights that were cancelled or delayed by
more than an hour on the dates given in the first data set, there were
many flights that were able to leave early or on time. The data set
below, on_ear will present the flights that left either
early or on time.
on_ear<-flights %>%
filter((dep_delay <= 0) & (((month == 2) & (day == 8))|((month == 2) & (day == 9))|((month == 3) & (day == 8))|((month == 5) & (day == 23))|((month == 6) & (day == 24))|((month == 6) & (day == 28))|((month == 7) & (day == 1))|((month == 7) & (day == 10))|((month == 7) & (day == 23))|((month == 9) & (day == 2))|((month == 9) & (day == 12))|((month == 12) & (day == 5))))
on_ear
It can be reasonably assumed that the flights that were able to leave early or on time despite the weather, left in the morning prior to the weather becoming unsafe to fly in. The data set above can be manipulated to find the average scheduled hour for departure on the days with a 35% cancellation or delay rate.
avg_hour <- on_ear%>%
group_by(month, day) %>%
summarize("average departure hour" = mean(hour, na.rm = TRUE))
avg_hour
The majority of the average hours of departure for the flights that left early or on time were in the morning from 8:00am to 10am. On February 9th, the average departure hour was at 16.96 hours or around 5:00pm. Returning to the weather data for this date, it was found that there was snow in the early morning with temperatures remaining at or below freezing. By around 5:00pm, it had returned to clear and the visibility became 10 mi, when earlier in the day it was around 1-2 mi. In this case, the average hour of departure being in the afternoon appears to be influenced by the weather, rather than a technical issue with the aircraft or the previous flight’s arrival was late.
From the data extracted and calculated, as well as the external data used to interpret found data, we can conclude that on days with a higher cancellation or delay over an hour rate, many of the flights were cancelled or delayed due to poor weather yet some were still able to depart on time or even early. The data is able to support this claim and we were able to extract and calculate new variables with the existing data to show that with a higher cancellation or delay rate, the majority of flights that were able to leave on time or early, left in the morning.