library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(nycflights13)
data(flights)

Filter the data to show only flights going to DC, Maryland, and Virginia major airports (IAD, BWI, and DCA) in the Summer months (June, July, August)

DMV_flights <- flights %>%
  filter(dest == "IAD" | dest == "BWI" | dest == "DCA") %>%
  filter (month == "6" | month == "7" | month == "8")

Graph 1: Bar graph showing arrival delays for flights from NY to DC, Marylan, Virginia in the Summer months

Graph1 <- DMV_flights %>%
  ggplot() +
  geom_bar(aes(x=month, y=arr_delay, fill = dest),
      position = "dodge", stat = "identity",  alpha = .2) +
  ggtitle("Arrival delays for flights from NY to DC, Maryland, Virgninia") +
  xlab("Month") + 
  ylab("Arrival Delays") + 
  labs(fill = "Airport")
Graph1
## Warning: Removed 296 rows containing missing values (`geom_bar()`).

Bar Graph 2: Arrival delays split up by day

Graph2 <- DMV_flights %>%
  ggplot() +
  geom_bar(aes(x=month, y=arr_delay, fill = dest),
      position = "dodge2", stat = "identity",  alpha = 1) +
  ggtitle("Arrival delays for flights from NY to DC, Maryland, Virgninia") +
  xlab("Month") + 
  ylab("Arrival Delays") + 
  labs(fill = "Airport")
Graph2
## Warning: Removed 296 rows containing missing values (`geom_bar()`).

Graph 3: Departure Delays in Facet Wrap

Graph3 <- DMV_flights %>% 
  tidyr::gather("dest", "dep_delay", 3:10) 

Graph3 <- DMV_flights %>% 
  ggplot(., aes(day, dep_delay))+
  geom_point()+
  aes(color = dest)+
  ggtitle("Departure Delays for Flights from NY to DC, Maryland, Virgninia") +
  xlab("Day") + 
  ylab("Departure Delay") + 
  facet_wrap(~dest)
Graph3
## Warning: Removed 278 rows containing missing values (`geom_point()`).

Essay

One of the many perks of living in the DMV metro area is having access to 3 different airports. I like to travel and I do so a lot so I wanted to find out the delays of the flights going to the major airports in DC, Maryland, and Virginia; especially in the Summer months where most people also travel. The results are not surprising. Dulles Airport (IAD) - despite being the biggest, has the highest and most frequent delays. This is “understandable” to a certain extent because they are a major hub of flights both domestic and international. Reagan Airport (DCA) is next on the list with more delays compared to BWI. DCA is a pretty small airport so one delay can have a domino effect on other flights. Lastly, Thurgood-Marshall Airport (BWI) shows the least delays, however, they also have the least number of flights coming in from NY. On the bright side, there are a decent number of flights that arrive early as well - not that that makes up for the delays, but the passengers on those flights, I’m sure, were happy!

As for the Summer months, drawing a conclusion from these graphs, if I were to travel to and from NY during the summer months, I would consider:

  1. Flying out of and into BWI

  2. Traveling in June

  3. Or sticking with taking the train!

However, this data set is from 2013 which is 10 years ago. I would be curious to see how things are now, I bet (hope) it is much better!