FlightsHW

Load in the necessary packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)

Load in the dataset “flights” from the nycflights13 library

data(flights)

Here we make 3 different dataframes, one each for flights to our 3 local airports

dca_flights <- flights %>%
  filter(dest == "DCA")

iad_flights <- flights %>%
  filter(dest == "IAD")

bwi_flights <- flights %>%
  filter(dest == "BWI")

Set all the airport codes to names people recognize

dca_flights$dest[dca_flights$dest == "DCA"] <- "National"
dca_flights$origin[dca_flights$origin == "EWR"] <- "Newark"
dca_flights$origin[dca_flights$origin == "JFK"] <- "Kennedy"
dca_flights$origin[dca_flights$origin == "LGA"] <- "LaGuardia"

iad_flights$dest[iad_flights$dest == "IAD"] <- "Dulles"
iad_flights$origin[iad_flights$origin == "EWR"] <- "Newark"
iad_flights$origin[iad_flights$origin == "JFK"] <- "Kennedy"
iad_flights$origin[iad_flights$origin == "LGA"] <- "LaGuardia"

bwi_flights$dest[bwi_flights$dest == "BWI"] <- "Baltimore"
bwi_flights$origin[bwi_flights$origin == "EWR"] <- "Newark"
bwi_flights$origin[bwi_flights$origin == "JFK"] <- "Kennedy"
bwi_flights$origin[bwi_flights$origin == "LGA"] <- "LaGuardia"
average_delays <- c(
  mean(dca_flights$dep_delay, na.rm = TRUE),
  mean(iad_flights$dep_delay, na.rm = TRUE),
  mean(bwi_flights$dep_delay, na.rm = TRUE)
)
average_delays
[1] 10.29300 16.98293 16.39682
avg_data <- data.frame(
  x_labels = c("National", "Dulles", "Baltimore"),
  Average = average_delays
)
ggplot(avg_data, aes(x = x_labels, y = Average, fill = x_labels)) +
  geom_bar(stat = "identity") +
  labs(fill = "DMV Airports", title = "Average Delays from NYC Area Airports to DMV Area Airports", x = "Destination", y = "Average Time (minutes)") +
  theme_minimal()

This graph represents the average delays of all flights from the 3 New York City area airports to each of the 3 airports in our area. Note I refuse to call National airport by its official name for political reasons. The most interesting part of the graph to me is that National airport seems to have on average much shorter delays than the other 2 airports, which is surprising to me because it’s so close to the other 2, especially Dulles.