NYC Flights HW

Author

Charlie Roth

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
data(flights)
flights_nona <- flights |>
  filter(!is.na(dep_delay))
ex_delay <- flights_nona |>
  filter(dep_delay > 120 & distance < 4000)
ggplot(ex_delay, aes(distance, dep_delay, color = origin)) +
  geom_point(aes(shape = origin)) +
  scale_color_manual(values = c("purple", "pink", "lightgreen")) +
  labs(title = "NYC Flight Distance and Extreme Departure Delays \n by Airport (2013)",
       x = "Flight Distance (In Miles)",
       y = "Departure Delays Over 2 Hours (In Minutes)",
       color = "Airports",
       shape = "Airports",
       caption = "FAA Aircraft registry")

The plot I created shows the distance of outbound NYC flights in 2013 on the X-axis and departure delays over 2 hours for those flights in minutes on the Y-axis. I filtered out 12 outliers in the distance variable of the data set. The color and shape of each plot point shows the distinctions among the three airports in the data set. The purple circle represents Newark Liberty International Airport, the pink triangle represents John F. Kennedy International Airport, and the green square represents LaGuardia Airport. I made this graph to answer my question of whether flights of a greater distance are more likely to have longer departure delays. I’d like to highlight that I did not observe a correlation between these two variables.