NYCflights

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data(flights)
data(airlines)  # Load airline names
# Filter dataset to remove extreme outliers
flights_filtered <- flights %>%
  filter(dep_delay < 300, arr_delay < 300)  # Remove extreme delays
# Scatterplot of departure delay vs. arrival delay
plot <- ggplot(flights_filtered, aes(x = dep_delay, y = arr_delay)) +
  geom_point(alpha = 0.5, aes(color = dep_delay)) +  # Color by departure delay
  scale_color_gradientn(colors = c("pink", "orange", "yellow")) +  # Custom color scheme
  labs(
    title = "Relationship Between Departure and Arrival Delays (NYC, 2023)",
    x = "Departure Delay (minutes)",
    y = "Arrival Delay (minutes)",
    color = "Departure Delay (min)",
    caption = "Data Source: nycflights23 package"
  ) +
  theme_minimal()

# Print the plot
print(plot)

The scatterplot shows the correlation between arrival and departure delays for airplanes leaving New York City in 2023. A flight is represented by each point, and the arrival delay (in minutes) is shown on the y-axis and the departure delay (in minutes) on the x-axis. We can easily spot patterns in the data thanks to the color gradient, which shows growing departure delays and ranges from pink to orange to yellow. The plot’s overall increasing trend, which implies that flights with longer departure delays typically have higher arrival delays, is one of its most important features. There is noticeable variation, too, since some flights with large departure delays are able to make up time and arrive with little delay. The color gradient is especially helpful in emphasizing extreme situations, as the planes with the biggest departure delays (yellow points) tend to group together in the graph’s upper-right area, supporting the notion that delays build up over time. Transparency (alpha = 0.5) serves to decrease overplotting and make patterns more visible even in big datasets. The operational factors that affect departure delays may also affect arrival delays, according to this visualization, which offers insightful information about flight on time performance. To better understand the reasons behind these patterns, additional research may look into seasonal variations in delays or tendencies unique to individual airlines.