Treemap of Departure Delays by Visibility and Wind Speed

Author

Latifah Traore

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
library(treemapify)
data(flights)
data(weather)
flight_weather <- flights %>%
  left_join(weather, by = c("origin","year", "month", "day", "hour"))
glimpse(weather)
Rows: 26,204
Columns: 15
$ origin     <chr> "JFK", "JFK", "JFK", "JFK", "JFK", "JFK", "JFK", "JFK", "JF…
$ year       <int> 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023,…
$ month      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ hour       <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ temp       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ dewp       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ humid      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ wind_dir   <dbl> 0, 190, 190, 250, 170, 0, 250, 230, 260, 250, 240, 260, 260…
$ wind_speed <dbl> 0.00000, 4.60312, 5.75390, 5.75390, 8.05546, 0.00000, 9.206…
$ wind_gust  <dbl> 0.000000, 5.297178, 6.621473, 6.621473, 9.270062, 0.000000,…
$ precip     <dbl> NA, NA, NA, 0.02, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ pressure   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ visib      <dbl> 0.25, 2.50, 0.25, 4.00, 0.75, 0.75, 0.24, 0.50, 8.00, 5.00,…
$ time_hour  <dttm> 2023-01-01 03:00:00, 2023-01-01 04:00:00, 2023-01-01 05:00…
weather_summary <- flight_weather %>%
  filter(!is.na(dep_delay)) %>%
  group_by(visib, wind_speed) %>%
  summarise(avg_dep_delay = mean(dep_delay, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(avg_dep_delay))
`summarise()` has grouped output by 'visib'. You can override using the
`.groups` argument.
ggplot(weather_summary, aes(area = avg_dep_delay, fill = avg_dep_delay, 
                             label = paste("Visib:", visib, "\nWind:", wind_speed, "\nDelay:", round(avg_dep_delay, 1)))) +
  geom_treemap() + 
  geom_treemap_text(fontface = "bold", colour = "white", place = "centre", grow = TRUE) +
  scale_fill_gradient(low = "blue", high = "red", name = "Avg Delay (min)") +
  labs(title = "Treemap of Departure Delays by Visibility and Wind Speed", 
       caption = "Source: nycflights23 and weather datasets") +
  theme_minimal() +
  theme(plot.title = element_text(size = 14, face = "bold"), 
        axis.title = element_text(size = 12), 
        legend.title = element_text(size = 12), 
        legend.position = "right")

The treemap visualization illustrates the relationship between departure delays from NYC flights and two key weather factors: visibility and wind speed. Each rectangle in the treemap represents a specific combination of visibility and wind speed, with the area of the rectangle indicating the average departure delay associated with that combination. Larger rectangles signify greater average delays, while the color gradient—from blue to red—highlights the severity of these delays, with blue indicating minimal delays and red indicating significant delays. This design allows for an intuitive comparison across various conditions, making it easy to identify which weather scenarios correlate with the worst delays. By integrating flight and weather data from the nycflights23 dataset, the visualization reveals how adverse weather conditions can affect flight schedules, offering valuable insights for airline operations and travelers. This analysis underscores the importance of weather monitoring in aviation, emphasizing that understanding these factors can help mitigate delays and improve overall travel experience.