Heatmaps, Treemaps, and Alluvials

Author

Hana Rose

Load Libraries

library(ggalluvial)
Loading required package: ggplot2
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)

Load Data

flights <- nycflights13::flights
weather <- nycflights13::weather

Merge Data

flights_weather <- flights %>%
  left_join(weather, by = c("year", "month", "day", "hour", "origin"))

Remove NAs

flights_filtered <- flights %>%
  select(month, origin, dest) %>%
  drop_na()

Aggregate Data and Group by Month, Origin, and Destination

flights_agg <- flights_filtered %>%
  group_by(month, origin, dest) %>%
  summarise(flight_count = n(), .groups = "drop")

Extract Top 10 Origin-Destination Pairs by Flight Count

flights_agg_top10 <- flights_agg %>%
  group_by(origin, dest) %>%
  summarise(total_flights = sum(flight_count), .groups = "drop") %>%
  top_n(10, total_flights)

Filter for Top 10 Origin-Destination Pairs

flights_agg_filtered <- flights_agg %>%
  filter(interaction(origin, dest) %in% interaction(flights_agg_top10$origin, flights_agg_top10$dest))

Plot

ggplot(flights_agg_filtered, aes(x = factor(month), y = flight_count, alluvium = interaction(origin, dest))) + 
  theme_bw() +
  geom_alluvium(aes(fill = interaction(origin, dest)), 
                color = "white",  
                width = 0.2,  
                alpha = 1,  
                decreasing = FALSE) +  
  scale_fill_brewer(palette = "Set3") +  
  scale_x_discrete(labels = c("1" = "January", "2" = "February", "3" = "March", "4" = "April", 
                              "5" = "May", "6" = "June", "7" = "July", "8" = "August", 
                              "9" = "September", "10" = "October", "11" = "November", "12" = "December")) +
  labs(title = "Top 10 Origin-Destination Pairs by Month",
       x = "Month",
       y = "Number of Flights",
       fill = "Origin-Destination",
       caption = "Source: nycflights13") +
  theme(legend.position = "bottom", 
        axis.text.x = element_text(angle = 45, hjust = 1))

Essay

This is a visualization of an alluvial diagram illustrating the relationship between the number of flights for various origin-destination pairs across months from the nycflights23 dataset. The y-axis represents the scale of flights, while the alluvium flowing across the x-axis, which depicts months, corresponds to different origin-destination pairs. Each color in the diagram represents a unique origin-destination pair. The width of each alluvium remains consistent, as the pairs are static, but their position along the y-axis reveals how the popularity of these routes changes throughout the year. It’s interesting to see how the flow of people from one place to another fluctuates seasonally, and it revealed some results that were unexpected to me. For example, I assumed the number of flights from New York to Miami would rise in the warmer months because of summer vacations, but it actually increases as it gets colder, peaking in December. This is because New York is far north and many New Yorkers seek a warm getaway in the frigid months. You can also see travel from New Jersey to Chicago peaks in August and takes a sharp decline as fall begins, with Chicago being another colder city. This suggests that travel is heavily influenced by people wanting to migrate to more comfortable temperatures and away from less comfortable ones.