library(tidyverse)
library(nycflights23)
library(alluvial)
library(ggalluvial)
data("flights")NYC Flights HW
NYC Flights HW
Alluvial of Top Five Flight Destinations over 2023
First find the number of flights to each destination, view the top five
dest_count <- flights |>
group_by(dest) |>
count() |>
arrange(desc(n)) |>
head(5)
dest_count# A tibble: 5 × 2
# Groups: dest [5]
dest n
<chr> <int>
1 BOS 19036
2 ORD 18200
3 MCO 17756
4 ATL 17570
5 MIA 16076
Filter for just flights going to those five destinations, count by destination per month.
topdest <- flights |>
filter(dest == c("BOS", "ORD", "MCO", "ATL", "MIA")) |>
group_by(dest, month) |>
count()Warning: There was 1 warning in `filter()`.
ℹ In argument: `dest == c("BOS", "ORD", "MCO", "ATL", "MIA")`.
Caused by warning in `dest == c("BOS", "ORD", "MCO", "ATL", "MIA")`:
! longer object length is not a multiple of shorter object length
head(topdest)# A tibble: 6 × 3
# Groups: dest, month [6]
dest month n
<chr> <int> <int>
1 ATL 1 305
2 ATL 2 321
3 ATL 3 325
4 ATL 4 279
5 ATL 5 322
6 ATL 6 290
Change the destination codes to the actual names. I did not expect MCO to be Orlando.
topdest$dest[topdest$dest == 'ATL'] <- 'Hartsfield–Jackson Atlanta International'
topdest$dest[topdest$dest == 'BOS'] <- 'Boston Logan International'
topdest$dest[topdest$dest == 'MCO'] <- 'Orlando International'
topdest$dest[topdest$dest == 'MIA'] <- 'Miami International'
topdest$dest[topdest$dest == 'ORD'] <- "Chicago O'Hare International"The Alluvium:
destalluv <- topdest |>
ggplot(aes(x = month, y = n, alluvium = dest)) +
theme_bw() +
geom_alluvium(aes(fill= dest),
color = 'black',
width = .1,
alpha = .7,
decreasing = FALSE) +
scale_fill_brewer(palette = "Set2") +
scale_x_continuous(breaks = seq(1,12, by = 1)) +
labs(title = 'Number of Flights From NYC Airports, 2023',
y = 'Flights',
x = 'Month',
fill = 'Destination',
caption = 'Source: FAA')
destalluv?flightsstarting httpd help server ... done
Comments
I created an alluvial that shows the top five flight destinations from the two New York City airports plus Newark in 2023. I chose top five as while the human eye can handle up to ten colors, the alluvial already feels visually ‘busy’ at five. Atlanta is the only airport on the list to be a surprise. The Florida destination trends make sense: Miami is busiest in the coldest of winter and around spring break; while Orlando, near Disney, has more constant traffic with spikes at those same times and during the beginning and end of summer vacation. I tried renaming all the months to character strings, but when used as the x axis, the plot extended to a 13th tick mark labeled ‘NA’, which I could not figure where the values came from, so I reverted back to numerical months. Using breaks over lims in scale_x_continuous has worked better for me in getting whole integers instead of half-values, but the inputted values also contributed to making the default graph rather narrow. I had to use fig.width to make it easier to see.