Load the libraries and feed data into the global environment
The data I will use is the update of the now 10-year-old ‘nycflights13’ data package. ‘nycflights23’ contains information about all flights that departed from the three main New York City airports in 2023 and metadata on airlines, airports, weather, and planes.
I will use the ‘flights’ dataset that is pre-built in the ‘nycflights23’ package.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)data("flights") # loads the flights dataset into my global environment
Notice although I didn’t use arrange(desc()) to sort the arrival delays from the longest to the shortest, I get that order because there are negative values in that column.
Group and summarize data
Calculate the total flights by carriers
# Calculate total flights by carrierscarriers_sum <- flights |>group_by(carrier) |>summarize(total_flights =n()) |>arrange(desc(total_flights)) ## total_flights is a new variable createdcarriers_sum
# A tibble: 14 × 2
carrier total_flights
<chr> <int>
1 YX 88785
2 UA 79641
3 B6 66169
4 DL 61562
5 9E 54141
6 AA 40525
7 NK 15189
8 WN 12385
9 AS 7843
10 OO 6432
11 F9 1286
12 G4 671
13 HA 366
14 MQ 357
Create an alluvial representing carriers total flights by month throughout the year
Load the alluvial package
library(alluvial)library(ggalluvial)
Examine the number of flights each carrier did per month
# calculate total flights by carrier by monthmonthly_flights <- flights |>group_by(carrier, month) |>summarize(total_flights =n()) |>arrange(month)
`summarise()` has grouped output by 'carrier'. You can override using the
`.groups` argument.
monthly_flights
# A tibble: 165 × 3
# Groups: carrier [14]
carrier month total_flights
<chr> <int> <int>
1 9E 1 3985
2 AA 1 3574
3 AS 1 542
4 B6 1 5917
5 DL 1 4836
6 F9 1 92
7 G4 1 42
8 HA 1 31
9 MQ 1 9
10 NK 1 1176
# ℹ 155 more rows
Notice the biggest numbers of flights were operated by Republic Airways YX.
The function mutate(month = factor(month, levels = c(“January”, “February”, “March”, “April”, “May”, “June”, “July”, “August”, “September”, “October”, “November”, “December”))) line ensures that the month column is treated as a factor with levels ordered from January to December instead of alphabetical order.
theme(plot.title = element_text(hjust = 0.5)) centers the title. The “hjust” parameter controls the horizontal justification of the title text, where 0 is left-aligned, 0.5 is centered, and 1 is right-aligned.
# Merge annual total flights into monthly datamonthly_flights2 <- monthly_flights |>left_join(carriers_sum, by ="carrier", suffix =c("_monthly", "_annual")) |>mutate(monthly_rate = (total_flights_monthly/total_flights_annual)*100)# Create the alluvial plotggalluv <- monthly_flights2 |>mutate(month =factor(month, levels =c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"))) |>ggplot(aes(x = month, y = total_flights_monthly, alluvium = carrier ))+geom_alluvium(aes(fill = carrier, text =paste("Carrier:", carrier,"<br>Month:", month,"<br>Monthly Total:", total_flights_monthly,"<br>Annual Total:", total_flights_annual,"<br>Rate:", monthly_rate)),color ="white",width = .1,alpha = .8,decreasing =FALSE) +scale_fill_manual ( values =c("9E"="darkgreen", "AA"="#E6BBC7", "AS"="#B6EEE2", "B6"="#CAB8E5", "DL"="#20DE8B", "F9"="red", "G4"="darkblue", "HA"="yellow", "MQ"="violet", "NK"="#90CFFF", "OO"="#FF0076", "UA"="#FAEFAF", "WN"="#98FB98", "YX"="#EA967C")) +theme(axis.text.x =element_text(angle =45, hjust =1)) +labs(title ="NYC Monthly Flights by Carriers",x ="Month",y ="Monthly Total Flights",fill ="Carrier", caption ="Source: FAA and Bureau of Transportation Statistics \n (https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236)") +theme(plot.title =element_text(hjust =0.5),plot.caption =element_text(hjust =0.5)) # plot.title and plot.caption center both title and caption.
Warning in geom_alluvium(aes(fill = carrier, text = paste("Carrier:", carrier,
: Ignoring unknown aesthetics: text
ggalluv
Convert the plot into an interactive alluvial
To create an interactive alluvial plot with tooltip using ggplotly from the plotly library in R, we should ensure that the text aesthetic is properly set in the ggplot object. This text aesthetic will then be used by ggplotly to display tooltip.
# Install plotlylibrary(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
# Convert ggplot object to an interactive plotly objectalluvial_plot_interactive <-ggplotly(ggalluv, tooltip ="text")alluvial_plot_interactive
The alluvial plot titled ‘NYC Monthly Flights by Carriers’ provides an overall view of the flight volumes for various carriers operating in New York City throughout 2023. The X-axis represents the months from January to December, while the Y-axis shows the total number of flights per month. Each colored curve corresponds to a different airline carrier, with tooltips providing detailed information on the carrier, month, monthly total flights, and annual total flights. The wider a carrier’s curve is, the higher flights’ volume it has.
From the plot, we can observe that the top five carriers with the highest flights’ volumes are Republic Airways “YX”, United Airlines “UA”, JetBlue Airways “B6”, Delta Airlines “DL”, and Endeavor Air “9E”.
We notice that Carrier AA (American Airlines) consistently has a high number of flights each month, with a notable peak during march. Carrier DL (Delta Air Lines) also shows a significant volume, particularly in the second half of the year while carrier JetBlue Airways “B6” has greater volume of flights in the first half of the year.
Frontier Airlines “F9”, Allegiant Air “G4”, Hawaiian Airlines “HA”, Envoy Air (American Eagle) “MQ”, are almost invisible because they have the fewest numbers of flights. Although Frontier Airlines F9 has fewer flights, it is a little bit visible because it has more than one thousand annual flights. Plus, it curve shows a little increase during the summer and the holiday season. Meanwhile, Allegiant Air “G4”, Hawaiian Airlines “HA”, and Envoy Air (American Eagle) “MQ” are barely visible on the plot since they have less than one thousand annual flights.
Another Carrier that maintain a steady performance throughout the year is Southwest Airlines “WN”, while carriers with the largest volume of flights, such as Republic Airways “YX”, United Airlines “UA”, Delta Airlines “DL” exhibit more variability. We can also observe that December is the worst month in term of flights volume for Republic Airways “YX”.
Overall, this visualization helps identify seasonal trends, compare carrier performance, and understand the dynamics of air travel in NYC over 2023.