# Load libraries
suppressPackageStartupMessages(library(tidyverse))
library(nycflights23)
Unhappy Air Travelers
A visualization of flight delays from NYC airports
Stacked bar graphs of delayed flights compared to on-time flights for the six largest carriers
# Step 1: Identify top 6 carriers by flight count
<- flights %>%
top6_carriers count(carrier, sort = TRUE) %>%
slice(1:6) %>%
pull(carrier)
# Step 2: Filter and classify flights as 'On-Time' or 'Delayed'
<- flights %>%
delay_summary filter(carrier %in% top6_carriers) %>%
mutate(status = if_else(arr_delay <= 0, "On-Time", "Delayed")) %>%
count(carrier, status) %>%
left_join(airlines, by = "carrier")
# Step 3: Plot stacked bar chart
ggplot(delay_summary, aes(x = name, y = n, fill = status)) +
geom_bar(stat = "identity") +
labs(
title = "Flight Punctuality by Airline (Top 6 Carriers)",
x = "Airline",
y = "Number of Flights",
fill = "Status",
caption = "Source: nycflights23 package"
+
) scale_fill_manual(values = c("On-Time" = "dodgerblue", "Delayed" = "forestgreen")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Tree map of late arrivals for the top 20 destinations
library(treemapify)
# Step 1: Filter for delayed flights and group by destination
<- flights %>%
delayed_dest filter(arr_delay > 0) %>%
group_by(dest) %>%
summarize(
avg_delay = mean(arr_delay, na.rm = TRUE),
total_delays = n()
%>%
) arrange(desc(total_delays)) %>%
slice(1:20) # Keep only top 20 destinations
# Step 2: Create treemap using airport codes
ggplot(delayed_dest, aes(
area = total_delays,
fill = avg_delay,
label = dest
+
)) geom_treemap() +
geom_treemap_text(
fontface = "bold",
color = "white",
place = "center",
grow = FALSE
+
) scale_fill_viridis_c(name = "Avg Delay (min)") +
labs(
title = "Top 20 Destination Airports by Delayed Arrivals from NYC (FAA Codes)",
caption = "Source: nycflights23 package"
+
) theme_minimal()
N.B. Trying to squeeze the names of the airports into the plot made the visualization too small or the print too small. Of course, everyone knows most of these except the Florida ones: MCO Orlando, PBI - Palm Beach, FLL - Fort Lauderdale, TPA Tampa. SJU is in Puerto Rico, and LAS is where you go to gamble.
Summary
The first visualization shows stacked bar graphs comparing the number of on-time and delayed flights for the six largest airlines operating out of New York City in 2023. The top six carriers by flight volume were filtered, and each flight was classfied as either “On-Time” or “Delayed” based on arrival delay.
One key insight is the variation in punctuality across carriers. While all airlines have a mix of on-time and delayed flights, some show a higher proportion of on-time arrivals compared to others. The use of stacked bars allows for both absolute and relative comparisons. This visualization is more accessible than boxplots and provides a clear narrative for performance differences.
In the second visualization, a treemap was used to visualize the top 20 destination airports with the highest number of delayed arrivals departing from New York City in 2023. Only the data that included flights with positive arrival delays was used and grouped by destination. Of the top 20 delayed fight destinations 5 were in Florida where there are more likely to be weather-related delays due to tropical storms.
N.B. After I attempted to use a side-by-side box plot visualization of the top carriers flight delays, I decided that stacked bar charts were better. I used the assistance of Copilot for the tree map. The two plot ideas were mine.