Heatmaps Assignement

Author

Maisha Subin

# Loading the packages to view flight dataset
#install.packages("nycflights23")
library(nycflights23)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer)
data(flights)
# Creating a variable carrier_names to get specific carriers
carrier_names <- c("AA" = "American Airlines",
                   "AS" = "Alaska Airlines",
                   "B6" = "JetBlue Airways",
                   "DL" = "Delta Air Lines",
                   "F9" = "Frontier Airlines",
                   "HA" = "Hawaiian Airlines",
                   "NK" = "Spirit Airlines",
                   "UA" = "United Airlines",
                   "WN" = "Southwest Airlines",
                   "G4" = "Allegiant Air")
# remove na's for carrier and dep_delay
# filter for specific airline carriers 
flights_filtered <- flights |>
  filter(!is.na(carrier) & !is.na(dep_delay) & carrier %in% names(carrier_names))
# Convert month numbers to month names
flights_filtered$month <- factor(flights_filtered$month, 
                       levels = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
                       labels = c("January","February", "March", "April", "May", "June", "July", "August",                                         "September", "October", "November", "December"))

# Calculate average delay by carrier and month
avg_delay_by_month_carrier <- flights_filtered %>%
  group_by(carrier, month) %>%  # Group by carrier and month
  summarise(avg_dep_delay = mean(dep_delay), .groups = "drop")  # Calculate average delay
# Use carrier_names to replace carrier codes with full carrier names in the legend
ggplot(avg_delay_by_month_carrier, aes(x = month, y = avg_dep_delay, group = carrier, color = carrier)) +
  geom_line(linewidth = 1) +  # Replace size with linewidth for line thickness
  scale_x_discrete() +  # Display months on x-axis
  scale_color_brewer(palette = "Set3", 
                     labels = carrier_names) +  # Use full names for the legend
  coord_cartesian(ylim = c(-10, 70)) +  # Set y-axis limit to show negative and positive delays
  theme_minimal(base_size = 12) +
  labs(title = "Monthly Average Departure Delays by Airline Carrier",
       x = "Month",
       y = "Average Departure Delay (minutes)",
       caption = "Data: nycflights23") +
  scale_x_discrete(labels = levels(flights_filtered$month)) +  # Show month names on the x-axis
  theme(legend.title = element_blank(), 
        legend.position = "right",  # Legend on the right
        axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.

HIGHLIGHTS

The main goal of this visualization was to identify airlines with a tendency for delayed departures, specifically among well-known and popular carriers. I conducted research and, based on statistics, selected the top-ranked airlines in the U.S. as of 2024 (Slotnick, 2024). This also ensured I had a manageable number of categories, as most ColorBrewer palettes support only 8–10 colors. To track airline delays over the course of a year, I chose a line graph, as it effectively displays trends while accommodating multiple legends. The dataset reveals an overall increase in delays in July, likely due to the summer travel rush. I look forward to exploring further patterns and insights from this trend.

Work Cited Slotnick, D. (2024, June 12). The best airlines in the US for 2024. The Points Guy. https://thepointsguy.com/news/best-us-airlines-2024/