# Loading the packages to view flight dataset#install.packages("nycflights23")library(nycflights23)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer)data(flights)
# Creating a variable carrier_names to get specific carrierscarrier_names <-c("AA"="American Airlines","AS"="Alaska Airlines","B6"="JetBlue Airways","DL"="Delta Air Lines","F9"="Frontier Airlines","HA"="Hawaiian Airlines","NK"="Spirit Airlines","UA"="United Airlines","WN"="Southwest Airlines","G4"="Allegiant Air")
# remove na's for carrier and dep_delay# filter for specific airline carriers flights_filtered <- flights |>filter(!is.na(carrier) &!is.na(dep_delay) & carrier %in%names(carrier_names))
# Convert month numbers to month namesflights_filtered$month <-factor(flights_filtered$month, levels =c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),labels =c("January","February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"))# Calculate average delay by carrier and monthavg_delay_by_month_carrier <- flights_filtered %>%group_by(carrier, month) %>%# Group by carrier and monthsummarise(avg_dep_delay =mean(dep_delay), .groups ="drop") # Calculate average delay
# Use carrier_names to replace carrier codes with full carrier names in the legendggplot(avg_delay_by_month_carrier, aes(x = month, y = avg_dep_delay, group = carrier, color = carrier)) +geom_line(linewidth =1) +# Replace size with linewidth for line thicknessscale_x_discrete() +# Display months on x-axisscale_color_brewer(palette ="Set3", labels = carrier_names) +# Use full names for the legendcoord_cartesian(ylim =c(-10, 70)) +# Set y-axis limit to show negative and positive delaystheme_minimal(base_size =12) +labs(title ="Monthly Average Departure Delays by Airline Carrier",x ="Month",y ="Average Departure Delay (minutes)",caption ="Data: nycflights23") +scale_x_discrete(labels =levels(flights_filtered$month)) +# Show month names on the x-axistheme(legend.title =element_blank(), legend.position ="right", # Legend on the rightaxis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels for readability
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
HIGHLIGHTS
The main goal of this visualization was to identify airlines with a tendency for delayed departures, specifically among well-known and popular carriers. I conducted research and, based on statistics, selected the top-ranked airlines in the U.S. as of 2024 (Slotnick, 2024). This also ensured I had a manageable number of categories, as most ColorBrewer palettes support only 8–10 colors. To track airline delays over the course of a year, I chose a line graph, as it effectively displays trends while accommodating multiple legends. The dataset reveals an overall increase in delays in July, likely due to the summer travel rush. I look forward to exploring further patterns and insights from this trend.
Work Cited Slotnick, D. (2024, June 12). The best airlines in the US for 2024. The Points Guy. https://thepointsguy.com/news/best-us-airlines-2024/