NYC Flights

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
nycflights13::flights
# A tibble: 336,776 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2013     1     1      517            515         2      830            819
 2  2013     1     1      533            529         4      850            830
 3  2013     1     1      542            540         2      923            850
 4  2013     1     1      544            545        -1     1004           1022
 5  2013     1     1      554            600        -6      812            837
 6  2013     1     1      554            558        -4      740            728
 7  2013     1     1      555            600        -5      913            854
 8  2013     1     1      557            600        -3      709            723
 9  2013     1     1      557            600        -3      838            846
10  2013     1     1      558            600        -2      753            745
# ℹ 336,766 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>
flights <- nycflights13::flights %>%
  filter(!is.na(dep_delay), !is.na(carrier))

delay_by_hour <- flights %>%
  mutate(hour = as.POSIXlt(time_hour)$hour) %>%
  group_by(hour, carrier) %>%
  summarize(Avg_Departure_Delay = mean(dep_delay))
`summarise()` has grouped output by 'hour'. You can override using the
`.groups` argument.
carrier_colors <- scale_color_manual(values = c(
  "AA" = "blue",
  "B6" = "green",
  "DL" = "orange",
  "UA" = "red",
  "WN" = "purple"
))

ggplot(delay_by_hour, aes(x = hour, y = Avg_Departure_Delay, color = carrier)) +
  geom_line() +
  labs(
    title = "Average Departure Delay by Hour of the Day",
    x = "Hour of the Day",
    y = "Average Departure Delay (minutes)"
  ) +
  carrier_colors +
  theme_minimal()

The line plot above visualizes the average departure delay for flights over the course of the day, with different colors representing different airlines. Each line represents a carrier, and the x-axis represents the hour of the day, while the y-axis represents the average departure delay in minutes.

One aspect of the plot to highlight is the use of different colors for each carrier, making it easy to distinguish between them.