NYC flights assignment

Author

O Kandji

load the variables

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data("flights")
str(flights)
tibble [435,352 × 19] (S3: tbl_df/tbl/data.frame)
 $ year          : int [1:435352] 2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
 $ month         : int [1:435352] 1 1 1 1 1 1 1 1 1 1 ...
 $ day           : int [1:435352] 1 1 1 1 1 1 1 1 1 1 ...
 $ dep_time      : int [1:435352] 1 18 31 33 36 503 520 524 537 547 ...
 $ sched_dep_time: int [1:435352] 2038 2300 2344 2140 2048 500 510 530 520 545 ...
 $ dep_delay     : num [1:435352] 203 78 47 173 228 3 10 -6 17 2 ...
 $ arr_time      : int [1:435352] 328 228 500 238 223 808 948 645 926 845 ...
 $ sched_arr_time: int [1:435352] 3 135 426 2352 2252 815 949 710 818 852 ...
 $ arr_delay     : num [1:435352] 205 53 34 166 211 -7 -1 -25 68 -7 ...
 $ carrier       : chr [1:435352] "UA" "DL" "B6" "B6" ...
 $ flight        : int [1:435352] 628 393 371 1053 219 499 996 981 206 225 ...
 $ tailnum       : chr [1:435352] "N25201" "N830DN" "N807JB" "N265JB" ...
 $ origin        : chr [1:435352] "EWR" "JFK" "JFK" "JFK" ...
 $ dest          : chr [1:435352] "SMF" "ATL" "BQN" "CHS" ...
 $ air_time      : num [1:435352] 367 108 190 108 80 154 192 119 258 157 ...
 $ distance      : num [1:435352] 2500 760 1576 636 488 ...
 $ hour          : num [1:435352] 20 23 23 21 20 5 5 5 5 5 ...
 $ minute        : num [1:435352] 38 0 44 40 48 0 10 30 20 45 ...
 $ time_hour     : POSIXct[1:435352], format: "2023-01-01 20:00:00" "2023-01-01 23:00:00" ...
head(flights)
# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2023     1     1        1           2038       203      328              3
2  2023     1     1       18           2300        78      228            135
3  2023     1     1       31           2344        47      500            426
4  2023     1     1       33           2140       173      238           2352
5  2023     1     1       36           2048       228      223           2252
6  2023     1     1      503            500         3      808            815
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Calculate average departure delay by carrier and month

avg_delay <- flights %>%
  group_by(carrier, month) %>%
  summarise(avg_dep_delay = mean(dep_delay, na.rm = TRUE), .groups = "drop")

Plotting the heatmap

library(ggplot2)

ggplot(avg_delay, aes(x = month, y = carrier, fill = avg_dep_delay)) +
  geom_tile() +
  scale_fill_gradient(low = "lightblue", high = "purple") +
  labs(title = "Average Departure Delay by Airline and Month",
       x = "Month",
       y = "Airline",
       fill = "Avg Departure Delay (minutes)")+
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  guides(fill = guide_colorbar(title = "Avg Departure Delay\n(minutes)"))

The heatmap visualizes the average departure delays for various airlines across different months of the year. Each cell in the heatmap is a unique combination of an airline and a month, colored according to the average departure delay in minutes. The color gradient from lightblue to purple indicates the size of the delay, with lighter colors representing shorter delays and darker colors representing longer delays. One aspect of the plot to highlight is the seasonal variation in delays for specific airlines, which can provide insights into operational challenges faced during certain times of the year. This visualization helps in identifying patterns and trends in airline performance over time. Running the entire code should allow you to generate a heatmap with the specified color scheme that shows the average departure delays, giving you information about the performance and delays of the airline.