NYC flights 13 HW

Author

Andrew George

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
library(ggalluvial)

Looking at the data

head(flights)
# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      542            540         2      923            850
4  2013     1     1      544            545        -1     1004           1022
5  2013     1     1      554            600        -6      812            837
6  2013     1     1      554            558        -4      740            728
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Removing Na’s

flights_nona <- flights |>
  filter(!is.na(dep_delay) & !is.na(carrier))  

Finding flights operated by United Air Lines that were delayed by at least an hour

del_flights <- flights_nona |>
  filter(dep_delay >= 60 & carrier == "UA") |>
  group_by(month, origin) |> 
  summarize(
    count = n(),
    avg_delay = mean(dep_delay) 
  )
`summarise()` has grouped output by 'month'. You can override using the
`.groups` argument.

Plotting

ggalluv <- del_flights |>
  ggplot(aes(x = month, y = avg_delay, alluvium = origin)) + 
  theme_bw() +
  geom_alluvium(aes(fill = origin), 
                color = "white",
                width = .1, 
                alpha = .8,
                decreasing = FALSE) +
  scale_fill_brewer(palette = "Set1") + 
  scale_x_continuous(lim = c(1, 12)) +
  labs(title = "NYC Airports Average Departure Delay of at Least an Hour \n (Year of 2013 for United Air Lines)",
       x = "Month",
       y = "Average Departure Delay (min)", 
       fill = "Origin",
       caption = "Source: NYC flights 13")
ggalluv

Essay

This alluvial explores the trends in departure delays for planes operated by United Air Lines over the course of the year for NYC airports in 2013. Specifically it looks at average departure delays that are at least an hour (AVGDELHR+). The plot shows that LAG begins the year with the highest AVGDELHR+ which continues until October where interestingly enough it dips down to least AVGDELHR+, after which it goes back to highest AVGDELHR+. JFK airport for 7 months has the second most AVGDELHR+ of which peaks in October at rank 1. While generally EWR had the lowest AVGDELHR+ for a majority of the year. Overall, little difference between the widths of the lines indicate the differences in AVGDELHR+ between each airport aren’t too significant. This makes sense considering how the graph fluctuates with the similar widths especially towards the end of the year.