NYC Flights Homework

Author

Ryan Juica

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data(flights)
flights_nona <- flights |>
  filter(!is.na(distance) & !is.na(arr_delay) & !is.na(dep_delay))  
by_month <- flights_nona |>
  group_by(month) |>  # group all destinations (where is it landing)
  summarise(count = n(),   # counts totals for each destination
            avg_dist = mean(distance), # calculates the mean distance traveled
            avg_arr_delay = mean(arr_delay),  # calculates the mean arrival delay
            avg_dep_delay = mean(dep_delay), # calculates the mean dep delay
            .groups = "drop") |>  # remove the grouping structure after summarizing
  arrange(avg_arr_delay) |>
  filter(avg_dist < 3000)
head(by_month)
# A tibble: 6 × 5
  month count avg_dist avg_arr_delay avg_dep_delay
  <int> <int>    <dbl>         <dbl>         <dbl>
1    12 33148    1026.         -6.59          8.28
2    11 34460    1000.         -6.28          4.38
3     5 38447     959.         -5.52          8.34
4    10 36483     985.         -5.00          5.25
5     1 35447     973.          3.28         14.0 
6     2 34084     953.          3.29         10.9 
by_month$month_label <- month(by_month$month, label = TRUE, abbr = TRUE)
ggplot(by_month, aes(month_label, avg_dep_delay)) +
  geom_point(aes(size = count, color = avg_arr_delay), alpha = .5) +
  theme_minimal()+
  labs(x = "Months",
       y = "Average Departure Delay",
       size = "Number of flights",
       color = "Average Arrival Delay",
       caption = "Source: FAA Aircraft registry",
       title = "Average Departure and Arrival Delays \nby Months from NY Flights")

I created a visualization using the scatter plot graph that shows the average departure delay and average arrival delay by months. I also have the size of the points change based on the number of flights there were in those months. There is a gradient for average arrival delays to easily show which months have more while not interfering with the size of the points. I noticed that months like June and July have the highest departure delay, with a sudden drop in August and then later on in November and December it has the lowest number of flights, average departure and average arrival delay. There are more departure and arrival delays during the summer and it’s worse in the winter. It’s really interesting to notice that trend because it was not something I expected. I assumed there were more flights in the summer time because of everyone being on vacation.