NYC flights

Author

Mat Shaposhnikov

Average Arrival Delays by Airline

Loading tidyverse and Dataset

# Load tidyverse
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load nycflights23
library(nycflights23)
# Load data
data(flights)
data(airlines)

Cleaning Data using na.rm

https://stackoverflow.com/questions/58443566/what-does-na-rm-true-actually-means

delay_data <- summarize(
  group_by(flights, carrier),
  avg_arr_delay = mean(arr_delay, na.rm = TRUE)
)

Join with Airlines

delay_data <- left_join(delay_data, airlines, by = "carrier")

Sort the Data Hi to Lo Average Delays

delay_data <- arrange(delay_data, desc(avg_arr_delay))

Create Bar Plot

ggplot(delay_data, aes(x = reorder(name, avg_arr_delay), y = avg_arr_delay, fill = avg_arr_delay > 0)) +
  geom_bar(stat = "identity") +  # Draw bars based on y values
  coord_flip() +  # Flip axes for easier reading
  labs(
    title = "Average Arrival Delay by Airline",
    x = "Airline",
    y = "Average Arrival Delay (minutes)",
    caption = "Source: nycflights23 package"
  ) +
  scale_fill_manual(
    name = "Delay Status",
    values = c("TRUE" = "red", "FALSE" = "blue"),
    labels = c("On Time or Early", "Delayed")
  ) +
  theme_minimal()

```