NYC Flights Assignment

Author

Telesphore Kabore

Use the Dataset NYC Flights23 to Explore Airline Carriers On-Time Performance

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data(flights)
data(airlines)

Data loading

data(flights)
data(airlines)

Create an initial scatterplot with loess smoother for distance to delays

Use “group_by” together with summarize functions

flights_nona <- flights |>
  filter(!is.na(distance) & !is.na(arr_delay) & !is.na(dep_delay))  
# remove na's for distance, arr_delay, departure delay

Join the delay_punctuality dataset with the airlines dataset

Also remove “Inc.” or “Co.” from the Carrier Name

flights2 <- left_join(flights_nona, airlines, by = "carrier")
flights2$name <- gsub("Inc\\.|Co\\.", "", flights2$name)

Graph On-Time Performance using Departure Delay and Arrival Delay

Calculate the percentage of flights with less than 10 minutes delay (OTP)

delay_OTP <- flights2 |>
group_by(name) |>
summarize(Departure_Percentage = sum(dep_delay <= 10) 
/ n() * 100,
Arrival_Percentage = sum(arr_delay <= 10) / n() * 100)

Create a bidirectional horizontal bar chart

ggplot(delay_OTP, aes(x = -Departure_Percentage, y = reorder(name, Departure_Percentage))) +
  geom_text(aes(label = paste0(round(Departure_Percentage, 0), "%")), 
            hjust = 1.1, size = 3.5) +  #departure % labels
  geom_bar(aes(fill = "Departure_Percentage"), stat = "identity", width = .75) +
  geom_bar(aes(x = Arrival_Percentage, fill = "Arrival_Percentage"), 
           stat = "identity", width = .75) +
  geom_text(aes(x = Arrival_Percentage, label = paste0(round(Arrival_Percentage, 0), "%")),
            hjust =-.1, size = 3.5) +  # arrival % labels
  
  labs(x = "Departures < On-Time Performance > Arrivals", 
       y = "Carrier",
       title = "On-Time Performance of Airline Carriers \n (Percent of Flights < 10 Minutes Delay)",
       caption = "Source: FAA") +
  
  scale_fill_manual(
    name = "Performance",
    breaks = c("Departure_Percentage", "Arrival_Percentage"),  # Specify the order of legend items
    values = c("Departure_Percentage" = "#8bd3c7", "Arrival_Percentage" = "#beb9db"),
    labels = c("Departure_Percentage" = "Departure", "Arrival_Percentage" = "Arrival")
  ) +
  
  scale_x_continuous(labels = abs, limits = c(-120, 120)) +  # Positive negative axis
  theme_minimal()