NYC Flights HW Assignment

Author

C. Crabbe

Load the Libraries and Data

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data(flights)
data(airlines)

Filtering Average Delays by Airline

Include at least one dplyr command (filter, sort, summarize, group_by, select, mutate, ….)

flight_delays_named <- flights |>
  filter(!is.na(dep_delay)) |>
  group_by(carrier) |>
  summarize(
    avg_delay = mean(dep_delay),
    total_flights = n()
  ) |>
  filter(total_flights > 1000) |>
  left_join(airlines, by = "carrier")

Bar Plot

This visualization right here represents the average delays by airline greater than 10 minutes

ggplot(flight_delays_named, aes(x = reorder(name, avg_delay), y = avg_delay, fill = avg_delay > 10)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Average Departure Delay by Airline",
    x = "Airline",
    y = "Average Delay (minutes)",
    caption = "Source: nycflights23 dataset"
  ) +
  scale_fill_manual(values = c("TRUE" = "red", "FALSE" = "steelblue"), name = "Delay > 10 min") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))