Load the Libraries and Data
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data(flights)
data(airlines)
Filtering Average Delays by Airline
Include at least one dplyr command (filter, sort, summarize, group_by, select, mutate, ….)
flight_delays_named <- flights |>
filter(!is.na(dep_delay)) |>
group_by(carrier) |>
summarize(
avg_delay = mean(dep_delay),
total_flights = n()
) |>
filter(total_flights > 1000) |>
left_join(airlines, by = "carrier")
Bar Plot
This visualization right here represents the average delays by airline greater than 10 minutes
ggplot(flight_delays_named, aes(x = reorder(name, avg_delay), y = avg_delay, fill = avg_delay > 10)) +
geom_bar(stat = "identity") +
labs(
title = "Average Departure Delay by Airline",
x = "Airline",
y = "Average Delay (minutes)",
caption = "Source: nycflights23 dataset"
) +
scale_fill_manual(values = c("TRUE" = "red", "FALSE" = "steelblue"), name = "Delay > 10 min") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))