#Load relevant libraries
library(tidyverse, quietly = TRUE)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
data(flights)
flights_no_na <- flights %>%
filter(!is.na(distance) & !is.na(arr_delay))
by_airport <- flights_no_na %>%
group_by(origin) %>%
summarize(count = n(),
delay = mean(arr_delay))
# Wanting to see which origin airports are linked with most delays - on average, it is 5-10 minutes delays at each airport. what if we take out the early and on time flights? When the flights are delayed, how badly are they delayed?
#explore the average delays by carrier and then once more using only delayed flights
by_airline_3 <- flights_no_na %>%
filter(arr_delay > 0) %>%
group_by(carrier) %>%
summarize(count = n(),
delay = mean(arr_delay))
by_airline_4 <- flights_no_na %>%
group_by(carrier) %>%
summarize(count = n(),
delay = mean(arr_delay))
#combining the graphs to show the difference in mean delay times when early/ontime flights are ignored
by_airline_3$type <- "Delay > 0"
by_airline_4$type <- "All flights"
combined_delays <- rbind(by_airline_3, by_airline_4)
combined_delays <- combined_delays %>%
arrange(carrier, type)
ggplot(combined_delays, aes(x = carrier, y = delay, color = type)) +
theme_minimal() +
geom_point(size = 2) +
geom_line(linetype = 1, color = "royalblue1") +
labs(x = "Airline", y = "Average Delay in Minutes",
title = "Average Delay by Airline",
caption = "Source: https://github.com/hadley/nycflights13 Flights that Departed NYC in 2013",
color = "Type") +
scale_color_manual(values = c("Delay > 0" = "navyblue", "All flights" = "forestgreen"))
My plot shows how airlines can use mean delay times to position themselves as more timely than the data might indicate. Delta Airlines (DL) looks to be rather punctual when the mean delay is spread out over 47,000+ flights. Hawaiian Airlines (HA) and Alaska Airlines (AS) could market themselves as efficient airlines with their negative mean delay times. When either of the two airlines is delayed however, their delay times are in line with the other airline carriers on the plot.