Heatmaps, Treemaps, Streamgraphs, and Alluvials Homework

Author

Ryan Seabold

Heatmaps, Treemaps, Streamgraphs, and Alluvials

Set the working directory

setwd("C:/Users/Ryan/OneDrive/School/DATA 110/Homework/Heatmaps Homework")

Load the libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)

data <- flights

Prepare the data using dplyr

airport_delays <- flights |>
  group_by(origin)|>
  summarise(mean_dep_delay = mean(dep_delay, na.rm = TRUE),
            mean_arr_delay = mean(arr_delay, na.rm = TRUE))

# Covert to a longer dataset
airport_delays_long <- airport_delays |>
  pivot_longer(cols = c(mean_dep_delay, mean_arr_delay), 
               names_to = "delay_type", 
               values_to = "mean_delay")

Create a visualization of the average delay times for each airport

# Create a bar graph
ggplot(airport_delays_long, aes(x = origin, y = mean_delay, fill = delay_type)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c("mean_arr_delay" = "red", "mean_dep_delay" = "blue"),
                    labels = c("Arrival Delay", "Departure Delay")) +
  labs(title = "Average Departure and Arrival Delays by Airport",
       x = "Airport",
       y = "Average Delay (minutes)",
       fill = "Delay Type",
       caption = "Data source: nycflights13")

Analysis paragraph

This bar graph describes the average times of departure and arrival delay for each of the three airports. I chose to use a bar graph to plot this data because it is an excellent method for comparing multiple variables of multiple objects on the same scale.

As you can see, in 2023, the average arrival delays were significantly lower than the average departure delays. Interestingly, while the EWR and JFK delay times were very similar, the LGA delay times were significantly shorter for both arrivals and departures.

Upon further research, I learned that these numbers do not correlate with the numbers of aircraft movements or passengers served. In 2022, EWR had 401,422 arrivals or departures, JFK 449,223, and LGA 349,298. Additionally, EWR served 43.4 million passengers, JFK 5.2 million, and LGA 32.4 million. Despite these large differences in movements and passenger numbers, JFK and EWR share very similar average delay times. LGA is lower in both passenger and aircraft movement amounts, as well as average delay times, but the average delays are far lower than they should be if they were in proportion to its passenger and aircraft movement numbers.