Heatmap HW

Author

J Amaya

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)

data(flights)
data(airlines)
library(viridis)
Loading required package: viridisLite
flights_nona <- flights |>
  filter(!is.na(distance) & !is.na(arr_delay) & !is.na(dep_delay))  

Group by month and no NA data

by_month <- flights_nona |>
  group_by(month) |>
  summarise(
    avg_arr_delay = mean(arr_delay),
    avg_dep_delay = mean (dep_delay),
    .groups = "drop"
  )

Add labels to months

by_month$month <- month(by_month$month, label = TRUE, abbr = TRUE)

Line graph

ggplot(by_month, aes(x = month)) +
  geom_line(aes(y = avg_arr_delay, color = "Arrival Delay", group = 1), size = 1) +
  geom_line(aes(y = avg_dep_delay, color = "Departure Delay", group = 1), size = 1) +
  
  labs(x = "Month",
       y = "Average Delay (minutes)",
       title = "Seasonal Delays - Average Flight Delays by Month in NYC, 2023",
       caption = "Source: FAA, nycflights23 dataset") +
  scale_color_manual(values = c("Arrival Delay" = "pink", "Departure Delay" = "purple")) +
  theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

I finished this line graoh to visualize the NYC airport delays per month. The summer season turned out to have the worst delays. I did not go with this graph because I didn’t realize that the homework instructions did not mention that I could do a line graph. So, I did the bar graph below.

Group by Destinations

by_dest <- flights_nona |>
  group_by(dest) |>
  summarise(
    count = n(),
    avg_dist = mean(distance),
    avg_arr_delay = mean(arr_delay),
    .groups = "drop"
  )

Arrange to only 10 airports with the worst delay

worst_delays <- by_dest |>
  arrange(desc(avg_arr_delay)) |>
  head(10)

Change 3-letter airport names to full names

worst_delays_named <- left_join(worst_delays, airports, b = c("dest" = "faa"))
ggplot(worst_delays_named, aes(x = reorder(name, avg_arr_delay), y = avg_arr_delay, fill = avg_arr_delay)) +
  geom_col() +
   geom_text(aes(label = round(avg_arr_delay, 1)), 
             size = 3.5,
             hjust = -0.1, 
             color = "black") +
  coord_flip() +

  labs(
    x = "Airport",
    y = "Average Arrival Delay (minutes)",
    title = "Top 10 Worst Average Arrival Delays in NYC Airports (2023)",
    fill = "Average
    Delay",
    caption = "Source: FAA, nycflights23 dataset"
  ) +
  scale_fill_viridis_c(option = "magma", direction = -1) + 
  theme_minimal()

Bar Graph Visualization

I created a bar graph to visualize the top 10 NYC airports with the worst average arrival delays. To clean my data, I grouped the flights by destination and calculated the total flights count, average distance, and average arrival delay for each location. Then I arranged the results in descending order and selected the ten airports with the highest average delays using head(10). I then needed to make the graph easier to understand so I combined the dataset with FAA to display the full airport name instead of the three letter code. I also researched how to rotate the graph because the x axis was very crowded with the airport names, so I found the code, coord_flip() to flip the graph sideways.

One key aspect of this visualization is the color gradient, which goes from dark to light to visually represent the increasing average delays. The lighter shades indicate the airports that experienced lower wait times and the dark tones represent the airports with the most severe delays. Not only does the color palette make the graph more interesting but also helps the audience quickly distinguish the levels of delays.

Resource: https://www.sthda.com/english/wiki/ggplot2-rotate-a-graph-reverse-and-flip-the-plot#google_vignette