nycflights13

Author

Kevin Sanchez

libraries are loaded

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)

view dataset

View(flights)

dplyr commands

heatmap_data <- flights %>%
  filter(month %in% c(8, 9, 10), !is.na(dep_delay)) %>%
  group_by(day, hour) %>%
  summarize(avg_delay = mean(dep_delay, na.rm = TRUE))
`summarise()` has grouped output by 'day'. You can override using the `.groups`
argument.

labels are added and heat map is created

heatmap_plot <- ggplot(heatmap_data, aes(x = hour, y = day, fill = avg_delay)) +
  geom_tile() +
  scale_fill_gradient(low = "gold", high = "red") +
  labs(
    x = "Hour in the Day",
    y = "Day in Month",
    title = "Average Departure Delay Heatmap",
    caption = "Data source: nycflights13"
  ) +
  theme_minimal() +
  theme(legend.position = "right")

heatmap_plot

summary

I created a heat map visualization to represent the average departure delay for each hour of the day and each day of the month. The coding that I used filters the data sets for the months of August, September, and October and then it groups the data by day and hour to calculate the average departure delay. The x-axis represents the hour of day within 24 hours which can be converted from military time to regular time such as hour 20 as either 8 am or pm. The y-axis represents the day in the month from each month mentioned. The legend provides a reference to the color scale from lightest color being low delay and darker color being high delay. One aspect I want to highlight are the colors on the heatmap that indicate the magnitude of the average delay where you can see the highest is around the first few days of each month at around the same time frame.