library(tidyverse)
library(nycflights23)
library(viridis)
data(flights)
data(airlines)NYC Flights Homework
Load libraries and datasets
Create a dataset grouped by month and carrier to find average delay by carrier each month.
#Convert abbreviations to airline names
flights <- left_join(flights, airlines, by = "carrier")
flights$name <- gsub("Inc\\.|Co\\.", "", flights$name)
#Convert month number to label
flights$month_label <- month(flights$month, label = TRUE, abbr = TRUE)
flights2 <- flights |>
group_by(month_label, name) |>
summarise(count = n(),
avg_arr_delay = mean(arr_delay, na.rm = T),
avg_dep_delay = mean(dep_delay, na.rm = T))`summarise()` has grouped output by 'month_label'. You can override using the
`.groups` argument.
head(flights2)# A tibble: 6 × 5
# Groups: month_label [1]
month_label name count avg_arr_delay avg_dep_delay
<ord> <chr> <int> <dbl> <dbl>
1 Jan "Alaska Airlines " 542 -7.83 11.9
2 Jan "Allegiant Air" 42 -8.80 3.07
3 Jan "American Airlines " 3574 2.01 15.0
4 Jan "Delta Air Lines " 4836 4.35 18.3
5 Jan "Endeavor Air " 3985 8.07 15.1
6 Jan "Envoy Air" 9 10.6 23.6
Prepare data for heatmap by converting to a matrix
flights2_pwide <- flights2 |>
select(month_label, name, avg_arr_delay) |>
pivot_wider(names_from = month_label,
values_from = avg_arr_delay)
#pivot_wider function from DATA101 and tidyr.tidyverse.org
flights2_matrix <- data.matrix(flights2_pwide[,-1])
row.names(flights2_matrix) <- flights2_pwide$namePlot the heatmap
flights2_heatmap <- heatmap(flights2_matrix,
Rowv = NA,
Colv = NA,
col = magma(25, direction = -1),
cexCol = .7,
cexRow = .6,
main = "Heatmap of Flight Delays by Month")Realize the built in heatmap function does not include a legend or caption
# Help from https://r-graph-gallery.com/79-levelplot-with-ggplot2.html
ggplot(flights2, aes(month_label, name, fill = avg_arr_delay)) +
geom_tile() +
scale_fill_viridis(name = "Avg. Arrival Delay (min)", option = "magma", direction = -1) +
theme(panel.grid = element_blank()) + # <-- RStudio "Help" tab
labs(title = "Heatmap of Flight Delays by Month",
y = "",
x = "",
caption = "Blank tiles indicate missing data. \nSource: FAA")Paragraph
I created a heatmap to display the average arrival delays of airlines by month. My visualization contains data from the “flights” dataset, which I grouped by month and airline to find the mean flight delay in minutes of each airline throughout the year. The heatmap displays the month on the x-axis and the airline name on the y-axis. The colors of the squares within the plot show the average arrival time, with lighter squares indicating low arrival delays while darker squares indicate longer delays. One aspect of the visualization I would like to highlight is that the months June and July appear to have the longest delays regardless of the airline. This is likely due to high volumes of travel during the summer months. Conversely, May, October, November, and December have the shortest delays, which likely shows that fewer people travel during these months. I also find it interesting how there is no data for Envoy Air during the months of June through August, considering they operate year-round.