Installing the necessary packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1     ✔ purrr   1.0.1
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# install.packages("nycflights13")
library(nycflights13)
data(flights)

Preparing the data

delay_heatmap <- flights %>% group_by(carrier, arr_delay) %>%
  summarise(count = n()) %>% 
  ungroup() 
## `summarise()` has grouped output by 'carrier'. You can override using the
## `.groups` argument.

Visualizing the data

ggplot(delay_heatmap , aes(x = carrier, y = arr_delay, fill = count)) +
  geom_tile() +
  scale_fill_gradient(low = "blue", high = "red") +
  xlab("Carrier") +
  ylab("Arrival Delay (in minutes)") +
  ggtitle("Carrier Flight Delay Comparison")
## Warning: Removed 15 rows containing missing values (`geom_tile()`).

What is this graphic showing?

This graphic is intended to be a heatmap for the counts of the arrival delays based on the carrier. There are two elements incorporated when addressing values on the y axis: count, and arrival delay time (in minutes). Not only do the tiles indicate the value of arrival delay time, they also indicate the concentration of that delay time across the entirety of the dataset, which is determined by the gradient between blue and red. A flaw however, is that it does not incorporate negative values. Thus, on the surface, it may be inducing some sort of unwanted bias.