library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
# install.packages("nycflights13")
library(nycflights13)
data(flights)
delay_heatmap <- flights %>% group_by(carrier, arr_delay) %>%
summarise(count = n()) %>%
ungroup()
## `summarise()` has grouped output by 'carrier'. You can override using the
## `.groups` argument.
ggplot(delay_heatmap , aes(x = carrier, y = arr_delay, fill = count)) +
geom_tile() +
scale_fill_gradient(low = "blue", high = "red") +
xlab("Carrier") +
ylab("Arrival Delay (in minutes)") +
ggtitle("Carrier Flight Delay Comparison")
## Warning: Removed 15 rows containing missing values (`geom_tile()`).
This graphic is intended to be a heatmap for the counts of the arrival delays based on the carrier. There are two elements incorporated when addressing values on the y axis: count, and arrival delay time (in minutes). Not only do the tiles indicate the value of arrival delay time, they also indicate the concentration of that delay time across the entirety of the dataset, which is determined by the gradient between blue and red. A flaw however, is that it does not incorporate negative values. Thus, on the surface, it may be inducing some sort of unwanted bias.