Heatmaps, Treemaps, and Alluvials HW

Author

E Lott

Treemap Code

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data(flights)
flights_nona <- flights |>
  filter(!is.na(distance) & !is.na(arr_delay) & !is.na(dep_delay))  
flights_nona$month[flights_nona$month == 1]<- "January"
flights_nona$month[flights_nona$month == 2]<- "February"
flights_nona$month[flights_nona$month == 3]<- "March"
flights_nona$month[flights_nona$month == 4]<- "April"
flights_nona$month[flights_nona$month == 5]<- "May"
flights_nona$month[flights_nona$month == 6]<- "June"
flights_nona$month[flights_nona$month == 7]<- "July"
flights_nona$month[flights_nona$month == 8]<- "August"
flights_nona$month[flights_nona$month == 9]<- "September"
flights_nona$month[flights_nona$month == 10]<- "October"
flights_nona$month[flights_nona$month == 11]<- "Novermber"
flights_nona$month[flights_nona$month == 12]<- "December"
summary(flights_nona$month)
   Length     Class      Mode 
   422818 character character 
flights_nona$month<-factor(flights_nona$month, 
                         levels=c("January", "Feburary","March","April","May","June","July","August",
                                  "September","October","Novermber","December"))
p2 <- flights_nona |>
  group_by(month)|>
  summarise(avg_dist = mean(distance), # calculates the mean distance traveled
            avg_arr_delay = mean(arr_delay))  # calculates the mean arrival delay
library(RColorBrewer)
library(treemap)
treemap(p2, 
        index="month",
        vSize="avg_dist", 
        vColor="avg_arr_delay", 
        type="manual",    
        palette="RdBu",
        title = "Average Distance and Arrival Delay by Month",
        title.legend = "Average Arrival Delay (min)  |   Source: FAA",
        fontsize.legend = 12 )

Essay

The treemap above shows the average distance traveled and the average delay of flights from each month. The size of the box represents the average flight distance for each month (from all carriers). The color represents how much delay there was in minutes for each month. The redder it is, the less delay and even early arrival there was. The more blue the box gets, the longer the delays are. I decided to go for months and delays to see which month had the longest delays, and possibly come up with a reason why. Before going in I assumed the holidays would have longer delays, but I was surprised to see that only June and July had the greatest amount. I thought that it would be in the winter months like December and January because of the storms. However, I can see with this data that the summer had longer delays. This could be because there are also a fair share of storms during the summer. Also, it could be that more people get time off in the summer, so the demand is high and there is a large volume of people flying in and out. I wonder what other reasons there could be.