p2 <- flights_nona |>group_by(month)|>summarise(avg_dist =mean(distance), # calculates the mean distance traveledavg_arr_delay =mean(arr_delay)) # calculates the mean arrival delay
The treemap above shows the average distance traveled and the average delay of flights from each month. The size of the box represents the average flight distance for each month (from all carriers). The color represents how much delay there was in minutes for each month. The redder it is, the less delay and even early arrival there was. The more blue the box gets, the longer the delays are. I decided to go for months and delays to see which month had the longest delays, and possibly come up with a reason why. Before going in I assumed the holidays would have longer delays, but I was surprised to see that only June and July had the greatest amount. I thought that it would be in the winter months like December and January because of the storms. However, I can see with this data that the summer had longer delays. This could be because there are also a fair share of storms during the summer. Also, it could be that more people get time off in the summer, so the demand is high and there is a large volume of people flying in and out. I wonder what other reasons there could be.