Flight_Fleet_Delay

Author

Brian Caceres

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
library(treemap)
library(RColorBrewer)
library(ggplot2)
library(dplyr)
planes_by_carrier <- flights |> count(carrier, tailnum) |> group_by(carrier) |> summarise( 'Number of Planes' = n_distinct(tailnum))
planes_by_carrier
# A tibble: 16 × 2
   carrier `Number of Planes`
   <chr>                <int>
 1 9E                     204
 2 AA                     601
 3 AS                      84
 4 B6                     193
 5 DL                     629
 6 EV                     316
 7 F9                      26
 8 FL                     129
 9 HA                      14
10 MQ                     238
11 OO                      28
12 UA                     621
13 US                     290
14 VX                      53
15 WN                     583
16 YV                      58
Total_Delay_Time <- flights |> drop_na(arr_delay) |>
  mutate(is_late = grepl('-', arr_delay) ) |>
  select(carrier, arr_delay, is_late) |> 
  filter(is_late == FALSE) |> 
  group_by(carrier) |>
  summarise('Total delay time' = sum(arr_delay)/60)
Total_Delay_Time
# A tibble: 16 × 2
   carrier `Total delay time`
   <chr>                <dbl>
 1 9E                  5450. 
 2 AA                  6828. 
 3 AS                   108. 
 4 B6                 15743. 
 5 DL                 10325. 
 6 EV                 19697. 
 7 F9                   311. 
 8 FL                  1298. 
 9 HA                    56.6
10 MQ                  7377. 
11 OO                    10.1
12 UA                 13574. 
13 US                  3553. 
14 VX                  1276. 
15 WN                  3602. 
16 YV                   220. 
flights_data <- mutate(Total_Delay_Time, planes_by_carrier) 
flights_data2 <- flights_data |> 
  mutate(ratio = flights_data$`Total delay time`/flights_data$`Number of Planes`)
flights_data2
# A tibble: 16 × 4
   carrier `Total delay time` `Number of Planes`  ratio
   <chr>                <dbl>              <int>  <dbl>
 1 9E                  5450.                 204 26.7  
 2 AA                  6828.                 601 11.4  
 3 AS                   108.                  84  1.29 
 4 B6                 15743.                 193 81.6  
 5 DL                 10325.                 629 16.4  
 6 EV                 19697.                 316 62.3  
 7 F9                   311.                  26 12.0  
 8 FL                  1298.                 129 10.1  
 9 HA                    56.6                 14  4.05 
10 MQ                  7377.                 238 31.0  
11 OO                    10.1                 28  0.361
12 UA                 13574.                 621 21.9  
13 US                  3553.                 290 12.3  
14 VX                  1276.                  53 24.1  
15 WN                  3602.                 583  6.18 
16 YV                   220.                  58  3.79 
treemap(flights_data2,
        title = "Fleet Size vs Avg Late Arrival Time: Fleet Size ",
        title.legend = 'Total Annual Arrival Delay(hrs) Per Plane',
        index = 'carrier', 
        vSize = 'Number of Planes',
        vColor = 'ratio', type = 'manual', 
        palette = 'RdYlBu') 

Summary

I created a treemap that shows the largest fleets by airliner in conjunction with their total delay time per plane. The larger fleets are on the left with the big four airlines, and the smaller fleets on the right side of the map. The color scale shows how large the ratio of delayed time per plane is for each airline. The map shows the four large airlines do a good job of keeping that ratio low while some smaller airlines struggle in keeping thier overall delays to a minimum. I would like to highlight the two outliers in Jetblue and Expressjet. They have fleets less than half of any of the big four airlines but their delay per plane ratio almost triples the number of any of the big four. This map shows a good reason to look into the practices of those two companies and help them lower their delay time.