NYC Flights Homework

Author

Tessa McCollum

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)

data(flights)

library(dplyr)
avg_dep_delay <- mean(flights$dep_delay, na.rm = TRUE)
avg_dep_delay
[1] 13.83737
avg_arr_delay <- mean(flights$arr_delay, na.rm = TRUE)
avg_arr_delay
[1] 4.344803
carrier_set <- flights |>
  select(dep_delay,
         arr_delay,
         carrier)
carrier_set
# A tibble: 435,352 × 3
   dep_delay arr_delay carrier
       <dbl>     <dbl> <chr>  
 1       203       205 UA     
 2        78        53 DL     
 3        47        34 B6     
 4       173       166 B6     
 5       228       211 UA     
 6         3        -7 AA     
 7        10        -1 B6     
 8        -6       -25 AA     
 9        17        68 UA     
10         2        -7 NK     
# ℹ 435,342 more rows
org_carrier_set <- carrier_set |>
  group_by(carrier) |>
  summarise(
    num_perc = n(),
    mean_dep_delay = mean(dep_delay, na.rm = TRUE),
    mean_arr_delay = mean(arr_delay, na.rm = TRUE),
    better_worse = ifelse(mean_dep_delay > avg_dep_delay, 'worse', 'better'),
    better_worse_2 = ifelse(mean_arr_delay > avg_arr_delay, 'worse', 'better')
    
  )
org_carrier_set
# A tibble: 14 × 6
   carrier num_perc mean_dep_delay mean_arr_delay better_worse better_worse_2
   <chr>      <int>          <dbl>          <dbl> <chr>        <chr>         
 1 9E         54141           7.44        -2.23   better       better        
 2 AA         40525          14.2          5.27   worse        worse         
 3 AS          7843          12.0          0.0844 better       better        
 4 B6         66169          23.8         15.6    worse        worse         
 5 DL         61562          15.1          1.64   worse        better        
 6 F9          1286          35.7         26.2    worse        worse         
 7 G4           671           3.98        -5.88   better       better        
 8 HA           366          22.9         21.4    worse        worse         
 9 MQ           357          10.5          0.119  better       better        
10 NK         15189          18.2          9.89   worse        worse         
11 OO          6432          19.8         13.7    worse        worse         
12 UA         79641          17.6          9.04   worse        worse         
13 WN         12385          16.1          5.76   worse        worse         
14 YX         88785           4.21        -4.64   better       better        
org_carrier_set <- org_carrier_set |>
  head(10)
org_carrier_set
# A tibble: 10 × 6
   carrier num_perc mean_dep_delay mean_arr_delay better_worse better_worse_2
   <chr>      <int>          <dbl>          <dbl> <chr>        <chr>         
 1 9E         54141           7.44        -2.23   better       better        
 2 AA         40525          14.2          5.27   worse        worse         
 3 AS          7843          12.0          0.0844 better       better        
 4 B6         66169          23.8         15.6    worse        worse         
 5 DL         61562          15.1          1.64   worse        better        
 6 F9          1286          35.7         26.2    worse        worse         
 7 G4           671           3.98        -5.88   better       better        
 8 HA           366          22.9         21.4    worse        worse         
 9 MQ           357          10.5          0.119  better       better        
10 NK         15189          18.2          9.89   worse        worse         
org_carrier_set <- org_carrier_set |>
  mutate(carrier=recode(carrier, 
                        '9E' = 'Endeavor',
                        'AA' = 'American',
                        'AS' = 'Alaska',
                        'B6' = 'Jet Blue',
                        'DL' = 'Delta',
                        'F9' = 'Frontier',
                        'G4' = 'Allegiant',
                        'HA' = 'Hawaiian',
                        'MQ' = 'MQ 25 Stingray',
                        'NK' = 'Spirit')
         )
plot1 <- org_carrier_set |>
  ggplot() +
  geom_bar(aes(x = fct_reorder(carrier, - mean_dep_delay), y = mean_dep_delay, fill = better_worse),
      position = "dodge", stat = "identity") +
  labs(y = "Carrier's Averge Departure Delay(min) ",
       x = "Carriers",
       title = "Average Departure Delays Compared by Carrier",
       subtitle = "In the Year 2023", 
       fill = "Better or Worse Than\nAverage Departure Delay of All Flights(min)",
      caption = "Source: The Bureau of Transportation Statistics") +
  scale_fill_manual(values = c('better' = 'blue', 'worse' = 'red')) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot1

plot2 <- org_carrier_set |>
  ggplot() +
  geom_bar(aes(x = fct_reorder(carrier, - mean_arr_delay), y = mean_arr_delay, fill = better_worse_2),
      position = "dodge", stat = "identity") +
  labs(y = "Carrier's average Arrival Delay(min)",
       x = "Carriers",
       title = "Average Arrival Delays Compared by Carrier",
       subtitle = "In the Year 2023", 
       fill = "Better or Worse Than\nAverage Arrival Delay of all flights(min)",
      caption = "Source: The Bureau of Transportation Statistics") +
  scale_fill_manual(values = c('better' = 'blue', 'worse' = 'red')) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot2

These two bar charts I have coded compare the average departure/arrival delays of the flights per each carrier, and then, through the color, shows whether the carrier’s average departure/arrival delay was above or below (better or worse) the average delay of all the flights in the data set combined. Something I would like to highlight is that, according to the graphs, it seems in 2023 the price point of airlines didn’t have the type of relationship to promptness of flights one might have expected. Allegiant Airlines has both the least departure and arrival delays, an airline that is known for it’s very discounted prices. Of course, the same is true for the worst airline in both categories, but overall it’s mixed, and I thought that was interesting. Something I would have liked to have been able to do, if I could have figured out how, is to show how much better or worse each carrier’s average was compared to the overall average. For example, finding a way to highlight where the overall average departure/arrival delay sits on the actual graph (like a horizontal line on y=13 or y=4), would give more meaningful information to a viewer, allowing someone to see how close a carrier’s bar is to that line. I think right now, the knee jerk reaction I have to looking at these graphs is that the bigger the blue bars, the better the delay is as compared to the overall average, rather than the opposite, which is what is really true. As soon as you look closer and read the values on the y axis, I think it makes sense, but I also think being able to delineate where the overall average departure/arrival delay would sit on the graph, would make not just the colors easier to understand, but the graph as a whole.