nycflights2023

Author

Zachary Rodavich

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data(flights)
data(airlines)
nycflights23::airlines
# A tibble: 14 × 2
   carrier name                  
   <chr>   <chr>                 
 1 9E      Endeavor Air Inc.     
 2 AA      American Airlines Inc.
 3 AS      Alaska Airlines Inc.  
 4 B6      JetBlue Airways       
 5 DL      Delta Air Lines Inc.  
 6 F9      Frontier Airlines Inc.
 7 G4      Allegiant Air         
 8 HA      Hawaiian Airlines Inc.
 9 MQ      Envoy Air             
10 NK      Spirit Air Lines      
11 OO      SkyWest Airlines Inc. 
12 UA      United Air Lines Inc. 
13 WN      Southwest Airlines Co.
14 YX      Republic Airline      
nycflights23::flights
# A tibble: 435,352 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2023     1     1        1           2038       203      328              3
 2  2023     1     1       18           2300        78      228            135
 3  2023     1     1       31           2344        47      500            426
 4  2023     1     1       33           2140       173      238           2352
 5  2023     1     1       36           2048       228      223           2252
 6  2023     1     1      503            500         3      808            815
 7  2023     1     1      520            510        10      948            949
 8  2023     1     1      524            530        -6      645            710
 9  2023     1     1      537            520        17      926            818
10  2023     1     1      547            545         2      845            852
# ℹ 435,342 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>
nycflights23::weather
# A tibble: 26,207 × 15
   origin  year month   day  hour  temp  dewp humid wind_dir wind_speed
   <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>
 1 JFK     2023     1     1     0  48    48     100        0       0   
 2 JFK     2023     1     1     1  48.2  48.2   100      190       4.60
 3 JFK     2023     1     1     2  49    49     100      190       5.75
 4 JFK     2023     1     1     3  49    49     100      250       5.75
 5 JFK     2023     1     1     4  49    49     100      170       8.06
 6 JFK     2023     1     1     5  48    48     100        0       0   
 7 JFK     2023     1     1     6  46.4  46.4   100      250       9.21
 8 JFK     2023     1     1     7  46    46     100      230       9.21
 9 JFK     2023     1     1     8  48    48     100      260      11.5 
10 JFK     2023     1     1     9  47    47     100      250      12.7 
# ℹ 26,197 more rows
# ℹ 5 more variables: wind_gust <dbl>, precip <dbl>, pressure <dbl>,
#   visib <dbl>, time_hour <dttm>
nycflights23::planes
# A tibble: 4,840 × 9
   tailnum  year type              manufacturer model engines seats speed engine
   <chr>   <int> <chr>             <chr>        <chr>   <int> <int> <int> <chr> 
 1 N101DQ   2020 Fixed wing multi… AIRBUS       A321…       2   199     0 Turbo…
 2 N101DU   2018 Fixed wing multi… C SERIES AI… BD-5…       2   133     0 Turbo…
 3 N101HQ   2007 Fixed wing multi… EMBRAER-EMP… ERJ …       2    80     0 Turbo…
 4 N101NN   2013 Fixed wing multi… AIRBUS INDU… A321…       2   379     0 Turbo…
 5 N102DN   2020 Fixed wing multi… AIRBUS       A321…       2   199     0 Turbo…
 6 N102DU     NA Fixed wing multi… C SERIES AI… BD-5…       2   133     0 Turbo…
 7 N102HQ   2007 Fixed wing multi… EMBRAER-EMP… ERJ …       2    80     0 Turbo…
 8 N102NN   2013 Fixed wing multi… AIRBUS       A321…       2   379     0 Turbo…
 9 N102UW   1998 Fixed wing multi… AIRBUS INDU… A320…       2   182     0 Turbo…
10 N103DU     NA Fixed wing multi… C SERIES AI… BD-5…       2   133     0 Turbo…
# ℹ 4,830 more rows
nycflights23::weather
# A tibble: 26,207 × 15
   origin  year month   day  hour  temp  dewp humid wind_dir wind_speed
   <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>
 1 JFK     2023     1     1     0  48    48     100        0       0   
 2 JFK     2023     1     1     1  48.2  48.2   100      190       4.60
 3 JFK     2023     1     1     2  49    49     100      190       5.75
 4 JFK     2023     1     1     3  49    49     100      250       5.75
 5 JFK     2023     1     1     4  49    49     100      170       8.06
 6 JFK     2023     1     1     5  48    48     100        0       0   
 7 JFK     2023     1     1     6  46.4  46.4   100      250       9.21
 8 JFK     2023     1     1     7  46    46     100      230       9.21
 9 JFK     2023     1     1     8  48    48     100      260      11.5 
10 JFK     2023     1     1     9  47    47     100      250      12.7 
# ℹ 26,197 more rows
# ℹ 5 more variables: wind_gust <dbl>, precip <dbl>, pressure <dbl>,
#   visib <dbl>, time_hour <dttm>
nycflights23::airports
# A tibble: 1,255 × 8
   faa   name                                 lat    lon   alt    tz dst   tzone
   <chr> <chr>                              <dbl>  <dbl> <dbl> <dbl> <chr> <chr>
 1 AAF   Apalachicola Regional Airport       29.7  -85.0    20    -5 A     Amer…
 2 AAP   Andrau Airpark                      29.7  -95.6    79    -6 A     Amer…
 3 ABE   Lehigh Valley International Airpo…  40.7  -75.4   393    -5 A     Amer…
 4 ABI   Abilene Regional Airport            32.4  -99.7  1791    -6 A     Amer…
 5 ABL   Ambler Airport                      67.1 -158.    334    -9 A     Amer…
 6 ABQ   Albuquerque International Sunport   35.0 -107.   5355    -7 A     Amer…
 7 ABR   Aberdeen Regional Airport           45.4  -98.4  1302    -6 A     Amer…
 8 ABY   Southwest Georgia Regional Airport  31.5  -84.2   197    -5 A     Amer…
 9 ACK   Nantucket Memorial Airport          41.3  -70.1    47    -5 A     Amer…
10 ACT   Waco Regional Airport               31.6  -97.2   516    -6 A     Amer…
# ℹ 1,245 more rows
flights_nona <- flights |>
  filter(!is.na(distance) & !is.na(arr_delay) & !is.na(dep_delay))  
by_dest <- flights_nona |>
  group_by(dest) |>  
  summarise(count = n(),   
            avg_dist = mean(distance), 
            avg_arr_delay = mean(arr_delay),  
            avg_dep_delay = mean(dep_delay), 
            .groups = "drop") |>  
  arrange(avg_dep_delay) |>
  filter(avg_dist < 225)
head(by_dest)
# A tibble: 6 × 5
  dest  count avg_dist avg_arr_delay avg_dep_delay
  <chr> <int>    <dbl>         <dbl>         <dbl>
1 AVP     140      93         -8.53         -0.957
2 MHT     586     209         -3.16          4.01 
3 SCE     269     208.        -4.70          4.11 
4 ALB    1510     139.        -3.15          4.92 
5 ORH     948     149.        -0.511         5.17 
6 MDT     471     141          0.380         7.85 
p1 <- by_dest |>
  ggplot (aes(x=avg_dist, fill=avg_dep_delay))+
  geom_histogram(position ="identity", alpha = 0.5, binwidth = 5, color = "green")+
  scale_fill_discrete(name = "Average Departure Delays" , 
                      labels = c("Amount of Delays","Average distance","Average Arrival Delays","Average Departure Delays")) +
  labs(x = "Average Flight Departue Delays by Distance",
       y = "Frequency of Flight Departure Delays",
       title = "Average NYC flight Departure delays for flights within 225 miles of NYC",
       caption = "Source: FAA Aircraft Reigstry")
p1
Warning: The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

p2 <- by_dest |>
  ggplot (aes(x=avg_dist, fill=avg_arr_delay))+
  geom_histogram(position ="identity", alpha = 0.5, binwidth = 5, color = "blue")+
  scale_fill_discrete(name = "Average Arrival Delays" , 
                      labels = c("Amount of Delays","Average distance","Average Arrival Delays","Average Arrival Delays")) +
  labs(x = "Average Flight Arrival Delays by Distance",
       y = "Frequency of Flight Arrival Delays",
       title = "Average NYC flight Arrival delays for flights within 225 miles of NYC",
       caption = "Source: FAA Aircraft Reigstry")
p2
Warning: The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

The above histograms show the amount of flight delays into and out of New York Airports in 2023, for destinations within a 225-Mile radius of the three main NYC airports, JFK, LaGuardia and Newark Liberty. The top portion, labelled in green, shows the delays forming in departing flights going to regional destinations close to New York, including other major cities in the Northeaster U.S. and the the bottom portion shows the delays in flights landing at NYC airports. The highest average departure delays were flights heading to Harrisburg, roughly only 141 Miles from New York, whilst the highest count of delays were flights going to Albany, the State Capitol of New York. Negative values in the delays column indicate flights that left or arrived ahead of schedule. Most flights going to Wilkes Barre/Scranton left on-time from New York Airports or slightly earlier than planned, acounting for real-world factors (boarding time, airport and airspace congestion, and any passengers who are making last-minute connections between flights), which is surprising for an airport that is only 93 miles from New York. Many of the smaller regional aircraft fly many short trips each day, and thus can face delays quite easily, especially if there are delays in boarding because of passengers having to gate-check their luggage due to it being too large to fit in the overhead bins, or because of other situations at the destination or origin airport that may impact departures and arrivals at other airports, inclusive of major hubs like JFK and EWR.