NYC Flights 2023

Author

Paul Daniel-Orie

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

load Nycflight23 dataset

library("nycflights23")

Load the flights dataset into the global environment

flights_data <- flights

Prepare the data

monthly_flights_alluvia <- flights_data %>%
  group_by(origin, month) %>%
  summarize(num_flights = n()) %>%
  ungroup() %>%
  mutate(month = factor(month.abb[month], levels = month.abb))

`summarise()` has grouped output by 'origin'. You can override using the
`.groups` argument.

Check the structure of the prepared data

str(monthly_flights_alluvia)

tibble [36 × 3] (S3: tbl_df/tbl/data.frame)
 $ origin     : chr [1:36] "EWR" "EWR" "EWR" "EWR" ...
 $ month      : Factor w/ 12 levels "Jan","Feb","Mar",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ num_flights: int [1:36] 11623 10991 12593 12022 12371 11339 11646 11561 11373 11805 ...

Load header dor nycflights23

head(flights_data)

# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2023     1     1        1           2038       203      328              3
2  2023     1     1       18           2300        78      228            135
3  2023     1     1       31           2344        47      500            426
4  2023     1     1       33           2140       173      238           2352
5  2023     1     1       36           2048       228      223           2252
6  2023     1     1      503            500         3      808            815
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Check the structure of the dataset

str(monthly_flights_alluvia)

tibble [36 × 3] (S3: tbl_df/tbl/data.frame)
 $ origin     : chr [1:36] "EWR" "EWR" "EWR" "EWR" ...
 $ month      : Factor w/ 12 levels "Jan","Feb","Mar",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ num_flights: int [1:36] 11623 10991 12593 12022 12371 11339 11646 11561 11373 11805 ...

Number of flight per month, converting numerical variables months to categorical variable

monthly_flights_alluvia <- flights_data |>
  group_by(origin, month) |>
  summarize(num_flights = n()) |>
  ungroup() |>
  mutate(month = factor(month.abb[month], levels = month.abb))

`summarise()` has grouped output by 'origin'. You can override using the
`.groups` argument.

View the head of the prepared dataset

head(monthly_flights_alluvia)

# A tibble: 6 × 3
  origin month num_flights
  <chr>  <fct>       <int>
1 EWR    Jan         11623
2 EWR    Feb         10991
3 EWR    Mar         12593
4 EWR    Apr         12022
5 EWR    May         12371
6 EWR    Jun         11339

Load the package ggalluvial

library(ggalluvial)

Plot the alluvia plot for monthly flights

ggplot(data = monthly_flights_alluvia,
       aes(axis1 = origin, axis2 = month, y = num_flights, fill = origin)) +
  geom_alluvium(width = 1/12) +
  geom_stratum(width = 1/12, aes(fill = origin), color = "black") +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Origin", "Month")) +
  labs(title = "Monthly Flight Trends by Origin", x = "Category", y = "Number of Flights",
       caption ="Source:FAA and Bureau of Transportation Statistics (https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236)" ) +
  theme_bw() +
  scale_fill_manual(values = c("JFK" = "#F41818", "EWR" = "#187FF4", "LGA" = "#18F45A"))

Convert the scientific notion to standard notation

options(scipen = 999)

Interpreting Flight Trends Based on Origin Using an Alluvial Plot

The above Alluvia plot shows trend of flights from each origin (e.g., JFK, LGA, EWR), represented on the plot’s x-axis. The flow of flight from the origin over different month is represented by the Alluvia ribbon. The width of the ribbons indicates the volume of flights (number of flights) between three origins, while the y-axis of the plot represents the number of flight in 2023.

While it appears there is no significant difference in flow of flight from the three airport, EWR had the most traffic,as more flight emanated from from it at certain months(Jan,Feb,Mar) compared with the other two airport,while it lags with flight flow in the 4th quarter of the year.

JFK shows noticeable increase in flight during holiday periods/summer vacation months, while LGA gets the largest traffic within Sept-Dec.

Conclusion

Interpreting flight trends based on origin from an alluvial plot involves analyzing the flow of flights between origins over time, identifying patterns, trends, and anomalies. This visualization method provides a comprehensive view of how flight volumes vary between different origins and across different months, offering insights into travel behaviors and operational patterns. Adjustments to plot configurations and detailed data analysis can further enhance understanding and decision-making in aviation and travel-related fields.