NYC Flights 2023

Author

Paul Daniel-Orie

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

load Nycflight23 dataset

library("nycflights23")

Load the flights dataset into the global environment

flights_data <- flights

Prepare the data

monthly_flights_alluvia <- flights_data %>%
  group_by(origin, month) %>%
  summarize(num_flights = n()) %>%
  ungroup() %>%
  mutate(month = factor(month.abb[month], levels = month.abb))
`summarise()` has grouped output by 'origin'. You can override using the
`.groups` argument.

Check the structure of the prepared data

str(monthly_flights_alluvia)
tibble [36 × 3] (S3: tbl_df/tbl/data.frame)
 $ origin     : chr [1:36] "EWR" "EWR" "EWR" "EWR" ...
 $ month      : Factor w/ 12 levels "Jan","Feb","Mar",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ num_flights: int [1:36] 11623 10991 12593 12022 12371 11339 11646 11561 11373 11805 ...

Load header dor nycflights23

head(flights_data)
# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2023     1     1        1           2038       203      328              3
2  2023     1     1       18           2300        78      228            135
3  2023     1     1       31           2344        47      500            426
4  2023     1     1       33           2140       173      238           2352
5  2023     1     1       36           2048       228      223           2252
6  2023     1     1      503            500         3      808            815
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Check the structure of the dataset

str(monthly_flights_alluvia)
tibble [36 × 3] (S3: tbl_df/tbl/data.frame)
 $ origin     : chr [1:36] "EWR" "EWR" "EWR" "EWR" ...
 $ month      : Factor w/ 12 levels "Jan","Feb","Mar",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ num_flights: int [1:36] 11623 10991 12593 12022 12371 11339 11646 11561 11373 11805 ...

Number of flight per month, converting numerical variables months to categorical variable

monthly_flights_alluvia <- flights_data |>
  group_by(origin, month) |>
  summarize(num_flights = n()) |>
  ungroup() |>
  mutate(month = factor(month.abb[month], levels = month.abb))
`summarise()` has grouped output by 'origin'. You can override using the
`.groups` argument.

View the head of the prepared dataset

head(monthly_flights_alluvia)
# A tibble: 6 × 3
  origin month num_flights
  <chr>  <fct>       <int>
1 EWR    Jan         11623
2 EWR    Feb         10991
3 EWR    Mar         12593
4 EWR    Apr         12022
5 EWR    May         12371
6 EWR    Jun         11339

Load the package ggalluvial

library(ggalluvial)

Plot the alluvia plot for monthly flights

ggplot(data = monthly_flights_alluvia,
       aes(axis1 = origin, axis2 = month, y = num_flights, fill = origin)) +
  geom_alluvium(width = 1/12) +
  geom_stratum(width = 1/12, aes(fill = origin), color = "black") +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Origin", "Month")) +
  labs(title = "Monthly Flight Trends by Origin", x = "Category", y = "Number of Flights",
       caption ="Source:FAA and Bureau of Transportation Statistics (https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236)" ) +
  theme_bw() +
  scale_fill_manual(values = c("JFK" = "#F41818", "EWR" = "#187FF4", "LGA" = "#18F45A"))

Convert the scientific notion to standard notation

options(scipen = 999)

Conclusion

Interpreting flight trends based on origin from an alluvial plot involves analyzing the flow of flights between origins over time, identifying patterns, trends, and anomalies. This visualization method provides a comprehensive view of how flight volumes vary between different origins and across different months, offering insights into travel behaviors and operational patterns. Adjustments to plot configurations and detailed data analysis can further enhance understanding and decision-making in aviation and travel-related fields.