Author

Penelope Pooler Eisenbies

Published

April 30, 2025

Import and Examine Data

Code
```{r}
#| label: import data

# data source: https://www.transtats.bts.gov/Data_Elements.aspx?Data=1

american <- read_csv("data/american_airlines_4_30_2025.csv",
                   show_col_types = F, skip=1)|>
  filter(Month != "TOTAL") |>
  mutate(date_som = ym(paste(Year, Month)),
         Date = ceiling_date(date_som, "month")-1,
         Total = (TOTAL/1000000) |> round(2)) |>
  select(Date, Total) |>
  glimpse(width=60)

american_pp <- american |>          # post_pandemic data 
  filter(Date >= "2021-06-01") |>
  glimpse(width=60)
```
Rows: 268
Columns: 2
$ Date  <date> 2002-10-31, 2002-11-30, 2002-12-31, 2003-01…
$ Total <dbl> 7.75, 7.09, 7.90, 7.01, 6.36, 7.63, 7.10, 7.…
Rows: 44
Columns: 2
$ Date  <date> 2021-06-30, 2021-07-31, 2021-08-31, 2021-09…
$ Total <dbl> 11.92, 12.87, 11.20, 10.00, 11.53, 11.79, 12…

Interactive Time Series Plot

Code
```{r}
#| label: create dygraph

# create interactive plot
american_xts <- xts(x=american[,2], order.by=american$Date)

(american_dg <- dygraph(american_xts[,1], 
                      main="American Airlines - Total Passengers") |>
    dySeries("Total", label="Total (Mill.)", color= "blue") |>
    dyAxis("y", label = "", drawGrid = FALSE) |>
    dyAxis("x", label = "", drawGrid = FALSE) |>
    dyShading(from="2020-3-12", to="2021-6-14", color = "lightgrey") |>
    dyRangeSelector())
```

Full and Truncated Data Plots

Plot of Full Time Series

Seasonal pattern was disrupted by the pandemic when air travel was considered dangerous.

Code
```{r}
#| label: Full Time Series
#| warning: false
#| message: false

(full_plot <- american |>
  ggplot() +
  geom_line(aes(x=Date, y=Total), linewidth=1, color="blue") +
  theme_classic() +
    scale_x_date(date_breaks = "2 years",  
                 date_labels = "%Y", 
                 limits=c(ymd("2001-12-31"), ymd("2025-01-31"))) +
  scale_y_continuous(breaks=seq(0,16,2), limits=c(0,17)) +
  labs(title="American Airlines: October 2002 - January 2025",
       subtitle="Total Passengers (Domestic and International)",
       x="Date", y="Millions of Passengers",
       caption="Data Source: https://www.bts.gov/" ) +
  theme(plot.title = element_text(size = 15),
        plot.caption = element_text(size = 10),
        axis.title.x = element_text(size = 15),
        axis.title.y = element_text(size = 15),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 15)))

ggsave("img/american_full_plot_2025_04_30.png", width=6, height=4, unit="in")
```

Plot of Truncated Time Series

Once vaccines became readily available, air travel began to resume it’s traditional pattern:

  • Peaks occur at the end of July

  • Low points occur at the end of Jnauary

  • Pattern is not straightforward to discern because post-pandemic data are fairly limited.

  • Data for February and March of 2024 are not available yet.

Code
```{r}
#| label: Truncated Time Series
#| warning: false
#| message: false

(short_plot <- american_pp |>
  ggplot() +
  geom_line(aes(x=Date, y=Total), linewidth=1, color="blue") +
  theme_classic() +
  scale_x_date(date_breaks = "2 months",  
               date_labels = "%b", 
               limits=c(ymd("2021-05-31"), ymd("2025-1-31")))+
  scale_y_continuous(breaks=seq(9,16,1), limits=c(9,17)) +
  labs(title="American Airlines: June 2021 - January 2025",
       subtitle="Total Passengers (Domestic and International)",
       x="Date", y="Millions of Passengers",
       caption="Data Source: https://www.bts.gov/" ) +
  theme(plot.title = element_text(size = 15),
        plot.caption = element_text(size = 10),
        axis.title.x = element_text(size = 15),
        axis.title.y = element_text(size = 15),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 15)))

#ggsave("img/american_trnc_plot_2024_04_21.png", width=6, height=4, unit="in")
```

Forecast Modeling

Create Time Series (ts)

Code
```{r}
#| label: create time series

american_ts <- ts(american_pp$Total, freq=12, start=c(2021,6)) # create time series

american_ts # display time series
```
       Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
2021                               11.92 12.87 11.20 10.00 11.53 11.79 12.16
2022  9.30  9.52 12.71 12.39 13.21 13.66 14.14 13.45 12.50 13.72 12.83 13.38
2023 12.16 11.80 14.12 13.60 14.21 14.88 15.19 14.39 12.87 14.13 13.43 13.59
2024 13.04 13.00 14.99 14.38 15.39 15.66 16.12 15.08 13.40 14.49 13.14 14.10
2025 12.37                                                                  

Model 1

Code
```{r}
#| label: Model 1

american_model1 <- american_ts |> auto.arima(ic="aic", seasonal=T)

american_forecast1 <- american_model1 |> forecast(h=12)

(american_fcp1 <- autoplot(american_forecast1) + 
    labs(y = "Number of Passenger (Mill.)") + 
    theme_classic())

american_forecast1

checkresiduals(american_forecast1, test=F)

(acr1 <- accuracy(american_forecast1)) 
```
         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
Feb 2025        12.8223 12.10206 13.54253 11.72080 13.92380
Mar 2025        14.8123 14.00765 15.61694 13.58170 16.04290
Apr 2025        14.2023 13.32129 15.08330 12.85492 15.54968
May 2025        15.2123 14.26104 16.16356 13.75748 16.66712
Jun 2025        15.4823 14.46563 16.49896 13.92745 17.03715
Jul 2025        15.9423 14.86419 17.02041 14.29347 17.59113
Aug 2025        14.9023 13.76606 16.03854 13.16457 16.64003
Sep 2025        13.2223 12.03076 14.41383 11.40000 15.04460
Oct 2025        14.3123 13.06792 15.55668 12.40919 16.21541
Nov 2025        12.9623 11.66723 14.25736 10.98167 14.94293
Dec 2025        13.9223 12.57846 15.26614 11.86707 15.97753
Jan 2026        12.1923 10.80139 13.58321 10.06509 14.31951
                      ME      RMSE       MAE        MPE     MAPE      MASE
Training set -0.08623257 0.4640575 0.3073673 -0.6014574 2.309707 0.2775326
                  ACF1
Training set 0.0742644

Model 2

Code
```{r}
#| label: Model 2

american_model2 <- american_ts |> auto.arima(ic="aic", seasonal=F) 

american_forecast2 <- american_model2 |> forecast(h=12)

(american_fcp2 <- autoplot(american_forecast2) + 
    labs(y = "Number of Passenger (Mill.)") + 
    theme_classic())

american_forecast2

checkresiduals(american_forecast2, test=F)

(acr2 <- accuracy(american_forecast2)) 
```
         Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
Feb 2025       12.12161 10.709308 13.53392  9.961678 14.28155
Mar 2025       13.42754 11.628298 15.22678 10.675837 16.17924
Apr 2025       12.60706 10.720385 14.49373  9.721641 15.49247
May 2025       13.12255 11.027264 15.21783  9.918088 16.32701
Jun 2025       12.79868 10.590985 15.00637  9.422305 16.17505
Jul 2025       13.00216 10.643194 15.36112  9.394434 16.60988
Aug 2025       12.87431 10.400230 15.34840  9.090529 16.65810
Sep 2025       12.95464 10.354589 15.55468  8.978208 16.93106
Oct 2025       12.90417 10.193688 15.61465  8.758846 17.04950
Nov 2025       12.93588 10.113462 15.75829  8.619367 17.25239
Dec 2025       12.91596  9.989423 15.84249  8.440210 17.39170
Jan 2026       12.92847  9.899255 15.95769  8.295684 17.56126
                     ME     RMSE       MAE        MPE     MAPE      MASE
Training set 0.04029695 1.049845 0.8915019 -0.2025601 6.954573 0.8049679
                     ACF1
Training set -0.007175111