---
title: "BUA 345"
subtitle: "Time Series and Forecasting Practice Questions"
author: "Penelope Pooler Eisenbies"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
## Setup
- Run the following chunk of R code to install and load the packages needed for this assignment.
- Click green triangle in the upper right corner of the setup chunk to run the setup code.
- Note that this may be slow if there are packages that have not been installed yet.
```{r}
#| label: setup
#| include: false
# suppress scientific notation
options(scipen=100)
# install helper package that loads and installs other packages, if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
# pacman should be first package in parentheses and then list others
pacman::p_load(pacman,tidyverse, magrittr, knitr, gridExtra,
forecast, tidyquant, lubridate, ggthemes,
RColorBrewer, dygraphs)
# verify packages
#p_loaded()
```
## Import and Examine Data
```{r}
#| label: import data
#| code-fold: true
# data source:
# https://www.transtats.bts.gov/Data_Elements.aspx?Data=1
united <- read_csv("data/united_passengers_2025_04_16.csv",
show_col_types = F, skip=1)|>
filter(Month != "TOTAL") |>
mutate(date_som = ym(paste(Year, Month)),
Date = ceiling_date(date_som, "month")-1,
Total = (TOTAL/1000000) |> round(2)) |>
select(Date, Total) |>
glimpse(width=60)
united_pp <- united |> # post_pandemic data
filter(Date >= "2021-06-01") |>
glimpse(width=60)
```
## Full and Truncated Data Plots
### Plot of Full Time Series
Seasonal pattern was disrupted by the pandemic when air travel was considered dangerous.
```{r}
#| label: Full Time Series
#| echo: false
#| warning: false
#| message: false
(full_plot <- united |>
ggplot() +
geom_line(aes(x=Date, y=Total), linewidth=1, color="blue") +
theme_classic() +
scale_x_date(date_breaks = "2 years",
date_labels = "%Y",
limits=c(ymd("2001-12-31"), ymd("2025-12-31"))) +
scale_y_continuous(breaks=seq(0,12,2), limits=c(0,13)) +
labs(title="United Airlines: October 2002 - January 2025",
subtitle="Total Passengers (Domestic and International)",
x="Date", y="Millions of Passengers",
caption="Data Source: https://www.bts.gov/" ) +
theme(plot.title = element_text(size = 15),
plot.caption = element_text(size = 10),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 15)))
ggsave("img/united_full_plot_2025_04_16.png", width=6, height=4, unit="in")
```
### Plot of Truncated Time Series
Once vaccines became readily available, air travel began to resume it's traditional pattern:
- Peaks occur at the end of July
- Low points occur at the end of Jnauary
- Pattern is not straightforward to discern because post-pandemic data are fairly limited.
- Data for February and March of 2024 are not available yet.
```{r}
#| label: Truncated Time Series
#| echo: false
#| warning: false
#| message: false
(short_plot <- united_pp |>
ggplot() +
geom_line(aes(x=Date, y=Total), linewidth=1, color="blue") +
theme_classic() +
scale_x_date(date_breaks = "3 months",
date_labels = "%b",
limits=c(ymd("2021-05-31"), ymd("2025-1-31")))+
scale_y_continuous(breaks=seq(0,12,2), limits=c(0,13)) +
labs(title="United Airlines: June 2021 - January 2025",
subtitle="Total Passengers (Domestic and International)",
x="Date", y="Millions of Passengers",
caption="Data Source: https://www.bts.gov/" ) +
theme(plot.title = element_text(size = 15),
plot.caption = element_text(size = 10),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 15)))
ggsave("img/united_trnc_plot_2025_04_16.png", width=6, height=4, unit="in")
```
## Forecast Modeling
### Create Time Series (`ts`)
```{r}
#| label: create time series
#| code-fold: true
# create and display time series
(united_ts <- ts(united_pp$Total, freq=12, start=c(2021,6)))
```
### Model 1
```{r}
#| label: Model 1
#| code-fold: true
united_model1 <- united_ts |> auto.arima(ic="aic", seasonal=F)
united_forecast1 <- united_model1 |> forecast(h=12)
(united_fcp1 <- autoplot(united_forecast1) +
labs(y = "Number of Passenger (Mill.)") +
theme_classic())
united_forecast1
checkresiduals(united_forecast1, test=F)
(acr1 <- accuracy(united_forecast1))
```
### Model 2
```{r}
#| label: Model 2
#| code-fold: true
united_model2 <- united_ts |> auto.arima(ic="aic", seasonal=T)
united_forecast2 <- united_model2 |> forecast(h=12)
(united_fcp2 <- autoplot(united_forecast2) +
labs(y = "Number of Passenger (Mill.)") +
theme_classic())
united_forecast2
checkresiduals(united_forecast2, test=F)
(acr2 <- accuracy(united_forecast2))
```