---
title: "BUA 345 - Lecture 26"
subtitle: "Forecasting - Part 2"
author: "Penelope Pooler Eisenbies"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
## Housekeeping
```{r setup, echo=FALSE, warning=F, message=F, include=F}
#| include: false
# this line specifies options for default options for all R Chunks
knitr::opts_chunk$set(echo=F)
# suppress scientific notation
options(scipen=100)
# install helper package that loads and installs other packages, if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
pacman::p_load(pacman,tidyverse, magrittr, knitr, gridExtra, forecast, tidyquant, lubridate, maps, usdata, mapproj, ggthemes, RColorBrewer, dygraphs, kableExtra)
# verify packages
# p_loaded()
```
### Upcoming Dates
- **HW 9 is due on Wednesday, 4/15**.
- Grace Period ends Thursday (4/16) at 10:00 PM.
- **HW 10 is now posted and is due on Monday 4/27.**
- **OPTIONAL [GitHub Quarto Dashboard Workshop](https://peneloopy.github.io/bua_345_sem/#dashboard-options){target="_blank"} on Fri., 4/17, at 3:15 PM**.
- If you are interested in attending, please sign up using the [Google Form](https://forms.gle/nqcSgwCc4LQrJ7PF6){target="_blank"}.
- Lecture 27 (4/21): Come to class with a stock symbol that you are interested in.
- Stock must be traded on NYSE
- This [NYSE Directory](https://www.nyse.com/listings_directory/stock){target="_blank"} will help you.
- Verify that stock can be found on [Yahoo Finance](https://finance.yahoo.com/){target="_blank"}.
##
## Course Evaluations
- **Evaluations are VERY Important:**
- [**coursefeedback.syr.edu**](http://coursefeedback.syr.edu/)
- I will provide time in class for evaluations next week.
- Please complete evaluations for **ALL** courses.
##
### Today's plan
- **Review of Time Series Concepts**
- Review and New Terminology
- Brief Review of Time Series without Seasonality
- Seasonality in Time Series Data
- Forecasting Trends with Seasonality in R
- Example from HW 10: Alaska
- HW 10 is now posted and is due on 4/28
- Part of HW 10 pertains to today's lecture.
- Guest Speaker - [Steven Davis from DLA Piper](https://www.dlapiper.com/en-us/people/d/davis-steven){target="_blank"} will speak for 15 minutes.
##
### Lecture 26 In-class Exercises - Q1 - Review
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
The `AR` in ARIMA stands a type of regression when you regress a variable on itself by using previous observations to predict future ones.
<br>
**This is known as `___`-regression.**
## Cross-Sectional Data
**Shows a Snapshot of One Time Period**
```{r echo = F, message = FALSE, fig.align='center'}
gsb26 <- read_csv("data/gsb_26.csv", show_col_types = F, skip=25,
col_select = c(1,2,6), ) |>
slice(1:5) |>
rename("city" = "Highest – 7.9",
"2025-26" = "...2",
"Most" = "...6") |>
mutate(Most = substr(Most, 1,5) |> as.numeric())
gsb_long <- gsb26 |> pivot_longer(cols = `2025-26`:Most,
names_to = "type",
values_to = "inches")
(gsb_plt <- gsb_long |>
ggplot() +
geom_bar(aes(x=city, y=inches, fill=type),
stat="identity", position="dodge") +
scale_fill_manual(values=c("blue4", "lightblue")) +
theme_classic() +
labs(fill="", x="City", y="Snowfall (inches)",
caption="Data Source: https://goldensnowball.com/",
title="City Snowfall - Current and All-time Record")+
theme(plot.title = element_text(size = 15),
plot.caption = element_text(size = 10),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 8),
axis.text.y = element_text(size = 15),
plot.background =
element_rect(colour = "darkgrey", fill=NA, linewidth=2)))
```
## Time Series Data
**Shows Trend over Time**
```{r syr snowfall timeseries, echo=F, message=F, fig.align='center'}
snowfall <- read_csv("data/snowfall_upstateny_cities.csv",
show_col_types = F) |>
filter(!Season=="Season") |>
separate(Season, into=c("Season_Start", "Season_End"), sep = "-") |>
mutate(Season_Start = Season_Start |> as.integer(),
Season_End = Season_Start + 1 |> as.integer(),
Syracuse = Syracuse |> as.numeric(),
Buffalo = Buffalo |> as.numeric()) |>
rename("city_most" = "City With Most Snow") |>
select(Season_End, Syracuse, Buffalo, city_most) |>
filter(Season_End >= 1952) |>
pivot_longer(cols=Syracuse:Buffalo, names_to = "City", values_to = "Snowfall")
(line_plot <- snowfall |>
ggplot() +
geom_line(aes(x=Season_End, y=Snowfall, color=City), linewidth=1) +
theme_classic() +
scale_x_continuous(breaks=seq(1960, 2020, 10)) +
scale_color_manual(values=c("lightblue", "blue")) +
ylim(0,200) +
labs(title="Syracuse and Buffalo Annual Snowfall",
y="Snowfall (inches)",
x="Year Season Ended",
caption="Data Source: https://en.wikipedia.org/wiki/Golden_Snowball_Award") +
theme(plot.title = element_text(size = 15),
plot.caption = element_text(size = 10),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 15),
plot.background =
element_rect(colour = "darkgrey", fill=NA, size=2)))
```
## Time Series Terminology
- **auto-correlation:** A variable is correlated with itself
- **auto-regression (AR):** Using previous observations to predict into the future.
- **R function:** **`auto.arima` - ARIMA** is an acronym:
- **AR:** auto-regressive - `p` = number of lags to minimize auto-correlation
- **I:** integrated - `d` = order of differencing to achieve stationarity
- **MA:** moving average - `q` = number of terms in moving average
- All 3 components are optimized to provide a reliable forecast.
- R software optimizes ARIMA model by varying these components.
## More on Stationarity
- **Stationary Time Series:**
- Consistent mean and variance throughout time series
- Time series with trends, or with seasonality, are not stationary.
- Separating a time series into different parts is how we analyze it
- This is called **DECOMPOSITION**
- Time Series Modeling decomposes the data into:
- Trend
- Seasonality (repeated pattern)
- Residuals (what's left over)
## Decomposition and SARIMA Models
::: fragment
**NEW TERM: SARIMA MODEL**
:::
- Lecture 25: **ARIMA** models
- Today: ARIMA models with **SEASONAL** component.
- **SARIMA:** **S**easonal **A**uto-**R**egressive **I**ntegrated **M**oving **A**verage.
- **SARIMA** models:
- optimize `p`, `d`, and `q` for whole time series
- Also optimize `p`, `d`, and `q` within season (repeating intervals)
::: fragment
**DECOMPOSITION**
:::
- ARIMA models are decomposed into
- Trend \| Residuals
- SARIMA models are decomposed into
- Trend \| Seasonal patterns \| Residuals
------------------------------------------------------------------------
### Visualization of Decomposition:
::::: columns
::: {.column width="40%"}

:::
::: {.column width="60%"}
- **ARIMA:**
- 2nd Plot looks similar to Population Time Series
- ARIMA decomposes trend into:
- Trend (2nd Plot)
- Residuals (4th Plot)
- **SARIMA:**
- Plot 1: Time Series with a seasonal pattern.
- SARIMA decomposes trend into:
- Trend (2nd Plot)
- Seasonality (3rd Plot)
- Residuals (4th Plot)
:::
:::::
## Netflix Stock Prices - Review {background-color="aliceblue"}
**Dashed lines show peaks at irregular intervals.**
```{r dygraph nflx data, echo=F}
# import from yahoo finance and plot hchart
nflx <- getSymbols("NFLX", from = "2010-01-01", to = "2026-04-15")
(nflx_dyg <- NFLX[,6] |>
dygraph(main="Netflix: Non-Seasonal Trending Data") |>
dySeries("NFLX.Adjusted", "NFLX", color="red") |>
dyRangeSelector() |>
# remove grid lines and label axes to make plot more readable (optional)
dyAxis("y", label = "Adjusted Close", drawGrid = FALSE) |>
dyAxis("x", label = "Date", drawGrid = FALSE) |>
# add lines for events
dyEvent("2014-2-1", color="lightgrey") |>
dyEvent("2014-8-1", color="lightgrey") |>
dyEvent("2015-8-1", color="lightgrey") |>
dyEvent("2015-11-1", color="lightgrey") |>
dyEvent("2018-6-1", color="lightgrey") |>
dyEvent("2018-8-1", color="lightgrey") |>
dyEvent("2019-4-1", color="lightgrey") |>
dyEvent("2019-6-1", color="lightgrey") |>
dyEvent("2020-8-1", color="lightgrey") |>
dyEvent("2021-1-1", color="lightgrey") |>
dyEvent("2021-11-1", color="lightgrey") |>
dyEvent("2023-2-2", color="lightgrey") |>
dyEvent("2023-7-19", color="lightgrey") |>
dyEvent("2025-2-14", color="lightgrey") |>
dyEvent("2025-6-27", color="lightgrey"))
# convert xts to tibble
nflx <- NFLX |>
fortify.zoo() |> as_tibble(.name_repair = "minimal") |>
rename("Date" = "Index", "Adjusted" = "NFLX.Adjusted") |>
mutate(year=year(Date), month=month(Date), day=day(Date)) |>
group_by(year, month) |>
filter(day == min(day, na.rm=T)) |>
ungroup() |>
select(Date, Adjusted)
```
## Netflix Stock
- **Forecast Questions:**
- What will be the estimated stock price be in April of `r year(Sys.Date())+1`?
- What ARIMA model was chosen (p,d,q)?
- **Model Assessment Questions:**
- How valid is our model?
- Check residual plots.
- How are accurate are our estimates?
- Examine Prediction Intervals and Prediction Bands
- Check fit statistics
## Netflix Stock - Modeling Time Series Data
**Stock Trend Forecast**
- Create time series using Netflix Stock data
- Specify `freq = 12` - 12 observations per year
- Specify `start = c(2010, 1)` - first obs. in dataset is January 2010
- Model data using `auto.arima` function
- Specify `ic = aic` - `aic` is the information criterion used to determine model.
- Specify `seasonality = F` - no seasonal (repeating) pattern in the data.
- This code will create and save the model:
:::: fragment
::: r-fit-text
```{r create nflx time series and model, echo=T}
nflx_ts <- ts(nflx$Adjusted, freq=12, start=c(2010,1)) # create time series
nflx_model <- auto.arima(nflx_ts, ic="aic", seasonal=F) # model data using auto.arima
```
:::
::::
## Netflix Stock - Create and Plot Forecasts
- Create forecasts (until April `r year(Sys.Date())+1`)
- `h = 12` indicates we want to forecast 12 months
- Most recent date in forecast data is April 1, `r year(Sys.Date())`
- 12 Months until April 1, `r year(Sys.Date())+1`
- **Forecasts become less accurate the further into the future you specify.**
:::: fragment
::: r-fit-text
```{r create nflx forecasts, echo=T}
nflx_forecast <- forecast(nflx_model, h=12) # create forecasts (until April 2027)
nflx_pred_plot <- autoplot(nflx_forecast) + labs(y = "Adjusted Closing Price") +
theme_classic()
```
:::
::::
- Darker purple: 80% Prediction Interval Bounds
- Lighter purple: 95% Prediction Interval Bounds
- **Plot shows:**
- **Lags (p = `r summary(nflx_model)$arma[1]`), Differencing (d = `r summary(nflx_model)$arma[6]`), Moving Average (q = `r summary(nflx_model)$arma[2]`)**
## Netflix Stock - Forecast Plot
```{r plot nflx forecasts with pred intervals, echo=F}
nflx_pred_plot
```
##
### Lecture 26 In-class Exercises - Q2
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
Based on the plot on the previous slide, we can tell how many previous periods (q) are included in each moving average in the model created by the `auto.arima` function in R.
<br>
**What is q, the number of previous time periods in the moving average?**
##
### Netflix Stock - Examine Numerical Forecasts
- Point Forecast is the forecasted estimate for each future time period
- Lo 80 and Hi 80 are lower and upper bounds for the 80% prediction interval
- Lo 95 and Hi 95 are lower and upper bounds for the 95% prediction interval
::: fragment
```{r nflx numerical forecasts}
nflx_forecast # prints out forecast values
```
:::
##
### Lecture 26 In-class Exercises - Q3
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
Interpretation of Netflix Prediction Intervals
<br>
**In March of `r year(Sys.Date())+1`, the 80% prediction interval width for the Netflix stock price will be \$`____` wide.**
To find this width, subtract the lower bound (`Lo 80`) from the upper bound (`Hi 80`) and round to the closest whole dollar.
##
### Netflix Stock - Examine Residuals and Model Fit
:::::: columns
::: {.column width="40%"}
- Top Plot: Spikes get larger over time
- ACF: auto-correlation function.
- Ideally, all or most values are with dashed lines
- Histogram: Distribution of residuals should be approx. normal
- Assessment: Stock prices are very volatile and this is sufficient.
:::
:::: {.column width="60%"}
::: fragment
```{r nflx residual plots}
checkresiduals(nflx_forecast) # examine residuals
```
:::
::::
::::::
##
### Netflix Stock - Examine Residuals and Model Fit
:::: fragment
::: r-fit-text
```{r nflx fit statistics }
(acr <- accuracy(nflx_forecast)) # examine model accuracy (fit)
```
:::
::::
- For BUA 345: We will use MAPE = Mean Absolute Percent Error
- **100 – MAPE = Percent accuracy of model.**
- Despite increasing volatility, our stock price model is estimated to be `r round(100 - acr[5],2)`% accurate.
- This doesn’t guarantee that forecasts will be `r round(100 - acr[5])`% accurate but it does improve our chances of accurate forecasting.
##
### Seasonality - Not Just Seasons
- Seasonal periods can be days, months, seasons, decades, etc.
- Seasonality: repeating pattern of highs and lows of approx. equal timespans
- 2024 Example shows some clear up and down pattern
::: fragment

:::
##
### Seasonality - Sometimes Pattern is Subtle
- Seasonal periods can be days, months, seasons, decades, etc.
- Seasonality: repeating pattern of highs and lows of approx. equal timespans
- `r year(Sys.Date())` seasonal pattern in this week's data is much more subtle, but still present.
::: fragment
{fig-align="center" height="3.5in"}
:::
##
### Seasonality - Not Just Seasons
- Seasonal periods can be days, months, seasons, decades, etc.
- Seasonality: repeating pattern of highs and lows of approx. equal timespans
::: fragment
***Carbon Dioxide Trends - Monthly - 1958 to Present Day***
```{r import co2 data, echo=F, include=F, message=FALSE, results='hide'}
co2 <- read_csv("data/co2.csv", show_col_types = F) |>
dplyr::select(1,2,4,5) |>
mutate(date=ymd(paste(yr,month,15))) |>
glimpse()
# https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt
# https://gml.noaa.gov/dv/data/index.php?parameter_name=Carbon%2BDioxide&site=MLO%22
```
```{r co2 trends, echo=F, message=F, fig.height=4}
(co2_plot <- co2 |>
ggplot() +
geom_line(aes(x=date, y=mnth_avg), color="blue", linewidth=1) +
theme_classic() +
labs(title="Carbon Dioxide Measurements (1958 - 2025)",
subtitle="Monthly Data from Mauna Loa Observatory, Hawaii, USA",
y="Carbon Dioxide (ppm)",
x="Date",
caption="Data Source: https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt") +
scale_x_date(breaks = "10 years", date_labels = "%Y") +
theme(plot.title = element_text(size = 15),
plot.caption = element_text(size = 8),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 15),
plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2)))
```
:::
##
### Seasonality - Not Just Seasons
- Seasonal periods can be days, months, seasons, decades, etc.
- Seasonality: repeating pattern of highs and lows of approx. equal timespans
::: fragment
**Carbon Dioxide - Monthly - 2015 to Present Day**
```{r recent co2 trends, echo=F, message=F, fig.height=4}
co2_1 <- co2 |> filter(yr >= 2015)
(co2_plot1 <- co2_1|>
ggplot() +
geom_line(aes(x=date, y=mnth_avg), color="blue", linewidth=1) +
theme_classic() +
labs(title="Carbon Dioxide Measurements (2015 - 2026)",
subtitle="Monthly Data from Mauna Loa Observatory, Hawaii, USA",
y="Carbon Dioxide (ppm)",
x="Date",
caption="Data Source: https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt") +
scale_x_date(breaks = "1 year", date_labels = "%Y") +
theme(plot.title = element_text(size = 15),
plot.caption = element_text(size = 8),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 15),
plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2)))
```
:::
## Seasonality and Trend
::::: columns
::: {.column width="40%"}
```{r smaller co2 plot, echo=F}
(co2_plot2 <- co2_1|>
ggplot() +
geom_line(aes(x=date, y=mnth_avg), color="blue", linewidth=1) +
theme_classic() +
labs(title="CO2 (2015 - 2026)",
y="Carbon Dioxide (ppm)",
x="Date") +
scale_x_date(breaks = "1 year", date_labels = "%Y") +
theme(plot.title = element_text(size = 15),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 15),
plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2)))
```
**Data above are decomposed into these components:**

:::
::: {.column width="60%"}
- Plot shows BOTH
- Upward Trend
- Seasonal Pattern
- Forecasting model is specified to account for both components
- Forecasting decomposes data into
- **Trend**
- **Seasonality**
- **Residuals**
:::
:::::
## Seasonal Data: Alaska Electricity Revenue
::::: columns
::: {.column width="60%"}
- **Alaska is very far north** so there is
- ***summer light*** (day and night)
- ***winter darkness*** (day and night)
- Alaska Electricity usage has a **strong** seasonal pattern.
- Data are quarterly residential revenues:
- 1st Qtr of 2001 to 4th Qtr of `r year(Sys.Date())-1`
:::
::: {.column width="40%"}

:::
:::::
```{r import ak data, echo=F, message=F}
# a little data mgmt (if you are interested)
ak_res <- read_csv("data/AK_Residential_Electricity_Revenue_Updated.csv",
show_col_types = F,
skip=5,
col_names = c("quarter","Revenue")) |>
separate(quarter, c("Quarter", "Year")) |>
mutate(Date= ceiling_date(yq(paste(Year, Quarter)), 'quarter') - 1 ) |>
select(Date, Revenue) |>
arrange(Date)
# https://www.eia.gov/electricity/data/browser/#/topic/5?agg=0,1&geo=000000000000g&endsec=8&linechart=ELEC.SALES.AK-RES.Q&columnchart=ELEC.SALES.AK-RES.Q&map=ELEC.SALES.AK-RES.Q&freq=Q&ctype=linechart<ype=pin&rtype=s&maptype=0&rse=0&pin=
```
::: fragment
```{r glimpse ak data}
ak_res |> glimpse(width=60) # Alaska Residential Electricity Usage
```
:::
## **Alaska Residential Electricity Time Series** {background-color="aliceblue"}
```{r ak_res to xts and plot, echo=F}
# convert to xts (extensible time series)
ak_res_xts <- xts(x = ak_res[,2], order.by = ak_res$Date)
# ak-res interactive time_series
(ak_dg <- dygraph(ak_res_xts, main="Alaska Residential Electricity Revenue ($Mill)") |>
dySeries("Revenue", label="Revenue", color= "purple") |>
dyAxis("y", label = "", drawGrid = FALSE) |>
dyAxis("x", label = "", drawGrid = FALSE) |>
dyRangeSelector())
```
## **Alaska Residential Time Series** 
- Create Time Series and Examine it:
::: fragment
```{r create ak time series}
ak_res_ts <- ts(ak_res$Revenue, freq=4, start=c(2001, 1)) # create time series
```
:::
- Format of Time Series with Quarters:
- `head(ak_res_ts, 20)` shows first 20 observations and format.
:::: fragment
::: {.r-fit-text max="1em"}
```{r head used to show ts format}
head(ak_res_ts, 20) # shows time series in ts matrix format
```
:::
::::
##
### Lecture 26 In-class Exercises - Q4
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
If our time series from Alaska were augmented so that it started in February of 1990 (2nd month) and we had data by month (12 observations per year), how would our `ts` command change in R?
Hint: Our current data, `ak_res` are quarterly, and begin in the first quarter of 2001. The command we used to create time series is:
**`ts(ak_res$Revenue, freq=4, start=c(2001,1))`**
<br>
::: nonincremental
- `ts(ak_res$Revenue, freq=1, start=c(1, 1990))`
- `ts(ak_res$Revenue, freq=4, start=c(2, 1990))`
- `ts(ak_res$Revenue, freq=12, start=c(1990, 2))`
- `ts(ak_res$Revenue, freq=12, start=c(2, 1990))`
- `ts(ak_res$Revenue, freq=4, start=c(1, 1990))`
:::
## **Alaska Residential Electricity Time Series**
**Incorrect Model: Ignores Seasonality (`seasonal = F`)**
- Notice how wide the prediction intervals are.
- Model only optimizes `p`, `d`, and `q` for full time series `(0,0,4)`.
::: fragment
```{r incorrect model plt lg, message=F, fig.dim=c(8,3)}
ak_res_forecast1 <- ak_res_ts |>
auto.arima(ic="aic", seasonal=F) |>
forecast(h=4)
(autoplot(ak_res_forecast1) + labs(y = "AK Resid. Elec. Revenue") + theme_classic())
```
:::
## **Alaska Residential Electricity Time Series**
**Correct Model: Includes Seasonality (`seasonal = T`)**
- Prediction intervals are **MUCH** more narrow
- Optimizes `p`, `d` and `q` for full time series `(1,0,1)` and within season `(2,1,0)`.
- Indicates number of time periods within season, `[4]`
::: fragment
```{r correct model plt lg, message=F, fig.dim=c(8,3)}
ak_res_forecast2 <- ak_res_ts |>
auto.arima(ic="aic", seasonal=T) |>
forecast(h=4)
(autoplot(ak_res_forecast2) + labs(y = "AK Resid. Elec. Revenue") + theme_classic())
```
:::
##
### Lecture 26 In-class Exercises - Q5
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
Our data is quarterly and has four observations per year ending in the 4th quarter of `r year(Sys.Date())-1`.
**If the state of Alaska wants to extend the forecast until the Fall of `r year(Sys.Date())+1` (3rd Quarter), how would they change the R command?**
Hint: Current forecast extends until the 4th quarter of `r year(Sys.Date())` and command is written as: **`forecast(ak_res_model2, h=4)`**
::: nonincremental
- `forecast(ak_res_model2, h=6)`
- `forecast(ak_res_model2, h=7)`
- `forecast(ak_res_model2, h=8)`
- `forecast(ak_res_model2, h=9)`
- `forecast(ak_res_model2, h=10)`
:::
##
::::: columns
::: {.column width="50%"}
**Incorrect Model: Less precise**
```{r incorrect model plt sm, echo=F, message=F, fig.dim=c(5,3)}
(autoplot(ak_res_forecast1) + labs(y = "AK Resid. Elec. Revenue") + theme_classic())
```
```{r incorrect model forecasts, echo=F, message=F}
out1 <- round(bind_cols(ak_res_forecast1$mean,
ak_res_forecast1$lower, ak_res_forecast1$upper),2) |>
rename("Pt" = "...1", "Lo80"="80%...2", "Lo95"="95%...3",
"Hi80"="80%...4", "Hi95"="95%...5") |>
mutate(Year = year(Sys.Date()),
Qtr = 1:4) |>
select(Year, Qtr, "Pt", "Lo95", "Hi95")
knitr::kable(out1, format = 'html', )
```
**`Q4 Width = Hi - Lo = $201`**
:::
::: {.column width="50%"}
**Correct Model: More precise**
```{r correct model plt sm , echo=F, message=F, fig.dim=c(5,3)}
(autoplot(ak_res_forecast2) + labs(y = "AK Resid. Elec. Revenue") + theme_classic())
```
```{r correct model forecasts, echo=F, message=F}
out2 <- round(bind_cols(ak_res_forecast2$mean,
ak_res_forecast2$lower, ak_res_forecast2$upper),2) |>
rename("Pt" = "...1", "Lo80"="80%...2", "Lo95"="95%...3",
"Hi80"="80%...4", "Hi95"="95%...5") |>
mutate(Year = year(Sys.Date()),
Qtr = 1:4) |>
select(Year, Qtr, "Pt", "Lo95", "Hi95")
knitr::kable(out2, format = 'html', )
```
**`Q4 Width = Hi - Lo = $70`**
:::
:::::
##
### Prediction Bands Indicate Model Precision
- Prediction bands are **MUCH** narrower when seasonality is accounted for.
::: fragment
**Incorrect Model Forecasts and Prediction Bounds**
```{r incorrect model forecasts full, echo=F, message=F, fig.height=3}
ak_res_forecast1
```
:::
::: fragment
**Correct Model Forecasts and Prediction Bounds**
```{r correct model forecasts full, echo=F, message=F, fig.height=3}
ak_res_forecast2
```
:::
- **Interpretation of 95% Prediction Bounds:**
- We are 95% certain that 4th qtr. revenue in `r year(Sys.Date())` will fall within:
- Incorrect model range: **`$614.35 - $413.63 = $201`**
- Correct model range: **`$596.72 - $526.75 = $70`**
##
### Comparison of Model Residuals
::::: columns
::: {.column width="50%"}
```{r incorrect residuals, echo=F, fig.height=6}
checkresiduals(ak_res_forecast1, test=F)
```
**Incorrect Model:**
- Residuals **MUCH** larger
- Are highly correlated
- See ACF plot
:::
::: {.column width="50%"}
```{r correct residuals, echo=F, fig.height=6}
checkresiduals(ak_res_forecast2, test=F)
```
**Correct Model:**
- Residuals **MUCH** smaller
- Auto-correlation assumption is met
- ACF plot: lags in range
:::
:::::
##
### Comparison of Model Accuracy
::: fragment
**Incorrect Model Accuracy:**
```{r incorrect model accuracy}
acr1 <- accuracy(ak_res_forecast1) # examine MAPE and model percent accuracy
acr1 |> kable() |> kable_styling(full_width = F, font_size = 18)
```
:::
- **The incorrect model's percent accuracy is `r round(100-acr1[5],1)`%.** - Better than expected.
::: fragment
**Correct Model Accuracy:**
```{r correct model accuracy}
acr2 <- accuracy(ak_res_forecast2) # examine MAPE and model percent accuracy
acr2 |> kable() |> kable_styling(full_width = F, font_size = 18)
```
:::
- **The correct model's percent accuracy is `r round(100-acr2[5],1)`%.**
- Always plot data, but if seasonality is difficult to discern, run both models and compare them.
- Residuals (previous slide) and model accuracy (this slide) of models will indicate which model is correct.
##
### Key Points from Today
- **R `forecast` package** - simplifies forecasting.
- Know terminology and how to read and interpret output.
- **Plot data FIRST:** - Check for seasonality, trend, other patterns
- **Lecture 27 (Tue. 4/21)** - Come to class with a stock symbol from the [NYSE](https://www.nyse.com/listings_directory/stock){target="_blank"} that can be found on [Yahoo Finance](https://finance.yahoo.com/){target="_blank"}.
- **Lecture 28 (Thu. 4/23)** - 20 min. of lecture with Poll Everywhere Questions then time for Q&A
- Come with questions!
- **Evaluations are VERY Important: [coursefeedback.syr.edu](http://coursefeedback.syr.edu/)**
- **HW 10 includes material from Lectures 24-26**
- **HW 9 was due on 4/16.**
::: fragment
**To submit an Engagement Question or Comment about material from Lecture 26:** Submit it by midnight today (day of lecture).
:::