BUA 345 - Lecture 26

Forecasting - Part 2

Author

Penelope Pooler Eisenbies

Published

April 15, 2026

Housekeeping

Upcoming Dates

HW 9 is due on Wednesday, 4/15.
- Grace Period ends Thursday (4/16) at 10:00 PM.
HW 10 is now posted and is due on Monday 4/27.
OPTIONAL GitHub Quarto Dashboard Workshop on Fri., 4/17, at 3:15 PM.
- If you are interested in attending, please sign up using the Google Form.
Lecture 27 (4/21): Come to class with a stock symbol that you are interested in.
- Stock must be traded on NYSE
- This NYSE Directory will help you.
- Verify that stock can be found on Yahoo Finance.

Course Evaluations

Evaluations are VERY Important:
- coursefeedback.syr.edu
- I will provide time in class for evaluations next week.
- Please complete evaluations for ALL courses.

Today’s plan

Review of Time Series Concepts
- Review and New Terminology
Brief Review of Time Series without Seasonality
Seasonality in Time Series Data
Forecasting Trends with Seasonality in R
- Example from HW 10: Alaska
HW 10 is now posted and is due on 4/28
- Part of HW 10 pertains to today’s lecture.
Guest Speaker - Steven Davis from DLA Piper will speak for 15 minutes.

Lecture 26 In-class Exercises - Q1 - Review

Poll Everywhere - My User Name: penelopepoolereisenbies685

The AR in ARIMA stands a type of regression when you regress a variable on itself by using previous observations to predict future ones.

This is known as ___-regression.

Cross-Sectional Data

Shows a Snapshot of One Time Period

Time Series Data

Shows Trend over Time

Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

Time Series Terminology

auto-correlation: A variable is correlated with itself
auto-regression (AR): Using previous observations to predict into the future.
R function: auto.arima - ARIMA is an acronym:
- AR: auto-regressive - p = number of lags to minimize auto-correlation
- I: integrated - d = order of differencing to achieve stationarity
- MA: moving average - q = number of terms in moving average
- All 3 components are optimized to provide a reliable forecast.
  - R software optimizes ARIMA model by varying these components.

More on Stationarity

Stationary Time Series:
- Consistent mean and variance throughout time series
- Time series with trends, or with seasonality, are not stationary.
- Separating a time series into different parts is how we analyze it
  - This is called DECOMPOSITION
  - Time Series Modeling decomposes the data into:
    - Trend
    - Seasonality (repeated pattern)
    - Residuals (what’s left over)

Decomposition and SARIMA Models

NEW TERM: SARIMA MODEL

Lecture 25: ARIMA models
Today: ARIMA models with SEASONAL component.
SARIMA: Seasonal Auto-Regressive Integrated Moving Average.
SARIMA models:
- optimize p, d, and q for whole time series
- Also optimize p, d, and q within season (repeating intervals)

DECOMPOSITION

ARIMA models are decomposed into
- Trend | Residuals
SARIMA models are decomposed into
- Trend | Seasonal patterns | Residuals

Visualization of Decomposition:

ARIMA:
2nd Plot looks similar to Population Time Series
ARIMA decomposes trend into:
- Trend (2nd Plot)
- Residuals (4th Plot)
SARIMA:
- Plot 1: Time Series with a seasonal pattern.
- SARIMA decomposes trend into:
  - Trend (2nd Plot)
  - Seasonality (3rd Plot)
  - Residuals (4th Plot)

Netflix Stock Prices - Review

Dashed lines show peaks at irregular intervals.

Netflix Stock

Forecast Questions:
- What will be the estimated stock price be in April of 2027?
- What ARIMA model was chosen (p,d,q)?
Model Assessment Questions:
- How valid is our model?
  - Check residual plots.
- How are accurate are our estimates?
  - Examine Prediction Intervals and Prediction Bands
  - Check fit statistics

Netflix Stock - Modeling Time Series Data

Stock Trend Forecast

Create time series using Netflix Stock data
- Specify freq = 12 - 12 observations per year
- Specify start = c(2010, 1) - first obs. in dataset is January 2010
Model data using auto.arima function
- Specify ic = aic - aic is the information criterion used to determine model.
- Specify seasonality = F - no seasonal (repeating) pattern in the data.
This code will create and save the model:

Code

```{r create nflx time series and model, echo=T}
nflx_ts <- ts(nflx$Adjusted, freq=12, start=c(2010,1))   # create time series
nflx_model <- auto.arima(nflx_ts, ic="aic", seasonal=F)  # model data using auto.arima
```

Netflix Stock - Create and Plot Forecasts

Create forecasts (until April 2027)
- h = 12 indicates we want to forecast 12 months
- Most recent date in forecast data is April 1, 2026
- 12 Months until April 1, 2027
Forecasts become less accurate the further into the future you specify.

Code

```{r create nflx forecasts, echo=T}
nflx_forecast <- forecast(nflx_model, h=12) # create forecasts (until April 2027)
nflx_pred_plot <- autoplot(nflx_forecast) + labs(y = "Adjusted Closing Price") +
  theme_classic()
```

Darker purple: 80% Prediction Interval Bounds
Lighter purple: 95% Prediction Interval Bounds
Plot shows:
- Lags (p = 2), Differencing (d = 1), Moving Average (q = 2)

Netflix Stock - Forecast Plot

Lecture 26 In-class Exercises - Q2

Poll Everywhere - My User Name: penelopepoolereisenbies685

Based on the plot on the previous slide, we can tell how many previous periods (q) are included in each moving average in the model created by the auto.arima function in R.

What is q, the number of previous time periods in the moving average?

Netflix Stock - Examine Numerical Forecasts

Point Forecast is the forecasted estimate for each future time period
Lo 80 and Hi 80 are lower and upper bounds for the 80% prediction interval
Lo 95 and Hi 95 are lower and upper bounds for the 95% prediction interval

         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
May 2026       96.28770 90.87356 101.7018 88.00749 104.5679
Jun 2026       97.88642 90.43508 105.3378 86.49058 109.2823
Jul 2026       99.36907 90.06138 108.6768 85.13418 113.6040
Aug 2026      100.03050 88.78655 111.2745 82.83436 117.2266
Sep 2026       99.90374 86.80360 113.0039 79.86881 119.9387
Oct 2026       99.59476 84.93684 114.2527 77.17741 122.0121
Nov 2026       99.72772 83.85743 115.5980 75.45620 123.9992
Dec 2026      100.48277 83.62884 117.3367 74.70690 126.2586
Jan 2027      101.54850 83.77874 119.3183 74.37199 128.7250
Feb 2027      102.44464 83.71553 121.1737 73.80094 131.0883
Mar 2027      102.90728 83.15997 122.6546 72.70638 133.1082
Apr 2027      103.04149 82.28462 123.7984 71.29660 134.7864

Lecture 26 In-class Exercises - Q3

Poll Everywhere - My User Name: penelopepoolereisenbies685

Interpretation of Netflix Prediction Intervals

In March of 2027, the 80% prediction interval width for the Netflix stock price will be $____ wide.

To find this width, subtract the lower bound (Lo 80) from the upper bound (Hi 80) and round to the closest whole dollar.

Netflix Stock - Examine Residuals and Model Fit

Top Plot: Spikes get larger over time
ACF: auto-correlation function.
- Ideally, all or most values are with dashed lines
Histogram: Distribution of residuals should be approx. normal
Assessment: Stock prices are very volatile and this is sufficient.


    Ljung-Box test

data:  Residuals from ARIMA(2,1,2) with drift
Q* = 29.59, df = 20, p-value = 0.07677

Model df: 4.   Total lags used: 24

Netflix Stock - Examine Residuals and Model Fit

                       ME     RMSE      MAE       MPE     MAPE      MASE
Training set 0.0007043361 4.159505 2.555861 -6.764627 13.76645 0.2189833
                  ACF1
Training set 0.0568807

For BUA 345: We will use MAPE = Mean Absolute Percent Error
- 100 – MAPE = Percent accuracy of model.
Despite increasing volatility, our stock price model is estimated to be 86.23% accurate.
This doesn’t guarantee that forecasts will be 86% accurate but it does improve our chances of accurate forecasting.

Seasonality - Not Just Seasons

Seasonal periods can be days, months, seasons, decades, etc.
Seasonality: repeating pattern of highs and lows of approx. equal timespans
2024 Example shows some clear up and down pattern

Seasonality - Sometimes Pattern is Subtle

Seasonal periods can be days, months, seasons, decades, etc.
Seasonality: repeating pattern of highs and lows of approx. equal timespans
2026 seasonal pattern in this week’s data is much more subtle, but still present.

Seasonality - Not Just Seasons

Seasonal periods can be days, months, seasons, decades, etc.
Seasonality: repeating pattern of highs and lows of approx. equal timespans

Carbon Dioxide Trends - Monthly - 1958 to Present Day

Seasonality - Not Just Seasons

Seasonal periods can be days, months, seasons, decades, etc.
Seasonality: repeating pattern of highs and lows of approx. equal timespans

Carbon Dioxide - Monthly - 2015 to Present Day

Seasonality and Trend

Data above are decomposed into these components:

Plot shows BOTH
- Upward Trend
- Seasonal Pattern
Forecasting model is specified to account for both components
Forecasting decomposes data into
- Trend
- Seasonality
- Residuals

Seasonal Data: Alaska Electricity Revenue

Alaska is very far north so there is
- summer light (day and night)
- winter darkness (day and night)
Alaska Electricity usage has a strong seasonal pattern.
- Data are quarterly residential revenues:
  - 1st Qtr of 2001 to 4th Qtr of 2025

Rows: 100
Columns: 2
$ Date    <date> 2001-03-31, 2001-06-30, 2001-09-30, 2001-…
$ Revenue <dbl> 542.9275, 424.4111, 394.4869, 529.6425, 57…

Alaska Residential Electricity Time Series

Alaska Residential Time Series

Create Time Series and Examine it:

Format of Time Series with Quarters:
- head(ak_res_ts, 20) shows first 20 observations and format.

         Qtr1     Qtr2     Qtr3     Qtr4
2001 542.9275 424.4111 394.4869 529.6425
2002 570.5655 439.6120 408.6286 513.4108
2003 571.3253 440.5069 418.7918 556.3850
2004 612.2230 459.5118 430.6615 559.5087
2005 611.2410 447.8371 436.4646 566.1093

Lecture 26 In-class Exercises - Q4

Poll Everywhere - My User Name: penelopepoolereisenbies685

If our time series from Alaska were augmented so that it started in February of 1990 (2nd month) and we had data by month (12 observations per year), how would our ts command change in R?

Hint: Our current data, ak_res are quarterly, and begin in the first quarter of 2001. The command we used to create time series is:

ts(ak_res$Revenue, freq=4, start=c(2001,1))

ts(ak_res$Revenue, freq=1, start=c(1, 1990))
ts(ak_res$Revenue, freq=4, start=c(2, 1990))
ts(ak_res$Revenue, freq=12, start=c(1990, 2))
ts(ak_res$Revenue, freq=12, start=c(2, 1990))
ts(ak_res$Revenue, freq=4, start=c(1, 1990))

Alaska Residential Electricity Time Series

Incorrect Model: Ignores Seasonality (seasonal = F)

Notice how wide the prediction intervals are.
Model only optimizes p, d, and q for full time series (0,0,4).

Alaska Residential Electricity Time Series

Correct Model: Includes Seasonality (seasonal = T)

Prediction intervals are MUCH more narrow
Optimizes p, d and q for full time series (1,0,1) and within season (2,1,0).
Indicates number of time periods within season, [4]

Lecture 26 In-class Exercises - Q5

Poll Everywhere - My User Name: penelopepoolereisenbies685

Our data is quarterly and has four observations per year ending in the 4th quarter of 2025.

If the state of Alaska wants to extend the forecast until the Fall of 2027 (3rd Quarter), how would they change the R command?

Hint: Current forecast extends until the 4th quarter of 2026 and command is written as: forecast(ak_res_model2, h=4)

forecast(ak_res_model2, h=6)
forecast(ak_res_model2, h=7)
forecast(ak_res_model2, h=8)
forecast(ak_res_model2, h=9)
forecast(ak_res_model2, h=10)

Incorrect Model: Less precise

Year	Qtr	Pt	Lo95	Hi95
2026	1	574.22	496.62	651.81
2026	2	457.56	379.41	535.71
2026	3	488.97	390.62	587.32
2026	4	550.52	450.82	650.21

Q4 Width = Hi - Lo = $201

Correct Model: More precise

Year	Qtr	Pt	Lo95	Hi95
2026	1	604.27	574.84	633.69
2026	2	458.26	426.12	490.41
2026	3	436.58	402.77	470.39
2026	4	564.57	529.71	599.43

Q4 Width = Hi - Lo = $70

Prediction Bands Indicate Model Precision

Prediction bands are MUCH narrower when seasonality is accounted for.

Incorrect Model Forecasts and Prediction Bounds

        Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2026 Q1       574.2158 523.4787 624.9530 496.6201 651.8116
2026 Q2       457.5595 406.4619 508.6571 379.4125 535.7066
2026 Q3       488.9728 424.6662 553.2795 390.6243 587.3214
2026 Q4       550.5177 485.3301 615.7052 450.8220 650.2134

Correct Model Forecasts and Prediction Bounds

        Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2026 Q1       604.2656 585.0237 623.5075 574.8376 633.6935
2026 Q2       458.2622 437.2441 479.2802 426.1179 490.4065
2026 Q3       436.5829 414.4750 458.6908 402.7718 470.3940
2026 Q4       564.5713 541.7754 587.3672 529.7080 599.4346

Interpretation of 95% Prediction Bounds:
We are 95% certain that 4th qtr. revenue in 2026 will fall within:
- Incorrect model range: $614.35 - $413.63 = $201
- Correct model range: $596.72 - $526.75 = $70

Comparison of Model Residuals

Incorrect Model:

Residuals MUCH larger
Are highly correlated
- See ACF plot

Correct Model:

Residuals MUCH smaller
Auto-correlation assumption is met
- ACF plot: lags in range

Comparison of Model Accuracy

Incorrect Model Accuracy:

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.3034447	38.5874	31.8747	-0.8747044	6.255521	2.420947	0.0359741

The incorrect model’s percent accuracy is 93.7%. - Better than expected.

Correct Model Accuracy:

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.9990916	14.40144	10.80413	0.1497032	2.012243	0.8205957	-0.0141138

The correct model’s percent accuracy is 98%.
Always plot data, but if seasonality is difficult to discern, run both models and compare them.
Residuals (previous slide) and model accuracy (this slide) of models will indicate which model is correct.

Key Points from Today

R forecast package - simplifies forecasting.
- Know terminology and how to read and interpret output.
Plot data FIRST: - Check for seasonality, trend, other patterns
Lecture 27 (Tue. 4/21) - Come to class with a stock symbol from the NYSE that can be found on Yahoo Finance.
Lecture 28 (Thu. 4/23) - 20 min. of lecture with Poll Everywhere Questions then time for Q&A
- Come with questions!
Evaluations are VERY Important: coursefeedback.syr.edu
HW 10 includes material from Lectures 24-26
HW 9 was due on 4/16.

To submit an Engagement Question or Comment about material from Lecture 26: Submit it by midnight today (day of lecture).

--- title: "BUA 345 - Lecture 26" subtitle: "Forecasting - Part 2" author: "Penelope Pooler Eisenbies" date: last-modified lightbox: true toc: true toc-depth: 3 toc-location: left toc-title: "Table of Contents" toc-expand: 1 format: html: code-line-numbers: true code-fold: true code-tools: true execute: echo: fenced --- ## Housekeeping ```{r setup, echo=FALSE, warning=F, message=F, include=F} #| include: false # this line specifies options for default options for all R Chunks knitr::opts_chunk$set(echo=F) # suppress scientific notation options(scipen=100) # install helper package that loads and installs other packages, if needed if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/") # install and load required packages pacman::p_load(pacman,tidyverse, magrittr, knitr, gridExtra, forecast, tidyquant, lubridate, maps, usdata, mapproj, ggthemes, RColorBrewer, dygraphs, kableExtra) # verify packages # p_loaded() ``` ### Upcoming Dates - **HW 9 is due on Wednesday, 4/15**. - Grace Period ends Thursday (4/16) at 10:00 PM. - **HW 10 is now posted and is due on Monday 4/27.** - **OPTIONAL [GitHub Quarto Dashboard Workshop](https://peneloopy.github.io/bua_345_sem/#dashboard-options){target="_blank"} on Fri., 4/17, at 3:15 PM**. - If you are interested in attending, please sign up using the [Google Form](https://forms.gle/nqcSgwCc4LQrJ7PF6){target="_blank"}. - Lecture 27 (4/21): Come to class with a stock symbol that you are interested in. - Stock must be traded on NYSE - This [NYSE Directory](https://www.nyse.com/listings_directory/stock){target="_blank"} will help you. - Verify that stock can be found on [Yahoo Finance](https://finance.yahoo.com/){target="_blank"}. ## ## Course Evaluations - **Evaluations are VERY Important:** - [**coursefeedback.syr.edu**](http://coursefeedback.syr.edu/) - I will provide time in class for evaluations next week. - Please complete evaluations for **ALL** courses. ## ### Today's plan - **Review of Time Series Concepts** - Review and New Terminology - Brief Review of Time Series without Seasonality - Seasonality in Time Series Data - Forecasting Trends with Seasonality in R - Example from HW 10: Alaska - HW 10 is now posted and is due on 4/28 - Part of HW 10 pertains to today's lecture. - Guest Speaker - [Steven Davis from DLA Piper](https://www.dlapiper.com/en-us/people/d/davis-steven){target="_blank"} will speak for 15 minutes. ## ### Lecture 26 In-class Exercises - Q1 - Review [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** The `AR` in ARIMA stands a type of regression when you regress a variable on itself by using previous observations to predict future ones. <br> **This is known as `___`-regression.** ## Cross-Sectional Data **Shows a Snapshot of One Time Period** ```{r echo = F, message = FALSE, fig.align='center'} gsb26 <- read_csv("data/gsb_26.csv", show_col_types = F, skip=25, col_select = c(1,2,6), ) |> slice(1:5) |> rename("city" = "Highest – 7.9", "2025-26" = "...2", "Most" = "...6") |> mutate(Most = substr(Most, 1,5) |> as.numeric()) gsb_long <- gsb26 |> pivot_longer(cols = `2025-26`:Most, names_to = "type", values_to = "inches") (gsb_plt <- gsb_long |> ggplot() + geom_bar(aes(x=city, y=inches, fill=type), stat="identity", position="dodge") + scale_fill_manual(values=c("blue4", "lightblue")) + theme_classic() + labs(fill="", x="City", y="Snowfall (inches)", caption="Data Source: https://goldensnowball.com/", title="City Snowfall - Current and All-time Record")+ theme(plot.title = element_text(size = 15), plot.caption = element_text(size = 10), axis.title.x = element_text(size = 15), axis.title.y = element_text(size = 15), axis.text.x = element_text(size = 8), axis.text.y = element_text(size = 15), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))) ``` ## Time Series Data **Shows Trend over Time** ```{r syr snowfall timeseries, echo=F, message=F, fig.align='center'} snowfall <- read_csv("data/snowfall_upstateny_cities.csv", show_col_types = F) |> filter(!Season=="Season") |> separate(Season, into=c("Season_Start", "Season_End"), sep = "-") |> mutate(Season_Start = Season_Start |> as.integer(), Season_End = Season_Start + 1 |> as.integer(), Syracuse = Syracuse |> as.numeric(), Buffalo = Buffalo |> as.numeric()) |> rename("city_most" = "City With Most Snow") |> select(Season_End, Syracuse, Buffalo, city_most) |> filter(Season_End >= 1952) |> pivot_longer(cols=Syracuse:Buffalo, names_to = "City", values_to = "Snowfall") (line_plot <- snowfall |> ggplot() + geom_line(aes(x=Season_End, y=Snowfall, color=City), linewidth=1) + theme_classic() + scale_x_continuous(breaks=seq(1960, 2020, 10)) + scale_color_manual(values=c("lightblue", "blue")) + ylim(0,200) + labs(title="Syracuse and Buffalo Annual Snowfall", y="Snowfall (inches)", x="Year Season Ended", caption="Data Source: https://en.wikipedia.org/wiki/Golden_Snowball_Award") + theme(plot.title = element_text(size = 15), plot.caption = element_text(size = 10), axis.title.x = element_text(size = 15), axis.title.y = element_text(size = 15), axis.text.x = element_text(size = 10), axis.text.y = element_text(size = 15), plot.background = element_rect(colour = "darkgrey", fill=NA, size=2))) ``` ## Time Series Terminology - **auto-correlation:** A variable is correlated with itself - **auto-regression (AR):** Using previous observations to predict into the future. - **R function:** **`auto.arima` - ARIMA** is an acronym: - **AR:** auto-regressive - `p` = number of lags to minimize auto-correlation - **I:** integrated - `d` = order of differencing to achieve stationarity - **MA:** moving average - `q` = number of terms in moving average - All 3 components are optimized to provide a reliable forecast. - R software optimizes ARIMA model by varying these components. ## More on Stationarity - **Stationary Time Series:** - Consistent mean and variance throughout time series - Time series with trends, or with seasonality, are not stationary. - Separating a time series into different parts is how we analyze it - This is called **DECOMPOSITION** - Time Series Modeling decomposes the data into: - Trend - Seasonality (repeated pattern) - Residuals (what's left over) ## Decomposition and SARIMA Models ::: fragment **NEW TERM: SARIMA MODEL** ::: - Lecture 25: **ARIMA** models - Today: ARIMA models with **SEASONAL** component. - **SARIMA:** **S**easonal **A**uto-**R**egressive **I**ntegrated **M**oving **A**verage. - **SARIMA** models: - optimize `p`, `d`, and `q` for whole time series - Also optimize `p`, `d`, and `q` within season (repeating intervals) ::: fragment **DECOMPOSITION** ::: - ARIMA models are decomposed into - Trend \| Residuals - SARIMA models are decomposed into - Trend \| Seasonal patterns \| Residuals ------------------------------------------------------------------------ ### Visualization of Decomposition: ::::: columns ::: {.column width="40%"} ![](img/decomp.png) ::: ::: {.column width="60%"} - **ARIMA:** - 2nd Plot looks similar to Population Time Series - ARIMA decomposes trend into: - Trend (2nd Plot) - Residuals (4th Plot) - **SARIMA:** - Plot 1: Time Series with a seasonal pattern. - SARIMA decomposes trend into: - Trend (2nd Plot) - Seasonality (3rd Plot) - Residuals (4th Plot) ::: ::::: ## Netflix Stock Prices - Review {background-color="aliceblue"} **Dashed lines show peaks at irregular intervals.** ```{r dygraph nflx data, echo=F} # import from yahoo finance and plot hchart nflx <- getSymbols("NFLX", from = "2010-01-01", to = "2026-04-15") (nflx_dyg <- NFLX[,6] |> dygraph(main="Netflix: Non-Seasonal Trending Data") |> dySeries("NFLX.Adjusted", "NFLX", color="red") |> dyRangeSelector() |> # remove grid lines and label axes to make plot more readable (optional) dyAxis("y", label = "Adjusted Close", drawGrid = FALSE) |> dyAxis("x", label = "Date", drawGrid = FALSE) |> # add lines for events dyEvent("2014-2-1", color="lightgrey") |> dyEvent("2014-8-1", color="lightgrey") |> dyEvent("2015-8-1", color="lightgrey") |> dyEvent("2015-11-1", color="lightgrey") |> dyEvent("2018-6-1", color="lightgrey") |> dyEvent("2018-8-1", color="lightgrey") |> dyEvent("2019-4-1", color="lightgrey") |> dyEvent("2019-6-1", color="lightgrey") |> dyEvent("2020-8-1", color="lightgrey") |> dyEvent("2021-1-1", color="lightgrey") |> dyEvent("2021-11-1", color="lightgrey") |> dyEvent("2023-2-2", color="lightgrey") |> dyEvent("2023-7-19", color="lightgrey") |> dyEvent("2025-2-14", color="lightgrey") |> dyEvent("2025-6-27", color="lightgrey")) # convert xts to tibble nflx <- NFLX |> fortify.zoo() |> as_tibble(.name_repair = "minimal") |> rename("Date" = "Index", "Adjusted" = "NFLX.Adjusted") |> mutate(year=year(Date), month=month(Date), day=day(Date)) |> group_by(year, month) |> filter(day == min(day, na.rm=T)) |> ungroup() |> select(Date, Adjusted) ``` ## Netflix Stock - **Forecast Questions:** - What will be the estimated stock price be in April of `r year(Sys.Date())+1`? - What ARIMA model was chosen (p,d,q)? - **Model Assessment Questions:** - How valid is our model? - Check residual plots. - How are accurate are our estimates? - Examine Prediction Intervals and Prediction Bands - Check fit statistics ## Netflix Stock - Modeling Time Series Data **Stock Trend Forecast** - Create time series using Netflix Stock data - Specify `freq = 12` - 12 observations per year - Specify `start = c(2010, 1)` - first obs. in dataset is January 2010 - Model data using `auto.arima` function - Specify `ic = aic` - `aic` is the information criterion used to determine model. - Specify `seasonality = F` - no seasonal (repeating) pattern in the data. - This code will create and save the model: :::: fragment ::: r-fit-text ```{r create nflx time series and model, echo=T} nflx_ts <- ts(nflx$Adjusted, freq=12, start=c(2010,1)) # create time series nflx_model <- auto.arima(nflx_ts, ic="aic", seasonal=F) # model data using auto.arima ``` ::: :::: ## Netflix Stock - Create and Plot Forecasts - Create forecasts (until April `r year(Sys.Date())+1`) - `h = 12` indicates we want to forecast 12 months - Most recent date in forecast data is April 1, `r year(Sys.Date())` - 12 Months until April 1, `r year(Sys.Date())+1` - **Forecasts become less accurate the further into the future you specify.** :::: fragment ::: r-fit-text ```{r create nflx forecasts, echo=T} nflx_forecast <- forecast(nflx_model, h=12) # create forecasts (until April 2027) nflx_pred_plot <- autoplot(nflx_forecast) + labs(y = "Adjusted Closing Price") + theme_classic() ``` ::: :::: - Darker purple: 80% Prediction Interval Bounds - Lighter purple: 95% Prediction Interval Bounds - **Plot shows:** - **Lags (p = `r summary(nflx_model)$arma[1]`), Differencing (d = `r summary(nflx_model)$arma[6]`), Moving Average (q = `r summary(nflx_model)$arma[2]`)** ## Netflix Stock - Forecast Plot ```{r plot nflx forecasts with pred intervals, echo=F} nflx_pred_plot ``` ## ### Lecture 26 In-class Exercises - Q2 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** Based on the plot on the previous slide, we can tell how many previous periods (q) are included in each moving average in the model created by the `auto.arima` function in R. <br> **What is q, the number of previous time periods in the moving average?** ## ### Netflix Stock - Examine Numerical Forecasts - Point Forecast is the forecasted estimate for each future time period - Lo 80 and Hi 80 are lower and upper bounds for the 80% prediction interval - Lo 95 and Hi 95 are lower and upper bounds for the 95% prediction interval ::: fragment ```{r nflx numerical forecasts} nflx_forecast # prints out forecast values ``` ::: ## ### Lecture 26 In-class Exercises - Q3 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** Interpretation of Netflix Prediction Intervals <br> **In March of `r year(Sys.Date())+1`, the 80% prediction interval width for the Netflix stock price will be \$`____` wide.** To find this width, subtract the lower bound (`Lo 80`) from the upper bound (`Hi 80`) and round to the closest whole dollar. ## ### Netflix Stock - Examine Residuals and Model Fit :::::: columns ::: {.column width="40%"} - Top Plot: Spikes get larger over time - ACF: auto-correlation function. - Ideally, all or most values are with dashed lines - Histogram: Distribution of residuals should be approx. normal - Assessment: Stock prices are very volatile and this is sufficient. ::: :::: {.column width="60%"} ::: fragment ```{r nflx residual plots} checkresiduals(nflx_forecast) # examine residuals ``` ::: :::: :::::: ## ### Netflix Stock - Examine Residuals and Model Fit :::: fragment ::: r-fit-text ```{r nflx fit statistics } (acr <- accuracy(nflx_forecast)) # examine model accuracy (fit) ``` ::: :::: - For BUA 345: We will use MAPE = Mean Absolute Percent Error - **100 – MAPE = Percent accuracy of model.** - Despite increasing volatility, our stock price model is estimated to be `r round(100 - acr[5],2)`% accurate. - This doesn’t guarantee that forecasts will be `r round(100 - acr[5])`% accurate but it does improve our chances of accurate forecasting. ## ### Seasonality - Not Just Seasons - Seasonal periods can be days, months, seasons, decades, etc. - Seasonality: repeating pattern of highs and lows of approx. equal timespans - 2024 Example shows some clear up and down pattern ::: fragment ![](img/weather_2024_04_15.png) ::: ## ### Seasonality - Sometimes Pattern is Subtle - Seasonal periods can be days, months, seasons, decades, etc. - Seasonality: repeating pattern of highs and lows of approx. equal timespans - `r year(Sys.Date())` seasonal pattern in this week's data is much more subtle, but still present. ::: fragment ![](img/weather_2026_04_15.png){fig-align="center" height="3.5in"} ::: ## ### Seasonality - Not Just Seasons - Seasonal periods can be days, months, seasons, decades, etc. - Seasonality: repeating pattern of highs and lows of approx. equal timespans ::: fragment ***Carbon Dioxide Trends - Monthly - 1958 to Present Day*** ```{r import co2 data, echo=F, include=F, message=FALSE, results='hide'} co2 <- read_csv("data/co2.csv", show_col_types = F) |> dplyr::select(1,2,4,5) |> mutate(date=ymd(paste(yr,month,15))) |> glimpse() # https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt # https://gml.noaa.gov/dv/data/index.php?parameter_name=Carbon%2BDioxide&site=MLO%22 ``` ```{r co2 trends, echo=F, message=F, fig.height=4} (co2_plot <- co2 |> ggplot() + geom_line(aes(x=date, y=mnth_avg), color="blue", linewidth=1) + theme_classic() + labs(title="Carbon Dioxide Measurements (1958 - 2025)", subtitle="Monthly Data from Mauna Loa Observatory, Hawaii, USA", y="Carbon Dioxide (ppm)", x="Date", caption="Data Source: https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt") + scale_x_date(breaks = "10 years", date_labels = "%Y") + theme(plot.title = element_text(size = 15), plot.caption = element_text(size = 8), axis.title.x = element_text(size = 15), axis.title.y = element_text(size = 15), axis.text.x = element_text(size = 10), axis.text.y = element_text(size = 15), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))) ``` ::: ## ### Seasonality - Not Just Seasons - Seasonal periods can be days, months, seasons, decades, etc. - Seasonality: repeating pattern of highs and lows of approx. equal timespans ::: fragment **Carbon Dioxide - Monthly - 2015 to Present Day** ```{r recent co2 trends, echo=F, message=F, fig.height=4} co2_1 <- co2 |> filter(yr >= 2015) (co2_plot1 <- co2_1|> ggplot() + geom_line(aes(x=date, y=mnth_avg), color="blue", linewidth=1) + theme_classic() + labs(title="Carbon Dioxide Measurements (2015 - 2026)", subtitle="Monthly Data from Mauna Loa Observatory, Hawaii, USA", y="Carbon Dioxide (ppm)", x="Date", caption="Data Source: https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt") + scale_x_date(breaks = "1 year", date_labels = "%Y") + theme(plot.title = element_text(size = 15), plot.caption = element_text(size = 8), axis.title.x = element_text(size = 15), axis.title.y = element_text(size = 15), axis.text.x = element_text(size = 10), axis.text.y = element_text(size = 15), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))) ``` ::: ## Seasonality and Trend ::::: columns ::: {.column width="40%"} ```{r smaller co2 plot, echo=F} (co2_plot2 <- co2_1|> ggplot() + geom_line(aes(x=date, y=mnth_avg), color="blue", linewidth=1) + theme_classic() + labs(title="CO2 (2015 - 2026)", y="Carbon Dioxide (ppm)", x="Date") + scale_x_date(breaks = "1 year", date_labels = "%Y") + theme(plot.title = element_text(size = 15), axis.title.x = element_text(size = 15), axis.title.y = element_text(size = 15), axis.text.x = element_text(size = 10), axis.text.y = element_text(size = 15), plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))) ``` **Data above are decomposed into these components:** ![](img/decomp2.png) ::: ::: {.column width="60%"} - Plot shows BOTH - Upward Trend - Seasonal Pattern - Forecasting model is specified to account for both components - Forecasting decomposes data into - **Trend** - **Seasonality** - **Residuals** ::: ::::: ## Seasonal Data: Alaska Electricity Revenue ::::: columns ::: {.column width="60%"} - **Alaska is very far north** so there is - ***summer light*** (day and night) - ***winter darkness*** (day and night) - Alaska Electricity usage has a **strong** seasonal pattern. - Data are quarterly residential revenues: - 1st Qtr of 2001 to 4th Qtr of `r year(Sys.Date())-1` ::: ::: {.column width="40%"} ![](img/AK_image.png) ::: ::::: ```{r import ak data, echo=F, message=F} # a little data mgmt (if you are interested) ak_res <- read_csv("data/AK_Residential_Electricity_Revenue_Updated.csv", show_col_types = F, skip=5, col_names = c("quarter","Revenue")) |> separate(quarter, c("Quarter", "Year")) |> mutate(Date= ceiling_date(yq(paste(Year, Quarter)), 'quarter') - 1 ) |> select(Date, Revenue) |> arrange(Date) # https://www.eia.gov/electricity/data/browser/#/topic/5?agg=0,1&geo=000000000000g&endsec=8&linechart=ELEC.SALES.AK-RES.Q&columnchart=ELEC.SALES.AK-RES.Q&map=ELEC.SALES.AK-RES.Q&freq=Q&ctype=linechart&ltype=pin&rtype=s&maptype=0&rse=0&pin= ``` ::: fragment ```{r glimpse ak data} ak_res |> glimpse(width=60) # Alaska Residential Electricity Usage ``` ::: ## **Alaska Residential Electricity Time Series** {background-color="aliceblue"} ```{r ak_res to xts and plot, echo=F} # convert to xts (extensible time series) ak_res_xts <- xts(x = ak_res[,2], order.by = ak_res$Date) # ak-res interactive time_series (ak_dg <- dygraph(ak_res_xts, main="Alaska Residential Electricity Revenue ($Mill)") |> dySeries("Revenue", label="Revenue", color= "purple") |> dyAxis("y", label = "", drawGrid = FALSE) |> dyAxis("x", label = "", drawGrid = FALSE) |> dyRangeSelector()) ``` ## **Alaska Residential Time Series** ![](img/AK_small.png) - Create Time Series and Examine it: ::: fragment ```{r create ak time series} ak_res_ts <- ts(ak_res$Revenue, freq=4, start=c(2001, 1)) # create time series ``` ::: - Format of Time Series with Quarters: - `head(ak_res_ts, 20)` shows first 20 observations and format. :::: fragment ::: {.r-fit-text max="1em"} ```{r head used to show ts format} head(ak_res_ts, 20) # shows time series in ts matrix format ``` ::: :::: ## ### Lecture 26 In-class Exercises - Q4 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** If our time series from Alaska were augmented so that it started in February of 1990 (2nd month) and we had data by month (12 observations per year), how would our `ts` command change in R? Hint: Our current data, `ak_res` are quarterly, and begin in the first quarter of 2001. The command we used to create time series is: **`ts(ak_res$Revenue, freq=4, start=c(2001,1))`** <br> ::: nonincremental - `ts(ak_res$Revenue, freq=1, start=c(1, 1990))` - `ts(ak_res$Revenue, freq=4, start=c(2, 1990))` - `ts(ak_res$Revenue, freq=12, start=c(1990, 2))` - `ts(ak_res$Revenue, freq=12, start=c(2, 1990))` - `ts(ak_res$Revenue, freq=4, start=c(1, 1990))` ::: ## **Alaska Residential Electricity Time Series** **Incorrect Model: Ignores Seasonality (`seasonal = F`)** - Notice how wide the prediction intervals are. - Model only optimizes `p`, `d`, and `q` for full time series `(0,0,4)`. ::: fragment ```{r incorrect model plt lg, message=F, fig.dim=c(8,3)} ak_res_forecast1 <- ak_res_ts |> auto.arima(ic="aic", seasonal=F) |> forecast(h=4) (autoplot(ak_res_forecast1) + labs(y = "AK Resid. Elec. Revenue") + theme_classic()) ``` ::: ## **Alaska Residential Electricity Time Series** **Correct Model: Includes Seasonality (`seasonal = T`)** - Prediction intervals are **MUCH** more narrow - Optimizes `p`, `d` and `q` for full time series `(1,0,1)` and within season `(2,1,0)`. - Indicates number of time periods within season, `[4]` ::: fragment ```{r correct model plt lg, message=F, fig.dim=c(8,3)} ak_res_forecast2 <- ak_res_ts |> auto.arima(ic="aic", seasonal=T) |> forecast(h=4) (autoplot(ak_res_forecast2) + labs(y = "AK Resid. Elec. Revenue") + theme_classic()) ``` ::: ## ### Lecture 26 In-class Exercises - Q5 [***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685** Our data is quarterly and has four observations per year ending in the 4th quarter of `r year(Sys.Date())-1`. **If the state of Alaska wants to extend the forecast until the Fall of `r year(Sys.Date())+1` (3rd Quarter), how would they change the R command?** Hint: Current forecast extends until the 4th quarter of `r year(Sys.Date())` and command is written as: **`forecast(ak_res_model2, h=4)`** ::: nonincremental - `forecast(ak_res_model2, h=6)` - `forecast(ak_res_model2, h=7)` - `forecast(ak_res_model2, h=8)` - `forecast(ak_res_model2, h=9)` - `forecast(ak_res_model2, h=10)` ::: ## ::::: columns ::: {.column width="50%"} **Incorrect Model: Less precise** ```{r incorrect model plt sm, echo=F, message=F, fig.dim=c(5,3)} (autoplot(ak_res_forecast1) + labs(y = "AK Resid. Elec. Revenue") + theme_classic()) ``` ```{r incorrect model forecasts, echo=F, message=F} out1 <- round(bind_cols(ak_res_forecast1$mean, ak_res_forecast1$lower, ak_res_forecast1$upper),2) |> rename("Pt" = "...1", "Lo80"="80%...2", "Lo95"="95%...3", "Hi80"="80%...4", "Hi95"="95%...5") |> mutate(Year = year(Sys.Date()), Qtr = 1:4) |> select(Year, Qtr, "Pt", "Lo95", "Hi95") knitr::kable(out1, format = 'html', ) ``` **`Q4 Width = Hi - Lo = $201`** ::: ::: {.column width="50%"} **Correct Model: More precise** ```{r correct model plt sm , echo=F, message=F, fig.dim=c(5,3)} (autoplot(ak_res_forecast2) + labs(y = "AK Resid. Elec. Revenue") + theme_classic()) ``` ```{r correct model forecasts, echo=F, message=F} out2 <- round(bind_cols(ak_res_forecast2$mean, ak_res_forecast2$lower, ak_res_forecast2$upper),2) |> rename("Pt" = "...1", "Lo80"="80%...2", "Lo95"="95%...3", "Hi80"="80%...4", "Hi95"="95%...5") |> mutate(Year = year(Sys.Date()), Qtr = 1:4) |> select(Year, Qtr, "Pt", "Lo95", "Hi95") knitr::kable(out2, format = 'html', ) ``` **`Q4 Width = Hi - Lo = $70`** ::: ::::: ## ### Prediction Bands Indicate Model Precision - Prediction bands are **MUCH** narrower when seasonality is accounted for. ::: fragment **Incorrect Model Forecasts and Prediction Bounds** ```{r incorrect model forecasts full, echo=F, message=F, fig.height=3} ak_res_forecast1 ``` ::: ::: fragment **Correct Model Forecasts and Prediction Bounds** ```{r correct model forecasts full, echo=F, message=F, fig.height=3} ak_res_forecast2 ``` ::: - **Interpretation of 95% Prediction Bounds:** - We are 95% certain that 4th qtr. revenue in `r year(Sys.Date())` will fall within: - Incorrect model range: **`$614.35 - $413.63 = $201`** - Correct model range: **`$596.72 - $526.75 = $70`** ## ### Comparison of Model Residuals ::::: columns ::: {.column width="50%"} ```{r incorrect residuals, echo=F, fig.height=6} checkresiduals(ak_res_forecast1, test=F) ``` **Incorrect Model:** - Residuals **MUCH** larger - Are highly correlated - See ACF plot ::: ::: {.column width="50%"} ```{r correct residuals, echo=F, fig.height=6} checkresiduals(ak_res_forecast2, test=F) ``` **Correct Model:** - Residuals **MUCH** smaller - Auto-correlation assumption is met - ACF plot: lags in range ::: ::::: ## ### Comparison of Model Accuracy ::: fragment **Incorrect Model Accuracy:** ```{r incorrect model accuracy} acr1 <- accuracy(ak_res_forecast1) # examine MAPE and model percent accuracy acr1 |> kable() |> kable_styling(full_width = F, font_size = 18) ``` ::: - **The incorrect model's percent accuracy is `r round(100-acr1[5],1)`%.** - Better than expected. ::: fragment **Correct Model Accuracy:** ```{r correct model accuracy} acr2 <- accuracy(ak_res_forecast2) # examine MAPE and model percent accuracy acr2 |> kable() |> kable_styling(full_width = F, font_size = 18) ``` ::: - **The correct model's percent accuracy is `r round(100-acr2[5],1)`%.** - Always plot data, but if seasonality is difficult to discern, run both models and compare them. - Residuals (previous slide) and model accuracy (this slide) of models will indicate which model is correct. ## ### Key Points from Today - **R `forecast` package** - simplifies forecasting. - Know terminology and how to read and interpret output. - **Plot data FIRST:** - Check for seasonality, trend, other patterns - **Lecture 27 (Tue. 4/21)** - Come to class with a stock symbol from the [NYSE](https://www.nyse.com/listings_directory/stock){target="_blank"} that can be found on [Yahoo Finance](https://finance.yahoo.com/){target="_blank"}. - **Lecture 28 (Thu. 4/23)** - 20 min. of lecture with Poll Everywhere Questions then time for Q&A - Come with questions! - **Evaluations are VERY Important: [coursefeedback.syr.edu](http://coursefeedback.syr.edu/)** - **HW 10 includes material from Lectures 24-26** - **HW 9 was due on 4/16.** ::: fragment **To submit an Engagement Question or Comment about material from Lecture 26:** Submit it by midnight today (day of lecture). :::