Lecture 25 - Forecasting - Part 2

Penelope Pooler Eisenbies
BUA 345

2024-04-15

Housekeeping

Upcoming Dates

  • HW 9 was due on Monday, 4/15.

    • Grace Period ends tonight (Tues. 4/16) at midnight.
  • HW 10 is posted and due Monday, 4/22

    Grace period for HW 10 extended until Thu. 4/25 at midnight.

  • Lecture 26 on Thu. 4/18 is Optional.

  • No Lecture on 4/23

  • Course Review on 4/25

  • NEW PACKAGE FOR FORECASTING: forecast

    • If you are having trouble installing/loading any packages or components of R or RStudio, please come to office hour or make an appointment with me.

Course Evaluations

  • Evaluations are VERY Important:

    • coursefeedback.syr.edu

    • I will end class 5 min. early today and next Thursday to give you time to complete evaluations in class.

    • Please complete evaluations for ALL courses.

💥 Lecture 25 In-class Exercises - Q1 - Review 💥

Session ID: bua345s24

The AR in ARIMA stands a type of regression when you regress a variable on itself by using previous observations to predict future ones.

This is known as ___-regression.

Plan for Today

  • Review of Time Series Concepts

    • Review and New Terminology
  • Brief Review of Time Series without Seasonality

  • Seasonality in Time Series Data

  • Forecasting Trends with Seasonality in R

    • Example from HW 10: Alaska

Cross-Sectional Data

Shows a Snapshot of One Time Period

Time Series Data

Shows Trend over Time

Time Series Terminology

  • auto-correlation: A variable is correlated with itself

  • auto-regression (AR): Using previous observations to predict into the future.

  • R function: auto.arima - ARIMA is an acronym:

    • AR: auto-regressive - p = number of lags to minimize auto-correlation

    • I: integrated - d = order of differencing to achieve stationarity

    • MA: moving average - q = number of terms in moving average

    • All 3 components are optimized to provide a reliable forecast.

      • R software optimizes ARIMA model by varying these components.

More on Stationarity

  • Stationary Time Series:

    • Consistent mean and variance throughout time series

    • Time series with trends, or with seasonality, are not stationary.

    • Separating a time series into different parts is how we analyze it

      • This is called DECOMPOSITION

      • Time Series Modeling decomposes the data into:

        • Trend

        • Seasonality (repeated pattern)

        • Residuals (what’s left over)

Decomposition and SARIMA Models

NEW TERM: SARIMA MODEL

  • Lecture 24: ARIMA models

  • Today: ARIMA models with SEASONAL component.

  • SARIMA: Seasonal Auto-Regressive Integrated Moving Average.

  • SARIMA models:

    • optimize p, d, and q for whole time series

    • Also optimize p, d, and q within season (repeating intervals)

DECOMPOSITION

  • ARIMA models are decomposed into

    • Trend | Residuals
  • SARIMA models are decomposed into

    • Trend | Seasonal patterns | Residuals

Visualization of Decomposition:

  • ARIMA:

  • 2nd Plot looks similar to Population Time Series

  • ARIMA decomposes trend into:

    • Trend (2nd Plot)
    • Residuals (4th Plot)
  • SARIMA:

    • Plot 1: Time Series with a seasonal pattern.

    • SARIMA decomposes trend into:

      • Trend (2nd Plot)
      • Seasonality (3rd Plot)
      • Residuals (4th Plot)

Netflix Stock Prices - Review

Dashed lines show peaks at irregular intervals.

Netflix Stock

  • Forecast Questions:

    • What will be the estimated stock price be in April of 2025?

    • What ARIMA model was chosen (p,d,q)?

  • Model Assessment Questions:

    • How valid is our model?

      • Check residual plots.
    • How are accurate are our estimates?

      • Examine Prediction Intervals and Prediction Bands

      • Check fit statistics

Netflix Stock - Modeling Time Series Data

Stock Trend Forecast

  • Creat time series using Netflix Stock data

    • Specify freq = 12 - 12 observations per year

    • Specify start = c(2010, 1) - first obs. in dataset is January 2010

  • Model data using auto.arima function

    • Specify ic = aic - aic is the information criterion used to determine model.

    • Specify seasonality = F - no seasonal (repeating) pattern in the data.

  • This chunk will create and save the model.

nflx_ts <- ts(nflx$Adjusted, freq=12, start=c(2010,1))   # create time series
nflx_model <- auto.arima(nflx_ts, ic="aic", seasonal=F)  # model data using auto.arima

Netflix Stock - Create and Plot Forecasts

  • Create forecasts (until April 2025)

    • h = 12 indicates we want to forecast 12 months

    • Most recent date in forecast data is April 1, 2024

    • 12 Months until April 1, 2025

  • Forecasts become less accurate the further into the future you specify.

nflx_forecast <- forecast(nflx_model, h=12) # create forecasts (until April 2025)
nflx_pred_plot <- autoplot(nflx_forecast) + labs(y = "Adjusted Closing Price") +
  theme_classic()
  • Darker purple: 80% Prediction Interval Bounds

  • Lighter purple: 95% Prediction Interval Bounds

  • Plot shows:

    • Lags (p = 2), Differencing (d = 1), Moving Average (q = 2)

Netflix Stock - Forecast Plot

Netflix Stock - Examine Numerical Forecasts

  • Point Forecast is the forecasted estimate for each future time period
  • Lo 80 and Hi 80 are lower and upper bounds for the 80% prediction interval
  • Lo 95 and Hi 95 are lower and upper bounds for the 95% prediction interval
nflx_forecast                # prints out forecast values
         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
May 2024       616.5980 577.8389 655.3570 557.3211 675.8749
Jun 2024       597.3982 540.9623 653.8340 511.0870 683.7094
Jul 2024       575.2199 501.5346 648.9053 462.5280 687.9119
Aug 2024       567.9570 477.2580 658.6559 429.2448 706.6691
Sep 2024       580.7199 475.2167 686.2230 419.3666 742.0731
Oct 2024       604.1698 487.2636 721.0759 425.3773 782.9623
Nov 2024       622.9879 497.5925 748.3833 431.2122 814.7636
Dec 2024       627.2257 494.8187 759.6327 424.7267 829.7247
Jan 2025       618.3484 478.9854 757.7114 405.2111 831.4857
Feb 2025       606.5939 459.5068 753.6811 381.6435 831.5443
Mar 2025       602.7019 447.2250 758.1789 364.9204 840.4835
Apr 2025       610.4861 446.7790 774.1931 360.1177 860.8544

💥 Lecture 25 In-class Exercises - Q2 💥

Session ID: bua345s24

Interpretation of Netflix Prediction Intervals


In February of 2025, the 80% prediction interval width for the Netflix stock price will be $____ wide.

To find this width, subtract the lower bound (Lo 80) from the upper bound (Hi 80) and round to the colosest whole dollar.

How to input your answer:

  • Round to closest whole dollar.

  • Don’t include dollar sign.

Netflix Stock - Examine Residuals and Model Fit

  • Top Plot: Spikes get larger over time

  • ACF: auto-correlation function.

    • Ideally, all or most values are with dashed lines
  • Histogram: Distribution of residuals should be approx. normal

  • Assessment: Stock prices are very volatile and this is sufficient.

checkresiduals(nflx_forecast) # examine residuals


    Ljung-Box test

data:  Residuals from ARIMA(2,1,2) with drift
Q* = 30.854, df = 20, p-value = 0.05715

Model df: 4.   Total lags used: 24

Netflix Stock - Examine Residuals and Model Fit

(acr <- accuracy(nflx_forecast))         # examine model accuracy (fit)
                     ME     RMSE      MAE       MPE     MAPE      MASE
Training set 0.02834461 29.71151 18.48178 -4.428438 12.66922 0.2157602
                    ACF1
Training set -0.02180208
  • For BUA 345: We will use MAPE = Mean Absolute Percent Error

    • 100 – MAPE = Percent accuracy of model.
  • Despite increasing volatility, our stock price model is estimated to be 87.33% accurate.

  • This doesn’t guarantee that forecasts will be 87% accurate but it does improve our chances of accurate forecasting.

Seasonality - Not Just Seasons

  • Seasonal periods can be days, months, seasons, decades, etc.
  • Seasonality: repeating pattern of highs and lows of approx. equal timespans

Seasonality - Not Just Seasons

  • Seasonal periods can be days, months, seasons, decades, etc.
  • Seasonality: repeating pattern of highs and lows of approx. equal timespans

Carbon Dioxide Trends - Monthly - 1958 to Present Day

Seasonality - Not Just Seasons

  • Seasonal periods can be days, months, seasons, decades, etc.
  • Seasonality: repeating pattern of highs and lows of approx. equal timespans

Carbon Dioxide - Monthly - 2015 to Present Day

Seasonality and Trend

Data above are decomposed into these components:

  • Plot shows BOTH

    • Upward Trend
    • Seasonal Pattern
  • Forecasting model is specified to account for both components

  • Forecasting decomposes data into

    • Trend

    • Seasonality

    • Residuals

Seasonal Data: Alaska Electricity Revenue

  • Alaska is very far north so there is

    • summer light (day and night)

    • winter darkness (day and night)

  • Alaska Electricity usage has a strong seasonal pattern.

    • Data are quarterly residential revenues:

      • 1st Qtr of 2001 to 4th Qtr of 2023

ak_res |> glimpse(width=60)  # Alaska Residential Electricity Usage
Rows: 92
Columns: 2
$ Date    <date> 2001-03-31, 2001-06-30, 2001-09-30, 2001-…
$ Revenue <dbl> 542.9275, 424.4111, 394.4869, 529.6425, 57…

Alaska Residential Electricity Time Series

Alaska Residential Time Series

  • Create Time Series and Examine it:
ak_res_ts <- ts(ak_res$Revenue, freq=4, start=c(2001, 1)) # create time series
  • Format of Time Series with Quarters:

    • head(ak_res_ts, 20) shows first 20 observations and format.
head(ak_res_ts, 20)                   # shows time series in ts matrix format
         Qtr1     Qtr2     Qtr3     Qtr4
2001 542.9275 424.4111 394.4869 529.6425
2002 570.5655 439.6120 408.6286 513.4108
2003 571.3253 440.5069 418.7918 556.3850
2004 612.2230 459.5118 430.6615 559.5087
2005 611.2410 447.8371 436.4646 566.1093

💥 Lecture 25 In-class Exercises - Q3 💥

Session ID: bua345s24

If our time series from Alaska were augmented so that it started in February of 1990 (2nd month) and we had data by month (12 observations per year), how would our ts command change in R?

Hint: Our current data, ak_res are quarterly, and begin in the first quarter of 2001. The command we used to create time series is:

ts(ak_res$Revenue, freq=4, start=c(2001,1))

  1. ts(ak_res$Revenue, freq=1, start=c(1, 1990))

  2. ts(ak_res$Revenue, freq=4, start=c(2, 1990))

  3. ts(ak_res$Revenue, freq=12, start=c(1990, 2))

  4. ts(ak_res$Revenue, freq=12, start=c(2, 1990))

  5. ts(ak_res$Revenue, freq=4, start=c(1, 1990))

Alaska Residential Electricity Time Series

Incorrect Model: Ignores Seasonality (seasonal = F)

  • Notice how wide prediction intervals are.
  • Model only optimizes p, d, and q for full time series (0,0,4).
ak_res_forecast1 <- ak_res_ts |> 
  auto.arima(ic="aic", seasonal=F) |>
  forecast(h=4)
(autoplot(ak_res_forecast1) + labs(y = "AK Resid. Elec. Revenue") + theme_classic())

Alaska Residential Electricity Time Series

Correct Model: Includes Seasonality (seasonal = T)

  • Prediction intervals are MUCH more narrow
  • Optimizes p, d and q for full time series (2,0,0) and within season (2,1,1).
  • Indicates number of time periods within season, [4]
ak_res_forecast2 <- ak_res_ts |> 
  auto.arima(ic="aic", seasonal=T) |>
  forecast(h=4) 
(autoplot(ak_res_forecast2) + labs(y = "AK Resid. Elec. Revenue") + theme_classic())

💥 Lecture 25 In-class Exercises - Q4 💥

Session ID: bua345s24

Our data is quarterly and has four observations per year ending in the 4th quarter of 2023.

If the state of Alaska wants to extend the forecast until the Fall of 2025 (3rd Quarter), how would they change the R command?

Hint: Current forecast extends until the 4th quarter of 2023 and command is written as:

forecast(ak_res_model2, h=4)

  1. forecast(ak_res_model2, h=6)

  2. forecast(ak_res_model2, h=7)

  3. forecast(ak_res_model2, h=8)

  4. forecast(ak_res_model2, h=9)

  5. forecast(ak_res_model2, h=10)

Incorrect Model: Less precise

Year Qtr Pt Lo95 Hi95
2024 1 560.67 483.24 638.09
2024 2 502.41 423.87 580.94
2024 3 501.34 401.84 600.84
2024 4 500.90 400.34 601.45

Q4 Width = Hi - Lo = $101

Correct Model: More precise

Year Qtr Pt Lo95 Hi95
2024 1 582.31 553.21 611.41
2024 2 441.45 408.40 474.51
2024 3 425.77 390.74 460.79
2024 4 556.41 520.50 592.32

Q4 Width = Hi - Lo = $72

Prediction Bands Indicate Model Precision

  • Prediction bands are MUCH narrower when seasonality is accounted for.

Incorrect Model Forecasts and Prediction Bounds

        Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2024 Q1       560.6661 510.0430 611.2892 483.2448 638.0874
2024 Q2       502.4057 451.0553 553.7561 423.8720 580.9394
2024 Q3       501.3386 436.2777 566.3995 401.8365 600.8407
2024 Q4       500.8975 435.1466 566.6484 400.3402 601.4548

Correct Model Forecasts and Prediction Bounds

        Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2024 Q1       582.3095 563.2840 601.3350 553.2126 611.4064
2024 Q2       441.4524 419.8389 463.0659 408.3974 474.5074
2024 Q3       425.7673 402.8662 448.6684 390.7431 460.7916
2024 Q4       556.4087 532.9286 579.8889 520.4990 592.3185
  • Interpretation of 95% Prediction Bounds:

  • We are 95% certain that 4th qtr. revenue in 2024 will fall within:

    • Incorrect model range: $601.45 - $400.34 = $201
    • Correct model range: $592.32 - $520.50 = $72

Comparison of Model Residuals

Incorrect Model:

  • Residuals MUCH larger
  • Are highly correlated
    • See ACF plot

Correct Model:

  • Residuals MUCH smaller
  • Auto-correlation assumption is met
    • ACF plot: lags in range

Incorrect Model:

(acr1 <- accuracy(ak_res_forecast1))   # examine MAPE and model percent accuracy
                    ME     RMSE      MAE        MPE    MAPE     MASE       ACF1
Training set 0.3758477 38.41288 31.54499 -0.8532147 6.18953 2.364354 0.04042622
  • The incorrect model’s percent accuracy is 93.8%. - Better than expected.

Correct Model:

(acr2 <- accuracy(ak_res_forecast2))   # examine MAPE and model percent accuracy
                     ME     RMSE      MAE         MPE     MAPE      MASE
Training set -0.2292964 14.01561 10.46793 -0.07894875 1.962027 0.7845902
                     ACF1
Training set -0.004452267
  • The correct model’s percent accuracy is 98%.

  • Always plot data, but if seasonality is difficult to discern, run both models and compare them.

  • Residuals (previous slide) and model accuracy (this slide) of models will indicate which model is correct.

Key Points from Today

  • R forecast package - simplifies forecasting**

    • Know terminology and how to read and interpret output.
  • Plot data FIRST: - Check for seasonality, trend, other patterns

  • HW 10 covers Lectures 23-25 (Due Mon. 4/22)

  • Lecture 26 (Thu. 4/18) - Optional - Students will learn to download stock or other time series data and create interactive displays, and forecasts

  • April 23rd - No Lecture

  • Lecture 28 (Thu. 4/25) - 20 min. of lecture with Point Solutions, then Q&A

    • Come with questions!
  • Evaluations are VERY Important: coursefeedback.syr.edu

To submit an Engagement Question or Comment about material from Today’s Lecture: Submit by midnight today (day of lecture). Click on Link next to the under today’s lecture.