2024-04-15
HW 9 was due on Monday, 4/15.
HW 10 is posted and due Monday, 4/22
Grace period for HW 10 extended until Thu. 4/25 at midnight.
Lecture 26 on Thu. 4/18 is Optional.
No Lecture on 4/23
Course Review on 4/25
NEW PACKAGE FOR FORECASTING: forecast
Evaluations are VERY Important:
I will end class 5 min. early today and next Thursday to give you time to complete evaluations in class.
Please complete evaluations for ALL courses.
Session ID: bua345s24
The AR
in ARIMA stands a type of regression when you regress a variable on itself by using previous observations to predict future ones.
This is known as ___
-regression.
Review of Time Series Concepts
Brief Review of Time Series without Seasonality
Seasonality in Time Series Data
Forecasting Trends with Seasonality in R
Shows a Snapshot of One Time Period
Shows Trend over Time
auto-correlation: A variable is correlated with itself
auto-regression (AR): Using previous observations to predict into the future.
R function: auto.arima
- ARIMA is an acronym:
AR: auto-regressive - p
= number of lags to minimize auto-correlation
I: integrated - d
= order of differencing to achieve stationarity
MA: moving average - q
= number of terms in moving average
All 3 components are optimized to provide a reliable forecast.
Stationary Time Series:
Consistent mean and variance throughout time series
Time series with trends, or with seasonality, are not stationary.
Separating a time series into different parts is how we analyze it
This is called DECOMPOSITION
Time Series Modeling decomposes the data into:
Trend
Seasonality (repeated pattern)
Residuals (what’s left over)
NEW TERM: SARIMA MODEL
Lecture 24: ARIMA models
Today: ARIMA models with SEASONAL component.
SARIMA: Seasonal Auto-Regressive Integrated Moving Average.
SARIMA models:
optimize p
, d
, and q
for whole time series
Also optimize p
, d
, and q
within season (repeating intervals)
DECOMPOSITION
ARIMA models are decomposed into
SARIMA models are decomposed into
ARIMA:
2nd Plot looks similar to Population Time Series
ARIMA decomposes trend into:
SARIMA:
Plot 1: Time Series with a seasonal pattern.
SARIMA decomposes trend into:
Dashed lines show peaks at irregular intervals.
Forecast Questions:
What will be the estimated stock price be in April of 2025?
What ARIMA model was chosen (p,d,q)?
Model Assessment Questions:
How valid is our model?
How are accurate are our estimates?
Examine Prediction Intervals and Prediction Bands
Check fit statistics
Stock Trend Forecast
Creat time series using Netflix Stock data
Specify freq = 12
- 12 observations per year
Specify start = c(2010, 1)
- first obs. in dataset is January 2010
Model data using auto.arima
function
Specify ic = aic
- aic
is the information criterion used to determine model.
Specify seasonality = F
- no seasonal (repeating) pattern in the data.
This chunk will create and save the model.
Create forecasts (until April 2025)
h = 12
indicates we want to forecast 12 months
Most recent date in forecast data is April 1, 2024
12 Months until April 1, 2025
Forecasts become less accurate the further into the future you specify.
Darker purple: 80% Prediction Interval Bounds
Lighter purple: 95% Prediction Interval Bounds
Plot shows:
p = 2
), Differencing (d = 1
), Moving Average (q = 2
) Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
May 2024 616.5980 577.8389 655.3570 557.3211 675.8749
Jun 2024 597.3982 540.9623 653.8340 511.0870 683.7094
Jul 2024 575.2199 501.5346 648.9053 462.5280 687.9119
Aug 2024 567.9570 477.2580 658.6559 429.2448 706.6691
Sep 2024 580.7199 475.2167 686.2230 419.3666 742.0731
Oct 2024 604.1698 487.2636 721.0759 425.3773 782.9623
Nov 2024 622.9879 497.5925 748.3833 431.2122 814.7636
Dec 2024 627.2257 494.8187 759.6327 424.7267 829.7247
Jan 2025 618.3484 478.9854 757.7114 405.2111 831.4857
Feb 2025 606.5939 459.5068 753.6811 381.6435 831.5443
Mar 2025 602.7019 447.2250 758.1789 364.9204 840.4835
Apr 2025 610.4861 446.7790 774.1931 360.1177 860.8544
Session ID: bua345s24
Interpretation of Netflix Prediction Intervals
In February of 2025, the 80% prediction interval width for the Netflix stock price will be $____
wide.
To find this width, subtract the lower bound (Lo 80
) from the upper bound (Hi 80
) and round to the colosest whole dollar.
How to input your answer:
Round to closest whole dollar.
Don’t include dollar sign.
Top Plot: Spikes get larger over time
ACF: auto-correlation function.
Histogram: Distribution of residuals should be approx. normal
Assessment: Stock prices are very volatile and this is sufficient.
For BUA 345: We will use MAPE = Mean Absolute Percent Error
Despite increasing volatility, our stock price model is estimated to be 87.33% accurate.
This doesn’t guarantee that forecasts will be 87% accurate but it does improve our chances of accurate forecasting.
Carbon Dioxide Trends - Monthly - 1958 to Present Day
Carbon Dioxide - Monthly - 2015 to Present Day
Data above are decomposed into these components:
Plot shows BOTH
Forecasting model is specified to account for both components
Forecasting decomposes data into
Trend
Seasonality
Residuals
Alaska is very far north so there is
summer light (day and night)
winter darkness (day and night)
Alaska Electricity usage has a strong seasonal pattern.
Data are quarterly residential revenues:
Format of Time Series with Quarters:
head(ak_res_ts, 20)
shows first 20 observations and format.Session ID: bua345s24
If our time series from Alaska were augmented so that it started in February of 1990 (2nd month) and we had data by month (12 observations per year), how would our ts
command change in R?
Hint: Our current data, ak_res
are quarterly, and begin in the first quarter of 2001. The command we used to create time series is:
ts(ak_res$Revenue, freq=4, start=c(2001,1))
ts(ak_res$Revenue, freq=1, start=c(1, 1990))
ts(ak_res$Revenue, freq=4, start=c(2, 1990))
ts(ak_res$Revenue, freq=12, start=c(1990, 2))
ts(ak_res$Revenue, freq=12, start=c(2, 1990))
ts(ak_res$Revenue, freq=4, start=c(1, 1990))
Incorrect Model: Ignores Seasonality (seasonal = F
)
p
, d
, and q
for full time series (0,0,4)
.Correct Model: Includes Seasonality (seasonal = T
)
p
, d
and q
for full time series (2,0,0)
and within season (2,1,1)
.[4]
Session ID: bua345s24
Our data is quarterly and has four observations per year ending in the 4th quarter of 2023.
If the state of Alaska wants to extend the forecast until the Fall of 2025 (3rd Quarter), how would they change the R command?
Hint: Current forecast extends until the 4th quarter of 2023 and command is written as:
forecast(ak_res_model2, h=4)
forecast(ak_res_model2, h=6)
forecast(ak_res_model2, h=7)
forecast(ak_res_model2, h=8)
forecast(ak_res_model2, h=9)
forecast(ak_res_model2, h=10)
Incorrect Model: Less precise
Year | Qtr | Pt | Lo95 | Hi95 |
---|---|---|---|---|
2024 | 1 | 560.67 | 483.24 | 638.09 |
2024 | 2 | 502.41 | 423.87 | 580.94 |
2024 | 3 | 501.34 | 401.84 | 600.84 |
2024 | 4 | 500.90 | 400.34 | 601.45 |
Q4 Width = Hi - Lo = $101
Correct Model: More precise
Year | Qtr | Pt | Lo95 | Hi95 |
---|---|---|---|---|
2024 | 1 | 582.31 | 553.21 | 611.41 |
2024 | 2 | 441.45 | 408.40 | 474.51 |
2024 | 3 | 425.77 | 390.74 | 460.79 |
2024 | 4 | 556.41 | 520.50 | 592.32 |
Q4 Width = Hi - Lo = $72
Incorrect Model Forecasts and Prediction Bounds
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2024 Q1 560.6661 510.0430 611.2892 483.2448 638.0874
2024 Q2 502.4057 451.0553 553.7561 423.8720 580.9394
2024 Q3 501.3386 436.2777 566.3995 401.8365 600.8407
2024 Q4 500.8975 435.1466 566.6484 400.3402 601.4548
Correct Model Forecasts and Prediction Bounds
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2024 Q1 582.3095 563.2840 601.3350 553.2126 611.4064
2024 Q2 441.4524 419.8389 463.0659 408.3974 474.5074
2024 Q3 425.7673 402.8662 448.6684 390.7431 460.7916
2024 Q4 556.4087 532.9286 579.8889 520.4990 592.3185
Interpretation of 95% Prediction Bounds:
We are 95% certain that 4th qtr. revenue in 2024 will fall within:
$601.45 - $400.34 = $201
$592.32 - $520.50 = $72
Incorrect Model:
Correct Model:
Incorrect Model:
Correct Model:
The correct model’s percent accuracy is 98%.
Always plot data, but if seasonality is difficult to discern, run both models and compare them.
Residuals (previous slide) and model accuracy (this slide) of models will indicate which model is correct.
R forecast
package - simplifies forecasting**
Plot data FIRST: - Check for seasonality, trend, other patterns
HW 10 covers Lectures 23-25 (Due Mon. 4/22)
Lecture 26 (Thu. 4/18) - Optional - Students will learn to download stock or other time series data and create interactive displays, and forecasts
April 23rd - No Lecture
Lecture 28 (Thu. 4/25) - 20 min. of lecture with Point Solutions, then Q&A
Evaluations are VERY Important: coursefeedback.syr.edu
To submit an Engagement Question or Comment about material from Today’s Lecture: Submit by midnight today (day of lecture). Click on Link next to the ❓ under today’s lecture.