Introduction to Forecasting
2025-04-16
HW 9 is due on Wednesday, 4/16.
HW 10 is now posted and is due on Monday 4/28.
Additional Practice Questions will be posted next week.
Course Review on 4/24.
Introduction to Forecasting
Cross-Sectional Data vs. Time Series Data
Basic Forecasting Terminology
Forecasting Trends without Seasonality in R
Example 1 - US Population
Example 2 - Netflix Stock Prices
NEW PACKAGE FOR FORECASTING: forecast
HW 10 is now posted and is due on 4/28
Part of HW 10 pertains to today’s lecture.
Demo videos for HW 10 will be posted this weekend.
In-class Polling (Session ID: bua345s25)
Review Question from Linear Regression Modeling:
We have data for 2024 annual salaries of 75 Upstate NY residents ranging from $50K
to $150K
and we use that data to model how much someone spends on their first house.
Is it valid to apply that model to someone with an annual salary of $350K
?
A. Yes, this is extrapolation and it is valid.
B. No this is extrapolation and it is invalid.
C. Yes, this is interpolation and it is valid.
D. No this is interpolation and it is invalid.
Shows a Snapshot of One Time Period
Shows Trend over Time
Population by County in 2019
In time series data, new observations are often correlated with prior observations
This is referred to as auto-correlation
A variable is correlated with itself
When data are auto-correlated, we use that information
This process is called auto-regression
R function: auto.arima
function in forecast
package
ARIMA is an acronym:
AR: auto-regressive
I: integrated
MA: moving average
In ARIMA models, all three components are optimized to provide a reliable forecast.
Auto-Regressive Models (AR)
Similar to a simple linear regression model or non-linear regression model
Key difference: Regressor or predictor variable (X) is dependent variable (Y) with a specific LAG
Lag (p) is how many previous time periods the model looks back to estimate the next time period.
If p = 1, the model estimates the next time period based on most recent one.
If p = 2, the model estimates the next time period on time period BEFORE the most recent one.
Differencing (I = Integration)
Stationarity: mean and variance of data are consistent over timespan
needed for accurate modeling
Can be verified by examining residuals
Differencing transforms non-stationary data to stationary
Differencing order (d) determined by model:
if d = 1: each obs. is difference from previous one (linear)
if d = 2: each obs. is difference of difference from previous one (quadratic)
Moving Average (MA)
Moving average (q): how many terms are incorporated into each average within the data.
Algorithm calculates the average for a specific number of lagged terms
Moving Averages smooths out temporary instability in the data
If q = 1: moving average is average of current term with the one from the previous time period.
If q = 2:, moving average is average of the current term with the ones from two previous time periods.
Population Trend Forecast
Create time series using population data
Specify freq = 1
- one observation per year
Specify start = 1950
- first year in dataset
Model data using auto.arima
function
Specify ic = aic
- aic
is the information criterion used to determine model.
Specify seasonality = F
- no seasonal (repeating) pattern in the data.
These commands will create and save the model:
Create forecasts (until 2040)
h = 16
indicates we want to forecast 16 years
Most recent year in our data is 2024
Forecasts become less accurate the further into the future you specify.
p = 2
), Differencing (d = 1
), Moving Average (q = 1
) Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2025 343.6569 343.4946 343.8193 343.4086 343.9052
2026 345.5929 345.0776 346.1082 344.8048 346.3809
2027 347.6223 346.6686 348.5760 346.1638 349.0808
2028 349.7332 348.2972 351.1691 347.5371 351.9293
2029 351.9131 349.9744 353.8517 348.9481 354.8780
2030 354.1509 351.7028 356.5991 350.4068 357.8950
2031 356.4375 353.4816 359.3933 351.9169 360.9581
2032 358.7648 355.3083 362.2214 353.4785 364.0512
2033 361.1264 357.1795 365.0734 355.0901 367.1628
2034 363.5167 359.0917 367.9418 356.7492 370.2843
2035 365.9311 361.0414 370.8209 358.4529 373.4094
2036 368.3657 363.0251 373.7064 360.1980 376.5335
2037 370.8173 365.0398 376.5947 361.9814 379.6531
2038 373.2830 367.0826 379.4835 363.8003 382.7658
2039 375.7607 369.1507 382.3707 365.6516 385.8698
2040 378.2483 371.2419 385.2548 367.5328 388.9639
Session ID: bua345s25
Based on the US Population forecast output, we are 95% certain that U.S population in 2030 will be less than ______
million people?
How to input your answer:
Round to closest million (whole number)
If the answer were 123 million (e.g. 123.4233), you would enter 123.
Top Plot: No spikes should be too large
ACF: auto-correlation function.
Histogram: Distribution of residuals should be approx. normal
Assessment: Trend is very smooth so small aberrations are exaggerated in residuals.
ME RMSE MAE MPE MAPE MASE
Training set 0.003495822 0.1223609 0.08692589 0.003017647 0.03838409 0.03323733
ACF1
Training set 0.007946428
Many options for comparing models
For BUA 345: We will use MAPE = Mean Absolute Percent Error
Despite outlier and one relatively large ACF value, our population model is estimated to be 99.96% accurate.
This doesn’t guarantee that forecasts will be 100% accurate but it does improve our chances of accurate forecasting.
Data from Yahoo Finance
Forecast Questions:
What will be the estimated stock price be in April of 2026?
What ARIMA model was chosen (p,d,q)?
Model Assessment Questions:
How valid is our model?
How are accurate are our estimates?
Examine Prediction Intervals and Prediction Bands
Check fit statistics
Stock Trend Forecast
Create time series using Netflix Stock data
Specify freq = 12
- 12 observations per year
Specify start = c(2010, 1)
- first obs. in dataset is January 2010
Model data using auto.arima
function
Specify ic = aic
- aic
is the information criterion used to determine model.
Specify seasonality = F
- no seasonal (repeating) pattern in the data.
This code will create and save the model:
Create forecasts (until April 2026)
h = 12
indicates we want to forecast 12 months
Most recent date in forecast data is April 1, 2025
12 Months until April 1, 2026
Forecasts become less accurate the further into the future you specify.
Darker purple: 80% Prediction Interval Bounds
Lighter purple: 95% Prediction Interval Bounds
Plot shows:
p = 0
), Differencing (d = 1
), Moving Average (q = 3
) Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
May 2025 931.5097 887.9080 975.1115 864.8266 998.1929
Jun 2025 917.5952 854.4637 980.7266 821.0439 1014.1464
Jul 2025 909.0272 825.7946 992.2599 781.7339 1036.3206
Aug 2025 909.0272 805.4485 1012.6060 750.6172 1067.4372
Sep 2025 909.0272 788.4891 1029.5653 724.6801 1093.3744
Oct 2025 909.0272 773.6377 1044.4167 701.9668 1116.0876
Nov 2025 909.0272 760.2616 1057.7928 681.5099 1136.5446
Dec 2025 909.0272 747.9928 1070.0616 662.7463 1155.3081
Jan 2026 909.0272 736.5947 1081.4597 645.3145 1172.7400
Feb 2026 909.0272 725.9047 1092.1497 628.9655 1189.0889
Mar 2026 909.0272 715.8053 1102.2492 613.5197 1204.5347
Apr 2026 909.0272 706.2081 1111.8464 598.8421 1219.2124
Session ID: bua345s25
Interpretation of Netflix Prediction Intervals
In January of 2026, the Netflix stock price is forecasted to be approximately $909. However the 95% prediction interval indicates it may be as low as ____
.
How to input your answer:
Round to closest whole dollar.
Don’t include dollar sign.
Top Plot: Spikes get larger over time
ACF: auto-correlation function.
Histogram: Distribution of residuals should be approx. normal
Assessment: Stock prices are very volatile and this is sufficient.
ME RMSE MAE MPE MAPE MASE ACF1
Training set 3.464136 33.6508 21.59716 1.309471 10.99867 0.213375 -0.006420417
Many options for comparing models
For BUA 345: We will use MAPE = Mean Absolute Percent Error
Despite increasing volatility, our stock price model is estimated to be 89% accurate.
This doesn’t guarantee that forecasts will be 89% accurate but it does improve our chances of accurate forecasting.
forecast
package in R simplifies forecasting.
Extrapolation OK in this case
You should know terminology and how to read and interpret output.
You will be given data, R code, and output
You will answer questions based on provided output.
HW 10 includes material from Lectures 24-26
HW 9 is due on 4/16.
Including today, there are four lectures and engagement questions remaining.
To submit an Engagement Question or Comment about material from Lecture 25: Submit it by midnight today (day of lecture).