BUA 345 - Lecture 25
Introduction to Forecasting
Housekeeping
Upcoming Dates
HW 9 is due on Wednesday, 4/15.
- Grace Period ends Thursday (4/16) at 10:00 PM.
HW 10 is now posted and is due on Monday 4/27.
OPTIONAL GitHub Quarto Dashboard Workshop on Fri., 4/17, at 3:15 PM.
- If you are interested in attending, please sign up using the Google Form.
- Space is limited on Friday so that I can help individual students troubleshoot this process.
Additional Practice Questions will be posted next week.
Course Review on 4/23.
Today’s plan
Introduction to Forecasting
Cross-Sectional Data vs. Time Series Data
Basic Forecasting Terminology
Forecasting Trends without Seasonality in R
Example 1 - US Population
Example 2 - Netflix Stock Prices
NEW PACKAGE FOR FORECASTING:
forecastPart of HW 10 pertains to today’s lecture.
Demo videos for HW 10 will be posted this weekend.
Lecture 25 In-class Exercises - Q1
Poll Everywhere - My User Name: penelopepoolereisenbies685
Review Question from Linear Regression Modeling:
We have 2025 data for annual salaries of 75 Upstate NY residents ranging from $50K to $150K and we use that data to model how much someone spends on their first house.
Is it valid to apply that model to someone with an annual salary of $350K?
Yes, this is extrapolation and it is valid.
No this is extrapolation and it is invalid.
Yes, this is interpolation and it is valid.
No this is interpolation and it is invalid.
Cross-Sectional Data
Shows a Snapshot of One Time Period
Time Series Data
Shows Trend over Time
U.S. Population - Cross-Sectional Data
Population by County in 2019
U.S. Population - Time Series Data
Time Series Terminology
In time series data, new observations are often correlated with prior observations
This is referred to as auto-correlation
A variable is correlated with itself
When data are auto-correlated, we use that information
This process is called auto-regression
- Using previous observations to predict into the future.
Introductory Time Series R Functions
R function:
auto.arimafunction inforecastpackageARIMA is an acronym:
AR: auto-regressive
I: integrated
MA: moving average
In ARIMA models, all three components are optimized to provide a reliable forecast.
Terminology: ARIMA model components (p, d, q)
Auto-Regressive Models (AR)
Similar to a simple linear regression model or non-linear regression model
Key difference: Regressor or predictor variable (X) is dependent variable (Y) with a specific LAG
Lag (p) is how many previous time periods the model looks back to estimate the next time period.
If p = 1, the model estimates the next time period based on most recent one.
- Looks back one time period
If p = 2, the model estimates the next time period on time period BEFORE the most recent one.
- Looks back two time periods
Terminology: ARIMA model components (p, d, q)
Differencing (I = Integration)
Stationarity: mean and variance of data are consistent over timespan
needed for accurate modeling
Can be verified by examining residuals
Differencing transforms non-stationary data to stationary
Differencing order (d) determined by model:
if d = 1: each obs. is difference from previous one (linear)
if d = 2: each obs. is difference of difference from previous one (quadratic)
Terminology: ARIMA model components (p, d, q)
Moving Average (MA)
Moving average (q): how many terms are incorporated into each average within the data.
Algorithm calculates the average for a specific number of lagged terms
Moving Averages smooths out temporary instability in the data
If q = 1: moving average is average of current term with the one from the previous time period.
If q = 2:, moving average is average of the current term with the ones from two previous time periods.
Example 1: U.S. Population - 1950 to Present
U.S. Population - Interactive Plot
U.S. Population - Modeling Time Series Data
Population Trend Forecast
Create time series using population data
Specify
freq = 1- one observation per yearSpecify
start = 1950- first year in dataset
Model data using
auto.arimafunctionSpecify
ic = aic-aicis the information criterion used to determine model.Specify
seasonality = F- no seasonal (repeating) pattern in the data.
These commands will create and save the model:
U.S. Population - Create and Plot Forecasts
Create forecasts (until 2041)
h = 16indicates we want to forecast 16 yearsMost recent year in our data is 2025
- 2041 - 2025 = 16
Forecasts become less accurate the further into the future you specify.
- Darker purple: 80% Prediction Interval Bounds
- Lighter purple: 95% Prediction Interval Bounds
- Plot shows:
- Lags (p = 2), Differencing (d = 1), Moving Average (q = 1)
U.S. Population - Forecast Plot
U.S. Population - Examine Numerical Forecasts
- Point Forecast is the forecasted estimate for each future time period
- Lo 80 and Hi 80 are lower and upper bounds for the 80% prediction interval
- Lo 95 and Hi 95 are lower and upper bounds for the 95% prediction interval
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2026 345.4317 345.2705 345.5929 345.1851 345.6783
2027 347.3536 346.8411 347.8660 346.5699 348.1373
2028 349.3666 348.4173 350.3159 347.9148 350.8184
2029 351.4591 350.0280 352.8901 349.2705 353.6476
2030 353.6191 351.6845 355.5537 350.6604 356.5778
2031 355.8363 353.3899 358.2827 352.0949 359.5778
2032 358.1019 355.1441 361.0597 353.5783 362.6255
2033 360.4083 356.9447 363.8718 355.1112 365.7053
2034 362.7491 358.7891 366.7091 356.6927 368.8055
2035 365.1191 360.6739 369.5643 358.3207 371.9174
2036 367.5136 362.5959 372.4314 359.9926 375.0347
2037 369.9289 364.5519 375.3060 361.7055 378.1524
2038 372.3618 366.5390 378.1846 363.4566 381.2670
2039 374.8095 368.5544 381.0646 365.2431 384.3759
2040 377.2697 370.5955 383.9439 367.0623 387.4770
2041 379.7405 372.6599 386.8210 368.9117 390.5692
Lecture 25 In-class Exercises - Q2
Poll Everywhere - My User Name: penelopepoolereisenbies685
Based on the US Population forecast output, we are 95% certain that U.S population in 2030 will be less than ______ million people?
Round to closest million (whole number), e.g. 345 million.
U.S. Population - Examine Residuals and Model Fit
Top Plot: No spikes should be too large
- One obs. should be checked.
ACF: auto-correlation function.
- Ideally, most values fall within lines
Histogram: Distribution of residuals should be approx. normal
- One large low outlier
Assessment: Trend is very smooth so small aberrations are exaggerated in residuals.
U.S. Population - Examine Residuals and Model Fit
ME RMSE MAE MPE MAPE MASE
Training set 0.003349508 0.1215547 0.08631584 0.00303556 0.03801874 0.03314369
ACF1
Training set 0.009871574
Many options for comparing models
For BUA 345: We will use MAPE = Mean Absolute Percent Error
- (100 – MAPE) = Percent accuracy of model.
Despite outlier and one relatively large ACF value, our population model is estimated to be 99.96% accurate.
This doesn’t guarantee that forecasts will be almost 100% accurate but it does improve our chances of accurate forecasting.
Example 2: Netflix Stock Prices
Data from Yahoo Finance
Netflix Stock
- Was mostly trending upward, but had a downturn and then another recent upturn.
- Data shown are daily adjusted closing value
- For analysis, we will use are monthly adjusted close (1st day of trading for each month)
Netflix Stock
Forecast Questions:
What will be the estimated stock price be in April of 2027?
What ARIMA model was chosen (p,d,q)?
Model Assessment Questions:
How valid is our model?
- Check residual plots.
How are accurate are our estimates?
Examine Prediction Intervals and Prediction Bands
Check fit statistics
Netflix Stock - Modeling Time Series Data
Stock Trend Forecast
Create time series using Netflix Stock data
Specify
freq = 12- 12 observations per yearSpecify
start = c(2010, 1)- first obs. in dataset is January 2010
Model data using
auto.arimafunctionSpecify
ic = aic-aicis the information criterion used to determine model.Specify
seasonality = F- no seasonal (repeating) pattern in the data.
This code will create and save the model:
Netflix Stock - Create and Plot Forecasts
Create forecasts (until April 2027)
h = 12indicates we want to forecast 12 monthsMost recent date in forecast data is April 1, 2026
12 Months until April 1, 2027
Forecasts become less accurate the further into the future you specify.
Darker purple: 80% Prediction Interval Bounds
Lighter purple: 95% Prediction Interval Bounds
Plot shows:
- Lags (p = 2), Differencing (d = 1), Moving Average (q = 2)
Netflix Stock - Forecast Plot
Netflix Stock - Examine Numerical Forecasts
- Point Forecast is the forecasted estimate for each future time period
- Lo 80 and Hi 80 are lower and upper bounds for the 80% prediction interval
- Lo 95 and Hi 95 are lower and upper bounds for the 95% prediction interval
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
May 2026 96.28770 90.87356 101.7018 88.00749 104.5679
Jun 2026 97.88642 90.43508 105.3378 86.49058 109.2823
Jul 2026 99.36907 90.06138 108.6768 85.13418 113.6040
Aug 2026 100.03050 88.78655 111.2745 82.83436 117.2266
Sep 2026 99.90374 86.80360 113.0039 79.86881 119.9387
Oct 2026 99.59476 84.93684 114.2527 77.17741 122.0121
Nov 2026 99.72772 83.85743 115.5980 75.45620 123.9992
Dec 2026 100.48277 83.62884 117.3367 74.70690 126.2586
Jan 2027 101.54850 83.77874 119.3183 74.37199 128.7250
Feb 2027 102.44464 83.71553 121.1737 73.80094 131.0883
Mar 2027 102.90728 83.15997 122.6546 72.70638 133.1082
Apr 2027 103.04149 82.28462 123.7984 71.29660 134.7864
Lecture 25 In-class Exercises - Q3
Poll Everywhere - My User Name: penelopepoolereisenbies685
Interpretation of Netflix Prediction Intervals
In February of 2027, the Netflix stock price is forecasted to be approximately $102. However the 95% prediction interval indicates it may be as low as ____.
- Round your answer to the closest whole dollar.
Netflix Stock - Examine Residuals and Model Fit
Top Plot: Spikes get larger over time
ACF: auto-correlation function.
- Ideally, all or most values are with dashed lines
Histogram: Distribution of residuals should be approx. normal
Assessment: Stock prices are very volatile and this is sufficient.
Netflix Stock - Examine Residuals and Model Fit
ME RMSE MAE MPE MAPE MASE
Training set 0.0007043361 4.159505 2.555861 -6.764627 13.76645 0.2189833
ACF1
Training set 0.0568807
Many options for comparing models
For BUA 345: We will use MAPE = Mean Absolute Percent Error
- 100 – MAPE = Percent accuracy of model.
Despite increasing volatility, our stock price model is estimated to be ____% accurate. (Next question)
This doesn’t guarantee that forecasts will be this accurate but it does improve our chances of accurate forecasting.
Lecture 25 In-class Exercises - Q4
Poll Everywhere - My User Name: penelopepoolereisenbies685
Based on the Netflix Model Mean Absolute Percent Error, (MAPE), waht is the percent accuracy of our forecast model?
Answer is reported as a percentage with two decimal places.
Key Points from Today
forecastpackage in R simplifies forecasting.Extrapolation OK in this case
- Report uncertainty as prediction bounds
You should know terminology and how to read and interpret output.
You will be given data, R code, and output
You will answer questions based on provided output.
HW 10 includes material from Lectures 24-26
HW 9 is due on 4/15.
To submit an Engagement Question or Comment about material from Lecture 25: Submit it by midnight today (day of lecture).