BUA 345 - Lecture 25

Introduction to Forecasting

Penelope Pooler Eisenbies

2026-04-14

Housekeeping

Upcoming Dates

HW 9 is due on Wednesday, 4/15.
- Grace Period ends Thursday (4/16) at 10:00 PM.
HW 10 is now posted and is due on Monday 4/27.
OPTIONAL GitHub Quarto Dashboard Workshop on Fri., 4/17, at 3:15 PM.
- If you are interested in attending, please sign up using the Google Form.
- Space is limited on Friday so that I can help individual students troubleshoot this process.
Additional Practice Questions will be posted next week.
Course Review on 4/23.

Today’s plan 📋

Introduction to Forecasting

Cross-Sectional Data vs. Time Series Data
Basic Forecasting Terminology
Forecasting Trends without Seasonality in R
- Example 1 - US Population
- Example 2 - Netflix Stock Prices
NEW PACKAGE FOR FORECASTING: forecast
- Part of HW 10 pertains to today’s lecture.
- Demo videos for HW 10 will be posted this weekend.

💥 Lecture 25 In-class Exercises - Q1 💥

Poll Everywhere - My User Name: penelopepoolereisenbies685

Review Question from Linear Regression Modeling:

We have 2025 data for annual salaries of 75 Upstate NY residents ranging from $50K to $150K and we use that data to model how much someone spends on their first house.

Is it valid to apply that model to someone with an annual salary of $350K?

Yes, this is extrapolation and it is valid.
No this is extrapolation and it is invalid.
Yes, this is interpolation and it is valid.
No this is interpolation and it is invalid.

Cross-Sectional Data

Shows a Snapshot of One Time Period

Time Series Data

Shows Trend over Time

U.S. Population - Cross-Sectional Data

Population by County in 2019

U.S. Population - Time Series Data

U.S. Population 1950 - 2025

Time Series Terminology

In time series data, new observations are often correlated with prior observations

This is referred to as auto-correlation
- A variable is correlated with itself
- When data are auto-correlated, we use that information
- This process is called auto-regression
  - Using previous observations to predict into the future.

Introductory Time Series R Functions

R function: auto.arima function in forecast package
- ARIMA is an acronym:
  - AR: auto-regressive
  - I: integrated
  - MA: moving average
In ARIMA models, all three components are optimized to provide a reliable forecast.

Terminology: ARIMA model components (p, d, q)

Auto-Regressive Models (AR)

Similar to a simple linear regression model or non-linear regression model
Key difference: Regressor or predictor variable (X) is dependent variable (Y) with a specific LAG
Lag (p) is how many previous time periods the model looks back to estimate the next time period.
- If p = 1, the model estimates the next time period based on most recent one.
  - Looks back one time period
- If p = 2, the model estimates the next time period on time period BEFORE the most recent one.
  - Looks back two time periods

Terminology: ARIMA model components (p, d, q)

Differencing (I = Integration)

Stationarity: mean and variance of data are consistent over timespan
- needed for accurate modeling
- Can be verified by examining residuals
Differencing transforms non-stationary data to stationary
Differencing order (d) determined by model:
- if d = 1: each obs. is difference from previous one (linear)
- if d = 2: each obs. is difference of difference from previous one (quadratic)

Terminology: ARIMA model components (p, d, q)

Moving Average (MA)

Moving average (q): how many terms are incorporated into each average within the data.
Algorithm calculates the average for a specific number of lagged terms
Moving Averages smooths out temporary instability in the data
- If q = 1: moving average is average of current term with the one from the previous time period.
- If q = 2:, moving average is average of the current term with the ones from two previous time periods.

Example 1: U.S. Population - 1950 to Present

Forecast Questions:
- What will the U.S. Population be in 2041?
- What ARIMA model was chosen (p,d,q)?
Model Assessment Questions:
- How valid is our model?
  - Check residual plots.
- How accurate are our estimates?
  - Examine Prediction Intervals and Prediction Bands
  - Check fit statistics

U.S. Population - Interactive Plot

U.S. Population - Modeling Time Series Data

Population Trend Forecast

Create time series using population data
- Specify freq = 1 - one observation per year
- Specify start = 1950 - first year in dataset
Model data using auto.arima function
- Specify ic = aic - aic is the information criterion used to determine model.
- Specify seasonality = F - no seasonal (repeating) pattern in the data.
These commands will create and save the model:

pop_ts <- ts(uspop$popM, freq=1, start=1950)            # create time series
pop_model <- auto.arima(pop_ts, ic="aic", seasonal=F)   ## model data using auto.arima

U.S. Population - Create and Plot Forecasts

Create forecasts (until 2041)

h = 16 indicates we want to forecast 16 years
Most recent year in our data is 2025
- 2041 - 2025 = 16
Forecasts become less accurate the further into the future you specify.

pop_forecast <- forecast(pop_model, h=16) # create forecasts (until 2041)
uspop_pred_plot <- autoplot(pop_forecast) + 
  labs(y = "U.S. Population (Millions)") +
  theme_classic()

Darker purple: 80% Prediction Interval Bounds
Lighter purple: 95% Prediction Interval Bounds
Plot shows:
- Lags (p = 2), Differencing (d = 1), Moving Average (q = 1)

U.S. Population - Forecast Plot

U.S. Population - Examine Numerical Forecasts

Point Forecast is the forecasted estimate for each future time period
Lo 80 and Hi 80 are lower and upper bounds for the 80% prediction interval
Lo 95 and Hi 95 are lower and upper bounds for the 95% prediction interval

     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2026       345.4317 345.2705 345.5929 345.1851 345.6783
2027       347.3536 346.8411 347.8660 346.5699 348.1373
2028       349.3666 348.4173 350.3159 347.9148 350.8184
2029       351.4591 350.0280 352.8901 349.2705 353.6476
2030       353.6191 351.6845 355.5537 350.6604 356.5778
2031       355.8363 353.3899 358.2827 352.0949 359.5778
2032       358.1019 355.1441 361.0597 353.5783 362.6255
2033       360.4083 356.9447 363.8718 355.1112 365.7053
2034       362.7491 358.7891 366.7091 356.6927 368.8055
2035       365.1191 360.6739 369.5643 358.3207 371.9174
2036       367.5136 362.5959 372.4314 359.9926 375.0347
2037       369.9289 364.5519 375.3060 361.7055 378.1524
2038       372.3618 366.5390 378.1846 363.4566 381.2670
2039       374.8095 368.5544 381.0646 365.2431 384.3759
2040       377.2697 370.5955 383.9439 367.0623 387.4770
2041       379.7405 372.6599 386.8210 368.9117 390.5692

💥 Lecture 25 In-class Exercises - Q2 💥

Poll Everywhere - My User Name: penelopepoolereisenbies685

Based on the US Population forecast output, we are 95% certain that U.S population in 2030 will be less than ______ million people?

Round to closest million (whole number), e.g. 345 million.

U.S. Population - Examine Residuals and Model Fit

Top Plot: No spikes should be too large
- One obs. should be checked.
ACF: auto-correlation function.
- Ideally, most values fall within lines
Histogram: Distribution of residuals should be approx. normal
- One large low outlier
Assessment: Trend is very smooth so small aberrations are exaggerated in residuals.


    Ljung-Box test

data:  Residuals from ARIMA(2,1,1) with drift
Q* = 3.4469, df = 7, p-value = 0.8408

Model df: 3.   Total lags used: 10

U.S. Population - Examine Residuals and Model Fit

                      ME      RMSE        MAE        MPE       MAPE       MASE
Training set 0.003349508 0.1215547 0.08631584 0.00303556 0.03801874 0.03314369
                    ACF1
Training set 0.009871574

Many options for comparing models
For BUA 345: We will use MAPE = Mean Absolute Percent Error
- (100 – MAPE) = Percent accuracy of model.
Despite outlier and one relatively large ACF value, our population model is estimated to be 99.96% accurate.
This doesn’t guarantee that forecasts will be almost 100% accurate but it does improve our chances of accurate forecasting.

Example 2: Netflix Stock Prices

Data from Yahoo Finance

Netflix Stock

Was mostly trending upward, but had a downturn and then another recent upturn.
Data shown are daily adjusted closing value
For analysis, we will use are monthly adjusted close (1st day of trading for each month)

Netflix Stock

Forecast Questions:
- What will be the estimated stock price be in April of 2027?
- What ARIMA model was chosen (p,d,q)?
Model Assessment Questions:
- How valid is our model?
  - Check residual plots.
- How are accurate are our estimates?
  - Examine Prediction Intervals and Prediction Bands
  - Check fit statistics

Netflix Stock - Modeling Time Series Data

Stock Trend Forecast

Create time series using Netflix Stock data
- Specify freq = 12 - 12 observations per year
- Specify start = c(2010, 1) - first obs. in dataset is January 2010
Model data using auto.arima function
- Specify ic = aic - aic is the information criterion used to determine model.
- Specify seasonality = F - no seasonal (repeating) pattern in the data.
This code will create and save the model:

nflx_ts <- ts(nflx$Adjusted, freq=12, start=c(2010,1))   # create time series
nflx_model <- auto.arima(nflx_ts, ic="aic", seasonal=F)  # model data using auto.arima

Netflix Stock - Create and Plot Forecasts

Create forecasts (until April 2027)
- h = 12 indicates we want to forecast 12 months
- Most recent date in forecast data is April 1, 2026
- 12 Months until April 1, 2027
Forecasts become less accurate the further into the future you specify.

nflx_forecast <- forecast(nflx_model, h=12) # create forecasts (until April 2027)
nflx_pred_plot <- autoplot(nflx_forecast) + labs(y = "Adjusted Closing Price") +
  theme_classic()

Darker purple: 80% Prediction Interval Bounds
Lighter purple: 95% Prediction Interval Bounds
Plot shows:
- Lags (p = 2), Differencing (d = 1), Moving Average (q = 2)

Netflix Stock - Forecast Plot

Netflix Stock - Examine Numerical Forecasts

Point Forecast is the forecasted estimate for each future time period
Lo 80 and Hi 80 are lower and upper bounds for the 80% prediction interval
Lo 95 and Hi 95 are lower and upper bounds for the 95% prediction interval

         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
May 2026       96.28770 90.87356 101.7018 88.00749 104.5679
Jun 2026       97.88642 90.43508 105.3378 86.49058 109.2823
Jul 2026       99.36907 90.06138 108.6768 85.13418 113.6040
Aug 2026      100.03050 88.78655 111.2745 82.83436 117.2266
Sep 2026       99.90374 86.80360 113.0039 79.86881 119.9387
Oct 2026       99.59476 84.93684 114.2527 77.17741 122.0121
Nov 2026       99.72772 83.85743 115.5980 75.45620 123.9992
Dec 2026      100.48277 83.62884 117.3367 74.70690 126.2586
Jan 2027      101.54850 83.77874 119.3183 74.37199 128.7250
Feb 2027      102.44464 83.71553 121.1737 73.80094 131.0883
Mar 2027      102.90728 83.15997 122.6546 72.70638 133.1082
Apr 2027      103.04149 82.28462 123.7984 71.29660 134.7864

💥 Lecture 25 In-class Exercises - Q3 💥

Poll Everywhere - My User Name: penelopepoolereisenbies685

Interpretation of Netflix Prediction Intervals

In February of 2027, the Netflix stock price is forecasted to be approximately $102. However the 95% prediction interval indicates it may be as low as ____.

Round your answer to the closest whole dollar.

Netflix Stock - Examine Residuals and Model Fit

Top Plot: Spikes get larger over time
ACF: auto-correlation function.
- Ideally, all or most values are with dashed lines
Histogram: Distribution of residuals should be approx. normal
Assessment: Stock prices are very volatile and this is sufficient.


    Ljung-Box test

data:  Residuals from ARIMA(2,1,2) with drift
Q* = 29.59, df = 20, p-value = 0.07677

Model df: 4.   Total lags used: 24

Netflix Stock - Examine Residuals and Model Fit

                       ME     RMSE      MAE       MPE     MAPE      MASE
Training set 0.0007043361 4.159505 2.555861 -6.764627 13.76645 0.2189833
                  ACF1
Training set 0.0568807

Many options for comparing models
For BUA 345: We will use MAPE = Mean Absolute Percent Error
- 100 – MAPE = Percent accuracy of model.
Despite increasing volatility, our stock price model is estimated to be ____% accurate. (Next question)
This doesn’t guarantee that forecasts will be this accurate but it does improve our chances of accurate forecasting.

💥 Lecture 25 In-class Exercises - Q4 💥

Poll Everywhere - My User Name: penelopepoolereisenbies685

Based on the Netflix Model Mean Absolute Percent Error, (MAPE), waht is the percent accuracy of our forecast model?

Answer is reported as a percentage with two decimal places.

Key Points from Today

forecast package in R simplifies forecasting.
Extrapolation OK in this case
- Report uncertainty as prediction bounds
You should know terminology and how to read and interpret output.
- You will be given data, R code, and output
- You will answer questions based on provided output.
HW 10 includes material from Lectures 24-26
HW 9 is due on 4/15.

To submit an Engagement Question or Comment about material from Lecture 25: Submit it by midnight today (day of lecture).