2024-04-10
HW 9 is due on Monday, 4/15.
HW 10 will be posted on 4/16 is due Monday, 4/22
Lecture 26 on Thu. 4/18 is Optional.
No Lecture on 4/23
Course Review on 4/25
NEW PACKAGE FOR FORECASTING: forecast
Session ID: bua345s24
We have data for 2022 annual salaries of 75 Upstate NY residents ranging from $50K
to $150K
and we use that data to model how much someone spends on their first house.
Is it valid to apply that model to someone with an annual salary of $350K
?
A. Yes, this is extrapolation and it is valid.
B. No this is extrapolation and it is invalid.
C. Yes, this is interpolation and it is valid.
D. No this is interpolation and it is invalid.
Today’s Topics:
Cross-Sectional Data vs. Time Series Data
Basic Forecasting Terminology
Forecasting Trends without Seasonality in R
Example 1 - US Population
Example 2 - Netflix Stock Prices
HW 10 will be posted on 4/16
Shows a Snapshot of One Time Period
Shows Trend over Time
Population by County in 2019
In time series data, new observations are often correlated with prior observations
This is referred to as auto-correlation
A variable is correlated with itself
When data are auto-correlated, we use that information
This process is called auto-regression
R function: auto.arima
function in forecast
package
ARIMA is an acronym:
AR: auto-regressive
I: integrated
MA: moving average
In ARIMA models, all three components are optimized to provide a reliable forecast.
Auto-Regressive Models (AR)
Similar to a simple linear regression model or non-linear regression model
Key difference: Regressor or predictor variable (X) is dependent variable (Y) with a specific LAG
Lag (p) is how many previous time periods the model looks back to estimate the next time period.
If p = 1, the model estimates the next time period based on most recent one.
If p = 2, the model estimates the next time period on time period BEFORE the most recent one.
Forecast Questions:
What will the U.S. Population be in 2040?
What ARIMA model was chosen (p,d,q)?
Model Assessment Questions:
How valid is our model?
How accurate are our estimates?
Examine Prediction Intervals and Prediction Bands
Check fit statistics
Differencing (I = Integration)
Stationarity: mean and variance of data are consistent over timespan
needed for accurate modeling
Can be verified by examining residuals
Differencing transforms non-stationary data to stationary
Differencing order (d) determined by model:
if d = 1: each obs. is difference from previous one (linear)
if d = 2: each obs. is difference of difference from previous one (quadratic)
Moving Average (MA)
Moving average (q): how many terms are incorporated into each average within the data.
Algorithm calculates the average for a specific number of lagged terms
Moving Averages smooths out temporary instability in the data
If q = 1: moving average is average of current term with the one from the previous time period.
If q = 2:, moving average is average of the current term with the ones from two previous time periods.
Population Trend Forecast
Create time series using population data
Specify freq = 1
- one observation per year
Specify start = 1950
- first year in dataset
Model data using auto.arima
function
Specify ic = aic
- aic
is the information criterion used to determine model.
Specify seasonality = F
- no seasonal (repeating) pattern in the data.
These commands will create and save the model:
Create forecasts (until 2040)
h = 17
indicates we want to forecast 17 years
Most recent year in our data is 2023
Forecasts become less accurate the further into the future you specify.
p = 2
), Differencing (d = 1
), Moving Average (q = 1
) Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2024 341.9140 341.7513 342.0767 341.6651 342.1628
2025 343.9590 343.4409 344.4770 343.1666 344.7513
2026 346.0985 345.1394 347.0577 344.6316 347.5654
2027 348.3135 346.8715 349.7556 346.1081 350.5189
2028 350.5901 348.6480 352.5323 347.6198 353.5605
2029 352.9173 350.4718 355.3628 349.1772 356.6574
2030 355.2861 352.3424 358.2298 350.7841 359.7881
2031 357.6891 354.2573 361.1209 352.4406 362.9375
2032 360.1202 356.2133 364.0272 354.1451 366.0953
2033 362.5746 358.2070 366.9422 355.8950 369.2542
2034 365.0480 360.2350 369.8611 357.6871 372.4090
2035 367.5372 362.2939 372.7804 359.5183 375.5560
2036 370.0393 364.3809 375.6976 361.3855 378.6930
2037 372.5520 366.4930 378.6109 363.2856 381.8183
2038 375.0734 368.6280 381.5189 365.2159 384.9309
2039 377.6021 370.7834 384.4208 367.1738 388.0304
2040 380.1367 372.9575 387.3160 369.1570 391.1165
Session ID: bua345s24
Based on the US Population forecast output, we are 95% certain that U.S population in 2030 will be less than ______
million people?
How to input your answer:
Round to closest million (whole number)
If the answer were 123 million (e.g. 123.4233), you would enter 123.
Top Plot: No spikes should be too large
ACF: auto-correlation function.
Histogram: Distribution of residuals should be approx. normal
Assessment: Trend is very smooth so small aberrations are exaggerated in residuals.
Many options for comparing models
For BUA 345: We will use MAPE = Mean Absolute Percent Error
Despite outlier and one large ACF value, our population model is estimated to be 99.96% accurate.
This doesn’t guarantee that forecasts will be 100% accurate but it does improve our chances of accurate forecasting.
Data from Yahoo Finance
Forecast Questions:
What will be the estimated stock price be in April of 2025?
What ARIMA model was chosen (p,d,q)?
Model Assessment Questions:
How valid is our model?
How are accurate are our estimates?
Examine Prediction Intervals and Prediction Bands
Check fit statistics
*Stock Trend Forecast
Creat time series using Netflix Stock data
Specify freq = 12
- 12 observations per year
Specify start = c(2010, 1)
- first obs. in dataset is January 2010
Model data using auto.arima
function
Specify ic = aic
- aic
is the information criterion used to determine model.
Specify seasonality = F
- no seasonal (repeating) pattern in the data.
This chunk will create and save the model.
Create forecasts (until April 2025)
h = 12
indicates we want to forecast 12 months
Most recent date in forecast data is April 1, 2024
12 Months until April 1, 2025
Forecasts become less accurate the further into the future you specify.
Darker purple: 80% Prediction Interval Bounds
Lighter purple: 95% Prediction Interval Bounds
Plot shows:
p = 2
), Differencing (d = 1
), Moving Average (q = 2
) Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
May 2024 616.5980 577.8389 655.3570 557.3211 675.8749
Jun 2024 597.3982 540.9623 653.8340 511.0870 683.7094
Jul 2024 575.2199 501.5346 648.9053 462.5280 687.9119
Aug 2024 567.9570 477.2580 658.6559 429.2448 706.6691
Sep 2024 580.7199 475.2167 686.2230 419.3666 742.0731
Oct 2024 604.1698 487.2636 721.0759 425.3773 782.9623
Nov 2024 622.9879 497.5925 748.3833 431.2122 814.7636
Dec 2024 627.2257 494.8187 759.6327 424.7267 829.7247
Jan 2025 618.3484 478.9854 757.7114 405.2111 831.4857
Feb 2025 606.5939 459.5068 753.6811 381.6435 831.5443
Mar 2025 602.7019 447.2250 758.1789 364.9204 840.4835
Apr 2025 610.4861 446.7790 774.1931 360.1177 860.8544
Session ID: bua345s24
Interpretation of Netflix Prediction Intervals
In January of 2025, the Netflix stock price is forecasted to be approximately $618 However the 95% prediction interval indicates it may be as low as ____
.
How to input your answer:
Round to closest whole dollar.
Don’t include dollar sign.
Top Plot: Spikes get larger over time
ACF: auto-correlation function.
Histogram: Distribution of residuals should be approx. normal
Assessment: Stock prices are very volatile and this is sufficient.
Many options for comparing models
For BUA 345: We will use MAPE = Mean Absolute Percent Error
Despite increasing volatility, our stock price model is estimated to be 87.33% accurate.
This doesn’t guarantee that forecasts will be 87% accurate but it does improve our chances of accurate forecasting.
forecast
package in R simplifies forecasting**
Extrapolation OK in this case
You should know terminology and how to read and interpret output.
You will be given data, R code, and output
You will answer questions based on provided output.
HW 10 will cover Lectures 23-25
To submit an Engagement Question or Comment about material from Today’s Lecture: Submit by midnight today (day of lecture). Click on Link next to the ❓ under today’s lecture.