Predictive Homework #3

Loading Packages and Data

##          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 530953 28.4    1182310 63.2         NA   669397 35.8
## Vcells 983203  7.6    8388608 64.0      16384  1851698 14.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.2     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'tsibble'
## 
## 
## The following object is masked from 'package:lubridate':
## 
##     interval
## 
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
## 
## 
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## 
## ✔ tsibbledata 0.4.1     ✔ fable       0.3.3
## ✔ feasts      0.3.1     ✔ fabletools  0.3.3
## 
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
## 
## 
## Attaching package: 'magrittr'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
## 
## 
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo 
## 
## 
## Attaching package: 'forecast'
## 
## 
## The following object is masked from 'package:fabletools':
## 
##     accuracy

Data Overview

The data selected for this assignment comes from the Federal Reserve Bank of St. Louis. The data follows the 10-year break even inflation rate. In this assignment, I will seek to build a variety of models on four years of the data to then forecast on one year of data to predict the inflation rate for that year.

The following plots outline the decomposition and general trend in the data. The data does not follow much of a seasonal trend and further analysis of the lag plots and seasonality plots suggest that the data are unpredictable. A likely explanation for the high ACF values on the lag plot are that the data is somewhat non-stationary. Inflation rates are influenced by factors that do not follow seasonal trends, including government policy, consumer behavior, and job market strength. This will make predicting on the testing data more challenging. Still, forecasting inflation rates is a very useful technique that can benefit many individuals ranging from investors to everyday consumers.

Neural Net Model

The first model I am constructing is a Neural Net model. This model is an NNAR(1,1,2) with an average of 20 networks, each of which is a 2-2-1 network with 9 weights. The sigma squared value is estimated as 0.01954.

As we can determine by the accuracy metrics and forecasting plot, the Neural Net is not forecasting particularly well on the testing data. A likely explanation for the limited predictive power of this NN model is the small size of the data set. This Neural Net is only being trained on 4 years of data, which is a small sample size for an NN model. Additionally, the training data does not follow much of a seasonal trend, which limits the predictive power of the Neural Network.

## Series: T10YIE 
## Model: NNAR(1,1,2)[12] 
## 
## Average of 20 networks, each of which is
## a 2-2-1 network with 9 weights
## options were - linear output units 
## 
## sigma^2 estimated as 0.01947

ETS Model

The second model built is an ETS model. The ETS model is an (A,N,N) with an alpha equal to 0.9999. The AIC is 7.845 and the sigma squared is 0.0226. The ETS model forecast is much more accurate on the testing data than the Neural Net Model.

## Series: T10YIE 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##      l[0]
##  2.114284
## 
##   sigma^2:  0.0226
## 
##       AIC      AICc       BIC 
##  7.844727  8.390181 13.458330

ARIMA

The next model constructed is an ARIMA model. This is a (0,1,0)(0,0,1) ARIMA with an AIC of -45.33 and sigma squared of 0.01949. As we can see by the accuracy metrics and forecast, the ARIMA model is predicting very well on the testing data. Compared to the Neural Network and ETS model, the ARIMA seems to have the most predictive power on the testing dataset.

## Series: T10YIE 
## Model: ARIMA(0,1,0)(0,0,1)[12] 
## 
## Coefficients:
##          sma1
##       -0.4968
## s.e.   0.2646
## 
## sigma^2 estimated as 0.01949:  log likelihood=24.66
## AIC=-45.33   AICc=-45.06   BIC=-41.63

Ensemble Model

Lastly, an ensemble model is constructed. This ensemble model takes the average of all three models; Neural Network, ETS, and ARIMA. The accuracy metrics and forecasting plot of the ensemble can be seen below.

## Series: T10YIE 
## Model: COMBINATION 
## Combination: T10YIE * 0.333333333333333
## 
## =======================================
## 
## Series: T10YIE 
## Model: COMBINATION 
## Combination: T10YIE + T10YIE
## 
## ============================
## 
## Series: T10YIE 
## Model: COMBINATION 
## Combination: T10YIE + T10YIE
## 
## ============================
## 
## Series: T10YIE 
## Model: NNAR(1,1,2)[12] 
## 
## Average of 20 networks, each of which is
## a 2-2-1 network with 9 weights
## options were - linear output units 
## 
## sigma^2 estimated as 0.01868
## 
## Series: T10YIE 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##      l[0]
##  2.114284
## 
##   sigma^2:  0.0226
## 
##       AIC      AICc       BIC 
##  7.844727  8.390181 13.458330 
## 
## 
## Series: T10YIE 
## Model: ARIMA(0,1,0)(0,0,1)[12] 
## 
## Coefficients:
##          sma1
##       -0.4968
## s.e.   0.2646
## 
## sigma^2 estimated as 0.01949:  log likelihood=24.66
## AIC=-45.33   AICc=-45.06   BIC=-41.63

Accuracy Metrics

Finally, each model can be compared on the basis of its forecasting accuracy. As we can tell, the ARIMA model outperformed all other models, with the ensemble model as the second best. Because the data lacks seasonality and has a bit of noise, the ARIMA model performs particularly well. Model types such as ETS and Neural Net are reliant on seasonality and a large number of observations. Given the scope of the dataset, ARIMA proved to be the most effective forecasting tool.

## # A tibble: 4 × 10
##   .model         .type      ME  RMSE    MAE    MPE  MAPE  MASE RMSSE  ACF1
## * <chr>          <chr>   <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA(T10YIE)  Test  -0.0562 0.107 0.0849  -2.43  3.68   NaN   NaN 0.206
## 2 Ensemble_model Test  -0.247  0.254 0.247  -10.7  10.7    NaN   NaN 0.184
## 3 ETS(T10YIE)    Test  -0.300  0.311 0.300  -13.1  13.1    NaN   NaN 0.568
## 4 NNETAR(T10YIE) Test  -0.335  0.342 0.335  -14.6  14.6    NaN   NaN 0.440