R Packages Utilized

library(readr)
library(ggplot2)
library(ggfortify)
library(vars)
library(forecast)
library(psych)

Importing Dataset into R Markdown

## Walmart Stock (NYSE:WMT)
## Monthly adjusted closing price from 1/1/2013 to 12/31/18
wmt <- read_csv("C:/Users/bryce_anderson/Desktop/Boston College/Predictive Analytics and Forecasting/Week 6 (PA&F)/WMT.csv")
## Source: https://finance.yahoo.com/quote/WMT/history?period1=1357016400&period2=1546232400&interval=1mo&filter=history&frequency=1mo

Data Exploration and Plots

## Creating time series for close variable
wmtTS <- ts(wmt$close, frequency=12, start=c(2013,1))

describe(wmtTS)
##    vars  n  mean    sd median trimmed  mad   min    max range skew
## X1    1 72 71.02 11.36  66.97    69.8 6.81 51.71 102.52 50.81 0.97
##    kurtosis   se
## X1     0.16 1.34
summary(wmtTS)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   51.71   64.17   66.97   71.02   75.42  102.52
## Plotting the time series
autoplot(wmtTS) + xlab("Month") + ylab("Adjusted Monthly Close Price")

acf(wmtTS)

pacf(wmtTS)

checkresiduals(wmtTS)

Model 1 - ETS

## Autoselection is used in the model argument using "Z" values
model1 <- ets(wmtTS, model="ZZZ", damped=NULL, alpha=NULL, beta=NULL,
    gamma=NULL, phi=NULL, lambda=NULL, biasadj=FALSE,
    additive.only=FALSE, restrict=TRUE,
    allow.multiplicative.trend=FALSE)

model1
## ETS(M,N,N) 
## 
## Call:
##  ets(y = wmtTS, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,  
## 
##  Call:
##      gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,  
## 
##  Call:
##      biasadj = FALSE, restrict = TRUE, allow.multiplicative.trend = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 58.944 
## 
##   sigma:  0.0525
## 
##      AIC     AICc      BIC 
## 498.9205 499.2734 505.7505
## Plotting the Model
autoplot(model1)

## Plotting the Residuals of the Model
cbind('Residuals' = residuals(model1),
      'Forecast errors' = residuals(model1,type='response')) %>%
  autoplot(facet=TRUE) + xlab("Year") + ylab("")

checkresiduals(model1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,N)
## Q* = 6.8593, df = 12, p-value = 0.8668
## 
## Model df: 2.   Total lags used: 14

Model 1 Discussion

I utilized the autoselection feature available through the ETS function by entering the model=“ZZZ” argument into my formula. This allows R to fit the best possible ETS model design to the time series. The ETS model characterizes on three dimensions: error, trend, and seasonality. The acf chart for the model shows values inside the critical region. This shows that there is not correlation between the lags and signifies that white noise is not present in the data.

In evaluating the ETS models produced, I focused on the AIC, AICc, and BIC values in order to judge the model’s effectiveness. The model produces AIC, AICc, and BIC values of 498.92, 499.27, and 505.75 respectively.

Model 2 - ARIMA Model

auto.arima(wmtTS)
## Series: wmtTS 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 15.86:  log likelihood=-198.86
## AIC=399.72   AICc=399.78   BIC=401.99
model2 <- Arima(wmtTS, order=c(0,1,0), seasonal=c(0,1,0))
checkresiduals(model2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,0)(0,1,0)[12]
## Q* = 22.963, df = 14, p-value = 0.06087
## 
## Model df: 0.   Total lags used: 14
summary(model2)
## Series: wmtTS 
## ARIMA(0,1,0)(0,1,0)[12] 
## 
## sigma^2 estimated as 29.97:  log likelihood=-184.02
## AIC=370.04   AICc=370.11   BIC=372.12
## 
## Training set error measures:
##                      ME     RMSE      MAE        MPE    MAPE      MASE
## Training set -0.1237914 4.955486 3.337329 -0.2568675 4.56592 0.3471073
##                    ACF1
## Training set 0.06592207
## Forecasting 12 months of data for the time series
model2F <- forecast(model2, h=12)
model2F
##          Point Forecast    Lo 80     Hi 80    Lo 95    Hi 95
## Jan 2019       99.26008 92.24451 106.27564 88.53070 109.9895
## Feb 2019       83.30517 73.38366  93.22668 68.13153  98.4788
## Mar 2019       82.30498 70.15366  94.45629 63.72115 100.8888
## Apr 2019       82.32169 68.29057  96.35282 60.86293 103.7805
## May 2019       76.59438 60.90711  92.28166 52.60276 100.5860
## Jun 2019       80.12519 62.94064  97.30974 53.84369 106.4067
## Jul 2019       83.61049 65.04905 102.17193 55.22322 111.9978
## Aug 2019       90.06510 70.22209 109.90811 59.71783 120.4124
## Sep 2019       88.69771 67.65102 109.74440 56.50957 120.8858
## Oct 2019       94.93522 72.75006 117.12038 61.00594 128.8645
## Nov 2019       92.35992 69.09192 115.62791 56.77459 127.9452
## Dec 2019       87.95351 63.65088 112.25613 50.78585 125.1212
## Plotting Forecast of ARIMA Model
autoplot(model2F) + xlab("Month") + ylab("Adjusted Monthly Close Price")

summary(model2F)
## 
## Forecast method: ARIMA(0,1,0)(0,1,0)[12]
## 
## Model Information:
## Series: wmtTS 
## ARIMA(0,1,0)(0,1,0)[12] 
## 
## sigma^2 estimated as 29.97:  log likelihood=-184.02
## AIC=370.04   AICc=370.11   BIC=372.12
## 
## Error measures:
##                      ME     RMSE      MAE        MPE    MAPE      MASE
## Training set -0.1237914 4.955486 3.337329 -0.2568675 4.56592 0.3471073
##                    ACF1
## Training set 0.06592207
## 
## Forecasts:
##          Point Forecast    Lo 80     Hi 80    Lo 95    Hi 95
## Jan 2019       99.26008 92.24451 106.27564 88.53070 109.9895
## Feb 2019       83.30517 73.38366  93.22668 68.13153  98.4788
## Mar 2019       82.30498 70.15366  94.45629 63.72115 100.8888
## Apr 2019       82.32169 68.29057  96.35282 60.86293 103.7805
## May 2019       76.59438 60.90711  92.28166 52.60276 100.5860
## Jun 2019       80.12519 62.94064  97.30974 53.84369 106.4067
## Jul 2019       83.61049 65.04905 102.17193 55.22322 111.9978
## Aug 2019       90.06510 70.22209 109.90811 59.71783 120.4124
## Sep 2019       88.69771 67.65102 109.74440 56.50957 120.8858
## Oct 2019       94.93522 72.75006 117.12038 61.00594 128.8645
## Nov 2019       92.35992 69.09192 115.62791 56.77459 127.9452
## Dec 2019       87.95351 63.65088 112.25613 50.78585 125.1212

Model 2 Discussion

The auto.arima() function chose ARIMA(0,1,0) as the most effective model based on the time series data. ARIMA model works best for data sets with non-seasonal tendencies and without random white noise. In the case on this data set, the acf plot does not show strong correlation between the lags signifing there is not white noise within the data. The ARIMA model produced AIC, AICc, and BIC values of 370.04, 370.11, and 372.12 respectively.

Model 3 - Holt-Winters’ Seasonal Method

## Multiplicative
model3 <- hw(wmtTS, seasonal="multiplicative", damped=FALSE, h=12)
model3
##          Point Forecast    Lo 80     Hi 80    Lo 95    Hi 95
## Jan 2019       93.47171 86.30469 100.63872 82.51070 104.4327
## Feb 2019       88.60682 78.88205  98.33159 73.73406 103.4796
## Mar 2019       88.94795 76.38961 101.50628 69.74163 108.1543
## Apr 2019       91.34337 75.58683 107.09990 67.24583 115.4409
## May 2019       90.27694 71.83695 108.71693 62.07541 118.4785
## Jun 2019       89.96567 68.66690 111.26443 57.39201 122.5393
## Jul 2019       92.26816 67.34791 117.18842 54.15592 130.3804
## Aug 2019       91.45616 63.62052 119.29181 48.88522 134.0271
## Sep 2019       90.46297 59.74075 121.18519 43.47739 137.4486
## Oct 2019       91.90139 57.35937 126.44342 39.07393 144.7289
## Nov 2019       95.18209 55.86056 134.50362 35.04500 155.3192
## Dec 2019       93.99684 51.56705 136.42664 29.10607 158.8876
## Additive
model4 <- hw(wmtTS, seasonal="additive", damped=FALSE, h=12)

summary(model3)
## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = wmtTS, h = 12, seasonal = "multiplicative", damped = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.8895 
##     beta  = 0.1349 
##     gamma = 0.0027 
## 
##   Initial states:
##     l = 60.191 
##     b = 0.0211 
##     s = 1.016 1.0307 0.9975 0.9835 0.9968 1.0076
##            0.9844 0.99 1.0039 0.9791 0.977 1.0335
## 
##   sigma:  0.0598
## 
##      AIC     AICc      BIC 
## 530.2609 541.5942 568.9642 
## 
## Error measures:
##                      ME     RMSE      MAE         MPE     MAPE      MASE
## Training set 0.01225194 3.807178 3.085969 -0.00127714 4.351595 0.3209639
##                    ACF1
## Training set 0.09558867
## 
## Forecasts:
##          Point Forecast    Lo 80     Hi 80    Lo 95    Hi 95
## Jan 2019       93.47171 86.30469 100.63872 82.51070 104.4327
## Feb 2019       88.60682 78.88205  98.33159 73.73406 103.4796
## Mar 2019       88.94795 76.38961 101.50628 69.74163 108.1543
## Apr 2019       91.34337 75.58683 107.09990 67.24583 115.4409
## May 2019       90.27694 71.83695 108.71693 62.07541 118.4785
## Jun 2019       89.96567 68.66690 111.26443 57.39201 122.5393
## Jul 2019       92.26816 67.34791 117.18842 54.15592 130.3804
## Aug 2019       91.45616 63.62052 119.29181 48.88522 134.0271
## Sep 2019       90.46297 59.74075 121.18519 43.47739 137.4486
## Oct 2019       91.90139 57.35937 126.44342 39.07393 144.7289
## Nov 2019       95.18209 55.86056 134.50362 35.04500 155.3192
## Dec 2019       93.99684 51.56705 136.42664 29.10607 158.8876
summary(model4)
## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = wmtTS, h = 12, seasonal = "additive", damped = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.9949 
##     beta  = 3e-04 
##     gamma = 4e-04 
## 
##   Initial states:
##     l = 61.32 
##     b = 0.5091 
##     s = 1.7027 3.1976 -0.1927 -2.3956 -2.0006 -0.3869
##            -1.5906 -1.1701 0.5495 0.6365 -0.0023 1.6525
## 
##   sigma:  4.206
## 
##      AIC     AICc      BIC 
## 530.6827 542.0160 569.3860 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE     MAPE     MASE
## Training set -0.1157349 3.709333 2.762224 -0.333041 3.860816 0.287292
##                     ACF1
## Training set 0.003665816
## 
## Forecasts:
##          Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## Jan 2019       91.68583 86.29563  97.07602 83.44224  99.92941
## Feb 2019       90.53531 82.93075  98.13987 78.90514 102.16548
## Mar 2019       91.68271 82.37577 100.98966 77.44897 105.91646
## Apr 2019       92.10442 81.36086 102.84797 75.67357 108.53526
## May 2019       90.89076 78.88062 102.90090 72.52284 109.25868
## Jun 2019       90.97712 77.82123 104.13302 70.85692 111.09733
## Jul 2019       92.68786 78.47777 106.89795 70.95541 114.42031
## Aug 2019       91.58156 76.38978 106.77335 68.34773 114.81539
## Sep 2019       91.69432 75.58007 107.80856 67.04970 116.33893
## Oct 2019       94.39990 77.41278 111.38701 68.42034 120.37945
## Nov 2019       98.30003 80.48233 116.11772 71.05021 125.54984
## Dec 2019       97.31279 78.70115 115.92443 68.84874 125.77684
checkresiduals(model3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 15.489, df = 3, p-value = 0.001443
## 
## Model df: 16.   Total lags used: 19
checkresiduals(model4)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 14.362, df = 3, p-value = 0.002452
## 
## Model df: 16.   Total lags used: 19
## Plotting forecast of Holt-Winters' Model

## Model 3 - Multiplicative
autoplot(model3) + xlab("Month") + ylab("Adjusted Monthly Close Price")

## Model 4 - Additive
autoplot(model4) + xlab("Month") + ylab("Adjusted Monthly Close Price")

Model 3 Discussion The Holt-Winters’ Seasonal Model produces a large forecast range when compared to the previous models analyzed. I also used the Additive seasonal method for comparison which resulted in a slightly smaller forecast range. I believe the additive method worked better with this dataset despite the increasing trend seen in the time series. I still chose the Multiplicative method becuase it seemed more appropriate based on the trends and make-up of the dataset. The Holt-Winters’ Multiplicative model produced AIC, AICc, and BIC values of 530.26, 541.59, and 568.96 respectively. The Holt-Winters’ Additive model produced AIC, AICc, and BIC values of 530.68, 542.02, and 569.39 respectively. The Multiplicative method produced slightly better results based on the above summaries but utlimately was about the same.

Best Model Selection

One important note in comparing the above models is that the AIC value should not be compared between model classes. It is appropriate to within each class of model, but should not be compared across models. For example, the the AIC value of the ARIMA model can be compared with other ARIMA models but the ETS should not be compared directly with the ARIMA AIC value because they are different model classes and likelihood is computed in different ways. For this reason, the models will be compared outside of these values but they are still useful in creating the best model within each class.

Ultimately, the ARIMA model worked best in forecasting this dataset. The forecast graph appears to have captured the trend of the data best in the forecast and produced optimal AIC, AICc, and BIC values of 399.72, 399.78, and 401.99 respectively. The residuals of the plot showed limited correlation between lags and showed a generally normal distribution in residuals. The Ljung-Box test showed a p-value outside of the .05 significance level but I belive the model overall did the best job of fitting and forecasting the stock price data. The autoselection tool produced a strong model when compared to other ARIMA models tested against.