Note: Conditions for being white noise (at least 1 condition should be met)
The X1 might not be white noise (lags are autocorrelated) given that large spike at 12 seems to go over the 95% CI. However, this is border line issue so we may have to rely on Box-LJung test to figure out whether p-value is less than 0.05 or not.
The x2 also might not be white noise given that relatively large spikes at lag 2 and lag 6 go over the CI but again, Box-LJung test is required to confirm this.
The x3 looks to be white noise since all lags are within CI and no one or more large spikes are outside CIs. However, we can see that spike at large 20 is on the borderline so we are not 100% sure. Box-LJung test is required once again.
However, it is 100% certain that all of these data do not have more than 5% of spikes going over the CIs which means the 2nd condition for being white noise is satisfied; but 1st condition which is about one or more large spike going over the CIs might not be identified visually so Box-LJung test is required to test validity of white noise to make sure if 1st condition is satisfied.
The critical values at different distances are from the mean of zero since we expect each autocorrelation to be close to zero for white noise series. If lag deviates from mean of zero and go over the CIs, it might indicate the data is autocorrelated (not white noise).
The autocorrelations are different in each figure since the size of the data for each figure is different. (in fact, the formula for CI is +-2/sqrt(N) given N = sample size) The bigger the N, sample size, the narrower the bounds. It looks like the fluctuation of spikes for each lag tends to become smaller as the size of the data increases.
ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.ggAcf shows that 2 conditions for being white noise are not satisfied. ggPacf shows that 1st condition is not satisfied, which is enough to say that data is not white noise (not stationary).
autoplot(ibmclose) # + autolayer(ibmclose)
ggAcf(ibmclose)
ggPacf(ibmclose)
Box.test(ibmclose, type = c("Ljung-Box"))
##
## Box-Ljung test
##
## data: ibmclose
## X-squared = 367.11, df = 1, p-value < 2.2e-16
Data sets are generated from following codes.
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
y
## Time Series:
## Start = 1
## End = 100
## Frequency = 1
## [1] 0.000000000 0.818123693 1.750733181 1.935355795 0.778304985
## [6] 0.187865212 0.695718462 1.491346666 1.070980186 0.551181763
## [11] 1.208260372 0.955434253 0.545495272 0.670635464 0.298774950
## [16] 0.650644785 -0.213995766 -1.383048208 -0.636606908 0.965388454
## [21] 1.248386521 2.622791257 1.109780687 1.010431007 1.197863621
## [26] 0.892816207 1.688625561 -0.407721928 -2.231313452 -0.783508432
## [31] -1.094433287 -1.082789709 -0.151499491 -1.022531250 -0.817622554
## [36] -0.686311827 -0.001718324 -0.091683729 -1.511514193 -1.123975817
## [41] -1.104596824 1.505670479 1.790316999 -1.116293798 -1.057907490
## [46] -1.177534024 -2.986856518 -0.738838020 0.063507723 -0.534658415
## [51] -0.879884098 -0.526992087 -0.288159465 -0.415629111 0.308349963
## [56] 0.488099034 -1.381228842 0.752162099 0.948320015 1.911475750
## [61] 1.596326986 0.974267160 0.903681052 -0.267089739 0.083034640
## [66] -0.631678318 1.408528651 0.670087945 0.170090229 0.891703469
## [71] 0.068699791 0.296097411 -1.030075412 0.842359841 0.372255176
## [76] 0.115854416 -0.741962074 -1.217073061 -1.407538114 -0.676621702
## [81] -1.316654921 -0.355456752 1.167305332 -1.129610403 -1.143347940
## [86] -0.034693579 -0.341072559 -0.380792818 -0.274650591 1.105549231
## [91] 0.724730742 -0.691651596 -0.442948976 -0.821664212 0.340567110
## [96] 0.511240722 -2.047177517 -0.650056179 -0.626510300 -1.465252847
Given that these conditions always hold for AR(1) model: 1. When o1 = 0, yt is white noise 2. 01 = 1 and c = 0, yt is random walk 3. 01 = 1 and c != 0, yt is random walk with drift 4. when 01 < 0, yt tends to oscillate around the mean
Normally, we restrict AR(1) to stationary data: 1. For AR(1): -1 < o1 < 1
As you can see, as parameter approches to 0, yt tends to revolves around 0; yt is white noise. As parameter approaches to 1, yt tends become random walk.
Mathematically, this is making sense. Intuitively, when o1 = 0, the equation becomes yt = e indicating yt is indeed white noise (error term is the only variable to predict yt). It is apparent from ggAcf(y2) and autoplot(y2) that the series has no particular seasonality or trend and all of spikes are within CIs.
When o1 = 1, the equation becomes yt = yt-1 + e, hence, given that y’t = yt - yt-1, it is true that y’t = yt - yt-1 = e indicating that the first order differencing (the change between consecutive observations) is equal to error term, given that the differenced series is white noise (e). It is apparent from ggAcf(y3) and autoplot(y3) that the series has long periods of apparent trends up or down as well as sudden and unpredictable changes in direction.
# 0.6
autoplot(y)
# lower - 0.5
y2 <- ts(numeric(100))
e2 <- rnorm(100)
for(i in 2:100)
y2[i] <- 0.5*y2[i-1] + e2[i]
autoplot(y2)
# 0.3
for(i in 2:100)
y2[i] <- 0.3*y2[i-1] + e2[i]
autoplot(y2)
# 0
for(i in 2:100)
y2[i] <- 0*y2[i-1] + e2[i]
# param = 0 graphs
autoplot(y2)
ggAcf(y2)
# higher - 0.7
y3 <- ts(numeric(100))
e3 <- rnorm(100)
for(i in 2:100)
y3[i] <- 0.7*y3[i-1] + e3[i]
autoplot(y3)
# higher - 0.8
for(i in 2:100)
y3[i] <- 0.8*y3[i-1] + e3[i]
autoplot(y3)
# higher - 1
for(i in 2:100)
y3[i] <- 1*y3[i-1] + e3[i]
# param = 1 graphs
autoplot(y3)
ggAcf(y3)
# c. Write your own code to generate data from an MA(1) model with theta_1 = 0.6 and var = 1.
The plot below shows the timeseries graph.
#ma.sim <- arima.sim(model=list(ma=c(0.6)),n=100)
## list description for AR(1) model with small coef
AR.pos <- list(order=c(1,0,0), ar=0.6, sd=1)
ma.sim <- arima.sim(n=100, model=AR.pos)
autoplot(ma.sim)
# d. Produce a time plot for the series. How does the plot change as you change theta_1?
As theta_1 increases, the series becoming more and more non-stationary. I can see cleary trends in 0.95 compared to 0.05.
## set up plot region
par(mfrow=c(3,3))
## loop over orders of q
for(q in c(0.05, 0.1,0.3,0.7,0.8, 0.95)) {
AR.pos <- list(order=c(1,0,0), ar=q, sd=1)
ma.sim <- arima.sim(n=100, model=AR.pos)
plot.ts(ma.sim, ylab=paste("parameter(",q,")",sep=""))
acf(ma.sim)
pacf(ma.sim)
}
# e. Generate data from an ARMA(1,1) model with para1 = 0.6, theta_1 = 0.6, var = 1
The random data is generated.
arma.pos <- list(order=c(1,0,1), ar=0.6, ma=0.6, sd=1)
arma.sim <- arima.sim(n=100, model=arma.pos)
Arima(arma.sim)
## Series: arma.sim
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 0.446
## s.e. 0.157
##
## sigma^2 estimated as 2.489: log likelihood=-186.99
## AIC=377.99 AICc=378.11 BIC=383.2
Box.test(arma.sim, type = c("Ljung-Box"))
##
## Box-Ljung test
##
## data: arma.sim
## X-squared = 49.59, df = 1, p-value = 1.895e-12
The data is generated.
#ar.pos <- list(order=c(2,0,0), ar=c(-0.8, 0.3), sd=1)
#ar.sim <- arima.sim(n=100, model=ar.pos)
#Arima(ar.sim )
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
y[i] <- (-0.8*y[i-1] + 0.3*y[i-2] + e[i])
head(y)
## Time Series:
## Start = 1
## End = 6
## Frequency = 1
## [1] 0.0000000 0.0000000 -0.0883549 -0.2750319 0.9716661 -1.2132951
For ARMA(1,1), data keeps revolving around the mean till the end of the period (stationary) where as AR(2) shows that data tends to fluctuate with strong cyclic pattern near the end. (non-stationary)
autoplot(arma.sim)
autoplot(y)
# 8.8. Consider
austa, the total international visitors to Australia (in millions) for the period 1980-2015. # a. Use auto.arima() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
ARIMa(0,1,1) with drift was selected.
Ljung-Box test suggests we fail to reject null hypothesis of residuals being white noise, (the residuals have constant variance). The residuals also look normally distributed.
Thus, we can conclude that our model can forecast reliably.
The forecast plot shows that there will be upward trend in the next 10 periods.
model <- auto.arima(austa)
checkresiduals(model)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1) with drift
## Q* = 2.297, df = 5, p-value = 0.8067
##
## Model df: 2. Total lags used: 7
pred <- forecast(model, h = 10)
autoplot(pred)
summary(pred)
##
## Forecast method: ARIMA(0,1,1) with drift
##
## Model Information:
## Series: austa
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## 0.3006 0.1735
## s.e. 0.1647 0.0390
##
## sigma^2 estimated as 0.03376: log likelihood=10.62
## AIC=-15.24 AICc=-14.46 BIC=-10.57
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0008313383 0.1759116 0.1520309 -1.069983 5.513269 0.7461559
## ACF1
## Training set -0.000571993
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 7.108647 6.873183 7.344111 6.748536 7.468758
## 2017 7.282129 6.895831 7.668428 6.691337 7.872922
## 2018 7.455612 6.962652 7.948571 6.701695 8.209529
## 2019 7.629094 7.048756 8.209432 6.741543 8.516644
## 2020 7.802576 7.146393 8.458758 6.799031 8.806120
## 2021 7.976058 7.251932 8.700184 6.868603 9.083513
## 2022 8.149540 7.363321 8.935760 6.947121 9.351960
## 2023 8.323023 7.479266 9.166779 7.032609 9.613436
## 2024 8.496505 7.598893 9.394117 7.123725 9.869284
## 2025 8.669987 7.721572 9.618402 7.219512 10.120462
Ljung-Box test for both models suggests we fail to reject null hypothesis of residuals being white noise, (the residuals have constant variance). The residuals also look normally distributed.
Thus, we can conclude that our models can forecast reliably.
The forecast plot in b. shows that there will be a flat trend in the next 10 periods, which is different from a. where there was increasing trend.
model2 <- Arima(austa, order = c(0,1,1), include.drift = FALSE)
checkresiduals(model2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)
## Q* = 3.8403, df = 6, p-value = 0.6983
##
## Model df: 1. Total lags used: 7
pred2 <- forecast(model2, h = 10)
autoplot(pred2)
summary(pred2)
##
## Forecast method: ARIMA(0,1,1)
##
## Model Information:
## Series: austa
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## 0.4936
## s.e. 0.1265
##
## sigma^2 estimated as 0.04833: log likelihood=3.73
## AIC=-3.45 AICc=-3.08 BIC=-0.34
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1150344 0.2136411 0.1714887 3.817629 5.649995 0.8416532
## ACF1
## Training set -0.1892256
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 7.015837 6.734107 7.297567 6.584969 7.446705
## 2017 7.015837 6.509440 7.522234 6.241369 7.790305
## 2018 7.015837 6.357426 7.674248 6.008885 8.022789
## 2019 7.015837 6.234446 7.797228 5.820803 8.210871
## 2020 7.015837 6.128347 7.903327 5.658539 8.373135
## 2021 7.015837 6.033643 7.998031 5.513701 8.517973
## 2022 7.015837 5.947300 8.084374 5.381651 8.650023
## 2023 7.015837 5.867430 8.164244 5.259501 8.772173
## 2024 7.015837 5.792765 8.238909 5.145310 8.886364
## 2025 7.015837 5.722403 8.309271 5.037701 8.993973
model3 <- Arima(austa, order = c(0,1,0), include.drift = FALSE)
checkresiduals(model3)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,0)
## Q* = 5.1592, df = 7, p-value = 0.6405
##
## Model df: 0. Total lags used: 7
pred3 <- forecast(model3, h = 10)
autoplot(pred3)
summary(pred3)
##
## Forecast method: ARIMA(0,1,0)
##
## Model Information:
## Series: austa
## ARIMA(0,1,0)
##
## sigma^2 estimated as 0.06423: log likelihood=-1.62
## AIC=5.24 AICc=5.36 BIC=6.8
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1674969 0.2498996 0.1981155 5.471861 6.479997 0.9723354
## ACF1
## Training set 0.2583422
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 6.858953 6.534151 7.183755 6.362211 7.355695
## 2017 6.858953 6.399614 7.318292 6.156454 7.561452
## 2018 6.858953 6.296379 7.421527 5.998571 7.719335
## 2019 6.858953 6.209349 7.508557 5.865469 7.852437
## 2020 6.858953 6.132673 7.585233 5.748204 7.969702
## 2021 6.858953 6.063354 7.654552 5.642189 8.075717
## 2022 6.858953 5.999607 7.718299 5.544697 8.173209
## 2023 6.858953 5.940274 7.777632 5.453955 8.263951
## 2024 6.858953 5.884547 7.833359 5.368727 8.349179
## 2025 6.858953 5.831839 7.886067 5.288117 8.429789
Given that d > 0 already, constant is omitted from the beginning but I applied it anyway. Method needs to be changed to Maximum-likelihood or CSS. Otherwise, Arima() gives you an error message.
model4 <- Arima(austa, order = c(2,1,3), include.constant = FALSE, method = 'CSS')
checkresiduals(model4)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,3)
## Q* = 3.634, df = 3, p-value = 0.3038
##
## Model df: 5. Total lags used: 8
pred4 <- forecast(model4, h = 10)
## Warning in predict.Arima(object, n.ahead = h): MA part of model is not
## invertible
autoplot(pred4)
summary(pred4)
##
## Forecast method: ARIMA(2,1,3)
##
## Model Information:
## Series: austa
## ARIMA(2,1,3)
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3
## 0.5138 0.7818 -0.1307 -0.9960 -0.4001
## s.e. 0.3718 0.5367 0.4625 0.5151 0.3345
##
## sigma^2 estimated as 0.03799: part log likelihood=9.24
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.04981184 0.1779344 0.1390249 2.155024 4.803313 0.6823234
## ACF1
## Training set -0.1202229
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 7.416168 7.111898 7.720437 6.950828 7.881508
## 2017 8.036829 7.415544 8.658113 7.086655 8.987002
## 2018 8.778174 7.819677 9.736671 7.312279 10.244069
## 2019 9.644357 8.308738 10.979975 7.601705 11.687009
## 2020 10.669041 8.886710 12.451371 7.943201 13.394880
## 2021 11.872771 9.565477 14.180064 8.344070 15.401471
## 2022 13.292422 10.361632 16.223213 8.810165 17.774679
## 2023 14.963006 11.294913 18.631099 9.353142 20.572870
## 2024 16.931343 12.389743 21.472942 9.985566 23.877119
## 2025 19.248862 13.674560 24.823164 10.723703 27.774021
model4 <- Arima(austa, order = c(2,1,3), include.constant = FALSE, method = 'ML')
checkresiduals(model4)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,3)
## Q* = 3.6694, df = 3, p-value = 0.2994
##
## Model df: 5. Total lags used: 8
pred4 <- forecast(model4, h = 10)
autoplot(pred4)
summary(pred4)
##
## Forecast method: ARIMA(2,1,3)
##
## Model Information:
## Series: austa
## ARIMA(2,1,3)
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3
## 0.0004 0.9996 0.4633 -0.9893 -0.4625
## s.e. 0.0031 0.0031 0.2283 0.0329 0.2261
##
## sigma^2 estimated as 0.03568: log likelihood=9.24
## AIC=-6.48 AICc=-3.48 BIC=2.85
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.03972287 0.1724412 0.1402048 1.685829 4.592996 0.6881143
## ACF1
## Training set -0.07513543
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 7.178948 6.932158 7.425739 6.801515 7.556382
## 2017 7.316096 6.877422 7.754770 6.645201 7.986991
## 2018 7.510224 6.933117 8.087331 6.627615 8.392833
## 2019 7.647387 6.956180 8.338593 6.590278 8.704496
## 2020 7.841488 7.046627 8.636348 6.625853 9.057122
## 2021 7.978666 7.090007 8.867325 6.619579 9.337753
## 2022 8.172740 7.194381 9.151099 6.676469 9.669011
## 2023 8.309934 7.247714 9.372154 6.685409 9.934459
## 2024 8.503981 7.359766 9.648197 6.754055 10.253908
## 2025 8.641190 7.419069 9.863312 6.772117 10.510264
Note that both of them have residuals that does not resemble white noise series. Residuals do not have normally distributed shape. Both constant variance and nomarlity assumptions are not met and hence the models are not reliable in prediction. Note that in 0,0,0, point forecast starts in flat trend where as 0,0,1 becomes flat trend at h > 1.
model5 <- Arima(austa, order = c(0,0,1), include.constant = TRUE)
checkresiduals(model5)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 116.97, df = 5, p-value < 2.2e-16
##
## Model df: 2. Total lags used: 7
pred5 <- forecast(model5, h = 10)
autoplot(pred5)
summary(pred5)
##
## Forecast method: ARIMA(0,0,1) with non-zero mean
##
## Model Information:
## Series: austa
## ARIMA(0,0,1) with non-zero mean
##
## Coefficients:
## ma1 mean
## 1.0000 3.5515
## s.e. 0.0907 0.3090
##
## sigma^2 estimated as 0.9347: log likelihood=-50.64
## AIC=107.28 AICc=108.03 BIC=112.04
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01997167 0.9395462 0.8063899 -26.87733 42.56419 3.957698
## ACF1
## Training set 0.8109528
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 4.997220 3.741604 6.252836 3.076921 6.917519
## 2017 3.551491 1.799302 5.303680 0.871749 6.231232
## 2018 3.551491 1.799302 5.303680 0.871749 6.231232
## 2019 3.551491 1.799302 5.303680 0.871749 6.231232
## 2020 3.551491 1.799302 5.303680 0.871749 6.231232
## 2021 3.551491 1.799302 5.303680 0.871749 6.231232
## 2022 3.551491 1.799302 5.303680 0.871749 6.231232
## 2023 3.551491 1.799302 5.303680 0.871749 6.231232
## 2024 3.551491 1.799302 5.303680 0.871749 6.231232
## 2025 3.551491 1.799302 5.303680 0.871749 6.231232
model6 <- Arima(austa, order = c(0,0,0), include.constant = TRUE)
checkresiduals(model6)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 137.9, df = 6, p-value < 2.2e-16
##
## Model df: 1. Total lags used: 7
pred6 <- forecast(model6, h = 10)
autoplot(pred6)
summary(pred6)
##
## Forecast method: ARIMA(0,0,0) with non-zero mean
##
## Model Information:
## Series: austa
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 3.5414
## s.e. 0.2972
##
## sigma^2 estimated as 3.27: log likelihood=-71.9
## AIC=147.8 AICc=148.17 BIC=150.97
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 5.00254e-13 1.783072 1.580774 -52.92811 82.83469 7.758317
## ACF1
## Training set 0.9099807
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 3.541428 1.223915 5.858941 -0.002902594 7.085758
## 2017 3.541428 1.223915 5.858941 -0.002902594 7.085758
## 2018 3.541428 1.223915 5.858941 -0.002902594 7.085758
## 2019 3.541428 1.223915 5.858941 -0.002902594 7.085758
## 2020 3.541428 1.223915 5.858941 -0.002902594 7.085758
## 2021 3.541428 1.223915 5.858941 -0.002902594 7.085758
## 2022 3.541428 1.223915 5.858941 -0.002902594 7.085758
## 2023 3.541428 1.223915 5.858941 -0.002902594 7.085758
## 2024 3.541428 1.223915 5.858941 -0.002902594 7.085758
## 2025 3.541428 1.223915 5.858941 -0.002902594 7.085758
Ljung-Box test confirms residuals are white noise and therefore, model is a reliable predictor. Note that for this model has an increasing trend in forecast.
model7 <- Arima(austa, order = c(0,2,1), include.constant = FALSE)
checkresiduals(model7)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,1)
## Q* = 3.8054, df = 6, p-value = 0.703
##
## Model df: 1. Total lags used: 7
pred7 <- forecast(model7, h = 10)
autoplot(pred7)
summary(pred7)
##
## Forecast method: ARIMA(0,2,1)
##
## Model Information:
## Series: austa
## ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.7263
## s.e. 0.2277
##
## sigma^2 estimated as 0.03969: log likelihood=6.74
## AIC=-9.48 AICc=-9.09 BIC=-6.43
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02907781 0.1907537 0.1567877 1.216128 4.984081 0.7695018
## ACF1
## Training set 0.1284994
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 7.179291 6.923960 7.434622 6.788796 7.569786
## 2017 7.499628 7.086148 7.913108 6.867265 8.131992
## 2018 7.819966 7.248051 8.391881 6.945297 8.694634
## 2019 8.140304 7.403196 8.877411 7.012995 9.267612
## 2020 8.460641 7.549896 9.371387 7.067776 9.853506
## 2021 8.780979 7.687705 9.874253 7.108960 10.452997
## 2022 9.101316 7.816610 10.386023 7.136527 11.066106
## 2023 9.421654 7.936765 10.906543 7.150712 11.692596
## 2024 9.741992 8.048388 11.435595 7.151848 12.332135
## 2025 10.062329 8.151718 11.972940 7.140302 12.984357