library(forecast)
## Warning: package 'forecast' was built under R version 4.0.4
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(ggplot2)
library(TTR)
## Warning: package 'TTR' was built under R version 4.0.4
library(stats)
library(readr)
library(fpp)
## Warning: package 'fpp' was built under R version 4.0.5
## Loading required package: fma
## Warning: package 'fma' was built under R version 4.0.5
## Loading required package: expsmooth
## Warning: package 'expsmooth' was built under R version 4.0.5
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: tseries
## Warning: package 'tseries' was built under R version 4.0.4
library(fpp2)
## Warning: package 'fpp2' was built under R version 4.0.5
##
## Attaching package: 'fpp2'
## The following objects are masked from 'package:fpp':
##
## ausair, ausbeer, austa, austourists, debitcards, departures,
## elecequip, euretail, guinearice, oil, sunspotarea, usmelec
library(seasonal)
## Warning: package 'seasonal' was built under R version 4.0.5
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.0.5
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(psych)
## Warning: package 'psych' was built under R version 4.0.5
##
## Attaching package: 'psych'
## The following object is masked from 'package:seasonal':
##
## outlier
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(fastDummies)
## Warning: package 'fastDummies' was built under R version 4.0.5
library(urca)
## Warning: package 'urca' was built under R version 4.0.4
clearing global first since it hasnt been in a while
rm(list=ls())
ses.pigs <- ses(pigs, h = 4)
summary(ses.pigs)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
s <- sd(ses.pigs$residuals)
pigs.point.value <- ses.pigs$mean[1]
pigs.upper <- pigs.point.value + 1.96*s
pigs.lower <- pigs.point.value - 1.96*s
pigs.upper
## [1] 118952.8
pigs.lower
## [1] 78679.97
ses.function <- function(y, alpha, level){
y.hat <- level
for (i in 1:length(y)){
y.hat <- alpha*y[i] + (1 - alpha)*y.hat
}
return(y.hat)
}
pig.alpha <- ses.pigs$model$par[1]
pig.level <- ses.pigs$model$par[2]
ses.function(y = pigs, alpha = pig.alpha, level = pig.level)
## alpha
## 98816.41
If we follow the equation from 7.2 to create the function and use the same alpha and level that we found fom the first model our point estimation is the same
Modify your function from the previous exercise to return the sum of squared errors rather than the forecast of the next observation. Then use the optim() function to find the optimal values of α and ℓ. Do you get the same values as the ses() function?
Combine your previous two functions to produce a function which both finds the optimal values of α and ℓ0, and produces a forecast of the next observation in the series
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
summary(books)
## Paperback Hardcover
## Min. :111.0 Min. :128.0
## 1st Qu.:167.2 1st Qu.:170.5
## Median :189.0 Median :200.5
## Mean :186.4 Mean :198.8
## 3rd Qu.:207.2 3rd Qu.:222.0
## Max. :247.0 Max. :283.0
autoplot(books)
double time series where it seems as if they follow some sort of seasonal trend both.
books.paperback <- books[,1]
books.hardcover <- books[,2]
ses.books.paperback <- ses(books.paperback)
summary(ses.books.paperback)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = books.paperback)
##
## Smoothing parameters:
## alpha = 0.1685
##
## Initial states:
## l = 170.8271
##
## sigma: 34.8183
##
## AIC AICc BIC
## 318.9747 319.8978 323.1783
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303 -0.2117522
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 207.1097 162.4882 251.7311 138.8670 275.3523
## 32 207.1097 161.8589 252.3604 137.9046 276.3147
## 33 207.1097 161.2382 252.9811 136.9554 277.2639
## 34 207.1097 160.6259 253.5935 136.0188 278.2005
## 35 207.1097 160.0215 254.1979 135.0945 279.1249
## 36 207.1097 159.4247 254.7946 134.1818 280.0375
## 37 207.1097 158.8353 255.3840 133.2804 280.9389
## 38 207.1097 158.2531 255.9663 132.3899 281.8294
## 39 207.1097 157.6777 256.5417 131.5099 282.7094
## 40 207.1097 157.1089 257.1105 130.6400 283.5793
ses.books.hardcover <- ses(books.hardcover)
summary(ses.books.hardcover)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = books.hardcover)
##
## Smoothing parameters:
## alpha = 0.3283
##
## Initial states:
## l = 149.2861
##
## sigma: 33.0517
##
## AIC AICc BIC
## 315.8506 316.7737 320.0542
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887 -0.1417763
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 239.5601 197.2026 281.9176 174.7799 304.3403
## 32 239.5601 194.9788 284.1414 171.3788 307.7414
## 33 239.5601 192.8607 286.2595 168.1396 310.9806
## 34 239.5601 190.8347 288.2855 165.0410 314.0792
## 35 239.5601 188.8895 290.2306 162.0662 317.0540
## 36 239.5601 187.0164 292.1038 159.2014 319.9188
## 37 239.5601 185.2077 293.9124 156.4353 322.6848
## 38 239.5601 183.4574 295.6628 153.7584 325.3618
## 39 239.5601 181.7600 297.3602 151.1625 327.9577
## 40 239.5601 180.1111 299.0091 148.6406 330.4795
autoplot(ses.books.paperback)
autoplot(ses.books.hardcover)
rsme.paperback <- sqrt(mean(ses.books.paperback$residuals^2))
rsme.hardcover <- sqrt(mean(ses.books.hardcover$residuals^2))
rsme.paperback
## [1] 33.63769
rsme.hardcover
## [1] 31.93101
holt.paperback <- holt(books.paperback, h = 4)
summary(holt.paperback)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = books.paperback, h = 4)
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 170.699
## b = 1.2621
##
## sigma: 33.4464
##
## AIC AICc BIC
## 318.3396 320.8396 325.3456
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
## ACF1
## Training set -0.1750792
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 209.4668 166.6035 252.3301 143.9130 275.0205
## 32 210.7177 167.8544 253.5811 145.1640 276.2715
## 33 211.9687 169.1054 254.8320 146.4149 277.5225
## 34 213.2197 170.3564 256.0830 147.6659 278.7735
holt.hardcover <- holt(books.hardcover, h = 4)
summary(holt.hardcover)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = books.hardcover, h = 4)
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 147.7935
## b = 3.303
##
## sigma: 29.2106
##
## AIC AICc BIC
## 310.2148 312.7148 317.2208
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
## ACF1
## Training set -0.03245186
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 250.1739 212.7390 287.6087 192.9222 307.4256
## 32 253.4765 216.0416 290.9113 196.2248 310.7282
## 33 256.7791 219.3442 294.2140 199.5274 314.0308
## 34 260.0817 222.6468 297.5166 202.8300 317.3334
holt.rsme.paper <- sqrt(mean(holt.paperback$residuals^2))
holt.rsme.hard <- sqrt(mean(holt.hardcover$residuals^2))
holt.rsme.paper
## [1] 31.13692
holt.rsme.hard
## [1] 27.19358
We can see from our plots from question 5 that there is a trend in the time series data. Similarly, if we are to go off of rsme it is lower for both values in the holt model even with an extra parameter. We should use holt’s model if there is a trend otherwise for non-obvious trended data it might be better to use a ses()
first comparing paperback forecasts
summary(ses.books.paperback)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = books.paperback)
##
## Smoothing parameters:
## alpha = 0.1685
##
## Initial states:
## l = 170.8271
##
## sigma: 34.8183
##
## AIC AICc BIC
## 318.9747 319.8978 323.1783
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303 -0.2117522
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 207.1097 162.4882 251.7311 138.8670 275.3523
## 32 207.1097 161.8589 252.3604 137.9046 276.3147
## 33 207.1097 161.2382 252.9811 136.9554 277.2639
## 34 207.1097 160.6259 253.5935 136.0188 278.2005
## 35 207.1097 160.0215 254.1979 135.0945 279.1249
## 36 207.1097 159.4247 254.7946 134.1818 280.0375
## 37 207.1097 158.8353 255.3840 133.2804 280.9389
## 38 207.1097 158.2531 255.9663 132.3899 281.8294
## 39 207.1097 157.6777 256.5417 131.5099 282.7094
## 40 207.1097 157.1089 257.1105 130.6400 283.5793
summary(holt.paperback)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = books.paperback, h = 4)
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 170.699
## b = 1.2621
##
## sigma: 33.4464
##
## AIC AICc BIC
## 318.3396 320.8396 325.3456
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
## ACF1
## Training set -0.1750792
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 209.4668 166.6035 252.3301 143.9130 275.0205
## 32 210.7177 167.8544 253.5811 145.1640 276.2715
## 33 211.9687 169.1054 254.8320 146.4149 277.5225
## 34 213.2197 170.3564 256.0830 147.6659 278.7735
now compare hardcovers
summary(ses.books.hardcover)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = books.hardcover)
##
## Smoothing parameters:
## alpha = 0.3283
##
## Initial states:
## l = 149.2861
##
## sigma: 33.0517
##
## AIC AICc BIC
## 315.8506 316.7737 320.0542
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887 -0.1417763
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 239.5601 197.2026 281.9176 174.7799 304.3403
## 32 239.5601 194.9788 284.1414 171.3788 307.7414
## 33 239.5601 192.8607 286.2595 168.1396 310.9806
## 34 239.5601 190.8347 288.2855 165.0410 314.0792
## 35 239.5601 188.8895 290.2306 162.0662 317.0540
## 36 239.5601 187.0164 292.1038 159.2014 319.9188
## 37 239.5601 185.2077 293.9124 156.4353 322.6848
## 38 239.5601 183.4574 295.6628 153.7584 325.3618
## 39 239.5601 181.7600 297.3602 151.1625 327.9577
## 40 239.5601 180.1111 299.0091 148.6406 330.4795
summary(holt.hardcover)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = books.hardcover, h = 4)
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 147.7935
## b = 3.303
##
## sigma: 29.2106
##
## AIC AICc BIC
## 310.2148 312.7148 317.2208
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
## ACF1
## Training set -0.03245186
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 250.1739 212.7390 287.6087 192.9222 307.4256
## 32 253.4765 216.0416 290.9113 196.2248 310.7282
## 33 256.7791 219.3442 294.2140 199.5274 314.0308
## 34 260.0817 222.6468 297.5166 202.8300 317.3334
for example, what if we plot one out
autoplot(holt.hardcover) + autolayer(books.hardcover) + autolayer(ses.books.hardcover, PI = FALSE)
We can see that the holts model (trended line) does a much better job with predictions. We would imagine the same thing for the hardcover since there was a trend in that one as well
paperback comparison
holt.paperback$upper[1, "95%"]
## 95%
## 275.0205
holt.paperback$lower[1, "95%"]
## 95%
## 143.913
rsme.paperback.upper <- holt.paperback$mean[1] + 1.96*holt.rsme.paper
rsme.paperback.lower <- holt.paperback$mean[1] - 1.96*holt.rsme.paper
rsme.paperback.upper
## [1] 270.4951
rsme.paperback.lower
## [1] 148.4384
ses interval comparison
ses.books.paperback$upper[1, "95%"]
## 95%
## 275.3523
ses.books.paperback$lower[1, "95%"]
## 95%
## 138.867
Our 95% usign the RSME has slightly smaller intervals.
eggs.data <- eggs
summary(eggs)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 62.27 148.87 209.15 206.15 276.71 358.78
autoplot(eggs)
eggs is yearly data
eggs.1 <- holt(eggs, h = 100)
autoplot(eggs.1) + autolayer(eggs)
predictions go below 0 so that is not good dampened trend
eggs.damp <- holt(eggs, damped = TRUE, phi = 0.83, h = 100)
autoplot(eggs.damp) + autolayer(eggs)
after trying multiple values of phi, the dampened holt model gives me a flat line each time. However, the point predictions do not go below 0 at least but the intervals still do
trying boxcox dampened
eggs.bc.damp <- holt(eggs, lambda = BoxCox.lambda(eggs), damped = TRUE, h = 100)
autoplot(eggs.bc.damp) + autolayer(eggs)
Lower boundary does a much better job this time in regards to realistic predictions (above 0) but still not super useful. However, we can see a very slight downward trend in point prediction line so overall box cox transformed + dampened yielded best results
Why is multiplicative seasonality necessary for this series? The magnitude of seasonality was changing so to model that accurately was best done with multiplicative. Additive seasonality works better when magnitude doesnt change
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
retaildata <- readxl::read_excel("C:\\Users\\wanga\\OneDrive\\Desktop\\PREDICTIVE ANALYTICS\\retail.xlsx", skip=1)
tsretail <- ts(retaildata[,"A3349336V"], frequency=12, start=c(1982,4))
autoplot(tsretail)
using holts and dampened
holt.retail <- hw(tsretail, seasonal = "multiplicative")
holt.retail.damp <- hw(tsretail, seasonal = "multiplicative", damped = TRUE)
autoplot(tsretail) + autolayer(holt.retail, series = "Multiplicative holts, damped = FALSE", PI = FALSE) + autolayer(holt.retail.damp, series = "Multiplicative holts, damped = TRUE", PI = FALSE)
We can see that the non damped peaks are higher than the damped ones
rsme.hw.retail <- sqrt(mean(holt.retail$residuals^2))
rsme.hw.retail
## [1] 0.05325072
rsme.hw.retail.damped <- sqrt(mean(holt.retail.damp$residuals^2))
rsme.hw.retail.damped
## [1] 0.05372678
The RSMEs are essentially the same. In short term, either model would be fine but if projecting long term sales, the damped model would be better
First is the non damped then damped plots
checkresiduals(holt.retail)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 48.323, df = 8, p-value = 8.57e-08
##
## Model df: 16. Total lags used: 24
checkresiduals(holt.retail.damp)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 50.505, df = 7, p-value = 1.15e-08
##
## Model df: 17. Total lags used: 24
Both residuals look almost exactly alike but the damped residuals has one less significant lag. However, the non damped residual histogram is more normally shaped.
setting up train and test sets and getting baseline goal
tsretail.train <- window(tsretail, end=c(2010,12))
tsretail.test <- window(tsretail, start=2011)
fc <- snaive(tsretail.train)
accuracy(fc, tsretail.test)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 13.51592 25.11534 19.49850 4.932556 7.778814 1.000000 0.5717573
## Test set -42.55833 45.76343 42.55833 -9.898213 9.898213 2.182647 0.1302427
## Theil's U
## Training set NA
## Test set 0.5709353
comparing hw method
hw.fc <- hw(tsretail.train, seasonal = "multiplicative", damped = TRUE)
accuracy(hw.fc, tsretail.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 1.115971 13.55181 9.841741 0.1805902 4.150258 0.5047435
## Test set -42.342749 44.73978 42.342749 -9.8406154 9.840615 2.1715902
## ACF1 Theil's U
## Training set 0.05881823 NA
## Test set 0.51543084 0.5740024
The holts winter damped model does a better job than the seasonal naive across all measurements
fc.tsretail.train <- stlm(tsretail.train, s.window = 13, robust = TRUE, method = "ets", lambda = BoxCox.lambda(tsretail.train))
fc.tsretail.train %>% forecast(h = 36, lambda = BoxCox.lambda(tsretail.train)) %>% autoplot()
## Warning in InvBoxCox(fcast$mean, lambda, biasadj, fcast): biasadj information
## not found, defaulting to FALSE.
fc.tsretail.train.values <- forecast(fc.tsretail.train, h = 36, lambda = BoxCox.lambda(tsretail.train))
## Warning in InvBoxCox(fcast$mean, lambda, biasadj, fcast): biasadj information
## not found, defaulting to FALSE.
fc.tsretail.train.values
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2011 494.7580 465.7418 525.5305 451.0589 542.5665
## Feb 2011 416.2886 389.7528 444.5793 376.3797 460.3041
## Mar 2011 445.6222 415.2818 478.1157 400.0443 496.2384
## Apr 2011 432.7747 401.4868 466.4304 385.8268 485.2641
## May 2011 470.3515 434.5847 508.9768 416.7371 530.6558
## Jun 2011 509.6897 469.1336 553.6499 448.9541 578.3928
## Jul 2011 486.3406 445.8895 530.3544 425.8213 555.1990
## Aug 2011 476.1973 434.9722 521.2151 414.5770 546.6958
## Sep 2011 458.5043 417.2982 503.6598 396.9677 529.2864
## Oct 2011 483.7487 438.8404 533.1173 416.7374 561.2020
## Nov 2011 524.8653 474.6833 580.1958 450.0415 611.7429
## Dec 2011 765.2840 690.6420 847.7543 654.0478 894.8490
## Jan 2012 521.8167 468.9733 580.4373 443.1468 614.0145
## Feb 2012 439.1642 393.3083 490.2070 370.9558 519.5189
## Mar 2012 470.0639 419.8343 526.1234 395.4000 558.3806
## Apr 2012 456.5308 406.5466 512.4762 382.2856 544.7372
## May 2012 496.1112 440.6778 558.3082 413.8231 594.2405
## Jun 2012 537.5422 476.3043 606.4162 446.6925 646.2778
## Jul 2012 512.9515 453.2366 580.2964 424.4224 619.3532
## Aug 2012 502.2683 442.6085 569.7261 413.8791 608.9252
## Sep 2012 483.6328 425.0410 550.0563 396.8830 588.7303
## Oct 2012 510.2217 447.3669 581.6388 417.2131 623.2909
## Nov 2012 553.5240 484.2812 632.3654 451.1171 678.4204
## Dec 2012 806.6388 704.9932 922.4923 656.3482 990.2197
## Jan 2013 550.3134 479.1588 631.7109 445.2025 679.4263
## Feb 2013 463.2613 402.1574 533.3642 373.0632 574.5485
## Mar 2013 495.8083 429.5296 571.9994 398.0203 616.8274
## Apr 2013 481.5541 416.1858 556.8728 385.1652 601.2646
## May 2013 523.2419 451.3509 606.2299 417.2842 655.2099
## Jun 2013 566.8738 488.0682 658.0098 450.7779 711.8727
## Jul 2013 540.9773 464.6735 629.4224 428.6312 681.7847
## Aug 2013 529.7263 453.9955 617.6971 418.2841 669.8632
## Sep 2013 510.0996 436.1810 596.1548 401.3840 647.2696
## Oct 2013 538.1025 459.2689 630.0456 422.2104 684.7316
## Nov 2013 583.7033 497.3365 684.5997 456.7892 744.6858
## Dec 2013 850.1652 724.0794 997.5201 664.9031 1085.2990
comparing accuracy below
accuracy(fc.tsretail.train.values, tsretail.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1259679 11.41780 8.278065 -0.1266136 3.531921 0.4245489
## Test set -85.5004980 92.06434 85.500498 -19.5488790 19.548879 4.3849786
## ACF1 Theil's U
## Training set 0.001568912 NA
## Test set 0.730119222 1.179901
using RSME and MASE as a metric, this model performed even better than the holts winter multiplicative model
head(ukcars)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1977 330.371 371.051 270.670 343.880
## 1978 358.491 362.822
autoplot(ukcars)
Lots of seasonality in the data and a very clear trend
stl.ukcars <- ukcars %>% stl(s.window = "periodic", robust = TRUE)
stl.ukcars.seasadj <- stl.ukcars %>% seasadj()
etsAAN.stl.ukcars.seasadj <- stlf(stl.ukcars.seasadj, etsmodel = "AAN", damped = TRUE, h = 8)
autoplot(stl.ukcars.seasadj) + autolayer(etsAAN.stl.ukcars.seasadj, series = "ETS AAN model")
holt.stl.ukcars.seasadj <- holt(stl.ukcars.seasadj, h = 8)
autoplot(stl.ukcars.seasadj) + autolayer(holt.stl.ukcars.seasadj, series = "Holts linear damped = FALSE")
ets.ukcars <- ets(ukcars)
summary(ets.ukcars)
## ETS(A,N,A)
##
## Call:
## ets(y = ukcars)
##
## Smoothing parameters:
## alpha = 0.6199
## gamma = 1e-04
##
## Initial states:
## l = 314.2568
## s = -1.7579 -44.9601 21.1956 25.5223
##
## sigma: 25.9302
##
## AIC AICc BIC
## 1277.752 1278.819 1296.844
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 1.313884 25.23244 20.17907 -0.1570979 6.629003 0.6576259
## ACF1
## Training set 0.02573334
fc.ets.ukcars <- forecast(ets.ukcars, h = 8)
autoplot(ukcars) + autolayer(fc.ets.ukcars, series = "ETS ANA forecasts")
ETS function automatically selected model specificataion “ANA”
accuracy(etsAAN.stl.ukcars.seasadj)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.551267 23.32113 18.48987 0.07585736 5.955086 0.602576 0.02262668
accuracy(holt.stl.ukcars.seasadj)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.2697968 25.32878 20.05443 -0.6328861 6.474306 0.653564
## ACF1
## Training set 0.0276835
accuracy(ets.ukcars)
## ME RMSE MAE MPE MAPE MASE
## Training set 1.313884 25.23244 20.17907 -0.1570979 6.629003 0.6576259
## ACF1
## Training set 0.02573334
The first model “AAN” had the lowest RSME
Let us refer back to the plots of the three above models. We can dismiss the linear holt model off the bat as it does not account for seasonality. That leaves us with either the ETS AAN or ANA model. They both look similar but we might note that towards the later section of our plot, the magnitude of seasonality becomes smaller. Our AAN model (first one) captures this while the second one (ANA) does not. Therefore I would use the ets AAN model
checkresiduals(etsAAN.stl.ukcars.seasadj)
## Warning in checkresiduals(etsAAN.stl.ukcars.seasadj): The fitted degrees of
## freedom is based on the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(A,Ad,N)
## Q* = 24.138, df = 3, p-value = 2.337e-05
##
## Model df: 5. Total lags used: 8
We can see seasonality in the ACF plots and p-value would indicate that the residuals are not white noise
str(visitors)
## Time-Series [1:240] from 1985 to 2005: 75.7 75.4 83.1 82.9 77.3 ...
autoplot(visitors)
It already seems to be in time series format. Clear upwards trend and seasonality. Seasonality increases in magnitude so it may be best to use multiplicative methods or perform boxcox transformations
visitors.train <- visitors[1:216]
visitors.test <- visitors[217:240]
ts.visitors.train <- ts(visitors.train, frequency = 12, start = c(1985, 5))
ts.visitors.test <- ts(visitors.test, frequency = 12, start = c(2003, 5))
hw.visit.test <- hw(ts.visitors.train, h = 24, seasonal = "multiplicative")
summary(hw.visit.test)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = ts.visitors.train, h = 24, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.4379
## beta = 0.0164
## gamma = 1e-04
##
## Initial states:
## l = 90.9387
## b = 2.584
## s = 0.945 1.0553 1.0882 0.9681 1.3072 1.0711
## 1.0264 0.9095 0.938 1.0401 0.8509 0.8001
##
## sigma: 0.0571
##
## AIC AICc BIC
## 2326.608 2329.699 2383.988
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.9749466 14.06539 10.35763 -0.5792169 4.223204 0.3970304
## ACF1
## Training set 0.1356528
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## May 2003 293.5512 272.0790 315.0235 260.7123 326.3902
## Jun 2003 311.3817 286.3466 336.4168 273.0939 349.6695
## Jul 2003 379.6865 346.4414 412.9316 328.8425 430.5305
## Aug 2003 341.5672 309.2321 373.9023 292.1150 391.0195
## Sep 2003 330.3820 296.7647 363.9994 278.9688 381.7953
## Oct 2003 371.9441 331.4615 412.4267 310.0313 433.8569
## Nov 2003 387.1607 342.2705 432.0509 318.5070 455.8144
## Dec 2003 471.3185 413.3054 529.3316 382.5951 560.0419
## Jan 2004 348.1741 302.8173 393.5309 278.8068 417.5414
## Feb 2004 390.3863 336.7055 444.0672 308.2885 472.4841
## Mar 2004 377.6150 322.9349 432.2951 293.9890 461.2410
## Apr 2004 337.3049 285.9782 388.6316 258.8075 415.8024
## May 2004 284.8762 239.4088 330.3437 215.3398 354.4127
## Jun 2004 302.1571 251.6621 352.6521 224.9316 379.3825
## Jul 2004 368.4105 304.0470 432.7739 269.9751 466.8459
## Aug 2004 331.3981 270.9579 391.8384 238.9627 423.8335
## Sep 2004 320.5215 259.5777 381.4653 227.3160 413.7270
## Oct 2004 360.8154 289.3783 432.2525 251.5618 470.0690
## Nov 2004 375.5478 298.2123 452.8832 257.2733 493.8222
## Dec 2004 457.1458 359.3352 554.9564 307.5574 606.7342
## Jan 2005 337.6781 262.6845 412.6717 222.9853 452.3709
## Feb 2005 378.5881 291.3958 465.7805 245.2390 511.9373
## Mar 2005 366.1740 278.7938 453.5542 232.5375 499.8105
## Apr 2005 327.0594 246.2591 407.8596 203.4861 450.6326
autoplot(ts.visitors.train) + autolayer(ts.visitors.test, series = "True values") + autolayer(hw.visit.test, series = "Forecasted values", PI = FALSE) + xlim(2003, 2007)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
As mentioned above, the level of seasonality over time becomes greater. Additive cant capture this properly so we have to use multiplicative
Forecast the two-year test set using each of the following methods:
an ETS model;
ets.visitors <- ets(ts.visitors.train)
fc.ets.visitors <- forecast(ets.visitors, h = 24)
autoplot(ts.visitors.train) + autolayer(fc.ets.visitors, PI = FALSE) + autolayer(ts.visitors.test, series = "actual values")
ets.add.visitors.train <- ets(ts.visitors.train, additive.only = TRUE, lambda = BoxCox.lambda(ts.visitors.train))
fc.ets.add.visitors.train <- forecast(ets.add.visitors.train, h = 24)
autoplot(ts.visitors.train) + autolayer(fc.ets.add.visitors.train, PI = FALSE) + autolayer(ts.visitors.test, series = "actual values")
sn.visitors <- snaive(ts.visitors.train)
autoplot(ts.visitors.train) + autolayer(sn.visitors, PI = FALSE) + autolayer(ts.visitors.test)
stlf.visitors <- stlf(ts.visitors.train, s.window = 13, robust = TRUE, method = "ets", lambda = BoxCox.lambda(ts.visitors.train))
fc.stlf.visitors <- forecast(stlf.visitors, h = 24, lambda = BoxCox.lambda(ts.visitors.train))
autoplot(stlf.visitors) + autolayer(fc.stlf.visitors, PI = FALSE) + autolayer(ts.visitors.test, series = "actual values")
for some reason, all my models have underestimated but it looks here as if the seasonal naive does the best job in predictions. this could change
it seems like the stlf model does the best based off the accuracy charts
checkresiduals(stlf.visitors)
## Warning in checkresiduals(stlf.visitors): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(M,Ad,N)
## Q* = 18.517, df = 19, p-value = 0.4882
##
## Model df: 5. Total lags used: 24
Residuals look like white noise and p value large so it seems to pass
accuracy(fc.ets.visitors, ts.visitors.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.7640074 14.53480 10.57657 0.1048224 3.994788 0.405423
## Test set 72.1992664 80.23124 74.55285 15.9202832 16.822384 2.857773
## ACF1 Theil's U
## Training set -0.05311217 NA
## Test set 0.58716982 1.127269
accuracy(fc.ets.add.visitors.train, ts.visitors.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 1.001363 14.97096 10.82396 0.1609336 3.974215 0.4149057
## Test set 69.458843 78.61032 72.41589 15.1662261 16.273089 2.7758586
## ACF1 Theil's U
## Training set -0.02535299 NA
## Test set 0.67684148 1.086953
accuracy(sn.visitors, ts.visitors.test)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 17.29363 31.15613 26.08775 7.192445 10.285961 1.000000 0.6327669
## Test set 32.87083 50.30097 42.24583 6.640781 9.962647 1.619375 0.5725430
## Theil's U
## Training set NA
## Test set 0.6594016
accuracy(fc.stlf.visitors, ts.visitors.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.5803348 13.36431 9.551391 0.08767744 3.51950 0.3661256
## Test set 76.3637263 84.24658 78.028992 16.87750474 17.51578 2.9910209
## ACF1 Theil's U
## Training set -0.05924203 NA
## Test set 0.64749552 1.178154
If we are to refer to the RSME value than the stlf model performed the best of the bunch.
autoplot(qcement)
fets <- function(y, h) {
forecast(ets(y), h = h)
}
tsCV.ets.qcement <- tsCV(qcement, forecastfunction = fets, h = 4)
tsCV.sn.qcement <- tsCV(qcement, forecastfunction = snaive, h = 4)
summary(tsCV.ets.qcement)
## h=1 h=2 h=3 h=4
## Min. :-0.284697 Min. :-0.505359 Min. :-0.467304 Min. :-0.45008
## 1st Qu.:-0.040635 1st Qu.:-0.043311 1st Qu.:-0.052818 1st Qu.:-0.05847
## Median : 0.003049 Median : 0.005511 Median : 0.017342 Median : 0.01804
## Mean : 0.001308 Mean : 0.003296 Mean : 0.002916 Mean : 0.00374
## 3rd Qu.: 0.046562 3rd Qu.: 0.065395 3rd Qu.: 0.076212 3rd Qu.: 0.07748
## Max. : 0.273002 Max. : 0.307940 Max. : 0.316248 Max. : 0.34043
## NA's :1 NA's :2 NA's :3 NA's :4
MSE.tsCV.ets.qcement <- mean(tsCV.ets.qcement^2, na.rm = TRUE)
MSE.tsCV.sn.qcement <- mean(tsCV.sn.qcement^2, na.rm = TRUE)
The ets forecast seems to be more accurate. Our autoplot revealed increasing magnitude of seasonality so some sort of multiplicative model should do better than a naive model. The NA’s seem to correspond to the number of steps ahead that our model is predicting. I am not completely certain on why this occurs but it looks similar to the tables on 7.1 and 7.2
ausbeer
str(ausbeer)
## Time-Series [1:218] from 1956 to 2010: 284 213 227 308 262 228 236 320 272 233 ...
autoplot(ausbeer)
ausbeer.train <- ts(ausbeer[1:206], frequency = 4, start = c(1956,1))
ausbeer.test <- ts(ausbeer[207:218], frequency = 4, start = c(2007,3))
ets.ausbeer <- ets(ausbeer.train)
fc.ets.ausbeer <- forecast(ets.ausbeer, h = 12)
sn.ausbeer <- snaive(ausbeer.train, h = 12)
stlf.ausbeer <- stlf(ausbeer.train, h = 12, lambda = BoxCox.lambda(ausbeer.train))
autoplot(ausbeer.train) + autolayer(fc.ets.ausbeer, series = "ets", PI = FALSE) + autolayer(sn.ausbeer, series = "snaive", PI = FALSE) + autolayer(stlf.ausbeer, series = "stlf", PI = FALSE) + autolayer(ausbeer.test, series = "actual") + xlim(2006, 2011)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
bricksq
str(bricksq)
## Time-Series [1:155] from 1956 to 1994: 189 204 208 197 187 214 227 223 199 229 ...
autoplot(bricksq)
bricksq.train <- ts(bricksq[1:143], frequency = 4, start = c(1956,1))
bricksq.test <- ts(bricksq[144:155], frequency = 4, start = c(1991,4))
ets.bricksq <- ets(bricksq.train)
fc.ets.bricksq <- forecast(ets.bricksq, h = 12)
sn.bricksq <- snaive(bricksq.train, h = 12)
stlf.bricksq <- stlf(bricksq.train, h = 12, lambda = BoxCox.lambda(bricksq.train))
autoplot(bricksq.train) + autolayer(fc.ets.bricksq, series = "ets", PI = FALSE) + autolayer(sn.bricksq, series = "snaive", PI = FALSE) + autolayer(stlf.bricksq, series = "stlf", PI = FALSE) + autolayer(bricksq.test, series = "actual") + xlim(1991, 1996)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
dole
str(dole)
## Time-Series [1:439] from 1956 to 1992: 4742 6128 6494 5379 6011 ...
autoplot(dole)
dole.train<- ts(dole[1:403], frequency = 12, start = c(1956,1))
dole.test <- ts(dole[404:439], frequency = 12, start = c(1989,8))
ets.dole <- ets(dole.train)
fc.ets.dole <- forecast(ets.dole, h = 36)
sn.dole <- snaive(dole.train, h = 36)
stlf.dole <- stlf(dole.train, h = 36, lambda = BoxCox.lambda(dole.train))
autoplot(dole.train) + autolayer(fc.ets.dole, series = "ets", PI = FALSE) + autolayer(sn.dole, series = "snaive", PI = FALSE) + autolayer(stlf.dole, series = "stlf", PI = FALSE) + autolayer(dole.test, series = "actual") + xlim(1988, 1993)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
bicoal %>% ets() %>% forecast() %>% autoplot()
chicken
chicken %>% ets() %>% forecast() %>% autoplot()
dole
dole %>% ets() %>% forecast() %>% autoplot()
usdeaths
usdeaths %>% ets() %>% forecast() %>% autoplot()
lynx
lynx %>% ets() %>% forecast() %>% autoplot()
ibmclose
ibmclose %>% ets() %>% forecast() %>% autoplot()
eggs
eggs %>% ets() %>% forecast() %>% autoplot()
Not always. A few ets models are able to capture seasonality but some of them also have straight line predictions which seem to essentially just be naive models
str(eggs)
## Time-Series [1:94] from 1900 to 1993: 277 315 315 321 315 ...
It did not work well on eggs. Just by looking at the plot, it is hard to distinguish any sort of seasonality. Furthermore, price specifically affected by inflation and demand. With the data seemingly looking random (dont quote me on this) then the ets model may have a hard time finding a pattern. Perhaps something power transformed/seasonally adjusted/inflation adjusted would have a better prediction.
I will use the tsretail example from earlier
holt.retail <- hw(tsretail, seasonal = “multiplicative”)
summary(holt.retail)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = tsretail, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.4297
## beta = 0.0067
## gamma = 0.1849
##
## Initial states:
## l = 103.7415
## b = 0.5713
## s = 0.9592 0.867 0.9124 1.4858 1.0326 0.9468
## 0.9277 0.9808 0.9907 0.9776 1.0141 0.9053
##
## sigma: 0.0544
##
## AIC AICc BIC
## 4249.805 4251.491 4316.832
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.03233263 13.58846 9.970385 -0.1710066 4.010066 0.5047153
## ACF1
## Training set 0.04424191
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 424.9538 395.3247 454.5830 379.6399 470.2677
## Feb 2014 365.4465 337.6432 393.2498 322.9250 407.9680
## Mar 2014 395.6692 363.1791 428.1593 345.9799 445.3585
## Apr 2014 381.2116 347.7055 414.7178 329.9684 432.4549
## May 2014 417.4747 378.4541 456.4952 357.7979 477.1514
## Jun 2014 457.9087 412.6335 503.1838 388.6663 527.1511
## Jul 2014 429.6804 384.9339 474.4269 361.2466 498.1142
## Aug 2014 418.7774 373.0110 464.5438 348.7838 488.7710
## Sep 2014 402.4656 356.4511 448.4801 332.0924 472.8387
## Oct 2014 413.4669 364.1440 462.7899 338.0340 488.8998
## Nov 2014 453.4647 397.1553 509.7742 367.3469 539.5826
## Dec 2014 667.8661 581.7138 754.0183 536.1076 799.6245
## Jan 2015 428.9451 369.8583 488.0319 338.5797 519.3105
## Feb 2015 368.8764 316.3727 421.3800 288.5789 449.1738
## Mar 2015 399.3800 340.7168 458.0431 309.6624 489.0975
## Apr 2015 384.7842 326.5248 443.0435 295.6842 473.8842
## May 2015 421.3842 355.6885 487.0799 320.9113 521.8571
## Jun 2015 462.1937 388.0676 536.3198 348.8275 575.5598
## Jul 2015 433.6983 362.2100 505.1866 324.3664 543.0301
## Aug 2015 422.6905 351.1406 494.2404 313.2644 532.1166
## Sep 2015 406.2235 335.6638 476.7831 298.3118 514.1352
## Oct 2015 417.3247 342.9957 491.6538 303.6482 531.0013
## Nov 2015 457.6926 374.1591 541.2261 329.9392 585.4461
## Dec 2015 674.0884 548.0993 800.0776 481.4046 866.7722
Above are the point forecasts for 2 years worth of data using a holt winters multiplicative method
ets.MAM.retail <- ets(tsretail, model = "MAM")
fc.ets.MAM.retail <- forecast(ets.MAM.retail, h = 24)
summary(fc.ets.MAM.retail)
##
## Forecast method: ETS(M,Ad,M)
##
## Model Information:
## ETS(M,Ad,M)
##
## Call:
## ets(y = tsretail, model = "MAM")
##
## Smoothing parameters:
## alpha = 0.4449
## beta = 0.0052
## gamma = 0.1465
## phi = 0.98
##
## Initial states:
## l = 102.6514
## b = 1.0174
## s = 0.9604 0.8751 0.9041 1.4974 1.0453 0.9542
## 0.922 0.9679 1.003 0.9665 1.0121 0.8919
##
## sigma: 0.0545
##
## AIC AICc BIC
## 4249.441 4251.331 4320.411
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.160435 13.72365 9.950394 0.2020476 3.977401 0.5037034 0.02578996
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 424.9125 395.2275 454.5976 379.5132 470.3119
## Feb 2014 365.7091 337.6839 393.7343 322.8482 408.5700
## Mar 2014 396.3098 363.4186 429.2009 346.0071 446.6124
## Apr 2014 381.6636 347.6836 415.6437 329.6957 433.6316
## May 2014 418.1393 378.4969 457.7818 357.5114 478.7672
## Jun 2014 454.9767 409.3133 500.6401 385.1405 524.8129
## Jul 2014 426.6709 381.5558 471.7859 357.6734 495.6684
## Aug 2014 415.8787 369.7362 462.0211 345.3099 486.4475
## Sep 2014 399.4849 353.1334 445.8364 328.5964 470.3734
## Oct 2014 411.6595 361.8558 461.4633 335.4912 487.8278
## Nov 2014 449.1881 392.6669 505.7093 362.7463 535.6298
## Dec 2014 658.1044 572.1693 744.0395 526.6780 789.5308
## Jan 2015 424.1100 365.4669 482.7531 334.4231 513.7969
## Feb 2015 365.0336 312.9211 417.1462 285.3344 444.7329
## Mar 2015 395.5940 337.3692 453.8188 306.5468 484.6412
## Apr 2015 380.9896 323.2524 438.7268 292.6882 469.2910
## May 2015 417.4174 352.3623 482.4724 317.9242 516.9105
## Jun 2015 454.2087 381.4874 526.9299 342.9911 565.4262
## Jul 2015 425.9667 355.9759 495.9576 318.9249 533.0085
## Aug 2015 415.2077 345.2577 485.1578 308.2283 522.1872
## Sep 2015 398.8549 330.0180 467.6919 293.5779 504.1320
## Oct 2015 411.0250 338.4125 483.6375 299.9737 522.0763
## Nov 2015 448.5114 367.4658 529.5570 324.5629 572.4599
## Dec 2015 657.1356 535.7630 778.5081 471.5123 842.7588
If we compare the two, we recieve nearly exactly the same forecasts
Show that the forecast variance for an ETS(A,N,N) model is given by σ^2[1 + α^2(h−1)]
Write down 95% prediction intervals for an ETS(A,N,N) model as a function of ℓT, α, h and σ, assuming normally distributed errors.
CHAPTER 8
All of the ACF plot lags fall within the non statistically significant line so they would indicate white noise. Additionally, the more observations we have, the more the lags should tend towards no autocorrelation if it is white noise. That looks to be the case here
The more observations, the less variance the residuals should have and the tighter the confidence interval should be. It si the same idea as flipping coins; there is a high chance for two coins flipped to both be heads up but a essentially zero chance for 1000 coins flipped to land 1000 heads up. White noise refers to random and not following any sort of pattern. The ACF plots may differ in magnitude but each of them doesnt seem to follow any cycles and are therefore white noise
autoplot(ibmclose)
acf(ibmclose)
pacf(ibmclose)
The book tells us that time series which have trend or seasonality are not stationary. There is a clear trend when the ts() is autoploted so it is not stationary. The ACF plot shows a significant amount of autocorrelation. If the data were stationry, we would expect to see few to none lags outside of the dotted blue lines with lags appearing in both the positive and negatives. The partial acf plot indicates that this is an autoregressive term and depends heavily on the previous observation. Not stationary
ggtsdisplay(usnetelec)
diff.usnetelec <- diff(usnetelec, 1)
checkresiduals(diff.usnetelec)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
It seems that by differencing by an order of 1, we got rid of the first term and the resulting residuals look like white noise
ggtsdisplay(usgdp)
Trying the same trick as the other one did not seem to work so i will try transforming first
bc.usgdp <- BoxCox.lambda(usgdp)
ndiffs(usgdp)
## [1] 2
usgdp.tr <- usgdp %>% BoxCox(bc.usgdp) %>% diff(1)
ggtsdisplay(usgdp.tr)
Ndiffs tells us 2 differences will make series stationary
usgdp.tr <- usgdp %>% BoxCox(bc.usgdp) %>% diff(2)
ggtsdisplay(usgdp.tr)
ggtsdisplay(mcopper)
bc.mcopper <- BoxCox.lambda(mcopper)
ndiffs(mcopper)
## [1] 1
mcopper.tr <- mcopper %>% BoxCox(bc.mcopper) %>% diff(1)
ggtsdisplay(mcopper.tr)
Even after differencing once there is a large first ACF and PACF lag but it looks as if the rest of the levels drop off to white noise fairly quickly
ggtsdisplay(enplanements)
diff.enpl <- diff(log(enplanements), 12)
ggtsdisplay(diff.enpl)
Following what the book did for seasonal data we were able to transform the graph into something that looks stationary. However, the ACF and PACF still have some concerning features.
diff.enpl <- diff(diff(log(enplanements), 12))
ggtsdisplay(diff.enpl)
I then differenced again and was able to obtain a better looking ACF and PACF plot
ggtsdisplay(visitors)
There is strong seasonality and a trend. We will try box cox here and then difference
bc.visitors <- BoxCox.lambda(visitors)
ndiffs(visitors)
## [1] 1
visitors.tr <- visitors %>% BoxCox(bc.visitors) %>% diff(12)
ggtsdisplay(visitors.tr)
Same deal, will try differencing a second time
visitors.tr <- visitors %>% BoxCox(bc.visitors) %>% diff(12) %>% diff(1)
ggtsdisplay(visitors.tr)
There is still some issues in the PACF and ACF but it looks a bit better than the original after transforming, adjusting for seasonality and differencing
A backshift notation of seasonal data would be (1 - B)^12 y(t)
plot(tsretail)
ggtsdisplay(tsretail)
We might note that there is strong seasonality and increasing magnitude in the dependent variable. We could boxcox transform and check ndiffs() just like the other one.
bc.tsretail <- BoxCox.lambda(tsretail)
ndiffs(tsretail)
## [1] 1
tsretail.tr <- tsretail %>% BoxCox(bc.tsretail) %>% diff(12) %>% diff(1)
ggtsdisplay(tsretail.tr)
tsretail.tr %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0189
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Now after using the kpss() test on the differenced and transformed data, we can see that based off the kpss() results it shouldnt require any further differencing. The test statistic is small and the p-values are large and non-significant
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + e[i]
}
ggtsdisplay(y)
The graph above is our initial time series data. We can run some tests to see what happens as phi changes. Same code but phi is now 0.8
y <- ts(numeric(100))
y2 <- ts(numeric(100))
y3 <- ts(numeric(100))
y4 <- ts(numeric(100))
y5 <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + e[i]
y2[i] <- 0.1*y2[i-1] + e[i]
y3[i] <- 0.3*y3[i-1] + e[i]
y4[i] <- 0.9*y4[i-1] + e[i]
y5[i] <- 1.3*y5[i-1] + e[i]
}
ggtsdisplay(y)
ggtsdisplay(y2)
ggtsdisplay(y3)
ggtsdisplay(y4)
ggtsdisplay(y5)
As phi increases, more autocorrelation is present in the time series
moving.avg <- function(theta, sd = 1){
y <- ts(numeric(100))
e <- rnorm(100)
for (i in 2:100){
y[i] <- e[i] + 0.6*y[i - 1]
}
return(y)
}
ma.1 <- moving.avg(0.6)
ma.2 <- moving.avg(0.1)
ma.3 <- moving.avg(1.3)
ggtsdisplay(ma.1)
ggtsdisplay(ma.2)
ggtsdisplay(ma.3)
As theta increases in value, the ACF plots become more correlated. However, the autocorrelation at higher thetas is not nearly as much as the AR(1) models
Do we just join the two equations together? If so then
ARMA.model <- function(phi, theta, sd = 1){
x <- ts(numeric(100))
z <- rnorm(100)
for (i in 2:length(z)){
x[i] <- phi * x[i - 1] + theta * z[i - 1] + 2 * z[i]
}
return(x)
}
ARMA.1 <- ARMA.model(0.6, 0.6)
ARMA.2 <- ARMA.model(0.2, 0.8)
ARMA.3 <- ARMA.model(1.3, 0.5)
ggtsdisplay(ARMA.1)
ggtsdisplay(ARMA.2)
ggtsdisplay(ARMA.3)
I am unsure if this is correct but the higher the phi value the more autocorrelation there is
AR.model <- function(phi, theta, sd = 1){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100){
y[i] <- phi * y[i-1] + theta * y[i - 2] + e[i]
}
return(y)
}
AR.2 <- AR.model(-0.8, 0.3)
ggtsdisplay(AR.2)
The ARMA(1,1) looks relatively stationary and still resembles white noise while the AR(2) model has an oscillatory pattern
ggtsdisplay(wmurders)
We can reasonably expect that we will need a non-zero amount of differences and possibly a transformation. Our ndiffs function tells us we need 2 differences
ndiffs(wmurders)
## [1] 2
diff.wmurd.2 <- diff(wmurders, differences = 2)
checkresiduals(diff.wmurd.2)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
ggtsdisplay(diff.wmurd.2)
We have chosen to difference twice since that is what our ndiffs() function told us.Similarly, I will try a p value of 1 from the PACF and q value of 2 from the ACF graph
arima.wmurders <- arima(wmurders, order = c(1,2,2))
summary(arima.wmurders)
##
## Call:
## arima(x = wmurders, order = c(1, 2, 2))
##
## Coefficients:
## ar1 ma1 ma2
## -0.7677 -0.2812 -0.4977
## s.e. 0.2350 0.2879 0.2762
##
## sigma^2 estimated as 0.04295: log likelihood = 7.38, aic = -6.75
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01109526 0.2034302 0.1504565 -0.2279984 4.285732 0.9252368
## ACF1
## Training set 0.01083757
No, if d > 1 then the constant will be omitted automatically.
In the backshift operator, this model should be (1 - phi * B)(1 - B)^2 y(t) = c + (1 + theta * B - theta2 * B2)e(t)
checkresiduals(arima.wmurders)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,2)
## Q* = 9.6215, df = 7, p-value = 0.2111
##
## Model df: 3. Total lags used: 10
Our ACF plot has no lags outside of the significant dotted blue and residuals are relatively normal. Additionally, the Ljung box test has a non-significant p value. I would say that this is sastisfactory
fc.arima.wmurder <- forecast(arima.wmurders, h = 3)
summary(fc.arima.wmurder)
##
## Forecast method: ARIMA(1,2,2)
##
## Model Information:
##
## Call:
## arima(x = wmurders, order = c(1, 2, 2))
##
## Coefficients:
## ar1 ma1 ma2
## -0.7677 -0.2812 -0.4977
## s.e. 0.2350 0.2879 0.2762
##
## sigma^2 estimated as 0.04295: log likelihood = 7.38, aic = -6.75
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01109526 0.2034302 0.1504565 -0.2279984 4.285732 0.9252368
## ACF1
## Training set 0.01083757
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.534015 2.268436 2.799594 2.127847 2.940183
## 2006 2.404157 2.037630 2.770684 1.843602 2.964712
## 2007 2.331482 1.844079 2.818885 1.586064 3.076901
autoplot(wmurders) + autolayer(fc.arima.wmurder, series = "ARIMA(1,2,2) predictions")
a.arima.wmurders <- auto.arima(wmurders)
checkresiduals(a.arima.wmurders)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
##
## Model df: 2. Total lags used: 10
fc.a.arima.wmurders <- forecast(a.arima.wmurders, h = 3)
autoplot(wmurders) + autolayer(fc.a.arima.wmurders, series = "AUTO ARIMA predictions")
Both forecasts look relatively similar but the model where we selected our own p,d and q values seems to flatten out which seems more realistic; the auto arima model looks like it will trend into the negatives. For that reason, I would go with the arima(1,2,2) instead of the auto.arima(1,2,1)
autoplot(austa)
a.arima.austa <- auto.arima(austa)
summary(a.arima.austa)
## Series: austa
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## 0.3006 0.1735
## s.e. 0.1647 0.0390
##
## sigma^2 estimated as 0.03376: log likelihood=10.62
## AIC=-15.24 AICc=-14.46 BIC=-10.57
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0008313383 0.1759116 0.1520309 -1.069983 5.513269 0.7461559
## ACF1
## Training set -0.000571993
checkresiduals(a.arima.austa)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1) with drift
## Q* = 2.297, df = 5, p-value = 0.8067
##
## Model df: 2. Total lags used: 7
fc.a.arima.austa <- forecast(a.arima.austa, h = 10)
autoplot(austa) + autolayer(fc.a.arima.austa, series = "AUTO ARIMA predictions for austa")
Checkresiduals() statistics would indicate white noise. Auto selected model with p,d,q of 0,1,1.
Summary tells us austa drift is 0.1735. How do we remove? Can we just subtract that constant from austa?
detrend.austa <- austa - a.arima.austa$coef[2]
autoplot(detrend.austa)
doesnt seem to do anything
a.arima.austa.nd <- arima(austa, order = c(0,1,1))
fc.a.arima.austa.nd <- forecast(a.arima.austa.nd, h = 10)
autoplot(austa) + autolayer(fc.a.arima.austa.nd, series = "No trend forecast?")
Using arima instead of auto.arima seems to detrend
austa213 <- arima(austa, order = c(2,1,3), method = "ML")
fc.austa213 <- forecast(austa213, h = 10)
autoplot(austa) + autolayer(fc.austa213, series = "ARIMA(2,1,3) predictions")
austa.001 <- arima(austa, order = c(0,0,1))
fc.austa.001 <- forecast(austa.001)
autoplot(austa) + autolayer(fc.austa.001, series = "0,0,1 with constant?")
austa.021 <- arima(austa, order = c(0,2,1))
fc.austa.021 <- forecast(austa.021)
autoplot(austa) + autolayer(fc.austa.021, series = "0,2,1 with no constant?")
Reading through the documentation I did not see any apparent variable for constant so I have just plugged in the orders of p,d and q
ggtsdisplay(usgdp)
a. if necessary, find a suitable Box-Cox transformation for the data;
print(bc.usgdp)
## [1] 0.366352
a.arima.usgdp <- auto.arima(usgdp, lambda = bc.usgdp)
summary(a.arima.usgdp)
## Series: usgdp
## ARIMA(2,1,0) with drift
## Box Cox transformation: lambda= 0.366352
##
## Coefficients:
## ar1 ar2 drift
## 0.2795 0.1208 0.1829
## s.e. 0.0647 0.0648 0.0202
##
## sigma^2 estimated as 0.03518: log likelihood=61.56
## AIC=-115.11 AICc=-114.94 BIC=-101.26
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 1.195275 39.2224 29.29521 -0.01363259 0.6863491 0.1655687
## ACF1
## Training set -0.03824844
checkresiduals(a.arima.usgdp)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,0) with drift
## Q* = 6.5772, df = 5, p-value = 0.254
##
## Model df: 3. Total lags used: 8
ndiffs(usgdp)
## [1] 2
diff.usgdp <- diff(usgdp, differences = 2)
ggtsdisplay(diff.usgdp)
Based off of this, I will select p = 1 due to PACF and q = 1 due to ACF
usgdp.121 <- arima(usgdp, order = c(1,2,1))
summary(usgdp.121)
##
## Call:
## arima(x = usgdp, order = c(1, 2, 1))
##
## Coefficients:
## ar1 ma1
## 0.2899 -0.9555
## s.e. 0.0668 0.0197
##
## sigma^2 estimated as 1651: log likelihood = -1204.96, aic = 2415.92
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 4.488314 40.46031 30.09537 0.09247046 0.6936754 0.6032826
## ACF1
## Training set -0.07106256
checkresiduals(usgdp.121)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 19.044, df = 6, p-value = 0.00409
##
## Model df: 2. Total lags used: 8
Our own selected model performed worse on the error measurements and also residuals showed signs of autocorrelation
I will stick with the auto.arima of 2,1,0
checkresiduals(a.arima.usgdp)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,0) with drift
## Q* = 6.5772, df = 5, p-value = 0.254
##
## Model df: 3. Total lags used: 8
Residual diagnostics look good here. Besides lag 12, everything else indicates that the model is satisfactory
fc.a.arima.usgdp <- forecast(a.arima.usgdp)
usgdp.bc <- usgdp %>% BoxCox(bc.usgdp)
autoplot(usgdp) + autolayer(fc.a.arima.usgdp)
autoplot(usgdp)
Yes, the forecasts look reasonable
ets.usgdp <- ets(usgdp)
summary(ets.usgdp)
## ETS(A,A,N)
##
## Call:
## ets(y = usgdp)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.278
##
## Initial states:
## l = 1557.4589
## b = 18.6862
##
## sigma: 41.8895
##
## AIC AICc BIC
## 3072.303 3072.562 3089.643
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.30185 41.53448 31.85324 0.02319607 0.7543931 0.180026 0.07596492
fc.ets.usgdp <- forecast(ets.usgdp)
autoplot(usgdp) + autolayer(fc.ets.usgdp, series = "ETS prediction of usgdp")
The results look almost identical but it seems as if the ETS model does not flatten out over time. For this reason only, I would say that the auto.arima model does a better job in predictions then
autoplot(austourists)
The time series data is highly seasonal and trending upwards. It seems as if the magnitude of peaks and troughs is getting larger so we might need to transform
ggtsdisplay(austourists)
The data is not stationary. It looks seasonal as you can see the same spike in lag every 4 lags
The PACF graph shows us that there are issues that will need to be resolved first by either differencing or another method before we can perform analysis on it
ndiffs(austourists)
## [1] 1
diff.austourists <- diff(diff(austourists, 4), differences = 1)
ggtsdisplay(diff.austourists)
After accounting for the quarterly seasonality and differencing, the ACF and PACF plots look a decent amount better. We could check the kpss() test to verify
diff.austourists %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.0359
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Test statistic is small and would seemingly tell us that we do not need to difference anymore.
a.arima.austourists <- auto.arima(austourists)
summary(a.arima.austourists)
## Series: austourists
## ARIMA(1,0,0)(1,1,0)[4] with drift
##
## Coefficients:
## ar1 sar1 drift
## 0.4705 -0.5305 0.5489
## s.e. 0.1154 0.1122 0.0864
##
## sigma^2 estimated as 5.15: log likelihood=-142.48
## AIC=292.97 AICc=293.65 BIC=301.6
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02200144 2.149384 1.620917 -0.7072593 4.388288 0.5378929
## ACF1
## Training set -0.06393238
checkresiduals(a.arima.austourists)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(1,1,0)[4] with drift
## Q* = 4.0937, df = 5, p-value = 0.536
##
## Model df: 3. Total lags used: 8
The auto.arima did not give me the same model. My p,d,q values were different. I would choose the auto.arima model as the ACF plot is seemingly better than the one i was able to construct with my model
The ARIMA(1,0,0)(1,1,0)[4] model could be written in backshift as (1 - phi * B)(1 - phi * B)^4 y(t) = (1 + THETA * B)^4 e(t)
ma.usmelec <- ma(usmelec, order = 12)
autoplot(usmelec) + autolayer(ma.usmelec, series = "12 month moving average")
## Warning: Removed 12 row(s) containing missing values (geom_path).
bc.usmelec <- BoxCox.lambda(usmelec)
print(bc.usmelec)
## [1] -0.5738331
ggtsdisplay(usmelec)
ndiffs(usmelec)
## [1] 1
Data is non-stationary. Will need to difference and also account for monthly seasonality
usmelec.tr <- usmelec %>% BoxCox(bc.usmelec) %>% diff(12) %>% diff(1)
ggtsdisplay(usmelec.tr)
diff.usmelec <- diff(diff(usmelec, 12), differences = 1)
ggtsdisplay(diff.usmelec)
Both transformed and differenced data look pretty similar. Its not great but its better than when it started
a.arima.usmelec1 <- auto.arima(usmelec.tr)
summary(a.arima.usmelec1)
## Series: usmelec.tr
## ARIMA(1,0,1)(2,0,1)[12] with non-zero mean
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1 mean
## 0.3994 -0.8398 0.0434 -0.0947 -0.8560 0e+00
## s.e. 0.0630 0.0374 0.0586 0.0550 0.0362 1e-04
##
## sigma^2 estimated as 1.194e-06: log likelihood=2548.5
## AIC=-5083 AICc=-5082.76 BIC=-5053.89
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 1.677983e-05 0.001085716 0.0008355439 29.61005 152.7566 0.4004158
## ACF1
## Training set 0.008433402
a.arima.usmelec2 <- auto.arima(diff.usmelec)
summary(a.arima.usmelec2)
## Series: diff.usmelec
## ARIMA(1,0,1)(0,0,1)[12] with zero mean
##
## Coefficients:
## ar1 ma1 sma1
## 0.423 -0.8870 -0.7232
## s.e. 0.059 0.0318 0.0285
##
## sigma^2 estimated as 57.85: log likelihood=-1634.35
## AIC=3276.71 AICc=3276.79 BIC=3293.34
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1240643 7.58155 5.738308 12.75308 170.0034 0.4177248
## ACF1
## Training set 0.009545742
When letting auto.arima() select between the differenced and differenced/transformed data there is a significant increase in model accuracy based on AICc value.
checkresiduals(a.arima.usmelec1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1)(2,0,1)[12] with non-zero mean
## Q* = 32.631, df = 18, p-value = 0.01849
##
## Model df: 6. Total lags used: 24
We can see here that the checkresiduals function indicates that our residuals do not indicate white noise. However, the plots look alright. I will try the other residual plot to see how it compares. (Note that it has a higher AICc value)
checkresiduals(a.arima.usmelec2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1)(0,0,1)[12] with zero mean
## Q* = 46.59, df = 21, p-value = 0.001066
##
## Model df: 3. Total lags used: 24
The arima model with the higher AICc value had an even more significant p-value on the ljung box test. By that relation, I would believe that most models with a higher AICc value would not be able to fit our model for white noise better.
There is no way that this is accurate? 15 years is quite far out the only way that maybe it would work is with a damped model but we will have to see what we get
fc.a.arima.usmelec1 <- forecast(a.arima.usmelec1, h = 180)
bt.fc.a.arima.usmelec1 <- as.data.frame(fc.a.arima.usmelec1)
autoplot(usmelec.tr) + autolayer(fc.a.arima.usmelec1, series = "15 year forecast")
Well, my predictions above did not work according to how they should have been. However, I would believe by reading the documentation of forecast that predictions more than twice the frequency of the time series data would start to become useless.
autoplot(mcopper)
This could use a boxcox transformation
print(bc.mcopper)
## [1] 0.1919047
a.arima.mcopper <- auto.arima(mcopper, lambda = bc.mcopper)
summary(a.arima.mcopper)
## Series: mcopper
## ARIMA(0,1,1)
## Box Cox transformation: lambda= 0.1919047
##
## Coefficients:
## ma1
## 0.3720
## s.e. 0.0388
##
## sigma^2 estimated as 0.04997: log likelihood=45.05
## AIC=-86.1 AICc=-86.08 BIC=-77.43
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 3.480533 77.27254 44.92858 0.166202 4.303677 0.2021433 -0.08442198
We recieve an arima model of (0,1,1).
As mcopper doesnt seem seasonal, we can use ndiffs() only
ndiffs(mcopper)
## [1] 1
autoplot(mcopper^bc.mcopper)
diff.mcopper <- diff(mcopper^bc.mcopper, 1)
ggtsdisplay(diff.mcopper)
diff.mcopper %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 0.0573
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
I will choose an arima model of (1,1,1) based off of PACF and ACF.
mcopper.111 <- arima(mcopper, order = c(1,1,1))
summary(mcopper.111)
##
## Call:
## arima(x = mcopper, order = c(1, 1, 1))
##
## Coefficients:
## ar1 ma1
## -0.1091 0.3948
## s.e. 0.2573 0.2453
##
## sigma^2 estimated as 6013: log likelihood = -3248.45, aic = 6502.9
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 4.439553 77.47704 45.25739 0.2286332 4.311034 0.9348025
## ACF1
## Training set 0.0007511998
d.choose what you think is the best model and check the residual diagnostics
We can try to compare the two models now
checkresiduals(a.arima.mcopper)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)
## Q* = 22.913, df = 23, p-value = 0.4659
##
## Model df: 1. Total lags used: 24
checkresiduals(mcopper.111)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)
## Q* = 29.543, df = 22, p-value = 0.13
##
## Model df: 2. Total lags used: 24
I think that both models performed decently. However, it is clear that the auto.arima model of (0,1,1) performed better than the (1,1,1) model. The auto.arima() model had less significant lags in the ACF plot, more normals shaped residuals, better error test statistics and a lower p-value on the Ljung Box test
fc.a.arima.mcopper <- forecast(a.arima.mcopper)
summary(fc.a.arima.mcopper)
##
## Forecast method: ARIMA(0,1,1)
##
## Model Information:
## Series: mcopper
## ARIMA(0,1,1)
## Box Cox transformation: lambda= 0.1919047
##
## Coefficients:
## ma1
## 0.3720
## s.e. 0.0388
##
## sigma^2 estimated as 0.04997: log likelihood=45.05
## AIC=-86.1 AICc=-86.08 BIC=-77.43
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 3.480533 77.27254 44.92858 0.166202 4.303677 0.2021433 -0.08442198
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2007 3387.143 3188.095 3596.115 3086.619 3710.882
## Feb 2007 3387.143 3054.898 3747.998 2889.994 3951.229
## Mar 2007 3387.143 2964.977 3856.608 2759.364 4125.610
## Apr 2007 3387.143 2893.279 3947.003 2656.458 4272.299
## May 2007 3387.143 2832.337 4026.647 2569.883 4402.686
## Jun 2007 3387.143 2778.707 4098.983 2494.387 4522.024
## Jul 2007 3387.143 2730.462 4165.940 2427.034 4633.250
## Aug 2007 3387.143 2686.396 4228.725 2365.987 4738.201
## Sep 2007 3387.143 2645.693 4288.152 2310.004 4838.118
## Oct 2007 3387.143 2607.770 4344.804 2258.202 4933.884
## Nov 2007 3387.143 2572.198 4399.111 2209.926 5026.155
## Dec 2007 3387.143 2538.644 4451.405 2164.672 5115.434
## Jan 2008 3387.143 2506.849 4501.947 2122.046 5202.116
## Feb 2008 3387.143 2476.603 4550.944 2081.731 5286.517
## Mar 2008 3387.143 2447.735 4598.569 2043.468 5368.896
## Apr 2008 3387.143 2420.103 4644.964 2007.042 5449.469
## May 2008 3387.143 2393.588 4690.247 1972.273 5528.415
## Jun 2008 3387.143 2368.088 4734.520 1939.008 5605.888
## Jul 2008 3387.143 2343.517 4777.871 1907.114 5682.019
## Aug 2008 3387.143 2319.799 4820.374 1876.478 5756.923
## Sep 2008 3387.143 2296.867 4862.096 1847.001 5830.699
## Oct 2008 3387.143 2274.664 4903.095 1818.596 5903.434
## Nov 2008 3387.143 2253.139 4943.421 1791.184 5975.204
## Dec 2008 3387.143 2232.247 4983.121 1764.699 6046.080
autoplot(fc.a.arima.mcopper)
Interestingly enough, I dont think the forecasts were particularly useful. It seems as if the ARIMA model couldnt distinguish a pattern and opted for a naive model.
ets.mcopper <- ets(mcopper)
fc.ets.mcopper <- forecast(ets.mcopper)
summary(fc.ets.mcopper)
##
## Forecast method: ETS(M,Ad,N)
##
## Model Information:
## ETS(M,Ad,N)
##
## Call:
## ets(y = mcopper)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.2141
## phi = 0.8
##
## Initial states:
## l = 265.0541
## b = -3.9142
##
## sigma: 0.0632
##
## AIC AICc BIC
## 8052.038 8052.189 8078.049
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 2.691483 78.87699 46.76047 0.1883633 4.503026 0.2103854 0.1052005
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2007 3362.057 3089.9095 3634.205 2945.84316 3778.271
## Feb 2007 3303.048 2886.5408 3719.555 2666.05529 3940.040
## Mar 2007 3255.840 2712.0515 3799.629 2424.18712 4087.493
## Apr 2007 3218.074 2556.4686 3879.680 2206.23587 4229.912
## May 2007 3187.861 2415.5047 3960.217 2006.64386 4369.078
## Jun 2007 3163.691 2286.5145 4040.867 1822.16554 4505.216
## Jul 2007 3144.354 2167.5706 4121.138 1650.49263 4638.216
## Aug 2007 3128.885 2057.1539 4200.616 1489.81367 4767.956
## Sep 2007 3116.510 1954.0218 4278.998 1338.63788 4894.382
## Oct 2007 3106.609 1857.1395 4356.079 1195.71014 5017.509
## Nov 2007 3098.689 1765.6373 4431.741 1059.96220 5137.416
## Dec 2007 3092.353 1678.7802 4505.926 930.47992 5254.226
## Jan 2008 3087.284 1595.9457 4578.622 806.47889 5368.089
## Feb 2008 3083.229 1516.6053 4649.852 687.28494 5479.173
## Mar 2008 3079.985 1440.3094 4719.660 572.31785 5587.651
## Apr 2008 3077.389 1366.6751 4788.103 461.07766 5693.701
## May 2008 3075.313 1295.3751 4855.251 353.13295 5797.493
## Jun 2008 3073.652 1226.1299 4921.174 248.11080 5899.193
## Jul 2008 3072.323 1158.6994 4985.947 145.68821 5998.958
## Aug 2008 3071.260 1092.8773 5049.643 45.58475 6096.935
## Sep 2008 3070.410 1028.4857 5112.334 -52.44361 6193.263
## Oct 2008 3069.729 965.3704 5174.088 -148.60993 6288.068
## Nov 2008 3069.185 903.3978 5234.972 -243.10074 6381.471
## Dec 2008 3068.750 842.4512 5295.048 -336.07992 6473.579
autoplot(fc.ets.mcopper)
I think the ets transformation looks more reasonable as it was able to capture the most recent data’s trend
I will choose auscafe since we have already been working with aus data
autoplot(auscafe)
Could use both differencing and power transformation
nsdiffs(auscafe)
## [1] 1
bc.auscafe <- BoxCox.lambda(auscafe)
print(bc.auscafe)
## [1] 0.109056
diff.auscafe <- diff(auscafe^bc.auscafe, 12)
autoplot(diff.auscafe)
dbl.diff.auscafe <- diff(diff.auscafe, 1)
autoplot(dbl.diff.auscafe)
ggtsdisplay(dbl.diff.auscafe)
As seen above, it seemed that by taking a BoxCox() transformation, seasonal difference and another difference we were able to obtain a white noise graph
First, we always start with the auto.arima
aa.auscafe <- auto.arima(auscafe, lambda = bc.auscafe)
summary(aa.auscafe)
## Series: auscafe
## ARIMA(1,0,1)(2,1,1)[12] with drift
## Box Cox transformation: lambda= 0.109056
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1 drift
## 0.9718 -0.3190 0.1270 -0.0527 -0.8423 0.0056
## s.e. 0.0131 0.0478 0.0649 0.0585 0.0431 0.0004
##
## sigma^2 estimated as 0.0005754: log likelihood=952.96
## AIC=-1891.92 AICc=-1891.65 BIC=-1863.74
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0008622186 0.03664217 0.02686113 0.01635299 1.79394 0.2570775
## ACF1
## Training set -0.00872097
I think because of the two large lags in both the ACF and PACF of our double differenced graph, the model I select will be (1,0,1)(2,1,2).
a.auscafe <- arima(auscafe, order = c(1,0,1), seasonal = c(2,1,2))
summary(a.auscafe)
##
## Call:
## arima(x = auscafe, order = c(1, 0, 1), seasonal = c(2, 1, 2))
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1 sma2
## 0.9891 -0.3691 0.9808 -0.1295 -1.6831 0.7959
## s.e. 0.0069 0.0455 0.0803 0.0812 0.0654 0.0490
##
## sigma^2 estimated as 0.001502: log likelihood = 749.82, aic = -1485.64
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.003036117 0.03820743 0.02792056 0.2228518 1.873709 0.3475727
## ACF1
## Training set -0.01674511
checkresiduals(a.auscafe)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1)(2,1,2)[12]
## Q* = 54.512, df = 18, p-value = 1.527e-05
##
## Model df: 6. Total lags used: 24
If I were to select my model based off of AIC value, than the auto.arima model with specification (1,0,1)(2,1,1) is the best model.
checkresiduals(aa.auscafe)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1)(2,1,1)[12] with drift
## Q* = 56.069, df = 18, p-value = 8.692e-06
##
## Model df: 6. Total lags used: 24
The residuals here have a clear pattern and the Ljung Box test would indicate that there is significant autocorrelation. Therefore, the arima model (1,0,1)(2,1,1) is not the best. Surprisingly, the model I chose had residuals that resembled white noise better than the auto.arima
I will use my own model of (1,0,1)(2,1,2)
fc.a.auscafe <- forecast(a.auscafe, h = 24)
summary(fc.a.auscafe)
##
## Forecast method: ARIMA(1,0,1)(2,1,2)[12]
##
## Model Information:
##
## Call:
## arima(x = auscafe, order = c(1, 0, 1), seasonal = c(2, 1, 2))
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1 sma2
## 0.9891 -0.3691 0.9808 -0.1295 -1.6831 0.7959
## s.e. 0.0069 0.0455 0.0803 0.0812 0.0654 0.0490
##
## sigma^2 estimated as 0.001502: log likelihood = 749.82, aic = -1485.64
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.003036117 0.03820743 0.02792056 0.2228518 1.873709 0.2672169
## ACF1
## Training set -0.01674511
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Oct 2017 3.791126 3.741454 3.840798 3.715160 3.867092
## Nov 2017 3.773986 3.715541 3.832430 3.684603 3.863368
## Dec 2017 4.119101 4.053194 4.185007 4.018305 4.219896
## Jan 2018 3.715501 3.643033 3.787969 3.604670 3.826331
## Feb 2018 3.381145 3.302787 3.459502 3.261307 3.500983
## Mar 2018 3.736766 3.653045 3.820486 3.608727 3.864805
## Apr 2018 3.662392 3.573739 3.751045 3.526809 3.797976
## May 2018 3.702740 3.609512 3.795968 3.560161 3.845320
## Jun 2018 3.607380 3.509884 3.704876 3.458273 3.756487
## Jul 2018 3.824932 3.723434 3.926430 3.669704 3.980160
## Aug 2018 3.862112 3.756845 3.967379 3.701120 4.023104
## Sep 2018 3.863592 3.754764 3.972420 3.697154 4.030030
## Oct 2018 3.924134 3.807445 4.040822 3.745674 4.102593
## Nov 2018 3.908343 3.786172 4.030514 3.721499 4.095187
## Dec 2018 4.257761 4.130455 4.385067 4.063063 4.452459
## Jan 2019 3.853904 3.721767 3.986042 3.651817 4.055991
## Feb 2019 3.504011 3.367312 3.640711 3.294947 3.713075
## Mar 2019 3.867038 3.726018 4.008059 3.651367 4.082710
## Apr 2019 3.790868 3.645745 3.935990 3.568921 4.012814
## May 2019 3.828141 3.679113 3.977168 3.600222 4.056059
## Jun 2019 3.727596 3.574844 3.880348 3.493982 3.961210
## Jul 2019 3.956621 3.800312 4.112931 3.717567 4.195676
## Aug 2019 3.994562 3.834849 4.154276 3.750301 4.238824
## Sep 2019 4.006758 3.843782 4.169734 3.757508 4.256008
autoplot(fc.a.auscafe) + xlim(2016, 2019)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Warning: Removed 405 row(s) containing missing values (geom_path).
ets.auscafe <- ets(auscafe)
fc.ets.auscafe <- forecast(ets.auscafe, h = 24)
summary(fc.ets.auscafe)
##
## Forecast method: ETS(M,A,M)
##
## Model Information:
## ETS(M,A,M)
##
## Call:
## ets(y = auscafe)
##
## Smoothing parameters:
## alpha = 0.6263
## beta = 0.0065
## gamma = 0.0755
##
## Initial states:
## l = 0.3477
## b = 0.0038
## s = 0.996 0.9357 1.0124 1.1498 1.0105 1.0106
## 0.9831 0.9911 0.9918 0.9509 0.9968 0.9714
##
## sigma: 0.0249
##
## AIC AICc BIC
## -319.1016 -317.6016 -250.1762
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.002601818 0.03997611 0.02952838 0.1121434 1.982609 0.2826048
## ACF1
## Training set 0.1249781
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Oct 2017 3.833084 3.710682 3.955486 3.645886 4.020282
## Nov 2017 3.813389 3.669399 3.957380 3.593176 4.033603
## Dec 2017 4.217663 4.036861 4.398464 3.941150 4.494175
## Jan 2018 3.826123 3.644415 4.007830 3.548225 4.104020
## Feb 2018 3.514610 3.332694 3.696526 3.236394 3.792827
## Mar 2018 3.870005 3.654231 4.085779 3.540007 4.200003
## Apr 2018 3.783810 3.558549 4.009072 3.439302 4.128318
## May 2018 3.811589 3.570956 4.052222 3.443573 4.179605
## Jun 2018 3.678277 3.433373 3.923180 3.303729 4.052825
## Jul 2018 3.875118 3.604236 4.146000 3.460840 4.289396
## Aug 2018 3.886458 3.602300 4.170615 3.451876 4.321039
## Sep 2018 3.854391 3.560568 4.148215 3.405027 4.303755
## Oct 2018 3.977564 3.659530 4.295598 3.491173 4.463955
## Nov 2018 3.956678 3.628713 4.284642 3.455100 4.458255
## Dec 2018 4.375647 4.000411 4.750883 3.801774 4.949521
## Jan 2019 3.968996 3.617487 4.320505 3.431410 4.506582
## Feb 2019 3.645445 3.312548 3.978342 3.136323 4.154567
## Mar 2019 4.013625 3.636243 4.391006 3.436469 4.590780
## Apr 2019 3.923799 3.544417 4.303181 3.343584 4.504014
## May 2019 3.952173 3.559686 4.344659 3.351916 4.552429
## Jun 2019 3.813529 3.424956 4.202101 3.219258 4.407799
## Jul 2019 4.017173 3.597609 4.436738 3.375505 4.658842
## Aug 2019 4.028495 3.597607 4.459384 3.369509 4.687482
## Sep 2019 3.994830 3.557605 4.432055 3.326152 4.663508
autoplot(auscafe) + autolayer(fc.a.auscafe, series = "ARIMA predictions", PI = FALSE) + autolayer(fc.ets.auscafe, series = "ETS predictions", PI = FALSE) + xlim(2016, 2019)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
stlf.auscafe <- stlf(auscafe, s.window = 13, lambda = bc.auscafe, robust = TRUE, method = "arima", h = 24)
summary(stlf.auscafe)
##
## Forecast method: STL + ARIMA(2,1,1) with drift
##
## Model Information:
## Series: x
## ARIMA(2,1,1) with drift
##
## Coefficients:
## ar1 ar2 ma1 drift
## -1.0831 -0.3835 0.7323 0.0056
## s.e. 0.1144 0.0464 0.1193 0.0007
##
## sigma^2 estimated as 0.0004465: log likelihood=1038.06
## AIC=-2066.12 AICc=-2065.97 BIC=-2045.86
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0008151595 0.03248321 0.02356182 -0.01628907 1.616348 0.2255012
## ACF1
## Training set -0.01144721
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Oct 2017 3.824049 3.735511 3.914452 3.689384 3.963075
## Nov 2017 3.807828 3.702880 3.915418 3.648374 3.973462
## Dec 2017 4.167131 4.038000 4.299928 3.971096 4.371741
## Jan 2018 3.748711 3.613285 3.888641 3.543374 3.964583
## Feb 2018 3.433847 3.298884 3.573709 3.229369 3.649783
## Mar 2018 3.800795 3.639629 3.968288 3.556802 4.059590
## Apr 2018 3.739425 3.570268 3.915687 3.483510 4.011958
## May 2018 3.788447 3.606846 3.978150 3.513887 4.081960
## Jun 2018 3.679796 3.493315 3.875094 3.398047 3.982172
## Jul 2018 3.904318 3.698334 4.120461 3.593262 4.239141
## Aug 2018 3.951270 3.733847 4.179901 3.623122 4.305640
## Sep 2018 3.943194 3.717508 4.181007 3.602759 4.311999
## Oct 2018 4.057762 3.817609 4.311285 3.695677 4.451123
## Nov 2018 4.034787 3.787632 4.296209 3.662336 4.440617
## Dec 2018 4.413511 4.137116 4.706241 3.997136 4.868100
## Jan 2019 3.974720 3.715449 4.249991 3.584393 4.402478
## Feb 2019 3.641024 3.394524 3.903346 3.270151 4.048916
## Mar 2019 4.029092 3.752010 4.324261 3.612316 4.488183
## Apr 2019 3.963550 3.683594 4.262307 3.542646 4.428444
## May 2019 4.015559 3.725513 4.325556 3.579658 4.498139
## Jun 2019 3.901051 3.611952 4.210590 3.466775 4.383150
## Jul 2019 4.137444 3.826071 4.471197 3.669843 4.657409
## Aug 2019 4.186996 3.865721 4.531847 3.704703 4.724454
## Sep 2019 4.178402 3.851309 4.530020 3.687564 4.726625
autoplot(stlf.auscafe) + xlim(2016,2019)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Warning: Removed 405 row(s) containing missing values (geom_path).
It seems as if all 3 models were able to capture the seasonality and trend very well. If I had to choose one, I would most likely go with the non stl arima model since it gave us the tightest prediction intervals
autoplot(tsretail)
Both seasonality and boxcox transformation necessary
nsdiffs(tsretail)
## [1] 1
diff.tsretail <- diff(tsretail^bc.tsretail,12)
autoplot(diff.tsretail)
dbl.diff.tsretail <- diff(diff.tsretail, 1)
autoplot(dbl.diff.tsretail)
ggtsdisplay(dbl.diff.tsretail)
I will let auto.arima select for me.
aa.tsretail <- auto.arima(tsretail, lambda = bc.tsretail)
summary(aa.tsretail)
## Series: tsretail
## ARIMA(1,0,2)(2,1,2)[12] with drift
## Box Cox transformation: lambda= 0.02481794
##
## Coefficients:
## ar1 ma1 ma2 sar1 sar2 sma1 sma2 drift
## 0.9764 -0.4646 -0.1315 -0.0748 0.0328 -0.6570 -0.1160 0.0044
## s.e. 0.0146 0.0564 0.0578 0.8512 0.0862 0.8494 0.6711 0.0010
##
## sigma^2 estimated as 0.003736: log likelihood=506.46
## AIC=-994.93 AICc=-994.43 BIC=-959.73
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.09968008 13.58123 10.03156 -0.1383827 4.008321 0.507812
## ACF1
## Training set -0.01734961
Seems good, I would also have selected the (2,1,2) for the seasonal portion due to the ACF and PACF plots so I will stick with this. AIC and AICc values seem decent and training error measures look fine (dont quote me on this I may be wrong)
Checking residuals
checkresiduals(aa.tsretail)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,2)(2,1,2)[12] with drift
## Q* = 44.239, df = 16, p-value = 0.0001814
##
## Model df: 8. Total lags used: 24
fc.aa.tsretail <- forecast(aa.tsretail, h = 36)
summary(fc.aa.tsretail)
##
## Forecast method: ARIMA(1,0,2)(2,1,2)[12] with drift
##
## Model Information:
## Series: tsretail
## ARIMA(1,0,2)(2,1,2)[12] with drift
## Box Cox transformation: lambda= 0.02481794
##
## Coefficients:
## ar1 ma1 ma2 sar1 sar2 sma1 sma2 drift
## 0.9764 -0.4646 -0.1315 -0.0748 0.0328 -0.6570 -0.1160 0.0044
## s.e. 0.0146 0.0564 0.0578 0.8512 0.0862 0.8494 0.6711 0.0010
##
## sigma^2 estimated as 0.003736: log likelihood=506.46
## AIC=-994.93 AICc=-994.43 BIC=-959.73
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.09968008 13.58123 10.03156 -0.1383827 4.008321 0.507812
## ACF1
## Training set -0.01734961
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 434.8374 406.4862 465.1137 392.2161 481.9633
## Feb 2014 375.1611 347.6968 404.7369 333.9621 421.3015
## Mar 2014 404.7567 373.6910 438.3357 358.1999 457.1957
## Apr 2014 388.9923 357.8114 422.8172 342.3061 441.8671
## May 2014 424.5109 389.2664 462.8605 371.7814 484.5083
## Jun 2014 466.6444 426.6951 510.2328 406.9180 534.8891
## Jul 2014 443.2055 404.0843 486.0115 384.7596 510.2772
## Aug 2014 434.7032 395.2891 477.9404 375.8582 502.4981
## Sep 2014 419.3598 380.3871 462.2166 361.2099 486.6029
## Oct 2014 430.8155 389.9289 475.8721 369.8429 501.5512
## Nov 2014 472.3130 426.6946 522.6751 404.3153 551.4171
## Dec 2014 700.4016 632.1584 775.8099 598.7036 818.8759
## Jan 2015 457.1131 409.4053 510.2268 386.1530 540.7325
## Feb 2015 394.1759 351.6263 441.7316 330.9505 469.1254
## Mar 2015 425.5723 378.6619 478.1328 355.9111 508.4664
## Apr 2015 409.4092 363.2963 461.2122 340.9779 491.1681
## May 2015 446.9067 395.7166 504.5340 370.9810 537.9106
## Jun 2015 490.7574 433.6889 555.1256 406.1538 592.4602
## Jul 2015 465.7676 410.6700 528.0502 384.1314 564.2354
## Aug 2015 457.0819 402.1896 519.2558 375.7905 555.4320
## Sep 2015 441.1062 387.3657 502.0927 361.5592 537.6292
## Oct 2015 453.9049 397.9443 517.5132 371.1053 554.6226
## Nov 2015 497.0774 435.2073 567.4950 405.5642 608.6175
## Dec 2015 736.4093 644.5943 840.9333 600.6121 901.9840
## Jan 2016 480.6395 418.1288 552.2304 388.3233 594.2344
## Feb 2016 414.0465 359.0271 477.2574 332.8579 514.4328
## Mar 2016 447.6423 387.4237 516.9539 358.8225 557.7738
## Apr 2016 430.6693 371.9219 498.4311 344.0655 538.4018
## May 2016 470.5112 405.6817 545.4042 374.9783 589.6329
## Jun 2016 516.2587 444.4771 599.3005 410.5187 648.3938
## Jul 2016 489.8815 420.9695 569.7505 388.4153 617.0333
## Aug 2016 480.7597 412.4401 560.0707 380.2066 607.0806
## Sep 2016 464.1605 397.5444 541.6180 366.1537 587.5844
## Oct 2016 477.9132 408.7851 558.3942 376.2431 606.2007
## Nov 2016 523.1131 447.0017 611.8100 411.1993 664.5352
## Dec 2016 774.2016 661.7328 905.2341 608.8174 983.1103
autoplot(fc.aa.tsretail)
The only direct comparison that can be done right now is between the holt and holt damped models.
summary(holt.retail)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = tsretail, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.4297
## beta = 0.0067
## gamma = 0.1849
##
## Initial states:
## l = 103.7415
## b = 0.5713
## s = 0.9592 0.867 0.9124 1.4858 1.0326 0.9468
## 0.9277 0.9808 0.9907 0.9776 1.0141 0.9053
##
## sigma: 0.0544
##
## AIC AICc BIC
## 4249.805 4251.491 4316.832
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.03233263 13.58846 9.970385 -0.1710066 4.010066 0.5047153
## ACF1
## Training set 0.04424191
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 424.9538 395.3247 454.5830 379.6399 470.2677
## Feb 2014 365.4465 337.6432 393.2498 322.9250 407.9680
## Mar 2014 395.6692 363.1791 428.1593 345.9799 445.3585
## Apr 2014 381.2116 347.7055 414.7178 329.9684 432.4549
## May 2014 417.4747 378.4541 456.4952 357.7979 477.1514
## Jun 2014 457.9087 412.6335 503.1838 388.6663 527.1511
## Jul 2014 429.6804 384.9339 474.4269 361.2466 498.1142
## Aug 2014 418.7774 373.0110 464.5438 348.7838 488.7710
## Sep 2014 402.4656 356.4511 448.4801 332.0924 472.8387
## Oct 2014 413.4669 364.1440 462.7899 338.0340 488.8998
## Nov 2014 453.4647 397.1553 509.7742 367.3469 539.5826
## Dec 2014 667.8661 581.7138 754.0183 536.1076 799.6245
## Jan 2015 428.9451 369.8583 488.0319 338.5797 519.3105
## Feb 2015 368.8764 316.3727 421.3800 288.5789 449.1738
## Mar 2015 399.3800 340.7168 458.0431 309.6624 489.0975
## Apr 2015 384.7842 326.5248 443.0435 295.6842 473.8842
## May 2015 421.3842 355.6885 487.0799 320.9113 521.8571
## Jun 2015 462.1937 388.0676 536.3198 348.8275 575.5598
## Jul 2015 433.6983 362.2100 505.1866 324.3664 543.0301
## Aug 2015 422.6905 351.1406 494.2404 313.2644 532.1166
## Sep 2015 406.2235 335.6638 476.7831 298.3118 514.1352
## Oct 2015 417.3247 342.9957 491.6538 303.6482 531.0013
## Nov 2015 457.6926 374.1591 541.2261 329.9392 585.4461
## Dec 2015 674.0884 548.0993 800.0776 481.4046 866.7722
summary(holt.retail.damp)
##
## Forecast method: Damped Holt-Winters' multiplicative method
##
## Model Information:
## Damped Holt-Winters' multiplicative method
##
## Call:
## hw(y = tsretail, seasonal = "multiplicative", damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.421
## beta = 0.0124
## gamma = 0.1865
## phi = 0.98
##
## Initial states:
## l = 102.6435
## b = 0.7816
## s = 0.9613 0.865 0.9192 1.4879 1.0274 0.9526
## 0.9316 0.9684 0.9981 0.9838 1.0087 0.8961
##
## sigma: 0.055
##
## AIC AICc BIC
## 4256.462 4258.351 4327.432
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.8180411 13.62155 9.970526 0.1068599 4.006783 0.5047225
## ACF1
## Training set 0.04436687
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 423.3278 393.5073 453.1484 377.7212 468.9344
## Feb 2014 363.4773 335.5622 391.3925 320.7848 406.1698
## Mar 2014 392.9316 360.3176 425.5456 343.0528 442.8104
## Apr 2014 378.0210 344.3457 411.6963 326.5190 429.5229
## May 2014 413.4527 374.1458 452.7595 353.3380 473.5673
## Jun 2014 453.0615 407.3111 498.8119 383.0923 523.0307
## Jul 2014 424.6531 379.2876 470.0186 355.2725 494.0336
## Aug 2014 413.3171 366.7662 459.8680 342.1237 484.5106
## Sep 2014 396.6597 349.7019 443.6175 324.8439 468.4754
## Oct 2014 406.8195 356.3321 457.3069 329.6057 484.0334
## Nov 2014 445.6304 387.7913 503.4695 357.1731 534.0877
## Dec 2014 655.6345 566.8262 744.4428 519.8139 791.4551
## Jan 2015 420.1640 359.0975 481.2305 326.7708 513.5571
## Feb 2015 360.8154 306.4037 415.2271 277.5999 444.0309
## Mar 2015 390.1118 329.1539 451.0698 296.8847 483.3390
## Apr 2015 375.3629 314.6641 436.0617 282.5321 468.1937
## May 2015 410.6042 341.9716 479.2368 305.6397 515.5687
## Jun 2015 450.0032 372.3382 527.6682 331.2248 568.7816
## Jul 2015 421.8446 346.7481 496.9412 306.9944 536.6949
## Aug 2015 410.6391 335.3093 485.9689 295.4321 525.8461
## Sep 2015 394.1418 319.7021 468.5814 280.2961 507.9874
## Oct 2015 404.2897 325.7447 482.8347 284.1655 524.4140
## Nov 2015 442.9157 354.4714 531.3601 307.6518 578.1797
## Dec 2015 651.7220 518.0618 785.3822 447.3064 856.1377
We could note that our arima model predictions are higher than both the holt retail predictions by about 10 units each.
Created another arima model with the training set to compare to test set
aa.tsretail.train <- auto.arima(tsretail.train, lambda = bc.tsretail)
fc.aa.tsretail.train <- forecast(aa.tsretail.train, h = 36)
summary(fc.aa.tsretail.train)
##
## Forecast method: ARIMA(1,0,2)(2,1,2)[12] with drift
##
## Model Information:
## Series: tsretail.train
## ARIMA(1,0,2)(2,1,2)[12] with drift
## Box Cox transformation: lambda= 0.02481794
##
## Coefficients:
## ar1 ma1 ma2 sar1 sar2 sma1 sma2 drift
## 0.96 -0.4656 -0.1390 -0.2132 0.0310 -0.5225 -0.2199 0.0052
## s.e. 0.02 0.0608 0.0621 1.3403 0.0852 1.3394 1.0495 0.0007
##
## sigma^2 estimated as 0.003914: log likelihood=449.14
## AIC=-880.29 AICc=-879.73 BIC=-846.01
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2643411 13.27723 9.811702 -0.1423951 4.122531 0.503203
## ACF1
## Training set -0.04820509
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2011 492.8142 460.0494 527.8507 443.5748 547.3693
## Feb 2011 415.8596 385.0095 449.1156 369.5938 467.7559
## Mar 2011 449.7190 415.0019 487.2625 397.6963 508.3567
## Apr 2011 439.9298 404.7260 478.1134 387.2182 499.6157
## May 2011 477.0289 437.7528 519.7338 418.2568 543.8269
## Jun 2011 521.0062 477.0541 568.8982 455.2731 595.9617
## Jul 2011 492.4080 449.8417 538.8930 428.7835 565.2055
## Aug 2011 483.8018 441.1038 530.5207 420.0120 557.0043
## Sep 2011 461.4125 419.9059 506.9104 399.4314 532.7373
## Oct 2011 487.3067 442.8230 536.1373 420.9041 563.8857
## Nov 2011 526.1307 477.5024 579.5762 453.5639 609.9752
## Dec 2011 772.0259 700.4074 850.7679 665.1613 895.5676
## Jan 2012 522.9580 471.3688 580.0385 446.1039 612.6696
## Feb 2012 441.6789 396.8669 491.4116 374.9726 519.9075
## Mar 2012 476.9549 427.8331 531.5611 403.8639 562.8886
## Apr 2012 466.1353 417.3782 520.4309 393.6191 551.6213
## May 2012 505.3445 451.8764 564.9643 425.8481 599.2471
## Jun 2012 550.8616 491.9942 616.5777 463.3631 654.3989
## Jul 2012 521.3073 464.9217 584.3418 437.5276 620.6585
## Aug 2012 512.1098 456.1551 574.7378 428.9957 610.8529
## Sep 2012 489.3031 435.3117 549.8049 409.1292 584.7251
## Oct 2012 516.4785 459.1070 580.8200 431.3028 617.9790
## Nov 2012 557.5292 495.2620 627.4071 465.1006 667.7835
## Dec 2012 818.2591 727.0960 920.5344 682.9275 979.6171
## Jan 2013 554.0844 490.0350 626.2714 459.1134 668.1172
## Feb 2013 468.4102 413.3255 530.6305 386.7774 566.7587
## Mar 2013 505.6291 445.7115 573.3757 416.8564 612.7422
## Apr 2013 493.6790 434.6443 560.5069 406.2407 599.3744
## May 2013 535.1555 470.7870 608.0778 439.8355 650.5146
## Jun 2013 583.0972 512.6111 663.0031 478.7356 709.5272
## Jul 2013 551.9954 484.7582 628.2964 452.4698 672.7558
## Aug 2013 542.0230 475.5900 617.4745 443.7085 661.4668
## Sep 2013 518.0803 454.1800 590.7177 423.5345 633.0965
## Oct 2013 546.7821 479.1028 623.7526 446.6571 668.6759
## Nov 2013 590.3794 517.1143 673.7320 482.0005 722.3933
## Dec 2013 865.6803 758.7767 987.2211 707.5141 1058.1406
accuracy(fc.aa.tsretail.train, tsretail.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2643411 13.27723 9.811702 -0.1423951 4.122531 0.503203
## Test set -92.8516478 99.72352 92.851648 -21.2208020 21.220802 4.761990
## ACF1 Theil's U
## Training set -0.04820509 NA
## Test set 0.71273848 1.283224
Accuracy looks decent. Lets plot with comparison of actual values
autoplot(tsretail.train) + autolayer(fc.aa.tsretail.train, series = "ARIMA predictions") + autolayer(tsretail.test, series = "True values") + xlim(2009, 2014)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
Our ARIMA predictions overshot by a bit unfortunately.
note: we chose A3349336V for our analysis
retaildata.updated <- read.csv("C:\\Users\\wanga\\OneDrive\\Desktop\\PREDICTIVE ANALYTICS\\Book1.csv")
tsretail.updated <- ts(retaildata.updated$A3349336V, frequency=12, start=c(1982,4))
autoplot(tsretail.updated) + autolayer(fc.aa.tsretail, series = "ARIMA predictions", PI = FALSE) + xlim(2013, 2017)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
Our ARIMA model did okay when compared with the actual values. I would say that it is usable but there is definitely a way that it could have been better. The actual values were captured in the intervals but not point predctions
autoplot(sheep)
I believe that this should be an ARIMA(3,1,0). Three phi’s, one difference in the form of y(t - 1) and no MA(q) terms leaving only an e(t)
ggtsdisplay(sheep)
If we follow what the book says, we would have an arima(p,d,0) model if: the ACF exponentially decreases or is sinusoidal OR there is a significant lag spike in PACF at point p but none beyond there
We let p = 3 in this case and see that there are 3 significant lags and none past that so p = 3 checks out. Similarly, the ACF plot looks relatively sinusoidal? However, it does not oscillate to the negative lag section. For the d, if we check our ndiffs() function for sheep it gives us 1. Therefore, arima(3,1,0) checks out
ndiffs(sheep)
## [1] 1
If we follow the above formula, than the values are as follows:
sheep.1940 <- 1797 + 0.42*(1797 - 1791) - 0.2*(1791 - 1627) - 0.3*(1627 - 1665)
sheep.1941 <- sheep.1940 + 0.42*(sheep.1940 - 1797) -0.20*(1797 - 1791) - 0.30*(1791 - 1627)
sheep.1942 <- sheep.1941 + 0.42*(sheep.1941 - sheep.1940) -0.20*(sheep.1940 - 1797) - 0.30*(1797 - 1791)
sheep.1940
## [1] 1778.12
sheep.1941
## [1] 1719.79
sheep.1942
## [1] 1697.268
a.sheep <- arima(sheep, order = c(3,1,0))
fc.a.sheep <- forecast(a.sheep, h = 3)
summary(fc.a.sheep)
##
## Forecast method: ARIMA(3,1,0)
##
## Model Information:
##
## Call:
## arima(x = sheep, order = c(3, 1, 0))
##
## Coefficients:
## ar1 ar2 ar3
## 0.4210 -0.2018 -0.3044
## s.e. 0.1193 0.1363 0.1243
##
## sigma^2 estimated as 4783: log likelihood = -407.56, aic = 823.12
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -6.566385 68.68715 53.43974 -0.4246247 2.977279 0.7962876
## ACF1
## Training set -0.06149874
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1940 1777.996 1689.361 1866.630 1642.441 1913.551
## 1941 1718.869 1564.857 1872.880 1483.329 1954.408
## 1942 1695.985 1498.402 1893.568 1393.808 1998.162
The values are almost exactly the same. The only reason I could think of on why they would be different is the white noise term e(t) that is unobserved when we calculate by hand
autoplot(bicoal)
This looks like an ARIMA(4,0,0) model
ggtsdisplay(bicoal)
The arima(p,d,0) hold because ACF is clearly sinusoidal and PACF has some significant lags for the first few and none after. Howver, I cannot seem to understand why p = 4 as the first four lags in PACF are not significant. Only 1 and 4 are significant. Maybe the p value is determined by the last PACF lag that is significant?
The formula is: y(t) = c + phi1 * y(t - 1) + phi2 * y(t - 2) + phi3 * y(t - 3) + phi4 * y(t - 4) + e(t)
c = 162.00
phi1 = 0.83
phi2 = -0.34
phi3 = 0.55
phi4 = -0.38
bicoal.1969 <- c + phi1 * 545 + phi2 * 552 + phi3 * 534 + phi4 * 512
bicoal.1970 <- c + phi1 * bicoal.1969 + phi2 * 545 + phi3 * 552 + phi4 * 534
bicoal.1971 <- c + phi1 * bicoal.1970 + phi2 * bicoal.1969 + phi3 * 545 + phi4 * 552
bicoal.1969
## [1] 525.81
bicoal.1970
## [1] 513.8023
bicoal.1971
## [1] 499.6705
arima.bicoal <- arima(bicoal, order = c(4,0,0))
fc.arima.bicoal <- forecast(arima.bicoal, h = 3)
summary(fc.arima.bicoal)
##
## Forecast method: ARIMA(4,0,0) with non-zero mean
##
## Model Information:
##
## Call:
## arima(x = bicoal, order = c(4, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 ar4 intercept
## 0.8334 -0.3443 0.5525 -0.3780 481.5221
## s.e. 0.1366 0.1752 0.1733 0.1414 21.0591
##
## sigma^2 estimated as 2509: log likelihood = -262.05, aic = 536.1
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.9191614 50.09487 36.28915 -1.426663 8.003388 0.7797132
## ACF1
## Training set 0.02170264
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1969 527.6291 463.4300 591.8283 429.4450 625.8133
## 1970 517.1923 433.6220 600.7626 389.3826 645.0021
## 1971 503.8051 417.2635 590.3466 371.4512 636.1590
Perhaps it is the unobserved error which throws our values off a little bit. Alternatively, if we are certain that the arima model in R is unsing the same formula as us above, then the difference could be due to rounding errors or different coefficients for the phi’s
library(Quandl)
## Warning: package 'Quandl' was built under R version 4.0.5
## Loading required package: xts
## Warning: package 'xts' was built under R version 4.0.4
Quandl.data.COP <- Quandl("EIA/PET_RWTC_D", api_key="gbkCBbuAsc5hW2DcSjrF")
head(Quandl.data.COP)
## Date Value
## 1 2021-04-19 63.33
## 2 2021-04-16 63.16
## 3 2021-04-15 63.42
## 4 2021-04-14 63.15
## 5 2021-04-13 60.20
## 6 2021-04-12 59.70
first need to reverse data
Quandl.data.COP$Date <- rev(Quandl.data.COP$Date)
Quandl.data.COP$Value <- rev(Quandl.data.COP$Value)
head(Quandl.data.COP)
## Date Value
## 1 1986-01-02 25.56
## 2 1986-01-03 26.00
## 3 1986-01-06 26.53
## 4 1986-01-07 25.85
## 5 1986-01-08 25.87
## 6 1986-01-09 26.03
time series
ts.COP <- ts(Quandl.data.COP$Value, frequency = 365, start = c(1986, 2))
autoplot(ts.COP)
ggtsdisplay(ts.COP)
ndiffs(ts.COP)
## [1] 1
diff.ts.COP <- diff(ts.COP, 1)
ggtsdisplay(diff.ts.COP)
diff.ts.COP %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 12 lags.
##
## Value of test-statistic is: 0.0437
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Differencing once seems to make it stationary based on kpss() test
what if we try auto.arima(). What happens?
aa.COP <- auto.arima(ts.COP)
summary(aa.COP)
## Series: ts.COP
## ARIMA(0,1,2)
##
## Coefficients:
## ma1 ma2
## -0.1701 -0.0238
## s.e. 0.0106 0.0105
##
## sigma^2 estimated as 1.889: log likelihood=-15481.33
## AIC=30968.66 AICc=30968.66 BIC=30989.94
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.005252856 1.374028 0.7495 0.04585652 1.866166 0.05760728
## ACF1
## Training set -0.0001926594
checkresiduals(aa.COP)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,2)
## Q* = 953.61, df = 728, p-value = 3.151e-08
##
## Model df: 2. Total lags used: 730
Lots of significant lag values in the ACF and long tailed residual histograms. Similarly, the Ljung Box test resulted in a very small p-value. All of this indicates that our data is not white noise and there is still autocorrelation going on
fc.aa.COP <- forecast(aa.COP, h = 1200)
autoplot(fc.aa.COP)
These results were similar to one of our earlier questions forecasts. ARIMA was not able to discern any sort of pattern so it went with a naive model
ets.COP <- ets(ts.COP)
## Warning in ets(ts.COP): I can't handle data with frequency greater than 24.
## Seasonality will be ignored. Try stlf() if you need seasonal forecasts.
summary(ets.COP)
## ETS(A,Ad,N)
##
## Call:
## ets(y = ts.COP)
##
## Smoothing parameters:
## alpha = 0.8179
## beta = 0.0049
## phi = 0.9722
##
## Initial states:
## l = 26.8288
## b = -0.4839
##
## sigma: 1.3743
##
## AIC AICc BIC
## 86761.80 86761.81 86804.37
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.006161016 1.37388 0.748335 0.0621108 1.864268 0.05751773
## ACF1
## Training set 0.0057575
Seasonality was ignored due to high frequency.ETS mode returned with specifications “AAN”
checkresiduals(ets.COP)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,N)
## Q* = 950.19, df = 725, p-value = 3.148e-08
##
## Model df: 5. Total lags used: 730
ETS residuals are almost identical to the ARIMA model. Even the p-value in both box tests are nearly the same.
fc.ets.COP <- forecast(ets.COP, h = 1200)
autoplot(fc.ets.COP)
Both forecasts look essentially the same so I would say either one would work. I am not saying that either one is satisfactory as I believe that engineering our data a bit more would probably yield better results. Additionally, it seems as if the lower bound on both go below zero which doesnt make any practical sense.
Perhaps moving averages would help R understand our data better or even different bounded models. Such work will be left for another time