require(tidyverse)
require(readxl)
require(fpp2)
require(plotly)
require(kableExtra)
require(DataExplorer)
require(GGally)
require(ggplot2)
require(ggpubr)
require(smooth)Part A – ATM Forecast, ATM624Data.xlsx
In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
Quick look at the Data
## Rows: 1,474
## Columns: 3
## $ DATE <dbl> 39934, 39934, 39935, 39935, 39936, 39936, 39937, 39937, 39938, 39~
## $ ATM <chr> "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "~
## $ Cash <dbl> 96, 107, 82, 89, 85, 90, 90, 55, 99, 79, 88, 19, 8, 2, 104, 103, ~
There are 1,474 rows and 3 columns.
Data Preparation
Missing Data
By looking at this you can see that ATM and cash have missing values, the missing values can be removed with using na.omit another option would have been imputation but there are not that many missing numbers.
Convert the Date
## DATE ATM Cash
## Min. :2009-05-01 Length:1455 Min. : 0.0
## 1st Qu.:2009-08-01 Class :character 1st Qu.: 0.5
## Median :2009-10-31 Mode :character Median : 73.0
## Mean :2009-10-30 Mean : 155.6
## 3rd Qu.:2010-01-29 3rd Qu.: 114.0
## Max. :2010-04-30 Max. :10919.8
Split the Data
ATM1<-atm_data %>% filter(ATM=="ATM1")
ATM2<-atm_data %>% filter(ATM=="ATM2")
ATM3<-atm_data %>% filter(ATM=="ATM3")
ATM4<-atm_data %>% filter(ATM=="ATM4")# specify the graphs to be combined
a<-ggplot(ATM1, aes(x = DATE, y = Cash, col = ATM)) +
geom_line(stat="identity") +
ggtitle("ATM1")
b<-ggplot(ATM2, aes(x = DATE, y = Cash, col = ATM)) +
geom_line(stat="identity")+
ggtitle("ATM2")
c<-ggplot(ATM3, aes(x = DATE, y = Cash, col = ATM)) +
geom_line(stat="identity")+
ggtitle("ATM3")
d<-ggplot(ATM4, aes(x = DATE, y = Cash, col = ATM)) +
geom_line(stat="identity")+
ggtitle("ATM4")
figure <- ggarrange(a, b, c, d,
ncol = 1, nrow = 4)
figureLooking at this you can see that ATM1 and ATM2 almost look the same, while ATM3 seems to have been offline until sometime after April 2010. ATM4 seems to have only one big spike, but it still seems to be a seasonal plot.
Models
Now we are going to render different models for ATM1,ATM2, ATM3, and ATM4. For ATM1,ATM2, and ATM4 we will look at the Simple Exponential Smoothing, HOLT method and ARIMA to see which one gives us the best outcome as far as the AIC and the RMSE. For ATM3 we will use some simple time series forecasting methods, like the mean method, the naive method and the snaive method. We will choose the model with the best RMSE.
ATM1
Simple Exponential Smoothing
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 351.1, df = 12, p-value < 2.2e-16
##
## Model df: 2. Total lags used: 14
Looking at the residuals it seems like it is somewahty skewd.
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = ATM1_ts)
##
## Smoothing parameters:
## alpha = 1e-04
##
## Initial states:
## l = 83.7797
##
## sigma: 36.7094
##
## AIC AICc BIC
## 4745.365 4745.432 4757.040
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.07802025 36.60783 27.31876 -175.78 199.2779 1.451603 0.06365945
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 52.71429 83.78256 36.7376 130.8275 11.83350 155.7316
## 52.85714 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.00000 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.14286 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.28571 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.42857 83.78256 36.7376 130.8275 11.83349 155.7316
## 53.57143 83.78256 36.7376 130.8275 11.83349 155.7316
## 53.71429 83.78256 36.7376 130.8275 11.83349 155.7316
## 53.85714 83.78256 36.7376 130.8275 11.83349 155.7316
## 54.00000 83.78256 36.7376 130.8275 11.83349 155.7316
Looking at this we can see that the AIC is 4745.365, the BIC is 4757.040 and the RMSE is 36.60783.
HOLT
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 16.871, df = 3, p-value = 0.0007513
##
## Model df: 11. Total lags used: 14
The residuals look almost normally distributed
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = ATM1_ts)
##
## Smoothing parameters:
## alpha = 0.0041
## beta = 1e-04
## gamma = 0.4349
##
## Initial states:
## l = 98.7123
## b = -0.0333
## s = 9.0364 12.5733 14.6333 -42.341 -0.675 6.2016
## 0.5715
##
## sigma: 27.1149
##
## AIC AICc BIC
## 4534.864 4535.758 4581.564
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.3248614 26.69971 17.28292 -113.4744 129.995 0.9183413 0.1421671
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 52.71429 86.159205 51.41012 120.90829 33.01507 139.30334
## 52.85714 99.761831 65.01244 134.51122 46.61722 152.90644
## 53.00000 74.085952 39.33624 108.83566 20.94085 127.23105
## 53.14286 3.804039 -30.94601 38.55409 -49.34157 56.94965
## 53.28571 99.124747 64.37435 133.87515 45.97860 152.27090
## 53.42857 78.469711 43.71895 113.22048 25.32300 131.61642
## 53.57143 84.554925 49.80378 119.30607 31.40763 137.70222
## 53.71429 85.843459 47.88104 123.80588 27.78495 143.90197
## 53.85714 99.446085 61.48328 137.40889 41.38699 157.50518
## 54.00000 73.770206 35.80701 111.73340 15.71050 131.82991
## 54.14286 3.488293 -34.47532 41.45191 -54.57204 61.54863
## 54.28571 98.809002 60.84496 136.77304 40.74801 156.86999
## 54.42857 78.153965 40.18948 116.11845 20.09229 136.21564
## 54.57143 84.239179 46.27423 122.20413 26.17680 142.30156
Looking at this we can see that the AIC is 4534.864, the BIC is 4581.564 and the RMSE is 26.69971. Which is better than the simple exponential smoothing.
ARIMA
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,1,1)[7]
## Q* = 11.267, df = 12, p-value = 0.5062
##
## Model df: 2. Total lags used: 14
The residuals look almost normally distributed
## Series: ATM1_ts
## ARIMA(0,0,1)(0,1,1)[7]
##
## Coefficients:
## ma1 sma1
## 0.1674 -0.5813
## s.e. 0.0565 0.0472
##
## sigma^2 estimated as 666.6: log likelihood=-1658.32
## AIC=3322.63 AICc=3322.7 BIC=3334.25
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.09700089 25.49555 16.17552 -106.2792 122.4165 0.8594986
## ACF1
## Training set -0.01116544
Looking at this we can see that the AIC is 3322.63, the BIC is 3334.25 and the RMSE is 25.49555. Arima seems to be the best model because the AIC, BIC and the RMSE is the least compared to the other 3 models.
ATM 2
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 515.46, df = 12, p-value < 2.2e-16
##
## Model df: 2. Total lags used: 14
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = ATM2_ts)
##
## Smoothing parameters:
## alpha = 0.016
##
## Initial states:
## l = 72.14
##
## sigma: 38.681
##
## AIC AICc BIC
## 4797.445 4797.512 4809.128
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -2.563596 38.57427 33.09136 -Inf Inf 1.539938 -0.02208177
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 52.85714 57.26713 7.695463 106.8388 -18.54619 133.0804
## 53.00000 57.26713 7.689132 106.8451 -18.55587 133.0901
## 53.14286 57.26713 7.682802 106.8514 -18.56555 133.0998
## 53.28571 57.26713 7.676473 106.8578 -18.57523 133.1095
## 53.42857 57.26713 7.670145 106.8641 -18.58491 133.1192
## 53.57143 57.26713 7.663817 106.8704 -18.59459 133.1288
## 53.71429 57.26713 7.657491 106.8768 -18.60426 133.1385
## 53.85714 57.26713 7.651165 106.8831 -18.61394 133.1482
## 54.00000 57.26713 7.644840 106.8894 -18.62361 133.1579
## 54.14286 57.26713 7.638515 106.8957 -18.63328 133.1675
Looking at this we can see that the AIC is 4797.445, the BIC is 4809.128 and the RMSE is 38.57427.
HOLT
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 36.563, df = 3, p-value = 5.693e-08
##
## Model df: 11. Total lags used: 14
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = ATM2_ts)
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
## gamma = 0.4749
##
## Initial states:
## l = 86.8442
## b = -0.0568
## s = 2.7954 18.428 -25.6811 -11.8901 10.6037 0.3422
## 5.4019
##
## sigma: 28.4654
##
## AIC AICc BIC
## 4583.646 4584.537 4630.379
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.004498074 28.03076 19.38286 -Inf Inf 0.9019996 -0.007054699
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 52.85714 68.0961667 31.61633 104.57600 12.305071 123.88726
## 53.00000 74.5158256 38.03599 110.99566 18.724728 130.30692
## 53.14286 10.1858821 -26.29396 46.66572 -45.605218 65.97698
## 53.28571 0.7160432 -35.76380 37.19589 -55.075061 56.50715
## 53.42857 102.0247290 65.54488 138.50458 46.233618 157.81584
## 53.57143 92.4676151 55.98776 128.94747 36.676494 148.25874
## 53.71429 71.7327034 35.25284 108.21257 15.941569 127.52384
## 53.85714 67.6974509 27.30091 108.09399 5.916269 129.47863
## 54.00000 74.1171098 33.72055 114.51367 12.335907 135.89831
## 54.14286 9.7871663 -30.60941 50.18374 -51.994061 71.56839
## 54.28571 0.3173274 -40.07927 40.71392 -61.463931 62.09859
## 54.42857 101.6260132 61.22940 142.02263 39.844719 163.40731
## 54.57143 92.0688993 51.67225 132.46554 30.287562 153.85024
## 54.71429 71.3339876 30.93731 111.73066 9.552601 133.11537
Looking at this we can see that the AIC is 4583.646, the BIC is 4630.379 and the RMSE is 28.03076.
ARIMA
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,1,2)[7]
## Q* = 12.12, df = 8, p-value = 0.1459
##
## Model df: 6. Total lags used: 14
## Series: ATM2_ts
## ARIMA(2,0,2)(0,1,2)[7]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 sma2
## -0.4148 -0.8922 0.4008 0.7534 -0.7220 0.0801
## s.e. 0.0674 0.0510 0.1027 0.0688 0.0567 0.0550
##
## sigma^2 estimated as 708.1: log likelihood=-1671.98
## AIC=3357.96 AICc=3358.28 BIC=3385.08
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.4583759 26.12942 18.09675 -Inf Inf 0.8421495 0.009277035
Looking at this we can see that the AIC is 3357.96, the BIC is 3385.08 and the RMSE is 26.12942. The ARIMA model seems to be the best model for ATM2.
ATM3
## # A tibble: 6 x 3
## DATE ATM Cash
## <date> <chr> <dbl>
## 1 2010-04-25 ATM3 0
## 2 2010-04-26 ATM3 0
## 3 2010-04-27 ATM3 0
## 4 2010-04-28 ATM3 96
## 5 2010-04-29 ATM3 82
## 6 2010-04-30 ATM3 85
Since it seems like there were only three transactions for ATM3 we are going to use simple time series forcasting methods such as The mean method, The Naïve method, The seasonal naïve method, and The simple moving average method.
The Mean Method
##
## Ljung-Box test
##
## data: Residuals from Mean
## Q* = 196.67, df = 13, p-value < 2.2e-16
##
## Model df: 1. Total lags used: 14
##
## Forecast method: Mean
##
## Model Information:
## $mu
## [1] 0.7205479
##
## $mu.se
## [1] 0.4158487
##
## $sd
## [1] 7.944778
##
## $bootstrap
## [1] FALSE
##
## $call
## meanf(y = ATM3_ts, h = 7)
##
## attr(,"class")
## [1] "meanf"
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -5.634762e-17 7.933887 1.429251 -Inf Inf 1.945521 0.6403876
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 53.14286 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.28571 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.42857 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.57143 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.71429 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.85714 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 54.00000 0.7205479 -9.49357 10.93467 -14.92427 16.36536
The RMSE of this model is 7.933887.
Naive Method
##
## Ljung-Box test
##
## data: Residuals from Naive method
## Q* = 8.4931, df = 14, p-value = 0.8621
##
## Model df: 0. Total lags used: 14
##
## Forecast method: Naive method
##
## Model Information:
## Call: naive(y = ATM3_ts, h = 7)
##
## Residual sd: 5.0874
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2335165 5.087423 0.3104396 28.81875 40.20086 0.4225755
## ACF1
## Training set -0.1494714
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 53.14286 85 78.48021 91.51979 75.02884 94.97116
## 53.28571 85 75.77962 94.22038 70.89864 99.10136
## 53.42857 85 73.70738 96.29262 67.72944 102.27056
## 53.57143 85 71.96041 98.03959 65.05767 104.94233
## 53.71429 85 70.42130 99.57870 62.70380 107.29620
## 53.85714 85 69.02983 100.97017 60.57573 109.42427
## 54.00000 85 67.75025 102.24975 58.61878 111.38122
The RMSE model is 5.087423.
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 192.93, df = 14, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 14
##
## Forecast method: Seasonal naive method
##
## Model Information:
## Call: snaive(y = ATM3_ts, h = 7)
##
## Residual sd: 8.044
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.7346369 8.044048 0.7346369 100 100 1 0.6403809
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 53.14286 0 -10.30886 10.30886 -15.76604 15.76604
## 53.28571 0 -10.30886 10.30886 -15.76604 15.76604
## 53.42857 0 -10.30886 10.30886 -15.76604 15.76604
## 53.57143 0 -10.30886 10.30886 -15.76604 15.76604
## 53.71429 96 85.69114 106.30886 80.23396 111.76604
## 53.85714 82 71.69114 92.30886 66.23396 97.76604
## 54.00000 85 74.69114 95.30886 69.23396 100.76604
The RMSE for this model is 8.044048. The best model for ATM3 seems to be the model with the naive method with the RMSE of 5.087423.
ATM 4
Simple Exponential Smoothing
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 4.6668, df = 12, p-value = 0.9682
##
## Model df: 2. Total lags used: 14
Looking at the residuals it seems to be skewed.
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = ATM4_ts)
##
## Smoothing parameters:
## alpha = 1e-04
##
## Initial states:
## l = 480.6291
##
## sigma: 651.8968
##
## AIC AICc BIC
## 6887.774 6887.840 6899.474
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -6.328783 650.1083 324.4922 -626.1841 655.7821 0.8074068
## ACF1
## Training set -0.009353838
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 53.14286 480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.28571 480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.42857 480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.57143 480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.71429 480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.85714 480.3981 -355.0412 1315.837 -797.2961 1758.092
## 54.00000 480.3981 -355.0413 1315.837 -797.2961 1758.092
## 54.14286 480.3981 -355.0413 1315.837 -797.2961 1758.092
## 54.28571 480.3981 -355.0413 1315.837 -797.2961 1758.092
## 54.42857 480.3981 -355.0413 1315.837 -797.2962 1758.092
Looking at this we can see that the AIC is 6887.774, the BIC is 6899.474 and the RMSE is 650.1083.
HOLT
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 3.7364, df = 3, p-value = 0.2914
##
## Model df: 11. Total lags used: 14
The residuals seem to be normally distributed.
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = ATM4_ts)
##
## Smoothing parameters:
## alpha = 0.0131
## beta = 1e-04
## gamma = 1e-04
##
## Initial states:
## l = 602.1216
## b = 0.1908
## s = -301.5132 -59.4532 174.6504 11.3506 61.4457 25.7597
## 87.76
##
## sigma: 650.5207
##
## AIC AICc BIC
## 6895.068 6895.954 6941.866
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -7.026146 640.6433 301.9819 -590.6452 619.007 0.7513964
## ACF1
## Training set 0.002054593
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 53.14286 502.3071 -331.3687 1335.983 -772.6900 1777.304
## 53.28571 537.9914 -295.7575 1371.740 -737.1174 1813.100
## 53.42857 487.7923 -346.0308 1321.615 -787.4300 1763.014
## 53.57143 651.0432 -182.8551 1484.941 -624.2942 1926.381
## 53.71429 416.8786 -417.0961 1250.853 -858.5756 1692.333
## 53.85714 174.7465 -659.3056 1008.799 -1100.8262 1450.319
## 54.00000 564.0173 -270.1135 1398.148 -711.6757 1839.710
## 54.14286 501.8473 -332.3644 1336.059 -773.9694 1777.664
## 54.28571 537.5316 -296.7610 1371.824 -738.4089 1813.472
## 54.42857 487.3325 -347.0423 1321.707 -788.7336 1763.399
## 54.57143 650.5834 -183.8746 1485.041 -625.6100 1926.777
## 54.71429 416.4188 -418.1236 1250.961 -859.9037 1692.741
## 54.85714 174.2868 -660.3413 1008.915 -1102.1667 1450.740
## 55.00000 563.5575 -271.1573 1398.272 -713.0287 1840.144
Looking at this we can see that the AIC is 6895.068, the BIC is 6941.866 and the RMSE is 640.6433.
ARIMA
ARIMA with Lambda is used for the ARIMA of ATM4
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Q* = 17.625, df = 11, p-value = 0.09069
##
## Model df: 3. Total lags used: 14
The residuals seem to be skewed.
## Series: ATM4_ts
## ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Box Cox transformation: lambda= -0.0737252
##
## Coefficients:
## sar1 sar2 mean
## 0.2487 0.1947 4.4864
## s.e. 0.0521 0.0525 0.0841
##
## sigma^2 estimated as 0.8418: log likelihood=-485.59
## AIC=979.18 AICc=979.29 BIC=994.78
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 209.7821 678.9957 317.1943 -239.7645 304.0411 0.7892481
## ACF1
## Training set -0.005100991
Looking at this we can see that the AIC is 979.18, the BIC is 994.78 and the RMSE is 678.9957.The Arima model seems to be the best model for the ATM4 model.
## ATM1
ATM1ac <- rbind(accuracy(sesatm1), accuracy(holtatm1), accuracy(ATM1arima))
row.names(ATM1ac) <- c("Smooth Exponential Smoothing", "Holt", "ARIMA")
## ATM2
ATM2ac <- rbind(accuracy(sesatm2), accuracy(holtatm2), accuracy(ATM2arima))
row.names(ATM2ac) <- c("Smooth Exponential Smoothing", "Holt", "ARIMA")
## ATM4
ATM4ac <- rbind(accuracy(sesatm4), accuracy(holtatm4), accuracy(ATM4arima))
row.names(ATM4ac) <- c("Smooth Exponential Smoothing", "Holt", "ARIMA")ATM 1 Accuracy
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
|---|---|---|---|---|---|---|---|
| Smooth Exponential Smoothing | 0.0780203 | 36.60784 | 27.31876 | -175.7800 | 199.2779 | 1.4516029 | 0.0636595 |
| Holt | -0.3248614 | 26.69971 | 17.28292 | -113.4744 | 129.9950 | 0.9183413 | 0.1421671 |
| ARIMA | -0.0970009 | 25.49555 | 16.17552 | -106.2792 | 122.4165 | 0.8594986 | -0.0111654 |
ATM 2 Accuracy
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
|---|---|---|---|---|---|---|---|
| Smooth Exponential Smoothing | -2.5635958 | 38.57427 | 33.09136 | -Inf | Inf | 1.5399380 | -0.0220818 |
| Holt | -0.0044981 | 28.03076 | 19.38286 | -Inf | Inf | 0.9019996 | -0.0070547 |
| ARIMA | -0.4583759 | 26.12942 | 18.09675 | -Inf | Inf | 0.8421495 | 0.0092770 |
ATM 4 Accuracy
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
|---|---|---|---|---|---|---|---|
| Smooth Exponential Smoothing | -6.328783 | 650.1083 | 324.4922 | -626.1841 | 655.7821 | 0.8074068 | -0.0093538 |
| Holt | -7.026146 | 640.6433 | 301.9819 | -590.6452 | 619.0070 | 0.7513964 | 0.0020546 |
| ARIMA | 209.782088 | 678.9957 | 317.1943 | -239.7645 | 304.0411 | 0.7892481 | -0.0051010 |
Forecast
Best Models for the Forecast
ATM1 - ARIMA(0,0,1)(0,1,1)[7] - AIC: 3322.63 RMSE: 25.49555
ATM1 - ARIMA(2,0,2)(0,1,2)[7] - AIC: 3357.96 RMSE: 26.12942
ATM3 - Naive Model - RMSE: 5.087423
ATM4 - ARIMA(0,0,0)(2,0,0)[7] - AIC: 979.18 RMSE: 678.9957
Summary of ATM Data
Forecast of of ATM1, ATM2, and ATM4 as said before all seem to have seasonality. Three different models were done for ATM1, ATM2, and ATM3 which were the Smooth Exponential Smoothing, Holt, and the ARIMA. The best models for all three of them, was the ARIMA model. These were the results ATM1 - ARIMA(0,0,1)(0,1,1)[7], AIC: 3322.63, RMSE: 25.49555, ATM2 - ARIMA(2,0,2)(0,1,2)[7], AIC: 3357.96, RMSE: 26.12942, ATM4 - ARIMA(0,0,0)(2,0,0)[7], AIC: 979.18, RMSE: 678.9957. For ATM3 the Mean method, Naive method, and the snaive method was used and the best turn out was with naive model and the results were ATM3 - Naive Model, RMSE: 5.087423.
sd <- as.Date("2010-05-01")
fd <- sd + 0:30
a1 <- forecastATM1
b1 <- data.frame(DATE=fd, ATM=rep("ATM1", 31), Cash=a1$mean)
a2 <- forecastATM2
b2 <- data.frame(DATE=fd, ATM=rep("ATM2", 31), Cash=a2$mean)
a3 <- forecastATM3
b3 <- data.frame(DATE=fd, ATM=rep("ATM3", 31), Cash=a3$mean)
a4 <- forecastATM4
b4 <- data.frame(DATE=fd, ATM=rep("ATM4", 31), Cash=a4$mean)
df <- bind_rows(list(b1, b2, b3, b4))
write.csv(df, file = "All ATMs Predicted.csv", row.names = FALSE)Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward.
Quick look at the Data
## CaseSequence YYYY-MMM KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5429912
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6502475
## 3rd Qu.:876.2 3rd Qu.: 7620524
## Max. :924.0 Max. :10655730
## NA's :1
## Rows: 192
## Columns: 3
## $ CaseSequence <dbl> 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 74~
## $ `YYYY-MMM` <chr> "1998-Jan", "1998-Feb", "1998-Mar", "1998-Apr", "1998-May~
## $ KWH <dbl> 6862583, 5838198, 5420658, 5010364, 4665377, 6467147, 891~
There are 192 rows and 3 columns.
Data Preparation
Missing Data
Only one column has missing data and it seems like only one row is missing data.
To remove the only missing data we use the na.omit. Once the data is removed we can now move on and convert it into a time series.
Models
For the power dataset we will create 3 different models Smooth Exponential Smoothing, Holt, ARIMA. We will look at the AIC, BIC, and the RMSE and we will pick the model that has the lowest of all three.
We can see that there is seasonality, and there was a big in June 2010.
you can see here there is a lag of 12.
Simple Exponential Smoothing
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 581.22, df = 22, p-value < 2.2e-16
##
## Model df: 2. Total lags used: 24
Looking at the residuals it seems like it is normally distributed.
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = power_datats)
##
## Smoothing parameters:
## alpha = 0.0389
##
## Initial states:
## l = 6150702.2675
##
## sigma: 1424322
##
## AIC AICc BIC
## 6419.811 6419.939 6429.567
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 141716.7 1416845 1176581 -5.38936 21.55473 1.581218 0.3671121
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Dec 2013 7204795 5379453 9030137 4413175 9996414
## Jan 2014 7204795 5378069 9031520 4411059 9998530
## Feb 2014 7204795 5376687 9032903 4408945 10000645
## Mar 2014 7204795 5375305 9034284 4406832 10002757
## Apr 2014 7204795 5373925 9035665 4404721 10004869
## May 2014 7204795 5372546 9037044 4402611 10006978
## Jun 2014 7204795 5371167 9038422 4400503 10009086
## Jul 2014 7204795 5369790 9039800 4398397 10011193
## Aug 2014 7204795 5368414 9041176 4396292 10013297
## Sep 2014 7204795 5367038 9042551 4394189 10015401
Looking at this we can see that the AIC is 6419.811, the BIC is 6429.567 and the RMSE is 1416845.
HOLT
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 35.221, df = 8, p-value = 2.437e-05
##
## Model df: 16. Total lags used: 24
The residuals look almost normally distributed
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = power_datats)
##
## Smoothing parameters:
## alpha = 0.0141
## beta = 0.0014
## gamma = 0.354
##
## Initial states:
## l = 6365037.9004
## b = 5957.8189
## s = 290670.2 -935083.3 -900245.6 422442.6 1494020 1462846
## -89005.15 -844980.1 -1259442 -731886.3 6783.244 1083881
##
## sigma: 1040969
##
## AIC AICc BIC
## 6313.337 6316.875 6368.626
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 57861.12 996414.8 656759 -4.275331 13.55529 0.8826242 0.2362087
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Dec 2013 9708882 8374827 11042938 7668621 11749144
## Jan 2014 8527516 7193300 9861732 6487009 10568023
## Feb 2014 7122739 5788332 8457145 5081940 9163537
## Mar 2014 6434973 5100343 7769603 4393833 8476113
## Apr 2014 6212687 4877799 7547576 4171151 8254223
## May 2014 7941345 6606159 9276530 5899355 9983335
## Jun 2014 8345415 7009892 9680938 6302909 10387921
## Jul 2014 9634765 8298861 10970668 7591677 11677853
## Aug 2014 8566996 7230666 9903326 6523256 10610736
## Sep 2014 6395805 5059001 7732610 4351339 8440272
## Oct 2014 6269112 4931782 7606442 4223842 8314382
## Nov 2014 8391313 7053404 9729222 6345157 10437469
## Dec 2014 9966065 8533020 11399111 7774411 12157719
## Jan 2015 8784699 7351005 10218392 6592054 10977343
## Feb 2015 7379922 5945523 8814320 5186199 9573645
## Mar 2015 6692156 5256993 8127318 4497264 8887048
## Apr 2015 6469870 5033881 7905859 4273715 8666025
## May 2015 8198528 6761649 9635406 6001011 10396044
## Jun 2015 8602598 7164763 10040433 6403619 10801577
## Jul 2015 9891947 8453087 11330808 7691400 12092494
## Aug 2015 8824179 7384223 10264135 6621956 11026402
## Sep 2015 6652988 5211863 8094114 4448977 8856999
## Oct 2015 6526295 5083925 7968665 4320380 8732210
## Nov 2015 8648496 7204803 10092188 6440559 10856433
Looking at this we can see that the AIC is 6313.337, the BIC is 6368.626 and the RMSE is 996414.8. Which is better than the simple exponential smoothing.
ARIMA
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,1,2)[12] with drift
## Q* = 17.494, df = 20, p-value = 0.6207
##
## Model df: 4. Total lags used: 24
The residuals look almost normally distributed
## Series: power_datats
## ARIMA(0,0,1)(0,1,2)[12] with drift
##
## Coefficients:
## ma1 sma1 sma2 drift
## 0.2431 -0.7287 0.1929 8501.749
## s.e. 0.0773 0.0800 0.0861 3639.647
##
## sigma^2 estimated as 9.396e+11: log likelihood=-2722.65
## AIC=5455.3 AICc=5455.65 BIC=5471.24
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -10349.99 927823.9 576140 -5.029934 12.4629 0.7742797 -0.002808329
Looking at this we can see that the AIC is 5455.3, the BIC is 5471.24 and the RMSE is 927823.9. ARIMA seems to be the best model because the AIC, BIC and the RMSE is the least compared to the other 3 models.
Accuracy
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
|---|---|---|---|---|---|---|---|
| Smooth Exponential Smoothing | 141716.74 | 1416845.0 | 1176581 | -5.389360 | 21.55473 | 1.5812182 | 0.3671121 |
| Holt | 57861.12 | 996414.8 | 656759 | -4.275331 | 13.55529 | 0.8826242 | 0.2362087 |
| ARIMA | -10349.99 | 927823.9 | 576140 | -5.029935 | 12.46290 | 0.7742797 | -0.0028083 |
Summary of the Power Data
Three different models were made which were Smooth Exponential Smoothing, Holt, ARIMA. Out all three of them the best model was the ARIMA model with AIC is 5455.3, the BIC is 5471.24 and the RMSE is 927823.9 since it had the best AIC, BIC, and RMSE.