In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
Missing values were removed from the data.
Here we read in the data.
Clean up missing values from data.
## DATE ATM Cash
## 0 14 19
## DATE ATM Cash
## 0 0 0
Atm transaction aata is divided up by weeks. The data seems to be static b/c of the ACF Model. We do not need to do further transformation. There is some seasonality as we can see downward trends on Mondays and Wednesdays.
The dataset does not show very prevalent seasonality. Exponential smoothing is applied to where there is no seasonality.
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = atm1TimeSeries)
##
## Smoothing parameters:
## alpha = 1e-04
##
## Initial states:
## l = 83.7797
##
## sigma: 36.7094
##
## AIC AICc BIC
## 4745.365 4745.432 4757.040
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.07802025 36.60783 27.31876 -175.78 199.2779 1.451603 0.06365945
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 52.71429 83.78256 36.7376 130.8275 11.83350 155.7316
## 52.85714 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.00000 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.14286 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.28571 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.42857 83.78256 36.7376 130.8275 11.83349 155.7316
## 53.57143 83.78256 36.7376 130.8275 11.83349 155.7316
## 53.71429 83.78256 36.7376 130.8275 11.83349 155.7316
## 53.85714 83.78256 36.7376 130.8275 11.83349 155.7316
## 54.00000 83.78256 36.7376 130.8275 11.83349 155.7316
Based on the results, the ARIMA model seems to be the most optimal
model here with the best values for AIC, BIC, and RMSE.
#### ARIMA
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,1,1)[7]
## Q* = 11.267, df = 12, p-value = 0.5062
##
## Model df: 2. Total lags used: 14
## Series: atm1TimeSeries
## ARIMA(0,0,1)(0,1,1)[7]
##
## Coefficients:
## ma1 sma1
## 0.1674 -0.5813
## s.e. 0.0565 0.0472
##
## sigma^2 = 666.6: log likelihood = -1658.32
## AIC=3322.63 AICc=3322.7 BIC=3334.25
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.09700089 25.49555 16.17552 -106.2792 122.4165 0.8594986
## ACF1
## Training set -0.01116544
AIC, BIC, and RMSE values are more optimal here, rather than with
simple exponential smoothing.
#### ETS
## ETS(A,N,A)
##
## Call:
## ets(y = atm1TimeSeries)
##
## Smoothing parameters:
## alpha = 1e-04
## gamma = 0.415
##
## Initial states:
## l = 87.7888
## s = -54.0598 16.2223 21.048 -26.068 8.3991 35.2058
## -0.7474
##
## sigma: 26.384
##
## AIC AICc BIC
## 4513.138 4513.765 4552.055
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.5473075 26.05397 16.9637 -110.6377 126.3744 0.9013793 0.1374786
ATM2 does not need further transformation and the data is stationary. There is white noise.
Using Simple Exponential Smoothing Model, AIC, BIC, and RMSE values seem as expected.
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 515.46, df = 12, p-value < 2.2e-16
##
## Model df: 2. Total lags used: 14
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = atm2_timeSeries)
##
## Smoothing parameters:
## alpha = 0.016
##
## Initial states:
## l = 72.14
##
## sigma: 38.681
##
## AIC AICc BIC
## 4797.445 4797.512 4809.128
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -2.563596 38.57427 33.09136 -Inf Inf 1.539938 -0.02208177
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 52.85714 57.26713 7.695463 106.8388 -18.54619 133.0804
## 53.00000 57.26713 7.689132 106.8451 -18.55587 133.0901
## 53.14286 57.26713 7.682802 106.8514 -18.56555 133.0998
## 53.28571 57.26713 7.676473 106.8578 -18.57523 133.1095
## 53.42857 57.26713 7.670145 106.8641 -18.58491 133.1192
## 53.57143 57.26713 7.663817 106.8704 -18.59459 133.1288
## 53.71429 57.26713 7.657491 106.8768 -18.60426 133.1385
## 53.85714 57.26713 7.651165 106.8831 -18.61394 133.1482
## 54.00000 57.26713 7.644840 106.8894 -18.62361 133.1579
## 54.14286 57.26713 7.638515 106.8957 -18.63328 133.1675
This is the best model and has the best AIC, BIC, and RMSE (lowest) values compared to the other two models. Also the data is stationary; data is normal and does not need to be transformed.
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,1,2)[7]
## Q* = 12.12, df = 8, p-value = 0.1459
##
## Model df: 6. Total lags used: 14
## Series: atm2_timeSeries
## ARIMA(2,0,2)(0,1,2)[7]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 sma2
## -0.4148 -0.8922 0.4008 0.7534 -0.7220 0.0801
## s.e. 0.0674 0.0510 0.1027 0.0688 0.0567 0.0550
##
## sigma^2 = 708.1: log likelihood = -1671.98
## AIC=3357.96 AICc=3358.28 BIC=3385.08
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.4583759 26.12942 18.09675 -Inf Inf 0.8421495 0.009277035
Transformation is not necessary since the data is stationary. AIC, BIC, and RMSE values are better here than with Simple Exponential Smoothing which is good.
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 38.485, df = 5, p-value = 3.015e-07
##
## Model df: 9. Total lags used: 14
## ETS(A,N,A)
##
## Call:
## ets(y = atm2_timeSeries)
##
## Smoothing parameters:
## alpha = 1e-04
## gamma = 0.4647
##
## Initial states:
## l = 76.1082
## s = 11.3136 -47.8461 -13.3718 -1.8486 18.6895 4.6056
## 28.4579
##
## sigma: 27.8902
##
## AIC AICc BIC
## 4566.882 4567.507 4605.826
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.6349959 27.54227 19.09383 -Inf Inf 0.8885496 -0.02291393
Much of the values are 0 in the ATM 3 data, so applying all 3 models here will not be effective.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.7206 0.0000 96.0000
Data for ATM 4 is stationary based on the ACF plot but data is not
normal. Using Box Cox transformation we see more optimal, lower values
for AIC, BIC, and RMSE.
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 68.994, df = 12, p-value = 4.939e-10
##
## Model df: 2. Total lags used: 14
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = atm4_timeSeries, lambda = "auto")
##
## Box-Cox transformation: lambda= -0.0737
##
## Smoothing parameters:
## alpha = 1e-04
##
## Initial states:
## l = 4.4957
##
## sigma: 0.9807
##
## AIC AICc BIC
## 2143.231 2143.298 2154.931
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 238.6335 692.468 345.1077 -255.9182 328.5578 0.8587028
## ACF1
## Training set -0.009363934
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 53.14286 235.4013 40.47784 1780.964 17.35251 5957.286
## 53.28571 235.4013 40.47784 1780.964 17.35251 5957.286
## 53.42857 235.4013 40.47784 1780.964 17.35251 5957.286
## 53.57143 235.4013 40.47784 1780.964 17.35251 5957.286
## 53.71429 235.4013 40.47783 1780.964 17.35251 5957.287
## 53.85714 235.4013 40.47783 1780.965 17.35251 5957.287
## 54.00000 235.4013 40.47783 1780.965 17.35251 5957.287
## 54.14286 235.4013 40.47783 1780.965 17.35251 5957.287
## 54.28571 235.4013 40.47783 1780.965 17.35251 5957.287
## 54.42857 235.4013 40.47783 1780.965 17.35251 5957.287
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Q* = 17.625, df = 11, p-value = 0.09069
##
## Model df: 3. Total lags used: 14
## Series: atm4_timeSeries
## ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Box Cox transformation: lambda= -0.0737252
##
## Coefficients:
## sar1 sar2 mean
## 0.2487 0.1947 4.4864
## s.e. 0.0521 0.0525 0.0841
##
## sigma^2 = 0.8418: log likelihood = -485.59
## AIC=979.18 AICc=979.29 BIC=994.78
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 209.7821 678.9957 317.1943 -239.7645 304.0411 0.7892481
## ACF1
## Training set -0.005100991
AIC, BIC, and RMSE values are getting better here than with Simple
Exponential Smoothing.
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 18.809, df = 5, p-value = 0.002086
##
## Model df: 9. Total lags used: 14
## ETS(A,N,A)
##
## Call:
## ets(y = atm4_timeSeries, lambda = "auto")
##
## Box-Cox transformation: lambda= -0.0737
##
## Smoothing parameters:
## alpha = 0.0138
## gamma = 0.1424
##
## Initial states:
## l = 4.4335
## s = -1.3504 -0.0467 0.0246 0.3089 0.3808 0.2859
## 0.397
##
## sigma: 0.8982
##
## AIC AICc BIC
## 2085.957 2086.578 2124.956
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 177.0985 664.9255 308.4432 -271.5266 333.9661 0.7674735
## ACF1
## Training set -0.004544246
Lower and upper values are in 80th percentile and 95th percentile.
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 52.71429 86.805607 53.71784 119.89337 36.20224 137.40898
## 52.85714 100.640560 67.09247 134.18865 49.33319 151.94793
## 53.00000 74.714560 41.16647 108.26265 23.40719 126.02193
## 53.14286 4.762635 -28.78545 38.31072 -46.54474 56.07001
## 53.28571 100.063192 66.51510 133.61128 48.75582 151.37057
Lower and upper values are in 80th percentile and 95th percentile.
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 52.85714 69.432448 35.33003 103.53487 17.27730 121.58759
## 53.00000 69.881366 35.77560 103.98713 17.72111 122.04163
## 53.14286 12.308924 -22.09671 46.71456 -40.30996 64.92780
## 53.28571 2.758906 -31.72395 37.24176 -49.97807 55.49588
## 53.42857 99.519624 64.89886 134.14038 46.57174 152.46751
Lower and upper values are in 80th percentile and 95th percentile.
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 53.14286 90.23023 19.20879 517.3860 9.049301 1441.116
## 53.28571 275.70713 51.80185 1856.9627 23.087713 5748.043
## 53.42857 328.95851 60.54706 2275.9529 26.741696 7171.648
## 53.57143 72.60334 15.82001 404.1714 7.530804 1104.380
## 53.71429 303.15712 56.33393 2071.4372 24.985772 6473.369
##Conclusion
Missing values were removed/cleaned out from the data. Data was measured
with time series on a weekly interval. SES, ARIMA, and ETS modelling
were used and compared between ATM1, ATM2, and ATM4. ARIMA had the best
results b/c of the best RMSE, AIC, and BIC values. The values were
forecasted for several weeks - there was 80% and 95% confidence interval
for lower and upper values as displayed in excel exports.
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
## CaseSequence YYYY-MMM KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5429912
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6502475
## 3rd Qu.:876.2 3rd Qu.: 7620524
## Max. :924.0 Max. :10655730
## NA's :1
## CaseSequence YYYY-MMM KWH
## 0 0 1
## Jan Feb Mar Apr May Jun Jul Aug
## 1998 6862583 5838198 5420658 5010364 4665377 6467147 8914755 8607428
## 1999 7183759 5759262 4847656 5306592 4426794 5500901 7444416 7564391
## 2000 7068296 5876083 4807961 4873080 5050891 7092865 6862662 7517830
## 2001 7538529 6602448 5779180 4835210 4787904 6283324 7855129 8450717
## 2002 7099063 6413429 5839514 5371604 5439166 5850383 7039702 8058748
## 2003 7256079 6190517 6120626 4885643 5296096 6051571 6900676 8476499
## 2004 7584596 6560742 6526586 4831688 4878262 6421614 7307931 7309774
## 2005 8225477 6564338 5581725 5563071 4453983 5900212 8337998 7786659
## 2006 7793358 5914945 5819734 5255988 4740588 7052275 7945564 8241110
## 2007 8031295 7928337 6443170 4841979 4862847 5022647 6426220 7447146
## 2008 7964293 7597060 6085644 5352359 4608528 6548439 7643987 8037137
## 2009 8072330 6976800 5691452 5531616 5264439 5804433 7713260 8350517
## 2010 9397357 8390677 7347915 5776131 4919289 6696292 770523 7922701
## 2011 8394747 8898062 6356903 5685227 5506308 8037779 10093343 10308076
## 2012 8991267 7952204 6356961 5569828 5783598 7926956 8886851 9612423
## 2013 10655730 7681798 6517514 6105359 5940475 7920627 8415321 9080226
## Sep Oct Nov Dec
## 1998 6989888 6345620 4640410 4693479
## 1999 7899368 5358314 4436269 4419229
## 2000 8912169 5844352 5041769 6220334
## 2001 7112069 5242535 4461979 5240995
## 2002 8245227 5865014 4908979 5779958
## 2003 7791791 5344613 4913707 5756193
## 2004 6690366 5444948 4824940 5791208
## 2005 7057213 6694523 4313019 6181548
## 2006 7296355 5104799 4458429 6226214
## 2007 7666970 5785964 4907057 6047292
## 2008 6283324 5101803 4555602 6442746
## 2009 7583146 5566075 5339890 7089880
## 2010 7819472 5875917 4800733 6152583
## 2011 8943599 5603920 6154138 8273142
## 2012 7559148 5576852 5731899 6609694
## 2013 7968220 5759367 5769083 9606304
## [1] "Differencing: 1"
## [1] "How much more differencing is needed? 0"
## [1] "Differencing : 0"
Lower and upper values had confidence intervals of 80% and 95%.
## Series: power_diff
## ARIMA(0,0,1)(0,0,2)[12] with non-zero mean
##
## Coefficients:
## ma1 sma1 sma2 mean
## 0.1195 -0.9264 0.0890 0.0616
## s.e. 0.0712 0.0810 0.0811 0.0198
##
## sigma^2 = 0.9557: log likelihood = -257.11
## AIC=524.23 AICc=524.57 BIC=540.19
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02554754 0.9666583 0.4541852 -1920.696 2160.894 0.4138519
## ACF1
## Training set 0.006469743
The power data was broken up into monthly intervals. Missing values were cleaned up and in place, the median values were used. Using Time Series, applied were BoxCox transformations and ARIMA model to ultimately find the best method. ARIMA model was selected b/c for monthly prediction of power consumption, this model displayed most optimal results with the best RMSE, AIC, and BIC values.