Part A – ATM Forecast, ATM624Data.xlsx

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

Data Preparation and Cleaning

Missing values were removed from the data.

Here we read in the data.

Clean up missing values from data.

## DATE  ATM Cash 
##    0   14   19
## DATE  ATM Cash 
##    0    0    0

Data Exploration and Modelling

Atm transaction aata is divided up by weeks. The data seems to be static b/c of the ACF Model. We do not need to do further transformation. There is some seasonality as we can see downward trends on Mondays and Wednesdays.

The dataset does not show very prevalent seasonality. Exponential smoothing is applied to where there is no seasonality.

ATM 1

Exponential Smoothing

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = atm1TimeSeries) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
## 
##   Initial states:
##     l = 83.7797 
## 
##   sigma:  36.7094
## 
##      AIC     AICc      BIC 
## 4745.365 4745.432 4757.040 
## 
## Error measures:
##                      ME     RMSE      MAE     MPE     MAPE     MASE       ACF1
## Training set 0.07802025 36.60783 27.31876 -175.78 199.2779 1.451603 0.06365945
## 
## Forecasts:
##          Point Forecast   Lo 80    Hi 80    Lo 95    Hi 95
## 52.71429       83.78256 36.7376 130.8275 11.83350 155.7316
## 52.85714       83.78256 36.7376 130.8275 11.83350 155.7316
## 53.00000       83.78256 36.7376 130.8275 11.83350 155.7316
## 53.14286       83.78256 36.7376 130.8275 11.83350 155.7316
## 53.28571       83.78256 36.7376 130.8275 11.83350 155.7316
## 53.42857       83.78256 36.7376 130.8275 11.83349 155.7316
## 53.57143       83.78256 36.7376 130.8275 11.83349 155.7316
## 53.71429       83.78256 36.7376 130.8275 11.83349 155.7316
## 53.85714       83.78256 36.7376 130.8275 11.83349 155.7316
## 54.00000       83.78256 36.7376 130.8275 11.83349 155.7316

Based on the results, the ARIMA model seems to be the most optimal model here with the best values for AIC, BIC, and RMSE.
#### ARIMA

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(0,1,1)[7]
## Q* = 11.267, df = 12, p-value = 0.5062
## 
## Model df: 2.   Total lags used: 14
## Series: atm1TimeSeries 
## ARIMA(0,0,1)(0,1,1)[7] 
## 
## Coefficients:
##          ma1     sma1
##       0.1674  -0.5813
## s.e.  0.0565   0.0472
## 
## sigma^2 = 666.6:  log likelihood = -1658.32
## AIC=3322.63   AICc=3322.7   BIC=3334.25
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.09700089 25.49555 16.17552 -106.2792 122.4165 0.8594986
##                     ACF1
## Training set -0.01116544

AIC, BIC, and RMSE values are more optimal here, rather than with simple exponential smoothing.
#### ETS

## ETS(A,N,A) 
## 
## Call:
##  ets(y = atm1TimeSeries) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     gamma = 0.415 
## 
##   Initial states:
##     l = 87.7888 
##     s = -54.0598 16.2223 21.048 -26.068 8.3991 35.2058
##            -0.7474
## 
##   sigma:  26.384
## 
##      AIC     AICc      BIC 
## 4513.138 4513.765 4552.055 
## 
## Training set error measures:
##                      ME     RMSE     MAE       MPE     MAPE      MASE      ACF1
## Training set -0.5473075 26.05397 16.9637 -110.6377 126.3744 0.9013793 0.1374786

ATM 2

Exponential Smoothing

ATM2 does not need further transformation and the data is stationary. There is white noise.

Using Simple Exponential Smoothing Model, AIC, BIC, and RMSE values seem as expected.

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 515.46, df = 12, p-value < 2.2e-16
## 
## Model df: 2.   Total lags used: 14
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = atm2_timeSeries) 
## 
##   Smoothing parameters:
##     alpha = 0.016 
## 
##   Initial states:
##     l = 72.14 
## 
##   sigma:  38.681
## 
##      AIC     AICc      BIC 
## 4797.445 4797.512 4809.128 
## 
## Error measures:
##                     ME     RMSE      MAE  MPE MAPE     MASE        ACF1
## Training set -2.563596 38.57427 33.09136 -Inf  Inf 1.539938 -0.02208177
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 52.85714       57.26713 7.695463 106.8388 -18.54619 133.0804
## 53.00000       57.26713 7.689132 106.8451 -18.55587 133.0901
## 53.14286       57.26713 7.682802 106.8514 -18.56555 133.0998
## 53.28571       57.26713 7.676473 106.8578 -18.57523 133.1095
## 53.42857       57.26713 7.670145 106.8641 -18.58491 133.1192
## 53.57143       57.26713 7.663817 106.8704 -18.59459 133.1288
## 53.71429       57.26713 7.657491 106.8768 -18.60426 133.1385
## 53.85714       57.26713 7.651165 106.8831 -18.61394 133.1482
## 54.00000       57.26713 7.644840 106.8894 -18.62361 133.1579
## 54.14286       57.26713 7.638515 106.8957 -18.63328 133.1675

ARIMA

This is the best model and has the best AIC, BIC, and RMSE (lowest) values compared to the other two models. Also the data is stationary; data is normal and does not need to be transformed.

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,2)(0,1,2)[7]
## Q* = 12.12, df = 8, p-value = 0.1459
## 
## Model df: 6.   Total lags used: 14
## Series: atm2_timeSeries 
## ARIMA(2,0,2)(0,1,2)[7] 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2     sma1    sma2
##       -0.4148  -0.8922  0.4008  0.7534  -0.7220  0.0801
## s.e.   0.0674   0.0510  0.1027  0.0688   0.0567  0.0550
## 
## sigma^2 = 708.1:  log likelihood = -1671.98
## AIC=3357.96   AICc=3358.28   BIC=3385.08
## 
## Training set error measures:
##                      ME     RMSE      MAE  MPE MAPE      MASE        ACF1
## Training set -0.4583759 26.12942 18.09675 -Inf  Inf 0.8421495 0.009277035

ETS

Transformation is not necessary since the data is stationary. AIC, BIC, and RMSE values are better here than with Simple Exponential Smoothing which is good.

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 38.485, df = 5, p-value = 3.015e-07
## 
## Model df: 9.   Total lags used: 14
## ETS(A,N,A) 
## 
## Call:
##  ets(y = atm2_timeSeries) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     gamma = 0.4647 
## 
##   Initial states:
##     l = 76.1082 
##     s = 11.3136 -47.8461 -13.3718 -1.8486 18.6895 4.6056
##            28.4579
## 
##   sigma:  27.8902
## 
##      AIC     AICc      BIC 
## 4566.882 4567.507 4605.826 
## 
## Training set error measures:
##                      ME     RMSE      MAE  MPE MAPE      MASE        ACF1
## Training set -0.6349959 27.54227 19.09383 -Inf  Inf 0.8885496 -0.02291393

ATM 3

Much of the values are 0 in the ATM 3 data, so applying all 3 models here will not be effective.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7206  0.0000 96.0000

ATM 4

Exponential Smoothing

Data for ATM 4 is stationary based on the ACF plot but data is not normal. Using Box Cox transformation we see more optimal, lower values for AIC, BIC, and RMSE.

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 68.994, df = 12, p-value = 4.939e-10
## 
## Model df: 2.   Total lags used: 14
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = atm4_timeSeries, lambda = "auto") 
## 
##   Box-Cox transformation: lambda= -0.0737 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
## 
##   Initial states:
##     l = 4.4957 
## 
##   sigma:  0.9807
## 
##      AIC     AICc      BIC 
## 2143.231 2143.298 2154.931 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE
## Training set 238.6335 692.468 345.1077 -255.9182 328.5578 0.8587028
##                      ACF1
## Training set -0.009363934
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 53.14286       235.4013 40.47784 1780.964 17.35251 5957.286
## 53.28571       235.4013 40.47784 1780.964 17.35251 5957.286
## 53.42857       235.4013 40.47784 1780.964 17.35251 5957.286
## 53.57143       235.4013 40.47784 1780.964 17.35251 5957.286
## 53.71429       235.4013 40.47783 1780.964 17.35251 5957.287
## 53.85714       235.4013 40.47783 1780.965 17.35251 5957.287
## 54.00000       235.4013 40.47783 1780.965 17.35251 5957.287
## 54.14286       235.4013 40.47783 1780.965 17.35251 5957.287
## 54.28571       235.4013 40.47783 1780.965 17.35251 5957.287
## 54.42857       235.4013 40.47783 1780.965 17.35251 5957.287

ARIMA

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Q* = 17.625, df = 11, p-value = 0.09069
## 
## Model df: 3.   Total lags used: 14
## Series: atm4_timeSeries 
## ARIMA(0,0,0)(2,0,0)[7] with non-zero mean 
## Box Cox transformation: lambda= -0.0737252 
## 
## Coefficients:
##         sar1    sar2    mean
##       0.2487  0.1947  4.4864
## s.e.  0.0521  0.0525  0.0841
## 
## sigma^2 = 0.8418:  log likelihood = -485.59
## AIC=979.18   AICc=979.29   BIC=994.78
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 209.7821 678.9957 317.1943 -239.7645 304.0411 0.7892481
##                      ACF1
## Training set -0.005100991

ETS

AIC, BIC, and RMSE values are getting better here than with Simple Exponential Smoothing.

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 18.809, df = 5, p-value = 0.002086
## 
## Model df: 9.   Total lags used: 14
## ETS(A,N,A) 
## 
## Call:
##  ets(y = atm4_timeSeries, lambda = "auto") 
## 
##   Box-Cox transformation: lambda= -0.0737 
## 
##   Smoothing parameters:
##     alpha = 0.0138 
##     gamma = 0.1424 
## 
##   Initial states:
##     l = 4.4335 
##     s = -1.3504 -0.0467 0.0246 0.3089 0.3808 0.2859
##            0.397
## 
##   sigma:  0.8982
## 
##      AIC     AICc      BIC 
## 2085.957 2086.578 2124.956 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 177.0985 664.9255 308.4432 -271.5266 333.9661 0.7674735
##                      ACF1
## Training set -0.004544246

Predictions

ATM 1 Test

Lower and upper values are in 80th percentile and 95th percentile.

##          Point.Forecast     Lo.80     Hi.80     Lo.95     Hi.95
## 52.71429      86.805607  53.71784 119.89337  36.20224 137.40898
## 52.85714     100.640560  67.09247 134.18865  49.33319 151.94793
## 53.00000      74.714560  41.16647 108.26265  23.40719 126.02193
## 53.14286       4.762635 -28.78545  38.31072 -46.54474  56.07001
## 53.28571     100.063192  66.51510 133.61128  48.75582 151.37057

ATM 2 Test

Lower and upper values are in 80th percentile and 95th percentile.

##          Point.Forecast     Lo.80     Hi.80     Lo.95     Hi.95
## 52.85714      69.432448  35.33003 103.53487  17.27730 121.58759
## 53.00000      69.881366  35.77560 103.98713  17.72111 122.04163
## 53.14286      12.308924 -22.09671  46.71456 -40.30996  64.92780
## 53.28571       2.758906 -31.72395  37.24176 -49.97807  55.49588
## 53.42857      99.519624  64.89886 134.14038  46.57174 152.46751

ATM 4 Test

Lower and upper values are in 80th percentile and 95th percentile.

##          Point.Forecast    Lo.80     Hi.80     Lo.95    Hi.95
## 53.14286       90.23023 19.20879  517.3860  9.049301 1441.116
## 53.28571      275.70713 51.80185 1856.9627 23.087713 5748.043
## 53.42857      328.95851 60.54706 2275.9529 26.741696 7171.648
## 53.57143       72.60334 15.82001  404.1714  7.530804 1104.380
## 53.71429      303.15712 56.33393 2071.4372 24.985772 6473.369

##Conclusion
Missing values were removed/cleaned out from the data. Data was measured with time series on a weekly interval. SES, ARIMA, and ETS modelling were used and compared between ATM1, ATM2, and ATM4. ARIMA had the best results b/c of the best RMSE, AIC, and BIC values. The values were forecasted for several weeks - there was 80% and 95% confidence interval for lower and upper values as displayed in excel exports.

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

Data Preparation and Cleaning

##   CaseSequence     YYYY-MMM              KWH          
##  Min.   :733.0   Length:192         Min.   :  770523  
##  1st Qu.:780.8   Class :character   1st Qu.: 5429912  
##  Median :828.5   Mode  :character   Median : 6283324  
##  Mean   :828.5                      Mean   : 6502475  
##  3rd Qu.:876.2                      3rd Qu.: 7620524  
##  Max.   :924.0                      Max.   :10655730  
##                                     NA's   :1
## CaseSequence     YYYY-MMM          KWH 
##            0            0            1
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1998  6862583  5838198  5420658  5010364  4665377  6467147  8914755  8607428
## 1999  7183759  5759262  4847656  5306592  4426794  5500901  7444416  7564391
## 2000  7068296  5876083  4807961  4873080  5050891  7092865  6862662  7517830
## 2001  7538529  6602448  5779180  4835210  4787904  6283324  7855129  8450717
## 2002  7099063  6413429  5839514  5371604  5439166  5850383  7039702  8058748
## 2003  7256079  6190517  6120626  4885643  5296096  6051571  6900676  8476499
## 2004  7584596  6560742  6526586  4831688  4878262  6421614  7307931  7309774
## 2005  8225477  6564338  5581725  5563071  4453983  5900212  8337998  7786659
## 2006  7793358  5914945  5819734  5255988  4740588  7052275  7945564  8241110
## 2007  8031295  7928337  6443170  4841979  4862847  5022647  6426220  7447146
## 2008  7964293  7597060  6085644  5352359  4608528  6548439  7643987  8037137
## 2009  8072330  6976800  5691452  5531616  5264439  5804433  7713260  8350517
## 2010  9397357  8390677  7347915  5776131  4919289  6696292   770523  7922701
## 2011  8394747  8898062  6356903  5685227  5506308  8037779 10093343 10308076
## 2012  8991267  7952204  6356961  5569828  5783598  7926956  8886851  9612423
## 2013 10655730  7681798  6517514  6105359  5940475  7920627  8415321  9080226
##           Sep      Oct      Nov      Dec
## 1998  6989888  6345620  4640410  4693479
## 1999  7899368  5358314  4436269  4419229
## 2000  8912169  5844352  5041769  6220334
## 2001  7112069  5242535  4461979  5240995
## 2002  8245227  5865014  4908979  5779958
## 2003  7791791  5344613  4913707  5756193
## 2004  6690366  5444948  4824940  5791208
## 2005  7057213  6694523  4313019  6181548
## 2006  7296355  5104799  4458429  6226214
## 2007  7666970  5785964  4907057  6047292
## 2008  6283324  5101803  4555602  6442746
## 2009  7583146  5566075  5339890  7089880
## 2010  7819472  5875917  4800733  6152583
## 2011  8943599  5603920  6154138  8273142
## 2012  7559148  5576852  5731899  6609694
## 2013  7968220  5759367  5769083  9606304

## [1] "Differencing: 1"
## [1] "How much more differencing is needed? 0"

## [1] "Differencing : 0"

Modelling and Predictions

ARIMA

Lower and upper values had confidence intervals of 80% and 95%.

## Series: power_diff 
## ARIMA(0,0,1)(0,0,2)[12] with non-zero mean 
## 
## Coefficients:
##          ma1     sma1    sma2    mean
##       0.1195  -0.9264  0.0890  0.0616
## s.e.  0.0712   0.0810  0.0811  0.0198
## 
## sigma^2 = 0.9557:  log likelihood = -257.11
## AIC=524.23   AICc=524.57   BIC=540.19
## 
## Training set error measures:
##                       ME      RMSE       MAE       MPE     MAPE      MASE
## Training set -0.02554754 0.9666583 0.4541852 -1920.696 2160.894 0.4138519
##                     ACF1
## Training set 0.006469743

Conclusion

The power data was broken up into monthly intervals. Missing values were cleaned up and in place, the median values were used. Using Time Series, applied were BoxCox transformations and ARIMA model to ultimately find the best method. ARIMA model was selected b/c for monthly prediction of power consumption, this model displayed most optimal results with the best RMSE, AIC, and BIC values.