ATM Forecast

Forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast.

##       DATE           ATM                 Cash        
##  Min.   :39934   Length:1474        Min.   :    0.0  
##  1st Qu.:40026   Class :character   1st Qu.:    0.5  
##  Median :40118   Mode  :character   Median :   73.0  
##  Mean   :40118                      Mean   :  155.6  
##  3rd Qu.:40210                      3rd Qu.:  114.0  
##  Max.   :40312                      Max.   :10919.8  
##                                     NA's   :19

We need to get the DATE column to its standard format and convert the data into a tsibble. The missing values will be substituted with the median values for the atms.

atm$DATE = as.Date(atm$DATE, origin = '1900-01-01')

impute.median = function(x) replace(x, is.na(x), median(x, na.rm=TRUE))

atm = atm %>%
  filter(DATE < "2010-05-01") %>%
  group_by(ATM) %>%
  mutate(Cash = impute.median(Cash)
  )

summary(atm)
##       DATE                ATM                 Cash        
##  Min.   :2009-05-03   Length:1452        Min.   :    0.0  
##  1st Qu.:2009-08-01   Class :character   1st Qu.:    0.0  
##  Median :2009-10-31   Mode  :character   Median :   73.0  
##  Mean   :2009-10-31                      Mean   :  155.5  
##  3rd Qu.:2010-01-30                      3rd Qu.:  114.0  
##  Max.   :2010-04-30                      Max.   :10919.8

Now we can plot the monthly time series. As we can see, ATM 1, 2, 3 and 4 have no trend or seasonality associated with the data. ATM 3 has data where no transactions were made before April 2010. ATM 4 shows a spike in February 2010.

## Plot variable not specified, automatically selected `.vars = Cash`
## `mutate_if()` ignored the following grouping variables:

There is no noticeable trend in ATM 1 but there is seasonality. The models SNAIVE, ETS, and ARIMA will be evaluated.

The ARIMA model without transformation performed best according to the RMSE. Lets evaluate the model’s components.

## # A tibble: 8 × 11
##   ATM   .model      .type       ME  RMSE   MAE    MPE  MAPE  MASE RMSSE     ACF1
##   <chr> <chr>       <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 ATM1  multi_tran  Train…  2.78    22.9  15.0  -32.6  50.8 0.843 0.821  0.112  
## 2 ATM1  arima       Train… -0.0792  23.4  14.7 -103.  118.  0.825 0.840 -0.00814
## 3 ATM1  add         Train… -0.174   23.8  15.0 -107.  122.  0.847 0.855  0.128  
## 4 ATM1  arima_tran  Train…  1.74    24.5  15.7  -90.9 109.  0.884 0.881  0.00488
## 5 ATM1  add_tran    Train…  1.80    24.6  15.8  -94.9 113.  0.891 0.885  0.129  
## 6 ATM1  multi       Train… -1.16    26.3  16.3 -128.  144.  0.916 0.945  0.0788 
## 7 ATM1  snaive      Train… -0.0730  27.8  17.8  -97.1 117.  1     1      0.159  
## 8 ATM1  snaive_tran Train… -0.0730  27.8  17.8  -97.1 117.  1     1      0.159

The model that will be used to forecast ATM1 in May 2010 is ARIMA(0,0,2)(0,1,2)[7]

## Series: Cash 
## Model: ARIMA(0,0,1)(0,1,2)[7] 
## 
## Coefficients:
##          ma1     sma1     sma2
##       0.1698  -0.5863  -0.0994
## s.e.  0.0554   0.0510   0.0500
## 
## sigma^2 estimated as 562.8:  log likelihood=-1632.92
## AIC=3273.84   AICc=3273.95   BIC=3289.34
## `mutate_if()` ignored the following grouping variables:
## • Column `ATM`

## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 arima     16.4     0.175

There is no noticeable trend in ATM 2 but there is seasonality. The models SNAIVE, ETS, and ARIMA will be evaluated.

The ARIMA model without transformation performed best according to the RMSE. Lets evaluate the model’s components.

## # A tibble: 8 × 11
##   ATM   .model      .type        ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <chr> <chr>       <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 ATM2  arima       Traini… -1.00    24.1  17.0 -Inf   Inf  0.820 0.801 -0.00230
## 2 ATM2  arima_tran  Traini…  0.183   24.4  17.1 -Inf   Inf  0.825 0.812 -0.0106 
## 3 ATM2  add         Traini… -0.798   25.0  17.7 -Inf   Inf  0.854 0.831  0.0161 
## 4 ATM2  add_tran    Traini…  0.532   25.4  17.8 -Inf   Inf  0.858 0.843  0.0170 
## 5 ATM2  snaive      Traini… -0.0674  30.1  20.7 -Inf   Inf  1     1      0.0476 
## 6 ATM2  snaive_tran Traini… -0.0674  30.1  20.7 -Inf   Inf  1     1      0.0476 
## 7 ATM2  multi_tran  Traini…  5.38    33.5  26.7 -Inf   Inf  1.29  1.11   0.00508
## 8 ATM2  multi       Traini…  0.972   34.8  26.9 -278.  307. 1.30  1.16   0.0205

The model that will be used to forecast ATM2 in May 2010 is ARIMA(2,0,2)(0,1,1)[7]

## Series: Cash 
## Model: ARIMA(2,0,2)(0,1,1)[7] 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2     sma1
##       -0.4292  -0.9169  0.4699  0.8071  -0.7566
## s.e.   0.0515   0.0404  0.0810  0.0548   0.0384
## 
## sigma^2 estimated as 600.4:  log likelihood=-1643.82
## AIC=3299.64   AICc=3299.88   BIC=3322.89
## `mutate_if()` ignored the following grouping variables:
## • Column `ATM`

## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 arima     9.21     0.685

ATM 3 has many 0 values, therefore noticing a trend or seasonality is non-existent. This ATM can be considered out of service until April 10. A model cannot predict on the limited data.

There is no noticeable trend in ATM 4 but there is seasonality. The models SNAIVE, ETS, and ARIMA will be evaluated. There is a noticeable spike in the data in February 2010.

The additive model without transformation performed best according to the RMSE. Lets evaluate the model’s components.

## # A tibble: 8 × 11
##   ATM   .model      .type        ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <chr> <chr>       <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 ATM4  add         Trai… -5.18e- 1  636.  301. -579.  608. 0.749 0.708 -0.00412
## 2 ATM4  arima       Trai… -1.41e-13  651.  324. -619.  649. 0.806 0.725 -0.00970
## 3 ATM4  multi_tran  Trai…  1.88e+ 2  670.  312. -266.  330. 0.775 0.746 -0.00315
## 4 ATM4  add_tran    Trai…  1.93e+ 2  671.  313. -261.  327. 0.778 0.747 -0.00332
## 5 ATM4  arima_tran  Trai…  2.24e+ 2  685.  321. -222.  289. 0.799 0.762 -0.00636
## 6 ATM4  multi       Trai… -8.12e+ 1  850.  377. -748.  781. 0.938 0.946 -0.00884
## 7 ATM4  snaive      Trai… -3.65e+ 0  898.  402. -395.  450. 1     1     -0.0149 
## 8 ATM4  snaive_tran Trai… -3.65e+ 0  898.  402. -395.  450. 1     1     -0.0149

The model that will be used to forecast ATM1 in May 2010 is ETS(A,N,A).

## Series: Cash 
## Model: ETS(A,N,A) 
##   Smoothing parameters:
##     alpha = 0.0001000812 
##     gamma = 0.0001000586 
## 
##   Initial states:
##      l[0]      s[0]     s[-1]    s[-2]    s[-3]    s[-4]    s[-5]   s[-6]
##  475.0999 -298.6338 -59.56414 179.8411 5.029765 60.52828 19.86394 92.9349
## 
##   sigma^2:  415042.5
## 
##      AIC     AICc      BIC 
## 6846.372 6846.997 6885.316
## `mutate_if()` ignored the following grouping variables:
## • Column `ATM`

## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 arima     4.48     0.973

The ATM forecast will be exported to an Excel file. Code was in reference to https://stackoverflow.com/questions/34659586/combining-forecasts-into-a-data-frame-in-r-and-then-exporting-into-excel

Forecasting Power

A simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward.

##   CaseSequence     YYYY-MMM              KWH          
##  Min.   :733.0   Length:192         Min.   :  770523  
##  1st Qu.:780.8   Class :character   1st Qu.: 5429912  
##  Median :828.5   Mode  :character   Median : 6283324  
##  Mean   :828.5                      Mean   : 6502475  
##  3rd Qu.:876.2                      3rd Qu.: 7620524  
##  Max.   :924.0                      Max.   :10655730  
##                                     NA's   :1

Since there is a missing value, we will replace it with the median number. Then we need to convert the data to a time series with a monthly time interval.

With the data in a time series, we can plot the data.

In the plot, we can see seasonality within the data but no trend in the data. We can visualize this using STL decomposition, season plot and sub series plot.

Lets determine how much differencing is needed to obtain stationary time series. We require a seasonal difference to obtain a stationary time series.

## [1] 1

Now the model is stationary and can begin modeling.

It seems the best model would be ARIMA(1,0,0)(0,1,1)[12] with drift

## Series: rcf 
## ARIMA(1,0,0)(0,1,1)[12] with drift 
## Box Cox transformation: lambda= -0.1484337 
## 
## Coefficients:
##          ar1     sma1  drift
##       0.3001  -0.7364  1e-04
## s.e.  0.0722   0.0693  1e-04
## 
## sigma^2 = 7.855e-05:  log likelihood = 594.93
## AIC=-1181.87   AICc=-1181.64   BIC=-1169.09
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 22217.68 588817.2 451817.4 -0.220072 6.902593 0.7424511 0.05056403

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(0,1,1)[12] with drift
## Q* = 26.668, df = 22, p-value = 0.2241
## 
## Model df: 2.   Total lags used: 24

Lets see if changing components will improve the model. I will begin with ARIMA(1,0,1)(0,1,1)[12]

## Series: rcf 
## ARIMA(1,0,1)(0,1,1)[12] 
## Box Cox transformation: lambda= -0.1484337 
## 
## Coefficients:
##          ar1      ma1     sma1
##       0.8444  -0.5949  -0.6886
## s.e.  0.1288   0.2018   0.0644
## 
## sigma^2 = 8.251e-05:  log likelihood = 591.2
## AIC=-1174.39   AICc=-1174.16   BIC=-1161.62
## 
## Training set error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
## Training set 114904.1 604994.1 468753.6 1.207013 7.111408 0.7702816 0.08236063

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,1)(0,1,1)[12]
## Q* = 26.429, df = 21, p-value = 0.1906
## 
## Model df: 3.   Total lags used: 24

Though the auto arima model suggested ARIMA(1,0,0)(0,1,1)[12] with drift, I will use ARIMA(1,0,1)(0,1,1)[12] due to the AICc being minimal.

fc_rcf = Arima(rcf, order=c(1,0,1), seasonal=c(0,1,1), lambda=BoxCox.lambda(rcf))
pred_rcf = forecast(fc_rcf,h=12,level=95)
autoplot(pred_rcf)

Lastly, the forecast will be exported to Excel.