require(tidyverse)
require(readxl)
require(fpp2)
require(plotly)
require(kableExtra)
require(DataExplorer)
require(GGally)
require(ggplot2)
require(ggpubr)
require(smooth)

Part A – ATM Forecast, ATM624Data.xlsx

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

Quick look at the Data

atm_data<-readxl::read_excel("ATM624Data.xlsx")

glimpse(atm_data)

## Rows: 1,474
## Columns: 3
## $ DATE <dbl> 39934, 39934, 39935, 39935, 39936, 39936, 39937, 39937, 39938, 39~
## $ ATM  <chr> "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "~
## $ Cash <dbl> 96, 107, 82, 89, 85, 90, 90, 55, 99, 79, 88, 19, 8, 2, 104, 103, ~

There are 1,474 rows and 3 columns.

ggpairs(atm_data)

Data Preparation

Missing Data

plot_missing(atm_data)

By looking at this you can see that ATM and cash have missing values, the missing values can be removed with using na.omit another option would have been imputation but there are not that many missing numbers.

atm_data<-na.omit(atm_data)

Convert the Date

atm_data$DATE<-as.Date(atm_data$DATE, origin = "1899-12-30")

summary(atm_data)

##       DATE                ATM                 Cash        
##  Min.   :2009-05-01   Length:1455        Min.   :    0.0  
##  1st Qu.:2009-08-01   Class :character   1st Qu.:    0.5  
##  Median :2009-10-31   Mode  :character   Median :   73.0  
##  Mean   :2009-10-30                      Mean   :  155.6  
##  3rd Qu.:2010-01-29                      3rd Qu.:  114.0  
##  Max.   :2010-04-30                      Max.   :10919.8

Split the Data

ATM1<-atm_data %>% filter(ATM=="ATM1")
ATM2<-atm_data %>% filter(ATM=="ATM2")
ATM3<-atm_data %>% filter(ATM=="ATM3")
ATM4<-atm_data %>% filter(ATM=="ATM4")

# specify the graphs to be combined
a<-ggplot(ATM1, aes(x = DATE, y = Cash, col = ATM)) + 
    geom_line(stat="identity") +
    ggtitle("ATM1")

b<-ggplot(ATM2, aes(x = DATE, y = Cash, col = ATM)) + 
    geom_line(stat="identity")+
    ggtitle("ATM2")

c<-ggplot(ATM3, aes(x = DATE, y = Cash, col = ATM)) + 
    geom_line(stat="identity")+
    ggtitle("ATM3")

d<-ggplot(ATM4, aes(x = DATE, y = Cash, col = ATM)) + 
    geom_line(stat="identity")+
    ggtitle("ATM4")

figure <- ggarrange(a, b, c, d,
                    ncol = 1, nrow = 4)

figure

Looking at this you can see that ATM1 and ATM2 almost look the same, while ATM3 seems to have been offline until sometime after April 2010. ATM4 seems to have only one big spike, but it still seems to be a seasonal plot.

Convert to Time Series

The data is being converted to a time series with a frequency of 7.

ATM1_ts<-ts(ATM1$Cash, frequency = 7)
ATM2_ts<-ts(ATM2$Cash, frequency = 7)
ATM3_ts<-ts(ATM3$Cash, frequency = 7)
ATM4_ts<-ts(ATM4$Cash, frequency = 7)

Models

Now we are going to render different models for ATM1,ATM2, ATM3, and ATM4. For ATM1,ATM2, and ATM4 we will look at the Simple Exponential Smoothing, HOLT method and ARIMA to see which one gives us the best outcome as far as the AIC and the RMSE. For ATM3 we will use some simple time series forecasting methods, like the mean method, the naive method and the snaive method. We will choose the model with the best RMSE.

ATM1

autoplot(ATM1_ts)

ggseasonplot(ATM1_ts)

ggtsdisplay(ATM1_ts)

Simple Exponential Smoothing

sesatm1<-ses(ATM1_ts)
autoplot(sesatm1)+
  autolayer(fitted(sesatm1), series='Fitted')

checkresiduals(sesatm1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 351.1, df = 12, p-value < 2.2e-16
## 
## Model df: 2.   Total lags used: 14

Looking at the residuals it seems like it is somewahty skewd.

summary(sesatm1)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = ATM1_ts) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
## 
##   Initial states:
##     l = 83.7797 
## 
##   sigma:  36.7094
## 
##      AIC     AICc      BIC 
## 4745.365 4745.432 4757.040 
## 
## Error measures:
##                      ME     RMSE      MAE     MPE     MAPE     MASE       ACF1
## Training set 0.07802025 36.60783 27.31876 -175.78 199.2779 1.451603 0.06365945
## 
## Forecasts:
##          Point Forecast   Lo 80    Hi 80    Lo 95    Hi 95
## 52.71429       83.78256 36.7376 130.8275 11.83350 155.7316
## 52.85714       83.78256 36.7376 130.8275 11.83350 155.7316
## 53.00000       83.78256 36.7376 130.8275 11.83350 155.7316
## 53.14286       83.78256 36.7376 130.8275 11.83350 155.7316
## 53.28571       83.78256 36.7376 130.8275 11.83350 155.7316
## 53.42857       83.78256 36.7376 130.8275 11.83349 155.7316
## 53.57143       83.78256 36.7376 130.8275 11.83349 155.7316
## 53.71429       83.78256 36.7376 130.8275 11.83349 155.7316
## 53.85714       83.78256 36.7376 130.8275 11.83349 155.7316
## 54.00000       83.78256 36.7376 130.8275 11.83349 155.7316

Looking at this we can see that the AIC is 4745.365, the BIC is 4757.040 and the RMSE is 36.60783.

HOLT

holtatm1 <- hw(ATM1_ts)
autoplot(holtatm1)

checkresiduals(holtatm1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 16.871, df = 3, p-value = 0.0007513
## 
## Model df: 11.   Total lags used: 14

The residuals look almost normally distributed

summary(holtatm1)

## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = ATM1_ts) 
## 
##   Smoothing parameters:
##     alpha = 0.0041 
##     beta  = 1e-04 
##     gamma = 0.4349 
## 
##   Initial states:
##     l = 98.7123 
##     b = -0.0333 
##     s = 9.0364 12.5733 14.6333 -42.341 -0.675 6.2016
##            0.5715
## 
##   sigma:  27.1149
## 
##      AIC     AICc      BIC 
## 4534.864 4535.758 4581.564 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE      ACF1
## Training set -0.3248614 26.69971 17.28292 -113.4744 129.995 0.9183413 0.1421671
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## 52.71429      86.159205  51.41012 120.90829  33.01507 139.30334
## 52.85714      99.761831  65.01244 134.51122  46.61722 152.90644
## 53.00000      74.085952  39.33624 108.83566  20.94085 127.23105
## 53.14286       3.804039 -30.94601  38.55409 -49.34157  56.94965
## 53.28571      99.124747  64.37435 133.87515  45.97860 152.27090
## 53.42857      78.469711  43.71895 113.22048  25.32300 131.61642
## 53.57143      84.554925  49.80378 119.30607  31.40763 137.70222
## 53.71429      85.843459  47.88104 123.80588  27.78495 143.90197
## 53.85714      99.446085  61.48328 137.40889  41.38699 157.50518
## 54.00000      73.770206  35.80701 111.73340  15.71050 131.82991
## 54.14286       3.488293 -34.47532  41.45191 -54.57204  61.54863
## 54.28571      98.809002  60.84496 136.77304  40.74801 156.86999
## 54.42857      78.153965  40.18948 116.11845  20.09229 136.21564
## 54.57143      84.239179  46.27423 122.20413  26.17680 142.30156

Looking at this we can see that the AIC is 4534.864, the BIC is 4581.564 and the RMSE is 26.69971. Which is better than the simple exponential smoothing.

ARIMA

ATM1arima<-auto.arima(ATM1_ts)
autoplot(forecast(ATM1arima))

checkresiduals(ATM1arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(0,1,1)[7]
## Q* = 11.267, df = 12, p-value = 0.5062
## 
## Model df: 2.   Total lags used: 14

The residuals look almost normally distributed

summary(ATM1arima)

## Series: ATM1_ts 
## ARIMA(0,0,1)(0,1,1)[7] 
## 
## Coefficients:
##          ma1     sma1
##       0.1674  -0.5813
## s.e.  0.0565   0.0472
## 
## sigma^2 estimated as 666.6:  log likelihood=-1658.32
## AIC=3322.63   AICc=3322.7   BIC=3334.25
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.09700089 25.49555 16.17552 -106.2792 122.4165 0.8594986
##                     ACF1
## Training set -0.01116544

Looking at this we can see that the AIC is 3322.63, the BIC is 3334.25 and the RMSE is 25.49555. Arima seems to be the best model because the AIC, BIC and the RMSE is the least compared to the other 3 models.

ATM 2

autoplot(ATM2_ts)

ggseasonplot(ATM2_ts)

ggtsdisplay(ATM2_ts)

sesatm2<-ses(ATM2_ts)
autoplot(sesatm1)+
  autolayer(fitted(sesatm1), series='Fitted')

checkresiduals(sesatm2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 515.46, df = 12, p-value < 2.2e-16
## 
## Model df: 2.   Total lags used: 14

summary(sesatm2)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = ATM2_ts) 
## 
##   Smoothing parameters:
##     alpha = 0.016 
## 
##   Initial states:
##     l = 72.14 
## 
##   sigma:  38.681
## 
##      AIC     AICc      BIC 
## 4797.445 4797.512 4809.128 
## 
## Error measures:
##                     ME     RMSE      MAE  MPE MAPE     MASE        ACF1
## Training set -2.563596 38.57427 33.09136 -Inf  Inf 1.539938 -0.02208177
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 52.85714       57.26713 7.695463 106.8388 -18.54619 133.0804
## 53.00000       57.26713 7.689132 106.8451 -18.55587 133.0901
## 53.14286       57.26713 7.682802 106.8514 -18.56555 133.0998
## 53.28571       57.26713 7.676473 106.8578 -18.57523 133.1095
## 53.42857       57.26713 7.670145 106.8641 -18.58491 133.1192
## 53.57143       57.26713 7.663817 106.8704 -18.59459 133.1288
## 53.71429       57.26713 7.657491 106.8768 -18.60426 133.1385
## 53.85714       57.26713 7.651165 106.8831 -18.61394 133.1482
## 54.00000       57.26713 7.644840 106.8894 -18.62361 133.1579
## 54.14286       57.26713 7.638515 106.8957 -18.63328 133.1675

Looking at this we can see that the AIC is 4797.445, the BIC is 4809.128 and the RMSE is 38.57427.

HOLT

holtatm2 <- hw(ATM2_ts)
autoplot(holtatm2)

checkresiduals(holtatm2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 36.563, df = 3, p-value = 5.693e-08
## 
## Model df: 11.   Total lags used: 14

summary(holtatm2)

## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = ATM2_ts) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
##     gamma = 0.4749 
## 
##   Initial states:
##     l = 86.8442 
##     b = -0.0568 
##     s = 2.7954 18.428 -25.6811 -11.8901 10.6037 0.3422
##            5.4019
## 
##   sigma:  28.4654
## 
##      AIC     AICc      BIC 
## 4583.646 4584.537 4630.379 
## 
## Error measures:
##                        ME     RMSE      MAE  MPE MAPE      MASE         ACF1
## Training set -0.004498074 28.03076 19.38286 -Inf  Inf 0.9019996 -0.007054699
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80      Lo 95     Hi 95
## 52.85714     68.0961667  31.61633 104.57600  12.305071 123.88726
## 53.00000     74.5158256  38.03599 110.99566  18.724728 130.30692
## 53.14286     10.1858821 -26.29396  46.66572 -45.605218  65.97698
## 53.28571      0.7160432 -35.76380  37.19589 -55.075061  56.50715
## 53.42857    102.0247290  65.54488 138.50458  46.233618 157.81584
## 53.57143     92.4676151  55.98776 128.94747  36.676494 148.25874
## 53.71429     71.7327034  35.25284 108.21257  15.941569 127.52384
## 53.85714     67.6974509  27.30091 108.09399   5.916269 129.47863
## 54.00000     74.1171098  33.72055 114.51367  12.335907 135.89831
## 54.14286      9.7871663 -30.60941  50.18374 -51.994061  71.56839
## 54.28571      0.3173274 -40.07927  40.71392 -61.463931  62.09859
## 54.42857    101.6260132  61.22940 142.02263  39.844719 163.40731
## 54.57143     92.0688993  51.67225 132.46554  30.287562 153.85024
## 54.71429     71.3339876  30.93731 111.73066   9.552601 133.11537

Looking at this we can see that the AIC is 4583.646, the BIC is 4630.379 and the RMSE is 28.03076.

ARIMA

ATM2arima<-auto.arima(ATM2_ts)
autoplot(forecast(ATM2arima))

checkresiduals(ATM2arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,2)(0,1,2)[7]
## Q* = 12.12, df = 8, p-value = 0.1459
## 
## Model df: 6.   Total lags used: 14

summary(ATM2arima)

## Series: ATM2_ts 
## ARIMA(2,0,2)(0,1,2)[7] 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2     sma1    sma2
##       -0.4148  -0.8922  0.4008  0.7534  -0.7220  0.0801
## s.e.   0.0674   0.0510  0.1027  0.0688   0.0567  0.0550
## 
## sigma^2 estimated as 708.1:  log likelihood=-1671.98
## AIC=3357.96   AICc=3358.28   BIC=3385.08
## 
## Training set error measures:
##                      ME     RMSE      MAE  MPE MAPE      MASE        ACF1
## Training set -0.4583759 26.12942 18.09675 -Inf  Inf 0.8421495 0.009277035

Looking at this we can see that the AIC is 3357.96, the BIC is 3385.08 and the RMSE is 26.12942. The ARIMA model seems to be the best model for ATM2.

ATM3

autoplot(ATM3_ts)

tail(ATM3)

## # A tibble: 6 x 3
##   DATE       ATM    Cash
##   <date>     <chr> <dbl>
## 1 2010-04-25 ATM3      0
## 2 2010-04-26 ATM3      0
## 3 2010-04-27 ATM3      0
## 4 2010-04-28 ATM3     96
## 5 2010-04-29 ATM3     82
## 6 2010-04-30 ATM3     85

Since it seems like there were only three transactions for ATM3 we are going to use simple time series forcasting methods such as The mean method, The Naïve method, The seasonal naïve method, and The simple moving average method.

The Mean Method

ATM3_mean<-meanf(ATM3_ts,h=7)
autoplot(ATM3_mean)

checkresiduals(ATM3_mean)

## 
##  Ljung-Box test
## 
## data:  Residuals from Mean
## Q* = 196.67, df = 13, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 14

summary(ATM3_mean)

## 
## Forecast method: Mean
## 
## Model Information:
## $mu
## [1] 0.7205479
## 
## $mu.se
## [1] 0.4158487
## 
## $sd
## [1] 7.944778
## 
## $bootstrap
## [1] FALSE
## 
## $call
## meanf(y = ATM3_ts, h = 7)
## 
## attr(,"class")
## [1] "meanf"
## 
## Error measures:
##                         ME     RMSE      MAE  MPE MAPE     MASE      ACF1
## Training set -5.634762e-17 7.933887 1.429251 -Inf  Inf 1.945521 0.6403876
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 53.14286      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.28571      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.42857      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.57143      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.71429      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.85714      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 54.00000      0.7205479 -9.49357 10.93467 -14.92427 16.36536

The RMSE of this model is 7.933887.

Naive Method

ATM3_naive<-naive(ATM3_ts,h=7)
autoplot(ATM3_naive)

checkresiduals(ATM3_naive)

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 8.4931, df = 14, p-value = 0.8621
## 
## Model df: 0.   Total lags used: 14

summary(ATM3_naive)

## 
## Forecast method: Naive method
## 
## Model Information:
## Call: naive(y = ATM3_ts, h = 7) 
## 
## Residual sd: 5.0874 
## 
## Error measures:
##                     ME     RMSE       MAE      MPE     MAPE      MASE
## Training set 0.2335165 5.087423 0.3104396 28.81875 40.20086 0.4225755
##                    ACF1
## Training set -0.1494714
## 
## Forecasts:
##          Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 53.14286             85 78.48021  91.51979 75.02884  94.97116
## 53.28571             85 75.77962  94.22038 70.89864  99.10136
## 53.42857             85 73.70738  96.29262 67.72944 102.27056
## 53.57143             85 71.96041  98.03959 65.05767 104.94233
## 53.71429             85 70.42130  99.57870 62.70380 107.29620
## 53.85714             85 69.02983 100.97017 60.57573 109.42427
## 54.00000             85 67.75025 102.24975 58.61878 111.38122

The RMSE model is 5.087423.

ATM3_snaive<-snaive(ATM3_ts,h=7)
autoplot(ATM3_snaive)

checkresiduals(ATM3_snaive)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 192.93, df = 14, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 14

summary(ATM3_snaive)

## 
## Forecast method: Seasonal naive method
## 
## Model Information:
## Call: snaive(y = ATM3_ts, h = 7) 
## 
## Residual sd: 8.044 
## 
## Error measures:
##                     ME     RMSE       MAE MPE MAPE MASE      ACF1
## Training set 0.7346369 8.044048 0.7346369 100  100    1 0.6403809
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## 53.14286              0 -10.30886  10.30886 -15.76604  15.76604
## 53.28571              0 -10.30886  10.30886 -15.76604  15.76604
## 53.42857              0 -10.30886  10.30886 -15.76604  15.76604
## 53.57143              0 -10.30886  10.30886 -15.76604  15.76604
## 53.71429             96  85.69114 106.30886  80.23396 111.76604
## 53.85714             82  71.69114  92.30886  66.23396  97.76604
## 54.00000             85  74.69114  95.30886  69.23396 100.76604

The RMSE for this model is 8.044048. The best model for ATM3 seems to be the model with the naive method with the RMSE of 5.087423.

ATM 4

autoplot(ATM4_ts)

ggseasonplot(ATM4_ts)

ggtsdisplay(ATM4_ts)

Simple Exponential Smoothing

sesatm4<-ses(ATM4_ts)
autoplot(sesatm4)+
  autolayer(fitted(sesatm4), series='Fitted')

checkresiduals(sesatm4)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 4.6668, df = 12, p-value = 0.9682
## 
## Model df: 2.   Total lags used: 14

Looking at the residuals it seems to be skewed.

summary(sesatm4)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = ATM4_ts) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
## 
##   Initial states:
##     l = 480.6291 
## 
##   sigma:  651.8968
## 
##      AIC     AICc      BIC 
## 6887.774 6887.840 6899.474 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -6.328783 650.1083 324.4922 -626.1841 655.7821 0.8074068
##                      ACF1
## Training set -0.009353838
## 
## Forecasts:
##          Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 53.14286       480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.28571       480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.42857       480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.57143       480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.71429       480.3981 -355.0412 1315.837 -797.2961 1758.092
## 53.85714       480.3981 -355.0412 1315.837 -797.2961 1758.092
## 54.00000       480.3981 -355.0413 1315.837 -797.2961 1758.092
## 54.14286       480.3981 -355.0413 1315.837 -797.2961 1758.092
## 54.28571       480.3981 -355.0413 1315.837 -797.2961 1758.092
## 54.42857       480.3981 -355.0413 1315.837 -797.2962 1758.092

Looking at this we can see that the AIC is 6887.774, the BIC is 6899.474 and the RMSE is 650.1083.

HOLT

holtatm4 <- hw(ATM4_ts)
autoplot(holtatm4)

checkresiduals(holtatm4)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 3.7364, df = 3, p-value = 0.2914
## 
## Model df: 11.   Total lags used: 14

The residuals seem to be normally distributed.

summary(holtatm4)

## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = ATM4_ts) 
## 
##   Smoothing parameters:
##     alpha = 0.0131 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 602.1216 
##     b = 0.1908 
##     s = -301.5132 -59.4532 174.6504 11.3506 61.4457 25.7597
##            87.76
## 
##   sigma:  650.5207
## 
##      AIC     AICc      BIC 
## 6895.068 6895.954 6941.866 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -7.026146 640.6433 301.9819 -590.6452 619.007 0.7513964
##                     ACF1
## Training set 0.002054593
## 
## Forecasts:
##          Point Forecast     Lo 80    Hi 80      Lo 95    Hi 95
## 53.14286       502.3071 -331.3687 1335.983  -772.6900 1777.304
## 53.28571       537.9914 -295.7575 1371.740  -737.1174 1813.100
## 53.42857       487.7923 -346.0308 1321.615  -787.4300 1763.014
## 53.57143       651.0432 -182.8551 1484.941  -624.2942 1926.381
## 53.71429       416.8786 -417.0961 1250.853  -858.5756 1692.333
## 53.85714       174.7465 -659.3056 1008.799 -1100.8262 1450.319
## 54.00000       564.0173 -270.1135 1398.148  -711.6757 1839.710
## 54.14286       501.8473 -332.3644 1336.059  -773.9694 1777.664
## 54.28571       537.5316 -296.7610 1371.824  -738.4089 1813.472
## 54.42857       487.3325 -347.0423 1321.707  -788.7336 1763.399
## 54.57143       650.5834 -183.8746 1485.041  -625.6100 1926.777
## 54.71429       416.4188 -418.1236 1250.961  -859.9037 1692.741
## 54.85714       174.2868 -660.3413 1008.915 -1102.1667 1450.740
## 55.00000       563.5575 -271.1573 1398.272  -713.0287 1840.144

Looking at this we can see that the AIC is 6895.068, the BIC is 6941.866 and the RMSE is 640.6433.

ARIMA

ATM4arima<-auto.arima(ATM4_ts,lambda = "auto")
autoplot(forecast(ATM4arima))

ARIMA with Lambda is used for the ARIMA of ATM4

checkresiduals(ATM4arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Q* = 17.625, df = 11, p-value = 0.09069
## 
## Model df: 3.   Total lags used: 14

The residuals seem to be skewed.

summary(ATM4arima)

## Series: ATM4_ts 
## ARIMA(0,0,0)(2,0,0)[7] with non-zero mean 
## Box Cox transformation: lambda= -0.0737252 
## 
## Coefficients:
##         sar1    sar2    mean
##       0.2487  0.1947  4.4864
## s.e.  0.0521  0.0525  0.0841
## 
## sigma^2 estimated as 0.8418:  log likelihood=-485.59
## AIC=979.18   AICc=979.29   BIC=994.78
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 209.7821 678.9957 317.1943 -239.7645 304.0411 0.7892481
##                      ACF1
## Training set -0.005100991

Looking at this we can see that the AIC is 979.18, the BIC is 994.78 and the RMSE is 678.9957.The Arima model seems to be the best model for the ATM4 model.

## ATM1
ATM1ac <- rbind(accuracy(sesatm1), accuracy(holtatm1), accuracy(ATM1arima))
row.names(ATM1ac) <- c("Smooth Exponential Smoothing", "Holt", "ARIMA")

## ATM2
ATM2ac <- rbind(accuracy(sesatm2), accuracy(holtatm2), accuracy(ATM2arima))
row.names(ATM2ac) <- c("Smooth Exponential Smoothing", "Holt", "ARIMA")

## ATM4
ATM4ac <- rbind(accuracy(sesatm4), accuracy(holtatm4), accuracy(ATM4arima))
row.names(ATM4ac) <- c("Smooth Exponential Smoothing", "Holt", "ARIMA")

ATM 1 Accuracy

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Smooth Exponential Smoothing	0.0780203	36.60784	27.31876	-175.7800	199.2779	1.4516029	0.0636595
Holt	-0.3248614	26.69971	17.28292	-113.4744	129.9950	0.9183413	0.1421671
ARIMA	-0.0970009	25.49555	16.17552	-106.2792	122.4165	0.8594986	-0.0111654

ATM 2 Accuracy

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Smooth Exponential Smoothing	-2.5635958	38.57427	33.09136	-Inf	Inf	1.5399380	-0.0220818
Holt	-0.0044981	28.03076	19.38286	-Inf	Inf	0.9019996	-0.0070547
ARIMA	-0.4583759	26.12942	18.09675	-Inf	Inf	0.8421495	0.0092770

ATM 4 Accuracy

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Smooth Exponential Smoothing	-6.328783	650.1083	324.4922	-626.1841	655.7821	0.8074068	-0.0093538
Holt	-7.026146	640.6433	301.9819	-590.6452	619.0070	0.7513964	0.0020546
ARIMA	209.782088	678.9957	317.1943	-239.7645	304.0411	0.7892481	-0.0051010

Forecast

Best Models for the Forecast

ATM1 - ARIMA(0,0,1)(0,1,1)[7] - AIC: 3322.63 RMSE: 25.49555

ATM1 - ARIMA(2,0,2)(0,1,2)[7] - AIC: 3357.96 RMSE: 26.12942

ATM3 - Naive Model - RMSE: 5.087423

ATM4 - ARIMA(0,0,0)(2,0,0)[7] - AIC: 979.18 RMSE: 678.9957

Forecast of ATM1

autoplot(forecastATM1)

Forecast of ATM2

autoplot(forecastATM2)

Forecast of ATM3

autoplot(forecastATM3)

Forecast of ATM4

autoplot(forecastATM4)

Summary of ATM Data

Forecast of of ATM1, ATM2, and ATM4 as said before all seem to have seasonality. Three different models were done for ATM1, ATM2, and ATM3 which were the Smooth Exponential Smoothing, Holt, and the ARIMA. The best models for all three of them, was the ARIMA model. These were the results ATM1 - ARIMA(0,0,1)(0,1,1)[7], AIC: 3322.63, RMSE: 25.49555, ATM2 - ARIMA(2,0,2)(0,1,2)[7], AIC: 3357.96, RMSE: 26.12942, ATM4 - ARIMA(0,0,0)(2,0,0)[7], AIC: 979.18, RMSE: 678.9957. For ATM3 the Mean method, Naive method, and the snaive method was used and the best turn out was with naive model and the results were ATM3 - Naive Model, RMSE: 5.087423.

sd <- as.Date("2010-05-01")
fd <- sd + 0:30

a1 <- forecastATM1
b1 <- data.frame(DATE=fd, ATM=rep("ATM1", 31), Cash=a1$mean)

a2 <- forecastATM2
b2 <- data.frame(DATE=fd, ATM=rep("ATM2", 31), Cash=a2$mean)

a3 <- forecastATM3
b3 <- data.frame(DATE=fd, ATM=rep("ATM3", 31), Cash=a3$mean)

a4 <- forecastATM4
b4 <- data.frame(DATE=fd, ATM=rep("ATM4", 31), Cash=a4$mean)

df <- bind_rows(list(b1, b2, b3, b4))
write.csv(df, file = "All ATMs Predicted.csv", row.names = FALSE)

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward.

power_data<-readxl::read_excel("ResidentialCustomerForecastLoad-624.xlsx")

Quick look at the Data

summary(power_data)

##   CaseSequence     YYYY-MMM              KWH          
##  Min.   :733.0   Length:192         Min.   :  770523  
##  1st Qu.:780.8   Class :character   1st Qu.: 5429912  
##  Median :828.5   Mode  :character   Median : 6283324  
##  Mean   :828.5                      Mean   : 6502475  
##  3rd Qu.:876.2                      3rd Qu.: 7620524  
##  Max.   :924.0                      Max.   :10655730  
##                                     NA's   :1

glimpse(power_data)

## Rows: 192
## Columns: 3
## $ CaseSequence <dbl> 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 74~
## $ `YYYY-MMM`   <chr> "1998-Jan", "1998-Feb", "1998-Mar", "1998-Apr", "1998-May~
## $ KWH          <dbl> 6862583, 5838198, 5420658, 5010364, 4665377, 6467147, 891~

There are 192 rows and 3 columns.

Data Preparation

Missing Data

plot_missing(power_data)

Only one column has missing data and it seems like only one row is missing data.

power_data<- na.omit(power_data)

To remove the only missing data we use the na.omit. Once the data is removed we can now move on and convert it into a time series.

Convert data to a Time Series

power_datats<-ts(power_data[, "KWH"], start = c(1998, 1), frequency = 12)

Models

For the power dataset we will create 3 different models Smooth Exponential Smoothing, Holt, ARIMA. We will look at the AIC, BIC, and the RMSE and we will pick the model that has the lowest of all three.

autoplot(power_datats)

ggseasonplot(power_datats)

We can see that there is seasonality, and there was a big in June 2010.

ggtsdisplay(power_datats)

you can see here there is a lag of 12.

Simple Exponential Smoothing

sespower<-ses(power_datats)
autoplot(sespower)+
  autolayer(fitted(sespower), series='Fitted')

checkresiduals(sespower)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 581.22, df = 22, p-value < 2.2e-16
## 
## Model df: 2.   Total lags used: 24

Looking at the residuals it seems like it is normally distributed.

summary(sespower)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = power_datats) 
## 
##   Smoothing parameters:
##     alpha = 0.0389 
## 
##   Initial states:
##     l = 6150702.2675 
## 
##   sigma:  1424322
## 
##      AIC     AICc      BIC 
## 6419.811 6419.939 6429.567 
## 
## Error measures:
##                    ME    RMSE     MAE      MPE     MAPE     MASE      ACF1
## Training set 141716.7 1416845 1176581 -5.38936 21.55473 1.581218 0.3671121
## 
## Forecasts:
##          Point Forecast   Lo 80   Hi 80   Lo 95    Hi 95
## Dec 2013        7204795 5379453 9030137 4413175  9996414
## Jan 2014        7204795 5378069 9031520 4411059  9998530
## Feb 2014        7204795 5376687 9032903 4408945 10000645
## Mar 2014        7204795 5375305 9034284 4406832 10002757
## Apr 2014        7204795 5373925 9035665 4404721 10004869
## May 2014        7204795 5372546 9037044 4402611 10006978
## Jun 2014        7204795 5371167 9038422 4400503 10009086
## Jul 2014        7204795 5369790 9039800 4398397 10011193
## Aug 2014        7204795 5368414 9041176 4396292 10013297
## Sep 2014        7204795 5367038 9042551 4394189 10015401

Looking at this we can see that the AIC is 6419.811, the BIC is 6429.567 and the RMSE is 1416845.

HOLT

holtpower <- hw(power_datats)
autoplot(holtpower)

checkresiduals(holtpower)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 35.221, df = 8, p-value = 2.437e-05
## 
## Model df: 16.   Total lags used: 24

The residuals look almost normally distributed

summary(holtpower)

## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = power_datats) 
## 
##   Smoothing parameters:
##     alpha = 0.0141 
##     beta  = 0.0014 
##     gamma = 0.354 
## 
##   Initial states:
##     l = 6365037.9004 
##     b = 5957.8189 
##     s = 290670.2 -935083.3 -900245.6 422442.6 1494020 1462846
##            -89005.15 -844980.1 -1259442 -731886.3 6783.244 1083881
## 
##   sigma:  1040969
## 
##      AIC     AICc      BIC 
## 6313.337 6316.875 6368.626 
## 
## Error measures:
##                    ME     RMSE    MAE       MPE     MAPE      MASE      ACF1
## Training set 57861.12 996414.8 656759 -4.275331 13.55529 0.8826242 0.2362087
## 
## Forecasts:
##          Point Forecast   Lo 80    Hi 80   Lo 95    Hi 95
## Dec 2013        9708882 8374827 11042938 7668621 11749144
## Jan 2014        8527516 7193300  9861732 6487009 10568023
## Feb 2014        7122739 5788332  8457145 5081940  9163537
## Mar 2014        6434973 5100343  7769603 4393833  8476113
## Apr 2014        6212687 4877799  7547576 4171151  8254223
## May 2014        7941345 6606159  9276530 5899355  9983335
## Jun 2014        8345415 7009892  9680938 6302909 10387921
## Jul 2014        9634765 8298861 10970668 7591677 11677853
## Aug 2014        8566996 7230666  9903326 6523256 10610736
## Sep 2014        6395805 5059001  7732610 4351339  8440272
## Oct 2014        6269112 4931782  7606442 4223842  8314382
## Nov 2014        8391313 7053404  9729222 6345157 10437469
## Dec 2014        9966065 8533020 11399111 7774411 12157719
## Jan 2015        8784699 7351005 10218392 6592054 10977343
## Feb 2015        7379922 5945523  8814320 5186199  9573645
## Mar 2015        6692156 5256993  8127318 4497264  8887048
## Apr 2015        6469870 5033881  7905859 4273715  8666025
## May 2015        8198528 6761649  9635406 6001011 10396044
## Jun 2015        8602598 7164763 10040433 6403619 10801577
## Jul 2015        9891947 8453087 11330808 7691400 12092494
## Aug 2015        8824179 7384223 10264135 6621956 11026402
## Sep 2015        6652988 5211863  8094114 4448977  8856999
## Oct 2015        6526295 5083925  7968665 4320380  8732210
## Nov 2015        8648496 7204803 10092188 6440559 10856433

Looking at this we can see that the AIC is 6313.337, the BIC is 6368.626 and the RMSE is 996414.8. Which is better than the simple exponential smoothing.

ARIMA

powerarima<-auto.arima(power_datats)
autoplot(forecast(powerarima))

checkresiduals(powerarima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(0,1,2)[12] with drift
## Q* = 17.494, df = 20, p-value = 0.6207
## 
## Model df: 4.   Total lags used: 24

The residuals look almost normally distributed

summary(powerarima)

## Series: power_datats 
## ARIMA(0,0,1)(0,1,2)[12] with drift 
## 
## Coefficients:
##          ma1     sma1    sma2     drift
##       0.2431  -0.7287  0.1929  8501.749
## s.e.  0.0773   0.0800  0.0861  3639.647
## 
## sigma^2 estimated as 9.396e+11:  log likelihood=-2722.65
## AIC=5455.3   AICc=5455.65   BIC=5471.24
## 
## Training set error measures:
##                     ME     RMSE    MAE       MPE    MAPE      MASE         ACF1
## Training set -10349.99 927823.9 576140 -5.029934 12.4629 0.7742797 -0.002808329

Looking at this we can see that the AIC is 5455.3, the BIC is 5471.24 and the RMSE is 927823.9. ARIMA seems to be the best model because the AIC, BIC and the RMSE is the least compared to the other 3 models.

Accuracy

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Smooth Exponential Smoothing	141716.74	1416845.0	1176581	-5.389360	21.55473	1.5812182	0.3671121
Holt	57861.12	996414.8	656759	-4.275331	13.55529	0.8826242	0.2362087
ARIMA	-10349.99	927823.9	576140	-5.029935	12.46290	0.7742797	-0.0028083

Forecast of Power Data

forecastpower <-forecast(powerarima, 12, level = 95)
autoplot(forecastpower)

Summary of the Power Data

Three different models were made which were Smooth Exponential Smoothing, Holt, ARIMA. Out all three of them the best model was the ARIMA model with AIC is 5455.3, the BIC is 5471.24 and the RMSE is 927823.9 since it had the best AIC, BIC, and RMSE.

ab <- seq(as.Date("2014-01-01"), as.Date("2014-12-01"), by="months")
df <- data.frame("YYYY-MMM" = format(ab, "%Y-%b"), KWH = forecastpower$mean)
colnames(df) <- c("YYYY-MMM", "KWH")
write.csv(df, file = "Forecast for Power.csv", row.names = FALSE)