In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file. Also, please submit the forecast which you will put in an Excel readable file.
data_raw <- read_excel ('./data/ATM624Data.xlsx', col_names = TRUE,
col_types = c('date','text','numeric'))%>%
rename(date = DATE, atm = ATM, cash = Cash)%>%
mutate(date = as.Date(date, origin = "1899-12-30"))
summary(data_raw)
## date atm cash
## Min. :2009-05-01 Length:1474 Min. : 0.0
## 1st Qu.:2009-08-01 Class :character 1st Qu.: 0.5
## Median :2009-11-01 Mode :character Median : 73.0
## Mean :2009-10-31 Mean : 155.6
## 3rd Qu.:2010-02-01 3rd Qu.: 114.0
## Max. :2010-05-14 Max. :10919.8
## NA's :19
After loading the data, we see there are a number of NA’s which need to be dealt with. Generally, dropping the cases is not advisable as the time series should have all the dates. In this case I have adjust that logic as not only the amount of cash is missing but also which ATM the cash is in is missing. Since I can not imply the location it does not make sense to do math manipulation for missing ‘cash’ values.
I will perform the following process:
atm <- data_raw
atm <- data_raw[complete.cases(data_raw),]
atm$cash <- imputeTS::na_ma(atm$cash, k=5, weighting = "simple")
# as the data is in hundreds of dollars &
atm <- atm %>%
mutate(across(where(is.numeric), round, 0))
atm <- atm%>%
spread(atm, cash)
atm_ts <- ts(atm %>% select(-date))
#str(atm_ts)
autoplot(atm_ts)+
labs(title = "Cash per all ATMs", subtitle = "By week")+
scale_y_continuous("Withdrawn in Hundreds")
atm1 <- atm_ts[,1]
atm1<-ts(atm1, frequency = 7)
ggtsdisplay(atm1, points = FALSE, plot.type = "histogram")
## Warning: Removed 3 rows containing non-finite values (stat_bin).
atm2 <- atm_ts[,2]
atm2<-ts(atm2, frequency = 7)
ggtsdisplay(atm2, points = FALSE, plot.type = "histogram")
## Warning: Removed 2 rows containing non-finite values (stat_bin).
atm3 <- atm_ts[,3]
atm3[which(atm3 == 0)]<-NA
atm3<-ts(atm3, frequency = 7)
ggtsdisplay(atm3, points = FALSE, plot.type = "histogram")
## Warning: Removed 362 rows containing non-finite values (stat_bin).
atm4 <- atm_ts[,4]
atm4[which.max(atm4)]<-median(atm4, na.rm = TRUE)
atm4<-ts(atm4, frequency = 7)
ggtsdisplay(atm4, points = FALSE, plot.type = "histogram")
All Models will start with a Box-Cox transform as it is the simplest transform method. Each will be evaluated to see if the transform is beneficial, gets removed, or if other steps are required.
atm1_lambda <- BoxCox.lambda(atm1)
ggtsdisplay(atm1 %>% BoxCox(atm1_lambda))
## Warning: Removed 3 rows containing missing values (geom_point).
ur.kpss(atm1 %>% BoxCox(atm1_lambda)) %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.3929
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ggtsdisplay(diff(atm1, 7), points = FALSE)
ur.kpss(diff(atm1, 7)) %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0259
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ggtsdisplay(atm1 %>% BoxCox(atm1_lambda)%>% diff(7))
## Warning: Removed 6 rows containing missing values (geom_point).
ur.kpss(atm1 %>% BoxCox(atm1_lambda) %>% diff(7)) %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.016
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
atm_model1 <- Arima(atm1, order=c(2,0,2), seasonal = c(0,1,1), lambda = atm1_lambda)
summary(atm_model1)
## Series: atm1
## ARIMA(2,0,2)(0,1,1)[7]
## Box Cox transformation: lambda= 0.2081476
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1
## -0.5362 -0.8226 0.6000 0.7602 -0.6945
## s.e. 0.1342 0.1455 0.1418 0.1770 0.0433
##
## sigma^2 estimated as 1.244: log likelihood=-542.33
## AIC=1096.67 AICc=1096.9 BIC=1119.95
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 2.522885 24.94303 16.01159 -85.35837 104.6165 0.9058306
## ACF1
## Training set 0.06631228
checkresiduals(atm_model1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,1,1)[7]
## Q* = 7.3162, df = 9, p-value = 0.6042
##
## Model df: 5. Total lags used: 14
atm1_check <- auto.arima(atm1, approximation = FALSE, lambda = atm1_lambda)
summary(atm1_check)
## Series: atm1
## ARIMA(0,0,2)(0,1,1)[7]
## Box Cox transformation: lambda= 0.2081476
##
## Coefficients:
## ma1 ma2 sma1
## 0.0989 -0.1107 -0.6482
## s.e. 0.0532 0.0527 0.0425
##
## sigma^2 estimated as 1.247: log likelihood=-543.68
## AIC=1095.37 AICc=1095.48 BIC=1110.89
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 2.493867 25.19797 16.12913 -88.12706 107.3975 0.9124807
## ACF1
## Training set 0.01895515
checkresiduals(atm1_check)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,2)(0,1,1)[7]
## Q* = 8.5811, df = 11, p-value = 0.6605
##
## Model df: 3. Total lags used: 14
No Transforms
ggtsdisplay(atm2, points = FALSE, plot.type = "histogram")
## Warning: Removed 2 rows containing non-finite values (stat_bin).
ur.kpss(atm2) %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 2.1698
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Box-Cox Transform
atm2_lambda <- BoxCox.lambda(atm2)
ggtsdisplay(atm2 %>% BoxCox(atm2_lambda))
## Warning: Removed 2 rows containing missing values (geom_point).
ur.kpss(atm2 %>% BoxCox(atm2_lambda)) %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 2.1556
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ndiffs(atm2)
## [1] 1
ggtsdisplay(diff(atm2, 7), points = FALSE)
ur.kpss(diff(atm2, 7)) %>% summary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0168
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
atm_model2<- Arima(atm2, order=c(2,1,2), seasonal = c(0,1,1), lambda = atm2_lambda)
summary(atm_model2)
## Series: atm2
## ARIMA(2,1,2)(0,1,1)[7]
## Box Cox transformation: lambda= 0.7164431
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1
## -0.0359 -0.1078 -0.9285 -0.0715 -0.6228
## s.e. 0.2651 0.0569 0.2633 0.2631 0.0450
##
## sigma^2 estimated as 65.42: log likelihood=-1249.21
## AIC=2510.42 AICc=2510.66 BIC=2533.69
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.6080112 24.9517 17.33436 -Inf Inf 0.8461616 -0.01891811
checkresiduals(atm_model2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,2)(0,1,1)[7]
## Q* = 26.39, df = 9, p-value = 0.001763
##
## Model df: 5. Total lags used: 14
atm2_check <- auto.arima(atm2, approximation = FALSE, lambda = atm2_lambda)
summary(atm2_check)
## Series: atm2
## ARIMA(2,0,2)(0,1,1)[7]
## Box Cox transformation: lambda= 0.7164431
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1
## -0.4188 -0.9019 0.4587 0.7886 -0.7187
## s.e. 0.0587 0.0471 0.0884 0.0619 0.0414
##
## sigma^2 estimated as 62.19: log likelihood=-1240.08
## AIC=2492.16 AICc=2492.4 BIC=2515.45
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.2631418 24.22403 16.90092 -Inf Inf 0.8250038 -0.01464052
checkresiduals(atm2_check)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,1,1)[7]
## Q* = 9.8051, df = 9, p-value = 0.3665
##
## Model df: 5. Total lags used: 14
I had a hard time figuring out what to do with this model. There is only 3 observations in the dataset. After doing research, I chose a random walk model vs an average model using the non-zero observations.
#atm3_lambda <- BoxCox.lambda(atm3) # lambda for ATM3 will not be required
atm_model3 <- rwf(atm3, h=31)
atm4_lambda <- BoxCox.lambda(atm4)
ggtsdisplay(atm4, points = FALSE, plot.type = "histogram")
ndiffs(atm4)
## [1] 0
ggtsdisplay(atm4 %>% diff(7))
atm_model4 <- Arima(atm4, order=c(1,0,0), seasonal = c(0,1,0), lambda = atm4_lambda)
summary(atm_model4)
## Series: atm4
## ARIMA(1,0,0)(0,1,0)[7]
## Box Cox transformation: lambda= 0.4524377
##
## Coefficients:
## ar1
## 0.0440
## s.e. 0.0528
##
## sigma^2 estimated as 297.2: log likelihood=-1526.79
## AIC=3057.58 AICc=3057.61 BIC=3065.34
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -3.622377 447.8441 339.2516 -371.1099 425.4804 0.9793969
## ACF1
## Training set 0.01466724
checkresiduals(atm_model4)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(0,1,0)[7]
## Q* = 120.15, df = 13, p-value < 2.2e-16
##
## Model df: 1. Total lags used: 14
atm4_check <- auto.arima(atm4, approximation = FALSE, lambda = atm4_lambda)
summary(atm4_check)
## Series: atm4
## ARIMA(1,0,0)(2,0,0)[7] with non-zero mean
## Box Cox transformation: lambda= 0.4524377
##
## Coefficients:
## ar1 sar1 sar2 mean
## 0.0770 0.2091 0.1995 28.9903
## s.e. 0.0526 0.0516 0.0525 1.2615
##
## sigma^2 estimated as 181.9: log likelihood=-1466.06
## AIC=2942.11 AICc=2942.28 BIC=2961.61
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 84.40696 351.7723 274.2572 -354.4169 398.8866 0.7917625
## ACF1
## Training set 0.01970019
checkresiduals(atm4_check)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(2,0,0)[7] with non-zero mean
## Q* = 15.867, df = 10, p-value = 0.1035
##
## Model df: 4. Total lags used: 14
# imputing the Auto Arima into the list
atm_model5 <- Arima(atm4, order=c(1,0,0), seasonal = c(2,0,0), lambda = atm4_lambda)
atm1_forecast <- forecast(atm_model1, 31, level = 95)
autoplot(atm1_forecast) +
labs(title = "ATM1: ARIMA(2,0,2)(0,1,1)", x = "Week", y = NULL) +
theme(legend.position = "none")
atm2_forecast <- forecast(atm_model2, 31, level = 95)
autoplot(atm2_forecast) +
labs(title = "ATM2: ARIMA(2,1,2)(0,1,1)", x = "Week", y = NULL) +
theme(legend.position = "none")
autoplot(atm_model3) +
labs(title = "ATM3: Random Walk", x = "Week", y = NULL) +
theme(legend.position = "none")
## Warning: Removed 362 row(s) containing missing values (geom_path).
atm3_forecast <- forecast(atm_model3)
atm4_forecast <- forecast(atm_model5, 31, level = 95)
autoplot(atm2_forecast) +
labs(title = "ATM4: ARIMA(1,0,0)(2,0,0)", x = "Week", y = NULL) +
theme(legend.position = "none")
export <- rbind(atm1_forecast$mean, atm2_forecast$mean, atm3_forecast$mean, atm4_forecast$mean)
write.csv(export, "ATM Forecasts.csv")