Automated Teller Machines (ATMs) play a critical role in modern banking by providing customers with continuous access to cash. Efficient management of ATM cash levels is essential to ensure reliable service while minimizing operational costs. The dataset used in this analysis contains daily cash withdrawal amounts from four ATM machines over a specified time period, with withdrawals measured in hundreds of dollars.
Understanding the patterns in cash demand—such as trends, variability, and potential seasonality—is important for developing accurate forecasts. Time series methods, including transformation, decomposition, and ARIMA modeling, are used to capture these patterns and generate reliable predictions for future cash requirements.
The primary objective of this analysis is to forecast daily cash withdrawals for each ATM for the month of May 2010. These forecasts will support better planning and decision-making for ATM cash management.
Financial institutions must ensure that ATMs are adequately stocked with cash at all times. Underestimating demand can lead to empty machines, resulting in poor customer experience and potential revenue loss. Overestimating demand, on the other hand, leads to excess cash being held in machines, increasing security risks and operational costs.
The key business problem is to determine how much cash should be allocated to each ATM on a daily basis. This requires accurate forecasting of withdrawal demand based on historical data.
By developing robust time series models, this analysis aims to:
Predict future cash withdrawal patterns Optimize cash replenishment schedules Reduce operational and transportation costs Minimize the risk associated with holding excess cash Improve overall service reliability
Ultimately, the goal is to provide a data-driven approach to ATM cash management, enabling more efficient and cost-effective operations.
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(tsibble)
## Warning: package 'tsibble' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
##
## Attaching package: 'tsibble'
## The following object is masked from 'package:lubridate':
##
## interval
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 methods overwritten by 'ggtime':
## method from
## +.gg_tsensemble feasts
## autolayer.fbl_ts fabletools
## autolayer.tbl_ts fabletools
## autoplot.dcmp_ts fabletools
## autoplot.fbl_ts fabletools
## autoplot.tbl_cf feasts
## autoplot.tbl_ts fabletools
## chooseOpsMethod.gg_tsensemble feasts
## fortify.fbl_ts fabletools
## grid.draw.gg_tsensemble feasts
## print.gg_tsensemble feasts
## scale_type.cf_lag feasts
## Warning: replacing previous import 'feasts::scale_x_cf_lag' by
## 'ggtime::scale_x_cf_lag' when loading 'fpp3'
## Warning: replacing previous import 'feasts::gg_season' by 'ggtime::gg_season'
## when loading 'fpp3'
## Warning: replacing previous import 'feasts::gg_tsresiduals' by
## 'ggtime::gg_tsresiduals' when loading 'fpp3'
## Warning: replacing previous import 'feasts::gg_irf' by 'ggtime::gg_irf' when
## loading 'fpp3'
## Warning: replacing previous import 'feasts::gg_arma' by 'ggtime::gg_arma' when
## loading 'fpp3'
## Warning: replacing previous import 'feasts::gg_tsdisplay' by
## 'ggtime::gg_tsdisplay' when loading 'fpp3'
## Warning: replacing previous import 'feasts::gg_subseries' by
## 'ggtime::gg_subseries' when loading 'fpp3'
## Warning: replacing previous import 'feasts::gg_lag' by 'ggtime::gg_lag' when
## loading 'fpp3'
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.3 ──
## ✔ tibble 3.2.1 ✔ ggtime 0.2.0
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ ggplot2 3.5.2 ✔ fable 0.5.0
## ✔ tsibbledata 0.4.1
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'tsibbledata' was built under R version 4.4.3
## Warning: package 'ggtime' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ feasts::gg_arma() masks ggtime::gg_arma()
## ✖ feasts::gg_irf() masks ggtime::gg_irf()
## ✖ feasts::gg_lag() masks ggtime::gg_lag()
## ✖ feasts::gg_season() masks ggtime::gg_season()
## ✖ feasts::gg_subseries() masks ggtime::gg_subseries()
## ✖ feasts::gg_tsdisplay() masks ggtime::gg_tsdisplay()
## ✖ feasts::gg_tsresiduals() masks ggtime::gg_tsresiduals()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ feasts::scale_x_cf_lag() masks ggtime::scale_x_cf_lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tidyr)
library(ggplot2)
library(zoo)
## Warning: package 'zoo' was built under R version 4.4.3
##
## Attaching package: 'zoo'
## The following object is masked from 'package:tsibble':
##
## index
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(janitor)
## Warning: package 'janitor' was built under R version 4.4.3
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
ATMDATA <- read_csv("https://raw.githubusercontent.com/tanzil64/DATA624/refs/heads/main/ATM624Data.csv")
ATMDATA <- ATMDATA %>%
mutate(DATE = as.Date(DATE, format = "%m/%d/%Y")) %>%
filter(!is.na(DATE))
head(ATMDATA)
summary(ATMDATA)
## DATE ATM Cash
## Min. :2009-05-01 Length:1474 Min. : 0.0
## 1st Qu.:2009-08-01 Class :character 1st Qu.: 0.5
## Median :2009-11-01 Mode :character Median : 73.0
## Mean :2009-10-31 Mean : 155.6
## 3rd Qu.:2010-02-01 3rd Qu.: 114.0
## Max. :2010-05-14 Max. :10920.0
## NA's :19
str(ATMDATA)
## tibble [1,474 × 3] (S3: tbl_df/tbl/data.frame)
## $ DATE: Date[1:1474], format: "2009-05-01" "2009-05-01" ...
## $ ATM : chr [1:1474] "ATM1" "ATM2" "ATM1" "ATM2" ...
## $ Cash: num [1:1474] 96 107 82 89 85 90 90 55 99 79 ...
head(ATMDATA)
summary(ATMDATA)
## DATE ATM Cash
## Min. :2009-05-01 Length:1474 Min. : 0.0
## 1st Qu.:2009-08-01 Class :character 1st Qu.: 0.5
## Median :2009-11-01 Mode :character Median : 73.0
## Mean :2009-10-31 Mean : 155.6
## 3rd Qu.:2010-02-01 3rd Qu.: 114.0
## Max. :2010-05-14 Max. :10920.0
## NA's :19
library(naniar)
gg_miss_var(ATMDATA)
atm <- ATMDATA %>%
mutate(Cash = as.integer(Cash))
str(atm)
## tibble [1,474 × 3] (S3: tbl_df/tbl/data.frame)
## $ DATE: Date[1:1474], format: "2009-05-01" "2009-05-01" ...
## $ ATM : chr [1:1474] "ATM1" "ATM2" "ATM1" "ATM2" ...
## $ Cash: int [1:1474] 96 107 82 89 85 90 90 55 99 79 ...
Because the four ATM machines may behave differently, each series will be analyzed separately.
atm1 <- atm %>%
filter(ATM == "ATM1") %>%
as_tsibble(index = DATE)
atm2 <- atm %>%
filter(ATM == "ATM2") %>%
as_tsibble(index = DATE)
atm3 <- atm %>%
filter(ATM == "ATM3") %>%
as_tsibble(index = DATE)
atm4 <- atm %>%
filter(ATM == "ATM4") %>%
as_tsibble(index = DATE)
autoplot(atm1) +
labs(title = "ATM1", subtitle = "Cash withdrawals per day", y = "Hundreds USD")
autoplot(atm2) +
labs(title = "ATM2", subtitle = "Cash withdrawals per day", y = "Hundreds USD")
autoplot(atm3) +
labs(title = "ATM3", subtitle = "Cash withdrawals per day", y = "Hundreds USD")
autoplot(atm4) +
labs(title = "ATM4", subtitle = "Cash withdrawals per day", y = "Hundreds USD")
From the plots, ATM1 and ATM2 appear to have relatively stable variation
and possible seasonality. ATM3 contains very few observations. ATM4
contains one clear outlier. # Outlier Treatment
which(atm4$Cash == 10919)
## integer(0)
The value 10919 is an obvious outlier, so it is replaced with the mean of the ATM4 series.
atm4 <- atm4 %>%
mutate(Cash = ifelse(Cash == 10919,
round(mean(Cash, na.rm = TRUE), 0),
Cash))
autoplot(atm4) +
labs(title = "ATM4 After Outlier Correction",
subtitle = "Cash withdrawals per day",
y = "Hundreds USD")
sum(is.na(atm1$Cash))
## [1] 3
sum(is.na(atm2$Cash))
## [1] 2
sum(is.na(atm3$Cash))
## [1] 0
sum(is.na(atm4$Cash))
## [1] 0
which(is.na(atm1$Cash))
## [1] 44 47 53
which(is.na(atm2$Cash))
## [1] 49 55
We inspect the distributions of ATM1 and ATM2 before imputing the missing values.
hist(atm1$Cash, main = "ATM1 Cash Distribution", xlab = "Cash")
hist(atm2$Cash, main = "ATM2 Cash Distribution", xlab = "Cash")
Since the distributions do not appear heavily skewed, the missing values
are imputed using the median.
atm1$Cash[44] <- round(median(atm1$Cash, na.rm = TRUE), 0)
atm1$Cash[47] <- round(median(atm1$Cash, na.rm = TRUE), 0)
atm1$Cash[53] <- round(median(atm1$Cash, na.rm = TRUE), 0)
atm2$Cash[49] <- round(median(atm2$Cash, na.rm = TRUE), 0)
atm2$Cash[55] <- round(median(atm2$Cash, na.rm = TRUE), 0)
lambda1 <- atm1 %>%
filter(!is.na(Cash)) %>%
features(Cash + 1, guerrero) %>%
pull(lambda_guerrero)
lambda2 <- atm2 %>%
filter(!is.na(Cash)) %>%
features(Cash + 1, guerrero) %>%
pull(lambda_guerrero)
lambda3 <- atm3 %>%
filter(!is.na(Cash)) %>%
features(Cash + 1, guerrero) %>%
pull(lambda_guerrero)
lambda4 <- atm4 %>%
filter(!is.na(Cash)) %>%
features(Cash + 1, guerrero) %>%
pull(lambda_guerrero)
plot_trans1 <- atm1 %>%
autoplot(box_cox(Cash + 1, lambda1)) +
labs(title = "ATM1 Transformed", subtitle = "Cash withdrawals per day", y = "USD")
plot_trans2 <- atm2 %>%
autoplot(box_cox(Cash + 1, lambda2)) +
labs(title = "ATM2 Transformed", subtitle = "Cash withdrawals per day", y = "USD")
plot_trans3 <- atm3 %>%
autoplot(box_cox(Cash + 1, lambda3)) +
labs(title = "ATM3 Transformed", subtitle = "Cash withdrawals per day", y = "USD")
plot_trans4 <- atm4 %>%
autoplot(box_cox(Cash + 1, lambda4)) +
labs(title = "ATM4 Transformed", subtitle = "Cash withdrawals per day", y = "USD")
atm1 %>%
model(classical_decomposition(box_cox(Cash, lambda1), type = "additive")) %>%
components() %>%
autoplot() +
labs(title = "Classical Additive Decomposition of ATM1")
atm2 %>%
model(classical_decomposition(box_cox(Cash, lambda2), type = "additive")) %>%
components() %>%
autoplot() +
labs(title = "Classical Additive Decomposition of ATM2")
atm3 %>%
model(classical_decomposition(box_cox(Cash, lambda3), type = "additive")) %>%
components() %>%
autoplot() +
labs(title = "Classical Additive Decomposition of ATM3")
atm4 %>%
model(classical_decomposition(box_cox(Cash, lambda4), type = "additive")) %>%
components() %>%
autoplot() +
labs(title = "Classical Additive Decomposition of ATM4")
plot_acf1 <- atm1 %>%
ACF(box_cox(Cash, lambda1)) %>%
autoplot() +
labs(title = "Autocorrelation of Cash - ATM1")
plot_acf2 <- atm2 %>%
ACF(box_cox(Cash, lambda2)) %>%
autoplot() +
labs(title = "Autocorrelation of Cash - ATM2")
plot_acf3 <- atm3 %>%
ACF(box_cox(Cash, lambda3)) %>%
autoplot() +
labs(title = "Autocorrelation of Cash - ATM3")
plot_acf4 <- atm4 %>%
ACF(box_cox(Cash, lambda4)) %>%
autoplot() +
labs(title = "Autocorrelation of Cash - ATM4")
library(gridExtra)
grid.arrange(plot_acf1, plot_acf2, plot_acf3, plot_acf4, nrow = 2)
atm1 %>% features(box_cox(Cash, lambda1), unitroot_nsdiffs)
atm2 %>% features(box_cox(Cash, lambda2), unitroot_nsdiffs)
atm3 %>% features(box_cox(Cash, lambda3), unitroot_nsdiffs)
atm4 %>% features(box_cox(Cash, lambda4), unitroot_nsdiffs)
The results indicate that ATM1 and ATM2 require one seasonal difference, while ATM3 and ATM4 do not.
We next check whether any additional regular differencing is required.
atm1 %>% features(difference(box_cox(Cash, lambda1), 7), unitroot_ndiffs)
atm2 %>% features(difference(box_cox(Cash, lambda2), 7), unitroot_ndiffs)
atm3 %>% features(box_cox(Cash, lambda3), unitroot_ndiffs)
atm4 %>% features(box_cox(Cash, lambda4), unitroot_ndiffs)
No additional non-seasonal differencing appears necessary.
atm1 %>%
ACF(difference(box_cox(Cash, lambda1), 7)) %>%
autoplot() +
labs(title = "ACF of Differenced ATM1")
atm2 %>%
ACF(difference(box_cox(Cash, lambda2), 7)) %>%
autoplot() +
labs(title = "ACF of Differenced ATM2")
After differencing, the series look much closer to white noise. # Model Building
Because ATM1 and ATM2 need seasonal differencing, ARIMA models are appropriate. ATM3 has too little data, so a naive model is more reasonable. # ATM1 Model
atm1_fit <- atm1 %>%
model(ARIMA(box_cox(Cash, lambda1)))
report(atm1_fit)
## Series: Cash
## Model: ARIMA(0,0,2)(0,1,1)[7]
## Transformation: box_cox(Cash, lambda1)
##
## Coefficients:
## ma1 ma2 sma1
## 0.0980 -0.1143 -0.6463
## s.e. 0.0524 0.0527 0.0425
##
## sigma^2 estimated as 1.483: log likelihood=-578.9
## AIC=1165.81 AICc=1165.92 BIC=1181.33
atm1_fit %>% gg_tsresiduals()
atm2_fit <- atm2 %>%
model(ARIMA(box_cox(Cash, lambda2)))
report(atm2_fit)
## Series: Cash
## Model: ARIMA(2,0,2)(0,1,1)[7]
## Transformation: box_cox(Cash, lambda2)
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1
## -0.4246 -0.9072 0.4679 0.8003 -0.7257
## s.e. 0.0587 0.0439 0.0890 0.0584 0.0409
##
## sigma^2 estimated as 67.61: log likelihood=-1261.88
## AIC=2535.75 AICc=2535.99 BIC=2559.03
atm2_fit %>% gg_tsresiduals()
atm3_fit <- atm3 %>%
model(NAIVE(box_cox(Cash, lambda3)))
report(atm3_fit)
## Series: Cash
## Model: NAIVE
## Transformation: box_cox(Cash, lambda3)
##
## sigma^2: 5.0877
atm3_fit %>% gg_tsresiduals()
atm4_fit <- atm4 %>%
model(ARIMA(box_cox(Cash, lambda4)))
report(atm4_fit)
## Series: Cash
## Model: ARIMA(0,0,0)(2,0,0)[7] w/ mean
## Transformation: box_cox(Cash, lambda4)
##
## Coefficients:
## sar1 sar2 constant
## 0.2501 0.1951 2.4713
## s.e. 0.0521 0.0525 0.0460
##
## sigma^2 estimated as 0.8133: log likelihood=-479.32
## AIC=966.65 AICc=966.76 BIC=982.25
atm4_fit %>% gg_tsresiduals()
We now forecast 31 days ahead for May 2010.
forecast_atm1 <- atm1_fit %>% forecast(h = 31)
forecast_atm2 <- atm2_fit %>% forecast(h = 31)
forecast_atm3 <- atm3_fit %>% forecast(h = 31)
forecast_atm4 <- atm4_fit %>% forecast(h = 31)
autoplot(forecast_atm1, atm1) + labs(title = "Forecast for ATM1")
autoplot(forecast_atm2, atm2) + labs(title = "Forecast for ATM2")
autoplot(forecast_atm3, atm3) + labs(title = "Forecast for ATM3")
autoplot(forecast_atm4, atm4) + labs(title = "Forecast for ATM4")
write.csv(as.data.frame(forecast_atm1), "forecast_atm1.csv", row.names = FALSE)
write.csv(as.data.frame(forecast_atm2), "forecast_atm2.csv", row.names = FALSE)
write.csv(as.data.frame(forecast_atm3), "forecast_atm3.csv", row.names = FALSE)
write.csv(as.data.frame(forecast_atm4), "forecast_atm4.csv", row.names = FALSE)
The business problem is to determine how much cash each ATM should hold in order to avoid shortages while minimizing excess cash. The time series models developed in this analysis directly address this problem by converting historical withdrawal patterns into quantitative forecasts of future demand.
First, the models capture key characteristics of the data, such as trend, variability, and seasonality. For example, ATM1 and ATM2 showed clear weekly patterns, which were incorporated through seasonal differencing in the ARIMA models. By accounting for these patterns, the models are able to produce more accurate and reliable forecasts.
Second, the forecasting step provides daily predictions of cash withdrawals for each ATM. These forecasts allow decision-makers to estimate how much cash will be needed on each day of May 2010. Instead of relying on intuition or simple averages, the bank can use these statistically grounded predictions to plan cash allocation.
As a result, the models help solve the business problem in the following ways:
Prevent stockouts: Forecasts ensure that enough cash is loaded into each ATM to meet expected demand, reducing the risk of machines running empty. Reduce excess cash: By avoiding overestimation, the bank minimizes unnecessary cash holding, lowering security risks and opportunity costs. Optimize replenishment schedules: Knowing expected demand allows for efficient planning of cash delivery routes and frequencies. Improve operational efficiency: Resources such as staff and transportation can be allocated more effectively based on predicted needs.
In summary, the models transform raw historical data into actionable insights, enabling the bank to make informed, data-driven decisions about ATM cash management. This directly improves service reliability while reducing operational costs.
Accurate forecasting of residential electricity demand is critical for efficient energy planning and resource allocation. The dataset used in this analysis contains monthly residential power consumption from January 1998 through December 2013. The goal is to build a time series model and generate forecasts for 2014.
Energy providers must anticipate future electricity demand to ensure reliable service while minimizing operational costs. Underestimating demand can lead to shortages, while overestimating demand results in inefficient resource allocation. This analysis aims to develop a data-driven forecasting model to support better planning and decision-making.
power <- read_csv("https://raw.githubusercontent.com/tanzil64/DATA624/refs/heads/main/ResidentialCustomerForecastLoad-624.csv") %>%
rename(Date = `YYYY-MMM`) %>%
mutate(Date = yearmonth(Date)) %>%
as_tsibble(index = Date)
power
power %>% filter(is.na(KWH))
There is one missing value (September 2008). This value is replaced using the median of the series.
power <- power %>%
mutate(KWH = ifelse(is.na(KWH),
median(KWH, na.rm = TRUE),
KWH))
power %>%
autoplot(KWH) +
labs(title = "Monthly Residential Power Consumption",
y = "KWH")
The data shows strong seasonal patterns and variability over time.
power %>%
gg_season(KWH) +
labs(title = "Seasonal Plot")
power %>%
gg_subseries(KWH) +
labs(title = "Subseries Plot")
These plots confirm strong yearly seasonality.
lambda <- power %>%
features(KWH, guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.1041666
power %>%
autoplot(box_cox(KWH, lambda)) +
labs(title = "Box-Cox Transformed Series")
The transformation stabilizes variance and improves model suitability.
power %>%
ACF(box_cox(KWH, lambda)) %>%
autoplot() +
labs(title = "ACF Plot")
The slow decay of the ACF suggests non-stationarity.
power %>%
features(KWH, unitroot_nsdiffs)
power %>%
features(KWH, unitroot_ndiffs)
Seasonal differencing is required, which will be handled internally by the ARIMA model.
fit_arima <- power %>%
model(ARIMA(box_cox(KWH, lambda)))
report(fit_arima)
## Series: KWH
## Model: ARIMA(0,1,2)(0,0,2)[12]
## Transformation: box_cox(KWH, lambda)
##
## Coefficients:
## ma1 ma2 sma1 sma2
## -0.7266 -0.2492 0.2269 0.2104
## s.e. 0.0717 0.0706 0.0787 0.0656
##
## sigma^2 estimated as 1.294: log likelihood=-295.53
## AIC=601.05 AICc=601.38 BIC=617.31
fit_arima %>%
gg_tsresiduals()
The residual diagnostics indicate that the residuals behave like white noise, suggesting a good model fit.
fit_ets <- power %>%
model(ETS(KWH))
report(fit_ets)
## Series: KWH
## Model: ETS(M,N,M)
## Smoothing parameters:
## alpha = 0.1167131
## gamma = 0.0001038822
##
## Initial states:
## l[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6]
## 6133758 0.9567738 0.7524052 0.8702234 1.184894 1.261653 1.185654 1.000083
## s[-7] s[-8] s[-9] s[-10] s[-11]
## 0.7742886 0.8153702 0.9116166 1.065073 1.221965
##
## sigma^2: 0.0144
##
## AIC AICc BIC
## 6224.787 6227.514 6273.649
fit_ets %>%
gg_tsresiduals()
accuracy(fit_arima)
accuracy(fit_ets)
The ARIMA model shows lower RMSE and better residual behavior, making it the preferred model.
fc <- fit_arima %>%
forecast(h = "12 months")
fc %>%
autoplot(power) +
labs(title = "Forecast for 2014")
fc_final <- fc %>%
as_tibble() %>%
mutate(KWH = inv_box_cox(.mean, lambda))
fc_final
write.csv(fc_final, "results_rcfl.csv", row.names = FALSE)
The ARIMA model captures both trend and seasonal patterns in electricity consumption and generates reliable monthly forecasts. These forecasts allow energy providers to anticipate demand, optimize production, and improve operational efficiency.
This analysis developed a forecasting model for residential electricity demand using historical monthly data. After handling missing values and applying a Box-Cox transformation, the ARIMA model was selected based on its superior performance and residual diagnostics. The forecasts for 2014 provide a reliable basis for planning and decision-making, helping energy providers manage resources efficiently and reduce operational risk.