This analysis forecasts daily cash withdrawals for four ATM machines (ATM1, ATM2, ATM3, ATM4) for the month of May 2010. The data, provided in hundreds of dollars, spans May 1, 2009 through April 30, 2010, which is exactly one full year of daily observations per machine. Each ATM tells a different story, and as a result each requires its own modeling strategy rather than a one-size-fits-all approach.
library(tidyverse)
library(readxl)
library(forecast)
library(tseries)
library(lubridate)
url <- "https://github.com/vincent-usny/624_pro1/raw/refs/heads/main/ATM624Data.xlsx"
tmp <- tempfile(fileext = ".xlsx")
download.file(url, tmp, mode = "wb")
df <- read_excel(tmp, sheet = "ATM Data") %>%
filter(!is.na(ATM)) %>%
mutate(DATE = as.Date(DATE))
df_clean <- df %>%
group_by(ATM) %>%
mutate(Cash = if_else(
ATM == "ATM4" & Cash == max(Cash, na.rm = TRUE),
median(Cash[abs(row_number() - which.max(Cash)) <= 3 &
row_number() != which.max(Cash)], na.rm = TRUE),
Cash
)) %>%
ungroup()
forecast_atm <- function(atm_name, data, h = 31) {
vals <- data %>%
filter(ATM == atm_name) %>%
arrange(DATE) %>%
pull(Cash)
fit <- ts(vals, frequency = 7) %>%
auto.arima(seasonal = TRUE, stepwise = FALSE, approximation = FALSE)
fc <- forecast(fit, h = h)
tibble(
ATM = atm_name,
Date = seq(as.Date("2010-05-01"), by = "day", length.out = h),
Forecast = as.numeric(fc$mean),
Lo80 = as.numeric(fc$lower[, 1]),
Hi80 = as.numeric(fc$upper[, 1]),
Lo95 = as.numeric(fc$lower[, 2]),
Hi95 = as.numeric(fc$upper[, 2])
)
}
atm3_mean <- df %>%
filter(ATM == "ATM3", Cash > 0) %>%
pull(Cash) %>%
mean()
atm3_fc <- tibble(
ATM = "ATM3",
Date = seq(as.Date("2010-05-01"), by = "day", length.out = 31),
Forecast = atm3_mean,
Lo80 = atm3_mean,
Hi80 = atm3_mean,
Lo95 = atm3_mean,
Hi95 = atm3_mean
)
The dataset contains 1,460 clean observations across four ATMs (365 per machine). There are no missing dates. However two significant data issues were identified before modeling:
ATM3: Near-zero machine. Of its 365 observations, 362 are exactly zero. Only the final three days of the sample period (April 28–30, 2010) show any activity, with values of 96, 82, and 85 ($00s). This machine was effectively offline or newly installed for nearly the entire year.
ATM4 — Extreme outlier. February 9, 2010 records a withdrawal of $10,919.76, which is 20 times the next highest value and clearly a data entry error.
df %>%
group_by(ATM) %>%
summarise(
Mean = round(mean(Cash, na.rm = TRUE), 2),
SD = round(sd(Cash, na.rm = TRUE), 2),
Min = round(min(Cash, na.rm = TRUE), 2),
Max = round(max(Cash, na.rm = TRUE), 2)
)
## # A tibble: 4 × 5
## ATM Mean SD Min Max
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ATM1 83.9 36.7 1 180
## 2 ATM2 62.6 38.9 0 147
## 3 ATM3 0.72 7.94 0 96
## 4 ATM4 474. 651. 1.56 10920.
ATM1 is the most consistent machine. ATM2 shows a gradual decline from ~$2,338 hundred in May 2009 to ~$1,842 hundred by April 2010. ATM4 is by far the highest-volume machine, averaging nearly 6x ATM1 daily.
forecasts <- map_dfr(c("ATM1", "ATM2", "ATM4"), forecast_atm, data = df_clean) %>%
bind_rows(atm3_fc) %>%
arrange(ATM, Date)
forecasts %>%
group_by(ATM) %>%
summarise(May_Total = round(sum(Forecast), 1))
## # A tibble: 4 × 2
## ATM May_Total
## <chr> <dbl>
## 1 ATM1 2435.
## 2 ATM2 1790.
## 3 ATM3 2718.
## 4 ATM4 13226.
ggplot(forecasts, aes(Date, Forecast, color = ATM, fill = ATM)) +
geom_ribbon(aes(ymin = Lo95, ymax = Hi95), alpha = 0.15, color = NA) +
geom_ribbon(aes(ymin = Lo80, ymax = Hi80), alpha = 0.25, color = NA) +
geom_line(size = 1) +
facet_wrap(~ATM, scales = "free_y", ncol = 2) +
scale_x_date(date_labels = "%b %d") +
labs(
title = "ATM Cash Withdrawal Forecasts - May 2010",
subtitle = "Shaded bands show 80% and 95% prediction intervals",
x = NULL,
y = "Cash ($00s)"
) +
theme_minimal() +
theme(legend.position = "none")
Each ATM follows its established weekly rhythm into May 2010 –
ATM1 and ATM2 show moderate, stable cycles with wide prediction
intervals reflecting day-to-day variability, ATM3 produces a flat line
at $87.67 hundred due to insufficient history, and ATM4 dominates in
volume with the widest uncertainty bands consistent with its
high-variance operating pattern.
Please see the report in the separate docx file.