Introduction

This analysis forecasts daily cash withdrawals for four ATM machines (ATM1, ATM2, ATM3, ATM4) for the month of May 2010. The data, provided in hundreds of dollars, spans May 1, 2009 through April 30, 2010, which is exactly one full year of daily observations per machine. Each ATM tells a different story, and as a result each requires its own modeling strategy rather than a one-size-fits-all approach.

library(tidyverse)
library(readxl)
library(forecast)
library(tseries)
library(lubridate)
url <- "https://github.com/vincent-usny/624_pro1/raw/refs/heads/main/ATM624Data.xlsx"
tmp <- tempfile(fileext = ".xlsx")
download.file(url, tmp, mode = "wb")

df <- read_excel(tmp, sheet = "ATM Data") %>%
  filter(!is.na(ATM)) %>%
  mutate(DATE = as.Date(DATE))
df_clean <- df %>%
  group_by(ATM) %>%
  mutate(Cash = if_else(
    ATM == "ATM4" & Cash == max(Cash, na.rm = TRUE),
    median(Cash[abs(row_number() - which.max(Cash)) <= 3 &
                  row_number() != which.max(Cash)], na.rm = TRUE),
    Cash
  )) %>%
  ungroup()
forecast_atm <- function(atm_name, data, h = 31) {
  vals <- data %>%
    filter(ATM == atm_name) %>%
    arrange(DATE) %>%
    pull(Cash)

  fit <- ts(vals, frequency = 7) %>%
    auto.arima(seasonal = TRUE, stepwise = FALSE, approximation = FALSE)

  fc <- forecast(fit, h = h)

  tibble(
    ATM      = atm_name,
    Date     = seq(as.Date("2010-05-01"), by = "day", length.out = h),
    Forecast = as.numeric(fc$mean),
    Lo80     = as.numeric(fc$lower[, 1]),
    Hi80     = as.numeric(fc$upper[, 1]),
    Lo95     = as.numeric(fc$lower[, 2]),
    Hi95     = as.numeric(fc$upper[, 2])
  )
}
atm3_mean <- df %>%
  filter(ATM == "ATM3", Cash > 0) %>%
  pull(Cash) %>%
  mean()

atm3_fc <- tibble(
  ATM      = "ATM3",
  Date     = seq(as.Date("2010-05-01"), by = "day", length.out = 31),
  Forecast = atm3_mean,
  Lo80     = atm3_mean,
  Hi80     = atm3_mean,
  Lo95     = atm3_mean,
  Hi95     = atm3_mean
)

Data Analysis

Data Quality

The dataset contains 1,460 clean observations across four ATMs (365 per machine). There are no missing dates. However two significant data issues were identified before modeling:

ATM3: Near-zero machine. Of its 365 observations, 362 are exactly zero. Only the final three days of the sample period (April 28–30, 2010) show any activity, with values of 96, 82, and 85 ($00s). This machine was effectively offline or newly installed for nearly the entire year.

ATM4 — Extreme outlier. February 9, 2010 records a withdrawal of $10,919.76, which is 20 times the next highest value and clearly a data entry error.

df %>%
  group_by(ATM) %>%
  summarise(
    Mean = round(mean(Cash, na.rm = TRUE), 2),
    SD   = round(sd(Cash,   na.rm = TRUE), 2),
    Min  = round(min(Cash,  na.rm = TRUE), 2),
    Max  = round(max(Cash,  na.rm = TRUE), 2)
  )
## # A tibble: 4 × 5
##   ATM     Mean     SD   Min    Max
##   <chr>  <dbl>  <dbl> <dbl>  <dbl>
## 1 ATM1   83.9   36.7   1      180 
## 2 ATM2   62.6   38.9   0      147 
## 3 ATM3    0.72   7.94  0       96 
## 4 ATM4  474.   651.    1.56 10920.

ATM1 is the most consistent machine. ATM2 shows a gradual decline from ~$2,338 hundred in May 2009 to ~$1,842 hundred by April 2010. ATM4 is by far the highest-volume machine, averaging nearly 6x ATM1 daily.

forecasts <- map_dfr(c("ATM1", "ATM2", "ATM4"), forecast_atm, data = df_clean) %>%
  bind_rows(atm3_fc) %>%
  arrange(ATM, Date)
forecasts %>%
  group_by(ATM) %>%
  summarise(May_Total = round(sum(Forecast), 1))
## # A tibble: 4 × 2
##   ATM   May_Total
##   <chr>     <dbl>
## 1 ATM1      2435.
## 2 ATM2      1790.
## 3 ATM3      2718.
## 4 ATM4     13226.

Forcast Results

ggplot(forecasts, aes(Date, Forecast, color = ATM, fill = ATM)) +
  geom_ribbon(aes(ymin = Lo95, ymax = Hi95), alpha = 0.15, color = NA) +
  geom_ribbon(aes(ymin = Lo80, ymax = Hi80), alpha = 0.25, color = NA) +
  geom_line(size = 1) +
  facet_wrap(~ATM, scales = "free_y", ncol = 2) +
  scale_x_date(date_labels = "%b %d") +
  labs(
    title    = "ATM Cash Withdrawal Forecasts - May 2010",
    subtitle = "Shaded bands show 80% and 95% prediction intervals",
    x        = NULL,
    y        = "Cash ($00s)"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Each ATM follows its established weekly rhythm into May 2010 – ATM1 and ATM2 show moderate, stable cycles with wide prediction intervals reflecting day-to-day variability, ATM3 produces a flat line at $87.67 hundred due to insufficient history, and ATM4 dominates in volume with the widest uncertainty bands consistent with its high-variance operating pattern.

Please see the report in the separate docx file.