Introduction

Accurately forecasting cash demand is important for ensuring ATM availability while minimizing excess cash holdings. In this report, historical ATM withdrawal data is used to forecast daily cash withdrawals for four ATM machines for May 2010. The analysis follows a practical, business-oriented approach using standard time-series forecasting techniques. Results are presented through visualizations and summarized forecasts, and the final output is provided in an Excel-readable format to support operational planning and decision-making.

Objective

This report forecasts daily cash withdrawals for each ATM for May 1–May 31, 2010 using the provided ATM624 dataset. Because the variable Cash is recorded in hundreds of dollars, the analysis first converts Cash to dollars and then builds time-series models per ATM. The goal is to produce a practical forecast that can be used for staffing and cash-replenishment planning, along with visuals and an Excel-readable forecast file.

File Location

The working directory is set to the folder containing this Rmd so the data file can be read reliably when knitting.

Set working directory to the folder where this Rmd lives (RStudio only)

try(setwd(dirname(rstudioapi::getActiveDocumentContext()$path)), silent = TRUE)

Forecast Dates and Settings

These constants define the training cutoff (end of April 2010), the May 2010 forecast window, and a zoom window used in one plot.

TRAIN_END  <- as.Date("2010-05-01")     # forecast May 2010, so train ends before this
MAY_START  <- as.Date("2010-05-01")
MAY_END    <- as.Date("2010-05-31")
ZOOM_START <- as.Date("2010-02-01")

Data Overview and Preparation

The dataset is loaded, the date is converted to a proper Date, and Cash is converted from hundreds of dollars into dollars. The result is converted into a tsibble so each ATM becomes its own time series.

atm_ts <- read.csv("ATM624Data.csv", stringsAsFactors = FALSE) %>%
  mutate(
  DATE = as.Date(mdy_hms(DATE)),
  Cash_dollars = Cash * 100) %>%
  select(DATE, ATM, Cash_dollars) %>%
  arrange(ATM, DATE) %>%
  as_tsibble(key = ATM, index = DATE)

Forecast Target Period (May 2010)

A “future” dataset is created for May 1–May 31, 2010 for each ATM. This is the date grid the models will forecast into.

new_data <- tidyr::crossing(
  ATM  = unique(atm_ts$ATM),
  DATE = seq(MAY_START, MAY_END, by = "day")) %>%
  as_tsibble(key = ATM, index = DATE)

Modeling Approach

Models are trained using data before May 1, 2010. ETS and ARIMA are fit per ATM, and if ARIMA fails for an ATM, the workflow falls back to ETS to ensure a forecast is produced.

train_ts <- atm_ts %>% filter(DATE < TRAIN_END)

fits <- train_ts %>%
model(
ets   = ETS(Cash_dollars),
arima = ARIMA(Cash_dollars)
) %>%
mutate(best = if_else(purrr::map_lgl(arima, is.null), "ets", "arima"))

Forecast Generation

The May 2010 forecasts are produced from both models and then filtered so each ATM keeps only its selected “best” model (ARIMA when available, otherwise ETS).

fc_ets <- fits %>%
select(ATM, ets) %>%
forecast(new_data = new_data) %>%
mutate(model_used = "ets")

fc_arima <- fits %>%
select(ATM, arima) %>%
forecast(new_data = new_data) %>%
mutate(model_used = "arima")

may_fc <- bind_rows(fc_ets, fc_arima) %>%
left_join(fits %>% as_tibble() %>% select(ATM, best), by = "ATM") %>%
filter(model_used == best)

Visuals: Historical Series

This plot shows the full history of daily cash withdrawals by ATM to provide context for the forecast period.

atm_ts %>%
  autoplot(Cash_dollars) +labs( title = "Daily Cash Withdrawals by ATM",
                                y = "Cash Withdrawn ($)",x = "Date")

Visuals: May 2010 Forecast

This plot overlays the May 2010 forecast on top of the historical data and includes uncertainty bands.

may_fc %>% autoplot(atm_ts, level = 95) + labs( title = "Forecast for May 2010 (ARIMA with ETS fallback)", y = "Cash Withdrawn ($)",x = "Date")

Visuals: Zoomed Forecast View

This zoomed view focuses on the most recent months leading into May, which makes the forecast region easier to read.

atm_recent <- atm_ts %>% filter(DATE >= ZOOM_START)

may_fc %>% autoplot(atm_recent, level = 95) + labs(title = "Forecast for May 2010 (Zoomed In)",y = "Cash Withdrawn ($)",x = "Date")

Model Used per ATM

This table shows which model was used for each ATM after applying the ARIMA/ETS fallback rule.

may_fc %>%
distinct(ATM, model_used) %>%
arrange(ATM)

## # A tibble: 4 × 2
##   ATM   model_used
##   <chr> <chr>     
## 1 ATM1  arima     
## 2 ATM2  arima     
## 3 ATM3  arima     
## 4 ATM4  arima

Forecast Summary (Totals and Daily Average)

The forecast is summarized into a May total and an average daily value per ATM, which is useful for comparing machines and planning cash loading.

may_summary <- may_fc %>%
mutate(Forecast = as.numeric(.mean)) %>%
group_by(ATM) %>%
summarise(
May2010_Total = sum(Forecast, na.rm = TRUE),
Avg_Daily     = mean(Forecast, na.rm = TRUE),
.groups = "drop"
)

may_summary

## # A tsibble: 124 x 4 [1D]
## # Key:       ATM [4]
##    ATM   DATE       May2010_Total Avg_Daily
##    <chr> <date>             <dbl>     <dbl>
##  1 ATM1  2010-05-01         8519.     8519.
##  2 ATM1  2010-05-02         9810.     9810.
##  3 ATM1  2010-05-03         7546.     7546.
##  4 ATM1  2010-05-04          353.      353.
##  5 ATM1  2010-05-05         9849.     9849.
##  6 ATM1  2010-05-06         7814.     7814.
##  7 ATM1  2010-05-07         8398.     8398.
##  8 ATM1  2010-05-08         8490.     8490.
##  9 ATM1  2010-05-09         9749.     9749.
## 10 ATM1  2010-05-10         7764.     7764.
## # ℹ 114 more rows

Visual: Forecasted May Totals by ATM

This bar chart compares the forecasted total withdrawals across ATMs for the month of May 2010.

may_summary %>%
ggplot(aes(x = ATM, y = May2010_Total)) +
geom_col() +
labs(
title = "Forecasted Total Cash Withdrawals – May 2010",
y = "Total ($)",
x = "ATM"
)

Export Forecast to Excel-Readable File

The daily May 2010 forecast is exported as a CSV for submission. It includes the forecast mean and 95% prediction intervals.

may_export <- may_fc %>%
  as_tibble() %>%
  transmute(
    ATM,
    DATE,
    Forecast = as.numeric(.mean)
  ) %>%
  arrange(ATM, DATE)

write.csv(may_export, "ATM_May2010_Forecast.csv", row.names = FALSE)

Conclusion

This analysis produced daily cash withdrawal forecasts for each ATM for May 2010 using time-series models built on historical transaction data. By converting the cash values into dollars and modeling each ATM separately, the forecasts reflect differences in usage patterns across machines. The combination of ARIMA and ETS models provided flexibility, with a fallback approach ensuring a forecast was generated for every ATM. The final results, supported by visualizations and an Excel-readable output, offer a practical estimate of expected cash demand that can be used for operational planning. While the models rely solely on historical patterns and do not incorporate external factors such as holidays or special events, they provide a reasonable and transparent baseline forecast for decision-making.

Part B – Forecasting Power (ResidentialCustomerForecastLoad-624.xlsx)

Objective (Part B)

This section models monthly residential electricity consumption (KWH) from January 1998 through December 2013 and produces a monthly forecast for all of 2014. The goal is to generate a practical, reproducible forecast with visuals and an Excel-readable output file.

Data Overview and Preparation (Part B)

The dataset is loaded from a single Excel file and converted into a monthly time series. The KWH variable represents monthly power usage in kilowatt-hours. Converting the date column into a proper monthly index ensures models can correctly learn long-term trend and yearly seasonality.

Load the Data (CSV)

The dataset is read from a single CSV file. A quick preview is included to confirm the column names and verify that the data loaded correctly before converting it into a time-series format.

power_raw_csv <- read.csv("ResidentialCustomerForecastLoad-624.csv", stringsAsFactors = FALSE)

# quick check
head(power_raw_csv)

##   CaseSequence YYYY.MMM     KWH
## 1          733 1998-Jan 6862583
## 2          734 1998-Feb 5838198
## 3          735 1998-Mar 5420658
## 4          736 1998-Apr 5010364
## 5          737 1998-May 4665377
## 6          738 1998-Jun 6467147

str(power_raw_csv)

## 'data.frame':    192 obs. of  3 variables:
##  $ CaseSequence: int  733 734 735 736 737 738 739 740 741 742 ...
##  $ YYYY.MMM    : chr  "1998-Jan" "1998-Feb" "1998-Mar" "1998-Apr" ...
##  $ KWH         : int  6862583 5838198 5420658 5010364 4665377 6467147 8914755 8607428 6989888 6345620 ...

Data Preparation and Time-Series Setup

The YYYY.MMM column is converted into a monthly date index and the dataset is converted into a tsibble. This ensures the forecasting models correctly interpret the data as monthly observations with yearly seasonality.

power_ts <- power_raw_csv %>%
  transmute(
    DATE = yearmonth(parse_date_time(paste0(.data[["YYYY.MMM"]], "-01"),
                                 orders = c("Y-b-d", "Y-B-d"))),
    
    KWH  = as.numeric(.data[["KWH"]])
  ) %>%
  arrange(DATE) %>%
  as_tsibble(index = DATE)

power_ts

## # A tsibble: 192 x 2 [1M]
##        DATE     KWH
##       <mth>   <dbl>
##  1 1998 Jan 6862583
##  2 1998 Feb 5838198
##  3 1998 Mar 5420658
##  4 1998 Apr 5010364
##  5 1998 May 4665377
##  6 1998 Jun 6467147
##  7 1998 Jul 8914755
##  8 1998 Aug 8607428
##  9 1998 Sep 6989888
## 10 1998 Oct 6345620
## # ℹ 182 more rows

Forecast Window for 2014

Models are trained using data through December 2013 and used to generate forecasts for January–December 2014. Creating an explicit 2014 “future” index ensures the forecast output has correct monthly timestamps.

FC_START <- yearmonth("2014 Jan")
FC_END   <- yearmonth("2014 Dec")

train_b <- power_ts %>% filter(DATE < FC_START)

new_data_2014 <- tibble(
DATE = seq(FC_START, FC_END, by = 1)
) %>%
as_tsibble(index = DATE)

Modeling Approach

Because the series is monthly and spans many years, the models should capture trend and annual seasonality. ETS and ARIMA are used as standard forecasting models, and a seasonal naïve method is included as a simple baseline for comparison.

fits_b <- train_b %>%
model(
snaive = SNAIVE(KWH ~ season("year")),
ets    = ETS(KWH),
arima  = ARIMA(KWH)
)

fits_b

## # A mable: 1 x 3
##         snaive          ets                              arima
##        <model>      <model>                            <model>
## 1 <NULL model> <NULL model> <ARIMA(0,0,2)(2,1,0)[12] w/ drift>

Model Comparison Using a Holdout Set (2013)

To choose a final model, the year 2013 is treated as a holdout period. Models are fit on data through 2012 and evaluated by forecasting 2013, then comparing forecast accuracy to the actual 2013 values.

train_2012 <- power_ts %>% filter(DATE < yearmonth("2013 Jan"))
test_2013  <- power_ts %>% filter(DATE >= yearmonth("2013 Jan") & DATE < yearmonth("2014 Jan"))

fits_2012 <- train_2012 %>%
model(
snaive = SNAIVE(KWH ~ season("year")),
ets    = ETS(KWH),
arima  = ARIMA(KWH)
)

acc_b <- fits_2012 %>%
forecast(new_data = test_2013) %>%
accuracy(test_2013)

acc_b

## # A tibble: 3 × 10
##   .model .type      ME     RMSE     MAE    MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>   <dbl>    <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 arima  Test  467241. 1500763. 971623.   4.63  11.8   NaN   NaN  0.0219
## 2 ets    Test     NaN      NaN     NaN  NaN    NaN     NaN   NaN NA     
## 3 snaive Test     NaN      NaN     NaN  NaN    NaN     NaN   NaN NA

Model Comparison Using a Holdout Set (2013)

best_model <- acc_b %>%
arrange(RMSE) %>%
slice(1) %>%
pull(.model)

best_model

## [1] "arima"

Select the Best Model

The final model is selected using the holdout accuracy results. In this report, the model with the lowest RMSE is chosen as the best-performing option for forecasting 2014.

final_fit_b <- train_b %>%
model(
snaive = SNAIVE(KWH ~ season("year")),
ets    = ETS(KWH),
arima  = ARIMA(KWH)
)

fc_2014 <- final_fit_b %>%
select(!!best_model) %>%
forecast(new_data = new_data_2014)

fc_2014

## # A fable: 12 x 4 [1M]
## # Key:     .model [1]
##    .model     DATE                 KWH    .mean
##    <chr>     <mth>              <dist>    <dbl>
##  1 arima  2014 Jan N(9790458, 7.8e+11) 9790458.
##  2 arima  2014 Feb   N(8624753, 8e+11) 8624753.
##  3 arima  2014 Mar N(6623643, 8.1e+11) 6623643.
##  4 arima  2014 Apr   N(6e+06, 8.1e+11) 5974459.
##  5 arima  2014 May N(5935072, 8.1e+11) 5935072.
##  6 arima  2014 Jun N(8199158, 8.1e+11) 8199158.
##  7 arima  2014 Jul N(9498803, 8.1e+11) 9498803.
##  8 arima  2014 Aug   N(1e+07, 8.1e+11) 9999104.
##  9 arima  2014 Sep N(8456634, 8.1e+11) 8456634.
## 10 arima  2014 Oct N(5860011, 8.1e+11) 5860011.
## 11 arima  2014 Nov N(6143003, 8.1e+11) 6143003.
## 12 arima  2014 Dec N(8245603, 8.1e+11) 8245603.

dist_col <- names(fc_2014)[sapply(fc_2014, function(x) inherits(x, "distribution"))]
dist_col

## [1] "KWH"

Historical Monthly Usage

This plot shows the full history of monthly residential electricity consumption. It helps confirm overall trend and repeating seasonal patterns before focusing on the forecast period.

power_ts %>%
autoplot(KWH) +
labs(
title = "Residential Power Usage (Monthly KWH), 1998–2013",
x = "Month",
y = "KWH"
)

2014 Forecast

This plot overlays the 2014 forecast on the historical data and includes uncertainty bands. The result provides a clear view of the expected seasonal pattern across the forecast year.

fc_2014 %>%
autoplot(power_ts, level = 95) +
labs(
title = paste("2014 Monthly Power Forecast (", best_model, ")", sep = ""),
x = "Month",
y = "KWH"
)

## Export Forecast to Excel-Readable File The final step exports the 2014 monthly forecast to a CSV file that can be opened in Excel. The output includes the forecast mean and 95% prediction interval bounds.

if (length(dist_col) == 0) {
  stop("No distribution column found in fc_2014. Recreate fc_2014 directly from forecast() and try again.")
}

power_export_2014 <- fc_2014 %>%
  mutate(`95%` = hilo(.data[[dist_col[1]]], 95)) %>%  
  unpack_hilo(`95%`) %>%
  as_tibble() %>%
  transmute(
    Month = as.character(DATE),
    Forecast_KWH = as.numeric(.mean),
    Lo95 = as.numeric(`95%_lower`),
    Hi95 = as.numeric(`95%_upper`)
  )

write.csv(power_export_2014, "ResidentialPower_Forecast_2014.csv", row.names = FALSE)

power_export_2014

## # A tibble: 12 × 4
##    Month    Forecast_KWH     Lo95      Hi95
##    <chr>           <dbl>    <dbl>     <dbl>
##  1 2014 Jan     9790458. 8059904. 11521011.
##  2 2014 Feb     8624753. 6868233. 10381273.
##  3 2014 Mar     6623643. 4864954.  8382333.
##  4 2014 Apr     5974459. 4215769.  7733148.
##  5 2014 May     5935072. 4176382.  7693762.
##  6 2014 Jun     8199158. 6440469.  9957848.
##  7 2014 Jul     9498803. 7740113. 11257493.
##  8 2014 Aug     9999104. 8240414. 11757794.
##  9 2014 Sep     8456634. 6697944. 10215323.
## 10 2014 Oct     5860011. 4101322.  7618701.
## 11 2014 Nov     6143003. 4384314.  7901693.
## 12 2014 Dec     8245603. 6486913. 10004292.

names(power_raw_csv)

## [1] "CaseSequence" "YYYY.MMM"     "KWH"

Conclusion

Using monthly KWH data from 1998–2013, this analysis generated a monthly forecast for 2014 that reflects both long-term trend and strong annual seasonality. The final model was selected using a simple holdout accuracy check and the forecast was exported in an Excel-ready format for practical use.

Project_1

Joe Garcia

2025-12-18

Introduction

Objective

File Location

Set working directory to the folder where this Rmd lives (RStudio only)

Forecast Dates and Settings

Data Overview and Preparation

Forecast Target Period (May 2010)

Modeling Approach

Forecast Generation

Visuals: Historical Series

Visuals: May 2010 Forecast

Visuals: Zoomed Forecast View

Model Used per ATM

Forecast Summary (Totals and Daily Average)

Visual: Forecasted May Totals by ATM

Export Forecast to Excel-Readable File

Conclusion

Part B – Forecasting Power (ResidentialCustomerForecastLoad-624.xlsx)

Objective (Part B)

Data Overview and Preparation (Part B)

Load the Data (CSV)

Data Preparation and Time-Series Setup

Forecast Window for 2014

Modeling Approach

Model Comparison Using a Holdout Set (2013)

Model Comparison Using a Holdout Set (2013)

Select the Best Model

Historical Monthly Usage

2014 Forecast

Conclusion