Part A – ATM Forecast, ATM624Data.xlsx

I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

Objective

The purpose of this project was to forecast daily cash withdrawals (in dollars) for four ATM machines for the month of May 2010, based on historical data from May 2009 to April 2010.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(readr)
library(openxlsx)
library(zoo)
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

Load and prepare Data

About the Data

The dataset included one year of daily cash withdrawal values per ATM. Data was provided in hundreds of dollars and converted to actual dollar values. After removing rows with missing values, the data was structured into a time series format for each ATM.

atm_data <- read_csv("/Users/michaelrobinson/Downloads/ATM_Daily_Cash_Withdrawals.csv")

atm_data_interp <- atm_data
atm_data_interp[,-1] <- lapply(atm_data[,-1], function(col) na.approx(col, na.rm = FALSE))

atm_data_ts <- ts(atm_data_interp[,-1], start = c(2009, 5), frequency = 365)

Data Visualization

atm_long <- atm_data %>%
  pivot_longer(cols = -DATE, names_to = "ATM", values_to = "Cash") %>%
  mutate(DATE = as.Date(DATE))

ggplot(atm_long, aes(x = DATE, y = Cash)) +
  geom_line() +
  facet_wrap(~ATM, ncol = 1, scales = "free_y") +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  labs(title = "Daily Cash Withdrawals per ATM (in $100s)", x = "Year", y = "Cash") +
  theme_minimal()

STL Decomposition

STL was used to understand the structure of each ATM’s withdrawal pattern. By breaking the time series into three parts: the trend, seasonality, and remainder. The STL plots shows that each ATM has a clear weekly seasonality and consistent trends, supporting the decision to use ETS models for forecasting. Additionally, any irregular behaviorlike spikes or dips—was easily visible in the residuals, such as identify anomalies like the sharp increase observed in ATM4.

par(mfrow = c(2, 2))
for (i in 1:ncol(atm_data_ts)) {
  atm_name <- colnames(atm_data_ts)[i]
  ts_data <- ts(atm_data_ts[, i], frequency = 7)  # use weekly seasonality
  stl_decomp <- stl(ts_data, s.window = "periodic")
  plot(stl_decomp, main = paste("STL Decomposition -", atm_name))
}

par(mfrow = c(1, 1))

Creating Forecasting

Techniques Used

ETS is my method of choice to model because it captures level, trend, and seasonality with minimal manual tuning. Each ATM was modeled independently using the ets() function.

Techniques Not Used

ARIMA: I chose not to use ARIMA although ARIMA is a powerful model for time series forecasting, it usually needs careful parameter tuning (p, d, q) and stationarity checks. I think in this case, the ETS model was able to capture level, trend, and seasonality adequately without such manual tuning.

forecast_list <- list()
for (i in 1:ncol(atm_data_ts)) {
  atm_name <- colnames(atm_data_ts)[i]
  ets_model <- ets(atm_data_ts[, i], allow.multiplicative.trend = TRUE)
  forecast_list[[atm_name]] <- forecast(ets_model, h = 31)
}

Forecast Visualization

par(mfrow=c(2,2))
for (atm in names(forecast_list)) {
  forecast_obj <- forecast_list[[atm]]
  plot(forecast_obj, main = paste("Forecast for", atm), xaxt = "n")
  time_range <- time(forecast_obj$mean)
  axis(1, at = floor(time_range[1]):ceiling(time_range[length(time_range)]),
       labels = floor(time_range[1]):ceiling(time_range[length(time_range)]))
}

par(mfrow=c(1,1))

saving Forecasts to file

Forecast Output

The final forecasts for May 2010 are exported to ATM_May2010_Forecast.xlsx. Each ATM’s daily cash demand is projected in full dollars.

forecast_df <- data.frame(Date = seq.Date(from = as.Date("2010-05-01"), by = "day", length.out = 31))
for (atm in names(forecast_list)) {
  forecast_df[[atm]] <- as.numeric(forecast_list[[atm]]$mean) * 100  # Convert to dollars
}

write.xlsx(forecast_df, "ATM_May2010_Forecast.xlsx", rowNames = FALSE)

Conclusion

The ETS model was a simple and reliable way to forecast the ATM withdrawals. Even though there are more advanced methods, ETS worked well and was easy to understand for this project.

The ETS models produced clear and stable forecasts for ATM1 and ATM2, both of which showed consistent seasonal patterns and moderate daily fluctuations. ATM3’s forecast seems to have been affected by drastic change in withdrawal behavior closer to the end of the data, which the model projected. ATM4 show an extreme outlier close to the end of the time series, it caused the forecast to spike.

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward.

For this analysis we will model monthly residential power consumption (KWH) from January 1998 to December 2013, and generate forecasts for the entire year of 2014 using a TBATS model.

Data Preparation

We will first load the Excel file, Converts the YYYY-MMM format to a proper Date, Sort the data chronologically, Fills missing values and convert the clean data into a monthly time series starting in January 1998.

data <- read_excel("/Users/michaelrobinson/Downloads/ResidentialCustomerForecastLoad-624.xlsx", sheet = "ResidentialCustomerForecastLoad")

data <- data |>
  mutate(Date = as.Date(paste0(substr(`YYYY-MMM`, 1, 4), "-", match(substr(`YYYY-MMM`, 6, 8), month.abb), "-01"))) |>
  arrange(Date)

data$KWH <- na_interpolation(data$KWH)

ts_kwh <- ts(data$KWH, start = c(1998, 1), frequency = 12)

Plot Data

Plot the entire time series (1998–2013) to visually examine consumption trends, outliers, and seasonal fluctuations.

plot(ts_kwh, main = "Original Residential Power Consumption Data (1998–2013)", ylab = "KWH", xlab = "Year")

Time Series Decomposition

Decomposes the time series using STL to spot Trend, Seasonality and Remainder. This analysis helps with chosing an suitable model.

decomposed <- stl(ts_kwh, s.window = "periodic")
plot(decomposed)

The STL decomposition shows Strong seasonality with consistent annual cycles. a growing trend starting around 2010 and small residuals, except for an outlier dip in 2010.

Modeling & Forecasting

We will use the TBATS model for this time series data as it’s ideal for complex seasonal structure, non-linear trends, and non-constant variance. we will then forecasts 12 months to 2014, and visualizes the forecast with confidence intervals.

model <- tbats(ts_kwh)
summary(model)
##                   Length Class  Mode     
## lambda               0   -none- NULL     
## alpha                1   -none- numeric  
## beta                 0   -none- NULL     
## damping.parameter    0   -none- NULL     
## gamma.one.values     1   -none- numeric  
## gamma.two.values     1   -none- numeric  
## ar.coefficients      0   -none- NULL     
## ma.coefficients      1   -none- numeric  
## likelihood           1   -none- numeric  
## optim.return.code    1   -none- numeric  
## variance             1   -none- numeric  
## AIC                  1   -none- numeric  
## parameters           2   -none- list     
## seed.states         12   -none- numeric  
## fitted.values      192   ts     numeric  
## errors             192   ts     numeric  
## x                 2304   -none- numeric  
## seasonal.periods     1   -none- numeric  
## k.vector             1   -none- numeric  
## y                  192   ts     numeric  
## p                    1   -none- numeric  
## q                    1   -none- numeric  
## call                 2   -none- call     
## series               1   -none- character
## method               1   -none- character
forecast_2014 <- forecast(model, h = 12)
plot(forecast_2014, main = "Forecast of Monthly KWH Usage for 2014")

forecast_2014
##          Point Forecast   Lo 80    Hi 80   Lo 95    Hi 95
## Jan 2014        9652315 8625618 10679013 8082117 11222514
## Feb 2014        7914446 6844563  8984329 6278202  9550691
## Mar 2014        6863193 5788802  7937584 5220054  8506332
## Apr 2014        6161878 5083740  7240017 4513008  7810748
## May 2014        5904393 4822730  6986056 4250133  7558654
## Jun 2014        7455357 6370316  8540398 5795930  9114784
## Jul 2014        8365892 7277704  9454079 6701653 10030131
## Aug 2014        9211425 8119969 10302881 7542186 10880664
## Sep 2014        8572564 7478226  9666901 6898919 10246208
## Oct 2014        6625037 5527513  7722561 4946519  8303555
## Nov 2014        5854523 4754453  6954593 4172112  7536934
## Dec 2014        7265369 6162215  8368523 5578240  8952498

Closer look at the Forecast

Improves visualization by: - Overlaying the last two years of actual data. - Adding shaded ribbons for 80% and 95% forecast confidence intervals.

df_hist <- data |>
  filter(Date >= as.Date("2012-01-01")) |>
  select(Date, KWH)

df_forecast <- data.frame(
  Date = seq(as.Date("2014-01-01"), by = "month", length.out = 12),
  KWH = forecast_2014$mean,
  Lo80 = forecast_2014$lower[, 1],
  Hi80 = forecast_2014$upper[, 1],
  Lo95 = forecast_2014$lower[, 2],
  Hi95 = forecast_2014$upper[, 2]
)

ggplot() +
  geom_line(data = df_hist, aes(x = Date, y = KWH), color = "blue") +
  geom_ribbon(data = df_forecast, aes(x = Date, ymin = Lo95, ymax = Hi95), fill = "orange", alpha = 0.2) +
  geom_ribbon(data = df_forecast, aes(x = Date, ymin = Lo80, ymax = Hi80), fill = "orange", alpha = 0.4) +
  geom_line(data = df_forecast, aes(x = Date, y = KWH), color = "yellow", size = 1) +
  labs(title = "Power Consumption Forecast (Last Two Years of Historical Data)",
       x = "Date", y = "KWH") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Conclusion

We used the TBATS model to forecast monthly residential power consumption for 2014 based on 16 years of historical data. This model was chosen because it best suites complex seasonal patterns. STL decomposition revealed strong seasonality and trend, making TBATS a suitable choice. The results demonstrate a clear understanding of consumption dynamics and offer valuable insights for energy planning, capacity management, and sustainability forecasting.