I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
The purpose of this project was to forecast daily cash withdrawals (in dollars) for four ATM machines for the month of May 2010, based on historical data from May 2009 to April 2010.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(readr)
library(openxlsx)
library(zoo)
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
The dataset included one year of daily cash withdrawal values per ATM. Data was provided in hundreds of dollars and converted to actual dollar values. After removing rows with missing values, the data was structured into a time series format for each ATM.
atm_data <- read_csv("/Users/michaelrobinson/Downloads/ATM_Daily_Cash_Withdrawals.csv")
atm_data_interp <- atm_data
atm_data_interp[,-1] <- lapply(atm_data[,-1], function(col) na.approx(col, na.rm = FALSE))
atm_data_ts <- ts(atm_data_interp[,-1], start = c(2009, 5), frequency = 365)
atm_long <- atm_data %>%
pivot_longer(cols = -DATE, names_to = "ATM", values_to = "Cash") %>%
mutate(DATE = as.Date(DATE))
ggplot(atm_long, aes(x = DATE, y = Cash)) +
geom_line() +
facet_wrap(~ATM, ncol = 1, scales = "free_y") +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
labs(title = "Daily Cash Withdrawals per ATM (in $100s)", x = "Year", y = "Cash") +
theme_minimal()
STL was used to understand the structure of each ATM’s withdrawal pattern. By breaking the time series into three parts: the trend, seasonality, and remainder. The STL plots shows that each ATM has a clear weekly seasonality and consistent trends, supporting the decision to use ETS models for forecasting. Additionally, any irregular behaviorlike spikes or dips—was easily visible in the residuals, such as identify anomalies like the sharp increase observed in ATM4.
par(mfrow = c(2, 2))
for (i in 1:ncol(atm_data_ts)) {
atm_name <- colnames(atm_data_ts)[i]
ts_data <- ts(atm_data_ts[, i], frequency = 7) # use weekly seasonality
stl_decomp <- stl(ts_data, s.window = "periodic")
plot(stl_decomp, main = paste("STL Decomposition -", atm_name))
}
par(mfrow = c(1, 1))
ETS is my method of choice to model because it
captures level, trend, and seasonality with minimal manual tuning. Each
ATM was modeled independently using the ets()
function.
ARIMA: I chose not to use ARIMA although ARIMA is a powerful model for time series forecasting, it usually needs careful parameter tuning (p, d, q) and stationarity checks. I think in this case, the ETS model was able to capture level, trend, and seasonality adequately without such manual tuning.
forecast_list <- list()
for (i in 1:ncol(atm_data_ts)) {
atm_name <- colnames(atm_data_ts)[i]
ets_model <- ets(atm_data_ts[, i], allow.multiplicative.trend = TRUE)
forecast_list[[atm_name]] <- forecast(ets_model, h = 31)
}
par(mfrow=c(2,2))
for (atm in names(forecast_list)) {
forecast_obj <- forecast_list[[atm]]
plot(forecast_obj, main = paste("Forecast for", atm), xaxt = "n")
time_range <- time(forecast_obj$mean)
axis(1, at = floor(time_range[1]):ceiling(time_range[length(time_range)]),
labels = floor(time_range[1]):ceiling(time_range[length(time_range)]))
}
par(mfrow=c(1,1))
The final forecasts for May 2010 are exported to
ATM_May2010_Forecast.xlsx
. Each ATM’s daily cash demand is
projected in full dollars.
forecast_df <- data.frame(Date = seq.Date(from = as.Date("2010-05-01"), by = "day", length.out = 31))
for (atm in names(forecast_list)) {
forecast_df[[atm]] <- as.numeric(forecast_list[[atm]]$mean) * 100 # Convert to dollars
}
write.xlsx(forecast_df, "ATM_May2010_Forecast.xlsx", rowNames = FALSE)
The ETS model was a simple and reliable way to forecast the ATM withdrawals. Even though there are more advanced methods, ETS worked well and was easy to understand for this project.
The ETS models produced clear and stable forecasts for ATM1 and ATM2, both of which showed consistent seasonal patterns and moderate daily fluctuations. ATM3’s forecast seems to have been affected by drastic change in withdrawal behavior closer to the end of the data, which the model projected. ATM4 show an extreme outlier close to the end of the time series, it caused the forecast to spike.
For this analysis we will model monthly residential power consumption
(KWH
) from January 1998 to December 2013,
and generate forecasts for the entire year of 2014
using a TBATS model.
We will first load the Excel file, Converts the YYYY-MMM
format to a proper Date
, Sort the data chronologically,
Fills missing values and convert the clean data into a monthly time
series starting in January 1998.
data <- read_excel("/Users/michaelrobinson/Downloads/ResidentialCustomerForecastLoad-624.xlsx", sheet = "ResidentialCustomerForecastLoad")
data <- data |>
mutate(Date = as.Date(paste0(substr(`YYYY-MMM`, 1, 4), "-", match(substr(`YYYY-MMM`, 6, 8), month.abb), "-01"))) |>
arrange(Date)
data$KWH <- na_interpolation(data$KWH)
ts_kwh <- ts(data$KWH, start = c(1998, 1), frequency = 12)
Plot the entire time series (1998–2013) to visually examine consumption trends, outliers, and seasonal fluctuations.
plot(ts_kwh, main = "Original Residential Power Consumption Data (1998–2013)", ylab = "KWH", xlab = "Year")
Decomposes the time series using STL to spot Trend, Seasonality and Remainder. This analysis helps with chosing an suitable model.
decomposed <- stl(ts_kwh, s.window = "periodic")
plot(decomposed)
The STL decomposition shows Strong seasonality with consistent annual cycles. a growing trend starting around 2010 and small residuals, except for an outlier dip in 2010.
We will use the TBATS model for this time series data as it’s ideal for complex seasonal structure, non-linear trends, and non-constant variance. we will then forecasts 12 months to 2014, and visualizes the forecast with confidence intervals.
model <- tbats(ts_kwh)
summary(model)
## Length Class Mode
## lambda 0 -none- NULL
## alpha 1 -none- numeric
## beta 0 -none- NULL
## damping.parameter 0 -none- NULL
## gamma.one.values 1 -none- numeric
## gamma.two.values 1 -none- numeric
## ar.coefficients 0 -none- NULL
## ma.coefficients 1 -none- numeric
## likelihood 1 -none- numeric
## optim.return.code 1 -none- numeric
## variance 1 -none- numeric
## AIC 1 -none- numeric
## parameters 2 -none- list
## seed.states 12 -none- numeric
## fitted.values 192 ts numeric
## errors 192 ts numeric
## x 2304 -none- numeric
## seasonal.periods 1 -none- numeric
## k.vector 1 -none- numeric
## y 192 ts numeric
## p 1 -none- numeric
## q 1 -none- numeric
## call 2 -none- call
## series 1 -none- character
## method 1 -none- character
forecast_2014 <- forecast(model, h = 12)
plot(forecast_2014, main = "Forecast of Monthly KWH Usage for 2014")
forecast_2014
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 9652315 8625618 10679013 8082117 11222514
## Feb 2014 7914446 6844563 8984329 6278202 9550691
## Mar 2014 6863193 5788802 7937584 5220054 8506332
## Apr 2014 6161878 5083740 7240017 4513008 7810748
## May 2014 5904393 4822730 6986056 4250133 7558654
## Jun 2014 7455357 6370316 8540398 5795930 9114784
## Jul 2014 8365892 7277704 9454079 6701653 10030131
## Aug 2014 9211425 8119969 10302881 7542186 10880664
## Sep 2014 8572564 7478226 9666901 6898919 10246208
## Oct 2014 6625037 5527513 7722561 4946519 8303555
## Nov 2014 5854523 4754453 6954593 4172112 7536934
## Dec 2014 7265369 6162215 8368523 5578240 8952498
Improves visualization by: - Overlaying the last two years of actual data. - Adding shaded ribbons for 80% and 95% forecast confidence intervals.
df_hist <- data |>
filter(Date >= as.Date("2012-01-01")) |>
select(Date, KWH)
df_forecast <- data.frame(
Date = seq(as.Date("2014-01-01"), by = "month", length.out = 12),
KWH = forecast_2014$mean,
Lo80 = forecast_2014$lower[, 1],
Hi80 = forecast_2014$upper[, 1],
Lo95 = forecast_2014$lower[, 2],
Hi95 = forecast_2014$upper[, 2]
)
ggplot() +
geom_line(data = df_hist, aes(x = Date, y = KWH), color = "blue") +
geom_ribbon(data = df_forecast, aes(x = Date, ymin = Lo95, ymax = Hi95), fill = "orange", alpha = 0.2) +
geom_ribbon(data = df_forecast, aes(x = Date, ymin = Lo80, ymax = Hi80), fill = "orange", alpha = 0.4) +
geom_line(data = df_forecast, aes(x = Date, y = KWH), color = "yellow", size = 1) +
labs(title = "Power Consumption Forecast (Last Two Years of Historical Data)",
x = "Date", y = "KWH") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
We used the TBATS model to forecast monthly residential power consumption for 2014 based on 16 years of historical data. This model was chosen because it best suites complex seasonal patterns. STL decomposition revealed strong seasonality and trend, making TBATS a suitable choice. The results demonstrate a clear understanding of consumption dynamics and offer valuable insights for energy planning, capacity management, and sustainability forecasting.