Part A – ATM Cash Forecast

Introduction

Automated Teller Machines (ATMs) play a critical role in modern banking by providing customers with continuous access to cash. Efficient management of ATM cash levels is essential to ensure reliable service while minimizing operational costs. The dataset used in this analysis contains daily cash withdrawal amounts from four ATM machines over a specified time period, with withdrawals measured in hundreds of dollars.

Understanding the patterns in cash demand—such as trends, variability, and potential seasonality—is important for developing accurate forecasts. Time series methods, including transformation, decomposition, and ARIMA modeling, are used to capture these patterns and generate reliable predictions for future cash requirements.

The primary objective of this analysis is to forecast daily cash withdrawals for each ATM for the month of May 2010. These forecasts will support better planning and decision-making for ATM cash management.

Business Problem

Financial institutions must ensure that ATMs are adequately stocked with cash at all times. Underestimating demand can lead to empty machines, resulting in poor customer experience and potential revenue loss. Overestimating demand, on the other hand, leads to excess cash being held in machines, increasing security risks and operational costs.

The key business problem is to determine how much cash should be allocated to each ATM on a daily basis. This requires accurate forecasting of withdrawal demand based on historical data.

By developing robust time series models, this analysis aims to:

Predict future cash withdrawal patterns Optimize cash replenishment schedules Reduce operational and transportation costs Minimize the risk associated with holding excess cash Improve overall service reliability

Ultimately, the goal is to provide a data-driven approach to ATM cash management, enabling more efficient and cost-effective operations.

Introduction

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
library(readr)
library(dplyr)

## Warning: package 'dplyr' was built under R version 4.4.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(tsibble)

## Warning: package 'tsibble' was built under R version 4.4.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## 
## Attaching package: 'tsibble'

## The following object is masked from 'package:lubridate':
## 
##     interval

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.3

## Registered S3 methods overwritten by 'ggtime':
##   method                        from      
##   +.gg_tsensemble               feasts    
##   autolayer.fbl_ts              fabletools
##   autolayer.tbl_ts              fabletools
##   autoplot.dcmp_ts              fabletools
##   autoplot.fbl_ts               fabletools
##   autoplot.tbl_cf               feasts    
##   autoplot.tbl_ts               fabletools
##   chooseOpsMethod.gg_tsensemble feasts    
##   fortify.fbl_ts                fabletools
##   grid.draw.gg_tsensemble       feasts    
##   print.gg_tsensemble           feasts    
##   scale_type.cf_lag             feasts

## Warning: replacing previous import 'feasts::scale_x_cf_lag' by
## 'ggtime::scale_x_cf_lag' when loading 'fpp3'

## Warning: replacing previous import 'feasts::gg_season' by 'ggtime::gg_season'
## when loading 'fpp3'

## Warning: replacing previous import 'feasts::gg_tsresiduals' by
## 'ggtime::gg_tsresiduals' when loading 'fpp3'

## Warning: replacing previous import 'feasts::gg_irf' by 'ggtime::gg_irf' when
## loading 'fpp3'

## Warning: replacing previous import 'feasts::gg_arma' by 'ggtime::gg_arma' when
## loading 'fpp3'

## Warning: replacing previous import 'feasts::gg_tsdisplay' by
## 'ggtime::gg_tsdisplay' when loading 'fpp3'

## Warning: replacing previous import 'feasts::gg_subseries' by
## 'ggtime::gg_subseries' when loading 'fpp3'

## Warning: replacing previous import 'feasts::gg_lag' by 'ggtime::gg_lag' when
## loading 'fpp3'

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.3 ──

## ✔ tibble      3.2.1     ✔ ggtime      0.2.0
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ ggplot2     3.5.2     ✔ fable       0.5.0
## ✔ tsibbledata 0.4.1

## Warning: package 'ggplot2' was built under R version 4.4.3

## Warning: package 'tsibbledata' was built under R version 4.4.3

## Warning: package 'ggtime' was built under R version 4.4.3

## Warning: package 'feasts' was built under R version 4.4.3

## Warning: package 'fabletools' was built under R version 4.4.3

## Warning: package 'fable' was built under R version 4.4.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()        masks base::date()
## ✖ dplyr::filter()          masks stats::filter()
## ✖ feasts::gg_arma()        masks ggtime::gg_arma()
## ✖ feasts::gg_irf()         masks ggtime::gg_irf()
## ✖ feasts::gg_lag()         masks ggtime::gg_lag()
## ✖ feasts::gg_season()      masks ggtime::gg_season()
## ✖ feasts::gg_subseries()   masks ggtime::gg_subseries()
## ✖ feasts::gg_tsdisplay()   masks ggtime::gg_tsdisplay()
## ✖ feasts::gg_tsresiduals() masks ggtime::gg_tsresiduals()
## ✖ tsibble::intersect()     masks base::intersect()
## ✖ tsibble::interval()      masks lubridate::interval()
## ✖ dplyr::lag()             masks stats::lag()
## ✖ feasts::scale_x_cf_lag() masks ggtime::scale_x_cf_lag()
## ✖ tsibble::setdiff()       masks base::setdiff()
## ✖ tsibble::union()         masks base::union()

library(tidyr)
library(ggplot2)
library(zoo)

## Warning: package 'zoo' was built under R version 4.4.3

## 
## Attaching package: 'zoo'

## The following object is masked from 'package:tsibble':
## 
##     index

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(janitor)

## Warning: package 'janitor' was built under R version 4.4.3

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Data Loading

ATMDATA <- read_csv("https://raw.githubusercontent.com/tanzil64/DATA624/refs/heads/main/ATM624Data.csv")

ATMDATA <- ATMDATA %>%
  mutate(DATE = as.Date(DATE, format = "%m/%d/%Y")) %>%
  filter(!is.na(DATE))

head(ATMDATA)

summary(ATMDATA)

##       DATE                ATM                 Cash        
##  Min.   :2009-05-01   Length:1474        Min.   :    0.0  
##  1st Qu.:2009-08-01   Class :character   1st Qu.:    0.5  
##  Median :2009-11-01   Mode  :character   Median :   73.0  
##  Mean   :2009-10-31                      Mean   :  155.6  
##  3rd Qu.:2010-02-01                      3rd Qu.:  114.0  
##  Max.   :2010-05-14                      Max.   :10920.0  
##                                          NA's   :19

str(ATMDATA)

## tibble [1,474 × 3] (S3: tbl_df/tbl/data.frame)
##  $ DATE: Date[1:1474], format: "2009-05-01" "2009-05-01" ...
##  $ ATM : chr [1:1474] "ATM1" "ATM2" "ATM1" "ATM2" ...
##  $ Cash: num [1:1474] 96 107 82 89 85 90 90 55 99 79 ...

head(ATMDATA)

summary(ATMDATA)

##       DATE                ATM                 Cash        
##  Min.   :2009-05-01   Length:1474        Min.   :    0.0  
##  1st Qu.:2009-08-01   Class :character   1st Qu.:    0.5  
##  Median :2009-11-01   Mode  :character   Median :   73.0  
##  Mean   :2009-10-31                      Mean   :  155.6  
##  3rd Qu.:2010-02-01                      3rd Qu.:  114.0  
##  Max.   :2010-05-14                      Max.   :10920.0  
##                                          NA's   :19

Missing Data Inspection

library(naniar)
gg_miss_var(ATMDATA)

Data Preparation

atm <- ATMDATA %>%
  mutate(Cash = as.integer(Cash))

str(atm)

## tibble [1,474 × 3] (S3: tbl_df/tbl/data.frame)
##  $ DATE: Date[1:1474], format: "2009-05-01" "2009-05-01" ...
##  $ ATM : chr [1:1474] "ATM1" "ATM2" "ATM1" "ATM2" ...
##  $ Cash: int [1:1474] 96 107 82 89 85 90 90 55 99 79 ...

Because the four ATM machines may behave differently, each series will be analyzed separately.

atm1 <- atm %>%
  filter(ATM == "ATM1") %>%
  as_tsibble(index = DATE)

atm2 <- atm %>%
  filter(ATM == "ATM2") %>%
  as_tsibble(index = DATE)

atm3 <- atm %>%
  filter(ATM == "ATM3") %>%
  as_tsibble(index = DATE)

atm4 <- atm %>%
  filter(ATM == "ATM4") %>%
  as_tsibble(index = DATE)

Initial Time Plots

autoplot(atm1) +
  labs(title = "ATM1", subtitle = "Cash withdrawals per day", y = "Hundreds USD")

autoplot(atm2) +
  labs(title = "ATM2", subtitle = "Cash withdrawals per day", y = "Hundreds USD")

autoplot(atm3) +
  labs(title = "ATM3", subtitle = "Cash withdrawals per day", y = "Hundreds USD")

autoplot(atm4) +
  labs(title = "ATM4", subtitle = "Cash withdrawals per day", y = "Hundreds USD")

From the plots, ATM1 and ATM2 appear to have relatively stable variation and possible seasonality. ATM3 contains very few observations. ATM4 contains one clear outlier. # Outlier Treatment

which(atm4$Cash == 10919)

## integer(0)

The value 10919 is an obvious outlier, so it is replaced with the mean of the ATM4 series.

atm4 <- atm4 %>%
  mutate(Cash = ifelse(Cash == 10919,
                       round(mean(Cash, na.rm = TRUE), 0),
                       Cash))

autoplot(atm4) +
  labs(title = "ATM4 After Outlier Correction",
       subtitle = "Cash withdrawals per day",
       y = "Hundreds USD")

Missing Value Treatment

sum(is.na(atm1$Cash))

## [1] 3

sum(is.na(atm2$Cash))

## [1] 2

sum(is.na(atm3$Cash))

## [1] 0

sum(is.na(atm4$Cash))

## [1] 0

which(is.na(atm1$Cash))

## [1] 44 47 53

which(is.na(atm2$Cash))

## [1] 49 55

We inspect the distributions of ATM1 and ATM2 before imputing the missing values.

hist(atm1$Cash, main = "ATM1 Cash Distribution", xlab = "Cash")

hist(atm2$Cash, main = "ATM2 Cash Distribution", xlab = "Cash")

Since the distributions do not appear heavily skewed, the missing values are imputed using the median.

atm1$Cash[44] <- round(median(atm1$Cash, na.rm = TRUE), 0)
atm1$Cash[47] <- round(median(atm1$Cash, na.rm = TRUE), 0)
atm1$Cash[53] <- round(median(atm1$Cash, na.rm = TRUE), 0)

atm2$Cash[49] <- round(median(atm2$Cash, na.rm = TRUE), 0)
atm2$Cash[55] <- round(median(atm2$Cash, na.rm = TRUE), 0)

Box-Cox Transformation

lambda1 <- atm1 %>%
  filter(!is.na(Cash)) %>%
  features(Cash + 1, guerrero) %>%
  pull(lambda_guerrero)

lambda2 <- atm2 %>%
  filter(!is.na(Cash)) %>%
  features(Cash + 1, guerrero) %>%
  pull(lambda_guerrero)

lambda3 <- atm3 %>%
  filter(!is.na(Cash)) %>%
  features(Cash + 1, guerrero) %>%
  pull(lambda_guerrero)

lambda4 <- atm4 %>%
  filter(!is.na(Cash)) %>%
  features(Cash + 1, guerrero) %>%
  pull(lambda_guerrero)

plot_trans1 <- atm1 %>%
  autoplot(box_cox(Cash + 1, lambda1)) +
  labs(title = "ATM1 Transformed", subtitle = "Cash withdrawals per day", y = "USD")

plot_trans2 <- atm2 %>%
  autoplot(box_cox(Cash + 1, lambda2)) +
  labs(title = "ATM2 Transformed", subtitle = "Cash withdrawals per day", y = "USD")

plot_trans3 <- atm3 %>%
  autoplot(box_cox(Cash + 1, lambda3)) +
  labs(title = "ATM3 Transformed", subtitle = "Cash withdrawals per day", y = "USD")

plot_trans4 <- atm4 %>%
  autoplot(box_cox(Cash + 1, lambda4)) +
  labs(title = "ATM4 Transformed", subtitle = "Cash withdrawals per day", y = "USD")

Decomposition

atm1 %>%
  model(classical_decomposition(box_cox(Cash, lambda1), type = "additive")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Additive Decomposition of ATM1")

atm2 %>%
  model(classical_decomposition(box_cox(Cash, lambda2), type = "additive")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Additive Decomposition of ATM2")

atm3 %>%
  model(classical_decomposition(box_cox(Cash, lambda3), type = "additive")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Additive Decomposition of ATM3")

atm4 %>%
  model(classical_decomposition(box_cox(Cash, lambda4), type = "additive")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical Additive Decomposition of ATM4")

ACF Analysis

plot_acf1 <- atm1 %>%
  ACF(box_cox(Cash, lambda1)) %>%
  autoplot() +
  labs(title = "Autocorrelation of Cash - ATM1")

plot_acf2 <- atm2 %>%
  ACF(box_cox(Cash, lambda2)) %>%
  autoplot() +
  labs(title = "Autocorrelation of Cash - ATM2")

plot_acf3 <- atm3 %>%
  ACF(box_cox(Cash, lambda3)) %>%
  autoplot() +
  labs(title = "Autocorrelation of Cash - ATM3")

plot_acf4 <- atm4 %>%
  ACF(box_cox(Cash, lambda4)) %>%
  autoplot() +
  labs(title = "Autocorrelation of Cash - ATM4")
library(gridExtra)
grid.arrange(plot_acf1, plot_acf2, plot_acf3, plot_acf4, nrow = 2)

Differencing Checks

atm1 %>% features(box_cox(Cash, lambda1), unitroot_nsdiffs)

atm2 %>% features(box_cox(Cash, lambda2), unitroot_nsdiffs)

atm3 %>% features(box_cox(Cash, lambda3), unitroot_nsdiffs)

atm4 %>% features(box_cox(Cash, lambda4), unitroot_nsdiffs)

The results indicate that ATM1 and ATM2 require one seasonal difference, while ATM3 and ATM4 do not.

We next check whether any additional regular differencing is required.

atm1 %>% features(difference(box_cox(Cash, lambda1), 7), unitroot_ndiffs)

atm2 %>% features(difference(box_cox(Cash, lambda2), 7), unitroot_ndiffs)

atm3 %>% features(box_cox(Cash, lambda3), unitroot_ndiffs)

atm4 %>% features(box_cox(Cash, lambda4), unitroot_ndiffs)

No additional non-seasonal differencing appears necessary.

ACF After Seasonal Differencing

atm1 %>%
  ACF(difference(box_cox(Cash, lambda1), 7)) %>%
  autoplot() +
  labs(title = "ACF of Differenced ATM1")

atm2 %>%
  ACF(difference(box_cox(Cash, lambda2), 7)) %>%
  autoplot() +
  labs(title = "ACF of Differenced ATM2")

After differencing, the series look much closer to white noise. # Model Building

Because ATM1 and ATM2 need seasonal differencing, ARIMA models are appropriate. ATM3 has too little data, so a naive model is more reasonable. # ATM1 Model

atm1_fit <- atm1 %>%
  model(ARIMA(box_cox(Cash, lambda1)))

report(atm1_fit)

## Series: Cash 
## Model: ARIMA(0,0,2)(0,1,1)[7] 
## Transformation: box_cox(Cash, lambda1) 
## 
## Coefficients:
##          ma1      ma2     sma1
##       0.0980  -0.1143  -0.6463
## s.e.  0.0524   0.0527   0.0425
## 
## sigma^2 estimated as 1.483:  log likelihood=-578.9
## AIC=1165.81   AICc=1165.92   BIC=1181.33

atm1_fit %>% gg_tsresiduals()

ATM2 Model

atm2_fit <- atm2 %>%
  model(ARIMA(box_cox(Cash, lambda2)))

report(atm2_fit)

## Series: Cash 
## Model: ARIMA(2,0,2)(0,1,1)[7] 
## Transformation: box_cox(Cash, lambda2) 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2     sma1
##       -0.4246  -0.9072  0.4679  0.8003  -0.7257
## s.e.   0.0587   0.0439  0.0890  0.0584   0.0409
## 
## sigma^2 estimated as 67.61:  log likelihood=-1261.88
## AIC=2535.75   AICc=2535.99   BIC=2559.03

atm2_fit %>% gg_tsresiduals()

ATM3 Model

atm3_fit <- atm3 %>%
  model(NAIVE(box_cox(Cash, lambda3)))

report(atm3_fit)

## Series: Cash 
## Model: NAIVE 
## Transformation: box_cox(Cash, lambda3) 
## 
## sigma^2: 5.0877

atm3_fit %>% gg_tsresiduals()

ATM4 Model

atm4_fit <- atm4 %>%
  model(ARIMA(box_cox(Cash, lambda4)))

report(atm4_fit)

## Series: Cash 
## Model: ARIMA(0,0,0)(2,0,0)[7] w/ mean 
## Transformation: box_cox(Cash, lambda4) 
## 
## Coefficients:
##         sar1    sar2  constant
##       0.2501  0.1951    2.4713
## s.e.  0.0521  0.0525    0.0460
## 
## sigma^2 estimated as 0.8133:  log likelihood=-479.32
## AIC=966.65   AICc=966.76   BIC=982.25

atm4_fit %>% gg_tsresiduals()

Forecasting

We now forecast 31 days ahead for May 2010.

forecast_atm1 <- atm1_fit %>% forecast(h = 31)
forecast_atm2 <- atm2_fit %>% forecast(h = 31)
forecast_atm3 <- atm3_fit %>% forecast(h = 31)
forecast_atm4 <- atm4_fit %>% forecast(h = 31)

Forecast Plots

autoplot(forecast_atm1, atm1) + labs(title = "Forecast for ATM1")

autoplot(forecast_atm2, atm2) + labs(title = "Forecast for ATM2")

autoplot(forecast_atm3, atm3) + labs(title = "Forecast for ATM3")

autoplot(forecast_atm4, atm4) + labs(title = "Forecast for ATM4")

Export Forecasts

write.csv(as.data.frame(forecast_atm1), "forecast_atm1.csv", row.names = FALSE)
write.csv(as.data.frame(forecast_atm2), "forecast_atm2.csv", row.names = FALSE)
write.csv(as.data.frame(forecast_atm3), "forecast_atm3.csv", row.names = FALSE)
write.csv(as.data.frame(forecast_atm4), "forecast_atm4.csv", row.names = FALSE)

How the Model and Forecast Solve the Business Problem

The business problem is to determine how much cash each ATM should hold in order to avoid shortages while minimizing excess cash. The time series models developed in this analysis directly address this problem by converting historical withdrawal patterns into quantitative forecasts of future demand.

First, the models capture key characteristics of the data, such as trend, variability, and seasonality. For example, ATM1 and ATM2 showed clear weekly patterns, which were incorporated through seasonal differencing in the ARIMA models. By accounting for these patterns, the models are able to produce more accurate and reliable forecasts.

Second, the forecasting step provides daily predictions of cash withdrawals for each ATM. These forecasts allow decision-makers to estimate how much cash will be needed on each day of May 2010. Instead of relying on intuition or simple averages, the bank can use these statistically grounded predictions to plan cash allocation.

As a result, the models help solve the business problem in the following ways:

Prevent stockouts: Forecasts ensure that enough cash is loaded into each ATM to meet expected demand, reducing the risk of machines running empty. Reduce excess cash: By avoiding overestimation, the bank minimizes unnecessary cash holding, lowering security risks and opportunity costs. Optimize replenishment schedules: Knowing expected demand allows for efficient planning of cash delivery routes and frequencies. Improve operational efficiency: Resources such as staff and transportation can be allocated more effectively based on predicted needs.

Conclusion

In summary, the models transform raw historical data into actionable insights, enabling the bank to make informed, data-driven decisions about ATM cash management. This directly improves service reliability while reducing operational costs.

Part B – Residential Power Forecast

Introduction

Accurate forecasting of residential electricity demand is critical for efficient energy planning and resource allocation. The dataset used in this analysis contains monthly residential power consumption from January 1998 through December 2013. The goal is to build a time series model and generate forecasts for 2014.

Business Problem

Energy providers must anticipate future electricity demand to ensure reliable service while minimizing operational costs. Underestimating demand can lead to shortages, while overestimating demand results in inefficient resource allocation. This analysis aims to develop a data-driven forecasting model to support better planning and decision-making.

Data Loading

power <- read_csv("https://raw.githubusercontent.com/tanzil64/DATA624/refs/heads/main/ResidentialCustomerForecastLoad-624.csv") %>%
  rename(Date = `YYYY-MMM`) %>%
  mutate(Date = yearmonth(Date)) %>%
  as_tsibble(index = Date)

power

Missing Value Handling

power %>% filter(is.na(KWH))

There is one missing value (September 2008). This value is replaced using the median of the series.

power <- power %>%
  mutate(KWH = ifelse(is.na(KWH),
                      median(KWH, na.rm = TRUE),
                      KWH))

Exploratory Analysis

power %>%
  autoplot(KWH) +
  labs(title = "Monthly Residential Power Consumption",
       y = "KWH")

The data shows strong seasonal patterns and variability over time.

power %>%
  gg_season(KWH) +
  labs(title = "Seasonal Plot")

power %>%
  gg_subseries(KWH) +
  labs(title = "Subseries Plot")

These plots confirm strong yearly seasonality.

Box-Cox Transformation

lambda <- power %>%
  features(KWH, guerrero) %>%
  pull(lambda_guerrero)

lambda

## [1] 0.1041666

power %>%
  autoplot(box_cox(KWH, lambda)) +
  labs(title = "Box-Cox Transformed Series")

The transformation stabilizes variance and improves model suitability.

ACF Analysis

power %>%
  ACF(box_cox(KWH, lambda)) %>%
  autoplot() +
  labs(title = "ACF Plot")

The slow decay of the ACF suggests non-stationarity.

Differencing Check

power %>%
  features(KWH, unitroot_nsdiffs)

power %>%
  features(KWH, unitroot_ndiffs)

Seasonal differencing is required, which will be handled internally by the ARIMA model.

Model Building

ARIMA Model (Correct Approach)

fit_arima <- power %>%
  model(ARIMA(box_cox(KWH, lambda)))
report(fit_arima)

## Series: KWH 
## Model: ARIMA(0,1,2)(0,0,2)[12] 
## Transformation: box_cox(KWH, lambda) 
## 
## Coefficients:
##           ma1      ma2    sma1    sma2
##       -0.7266  -0.2492  0.2269  0.2104
## s.e.   0.0717   0.0706  0.0787  0.0656
## 
## sigma^2 estimated as 1.294:  log likelihood=-295.53
## AIC=601.05   AICc=601.38   BIC=617.31

fit_arima %>%
  gg_tsresiduals()

The residual diagnostics indicate that the residuals behave like white noise, suggesting a good model fit.

ETS Model

fit_ets <- power %>%
  model(ETS(KWH))

report(fit_ets)

## Series: KWH 
## Model: ETS(M,N,M) 
##   Smoothing parameters:
##     alpha = 0.1167131 
##     gamma = 0.0001038822 
## 
##   Initial states:
##     l[0]      s[0]     s[-1]     s[-2]    s[-3]    s[-4]    s[-5]    s[-6]
##  6133758 0.9567738 0.7524052 0.8702234 1.184894 1.261653 1.185654 1.000083
##      s[-7]     s[-8]     s[-9]   s[-10]   s[-11]
##  0.7742886 0.8153702 0.9116166 1.065073 1.221965
## 
##   sigma^2:  0.0144
## 
##      AIC     AICc      BIC 
## 6224.787 6227.514 6273.649

fit_ets %>%
  gg_tsresiduals()

Model Comparison

accuracy(fit_arima)

accuracy(fit_ets)

The ARIMA model shows lower RMSE and better residual behavior, making it the preferred model.

Forecasting (2014)

fc <- fit_arima %>%
  forecast(h = "12 months")

fc %>%
  autoplot(power) +
  labs(title = "Forecast for 2014")

Back Transformation

fc_final <- fc %>%
  as_tibble() %>%   
  mutate(KWH = inv_box_cox(.mean, lambda))

fc_final

Export Results

write.csv(fc_final, "results_rcfl.csv", row.names = FALSE)

How the Model Solves the Business Problem

The ARIMA model captures both trend and seasonal patterns in electricity consumption and generates reliable monthly forecasts. These forecasts allow energy providers to anticipate demand, optimize production, and improve operational efficiency.

Conclusion

This analysis developed a forecasting model for residential electricity demand using historical monthly data. After handling missing values and applying a Box-Cox transformation, the ARIMA model was selected based on its superior performance and residual diagnostics. The forecasts for 2014 provide a reliable basis for planning and decision-making, helping energy providers manage resources efficiently and reduce operational risk.

Part C – Business Case: Water Flow Forecasting Using ARIMA

1. Business Objective

The objective of this analysis is to model and forecast total water flow using time series techniques. Accurate forecasting supports operational planning, system efficiency, and resource allocation for water infrastructure management.

2. Data Preparation

Two datasets representing water flow from Pipe 1 and Pipe 2 were combined to create a unified time series. Missing values were handled by replacing them with zero, ensuring continuity in the dataset.

The data was aggregated to hourly intervals to align both datasets and create a consistent time index. The resulting time series reflects total water flow across both pipelines.

library(readr)
library(dplyr)
library(tsibble)
library(lubridate)
library(forecast)
library(ggplot2)

# Load Pipe 1
wfp1 <- read_csv(
  "https://raw.githubusercontent.com/tanzil64/DATA624/refs/heads/main/Waterflow_Pipe1.csv",
  show_col_types = FALSE
) %>%
  rename(WaterFlow = WaterFlow) %>%
  mutate(DateTime = mdy_hm(DateTime))

# Load Pipe 2
wfp2 <- read_csv(
  "https://raw.githubusercontent.com/tanzil64/DATA624/refs/heads/main/Waterflow_Pipe2.csv",
  show_col_types = FALSE
) %>%
  rename(WaterFlow = WaterFlow) %>%
  mutate(DateTime = mdy_hm(DateTime))

# Match Pipe 1 hour format to project example
wfp1 <- wfp1 %>%
  mutate(Date = as.Date(DateTime),
         Time = hour(DateTime) + 1) %>%
  group_by(Date, Time) %>%
  summarise(Water = mean(WaterFlow, na.rm = TRUE), .groups = "drop") %>%
  mutate(DateTime = ymd_h(paste(Date, Time))) %>%
  select(DateTime, Water)

# Match Pipe 2 hour format to project example
wfp2 <- wfp2 %>%
  mutate(Date = as.Date(DateTime),
         Time = hour(DateTime)) %>%
  group_by(Date, Time) %>%
  summarise(Water = mean(WaterFlow, na.rm = TRUE), .groups = "drop") %>%
  mutate(DateTime = ymd_h(paste(Date, Time))) %>%
  select(DateTime, Water)

# Combine the two pipes
water <- full_join(wfp1, wfp2, by = "DateTime", suffix = c("_1", "_2")) %>%
  mutate(
    Water_1 = ifelse(is.na(Water_1), 0, Water_1),
    Water_2 = ifelse(is.na(Water_2), 0, Water_2),
    Water = Water_1 + Water_2
  ) %>%
  select(DateTime, Water)

4. Data Transformation & Stationarity

A Box-Cox transformation was applied to stabilize variance. Differencing analysis showed:

Non-seasonal differencing required (d = 1) No seasonal differencing required (D = 0)

After transformation and differencing, the series exhibited stationarity, making it suitable for ARIMA modeling.

# Create time series
water_ts <- ts(water$Water, frequency = 24)

# Plot series
autoplot(water_ts) +
  labs(title = "Combined Water Flow Time Series",
       x = "Time",
       y = "Water Flow")

# Seasonal plots
ggseasonplot(water_ts) +
  theme(legend.title = element_blank()) +
  labs(title = "Seasonal Plot of Water Flow")

ggsubseriesplot(water_ts) +
  labs(title = "Subseries Plot of Water Flow")

Box-Cox transformation

# Box-Cox transformation
lambda <- BoxCox.lambda(water_ts)
water_bc <- BoxCox(water_ts, lambda)

# Display transformed series
ggtsdisplay(water_bc, main = "Box-Cox Transformed Water Flow")

# Differencing checks
ndiffs(water_ts)

## [1] 1

nsdiffs(water_ts)

## [1] 0

# Differenced transformed series
ggtsdisplay(diff(water_bc), points = FALSE,
            main = "Differenced Box-Cox Transformed Water Flow")

ARIMA model

# Manual ARIMA model
water_arima <- Arima(
  water_ts,
  order = c(0, 1, 1),
  seasonal = c(0, 0, 1),
  lambda = lambda
)

summary(water_arima)

## Series: water_ts 
## ARIMA(0,1,1)(0,0,1)[24] 
## Box Cox transformation: lambda= 0.8713037 
## 
## Coefficients:
##           ma1    sma1
##       -0.9578  0.0712
## s.e.   0.0101  0.0319
## 
## sigma^2 = 103.3:  log likelihood = -3734.4
## AIC=7474.8   AICc=7474.83   BIC=7489.52
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.1591549 16.33716 13.34589 -28.20783 50.26558 0.7507364
##                     ACF1
## Training set -0.04444759

checkresiduals(water_arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(0,0,1)[24]
## Q* = 57.881, df = 46, p-value = 0.1124
## 
## Model df: 2.   Total lags used: 48

# Auto ARIMA model
water_auto <- auto.arima(
  water_ts,
  approximation = FALSE,
  lambda = lambda
)

summary(water_auto)

## Series: water_ts 
## ARIMA(0,1,1)(1,0,0)[24] 
## Box Cox transformation: lambda= 0.8713037 
## 
## Coefficients:
##           ma1    sar1
##       -0.9578  0.0714
## s.e.   0.0101  0.0322
## 
## sigma^2 = 103.3:  log likelihood = -3734.4
## AIC=7474.8   AICc=7474.82   BIC=7489.52
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE     MASE
## Training set 0.1613506 16.33729 13.34564 -28.19083 50.25213 0.750722
##                     ACF1
## Training set -0.04431537

checkresiduals(water_auto)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(1,0,0)[24]
## Q* = 57.858, df = 46, p-value = 0.1128
## 
## Model df: 2.   Total lags used: 48

# Final model
water_model <- water_auto

# Forecast 7 days = 168 hours
water_f <- forecast(water_model, h = 7 * 24, level = 95)

# Plot forecast
autoplot(water_f) +
  labs(title = "Water Usage Forecast",
       subtitle = "ARIMA Forecast for Combined Pipe Flow",
       x = "Time",
       y = "Water Flow")

# Export forecast
write.csv(as.data.frame(water_f), "Waterflow_Forecast.csv", row.names = FALSE)

# Show point forecasts
forecast_df <- as.data.frame(water_f) %>%
  rename(Point_Forecast = `Point Forecast`) %>%
  select(Point_Forecast)



forecast_df

Key observations:

Forecast values gradually stabilize around the mean Limited variability in predictions Confidence intervals reflect uncertainty in long-term forecasting

8. Business Insight

While the ARIMA model provides statistically valid forecasts, it shows limited responsiveness to short-term fluctuations. The forecasts converge toward the mean, suggesting:

The system has low predictable structure Water flow may be influenced by external factors not captured in the model Short-term variability is difficult to model with ARIMA alone

The ARIMA model provides a statistically sound approach for modeling water flow data. However, its forecasting capability is limited in capturing variability, indicating that additional modeling approaches may be required for enhanced predictive performance.

Conclusion

The ARIMA model delivers stable baseline forecasts for water flow but lacks the ability to capture short-term variability, limiting its effectiveness for operational decision-making.

Tanzil_DATA 624_PROJECT_01

Md. Tanzil Ehsan

2026-03-29

Part A – ATM Cash Forecast

Introduction

Business Problem

Introduction

Data Loading

Missing Data Inspection

Data Preparation

Initial Time Plots

Missing Value Treatment

Box-Cox Transformation

Decomposition

ACF Analysis

Differencing Checks

ACF After Seasonal Differencing

ATM2 Model

ATM3 Model

ATM4 Model

Forecasting

Forecast Plots

Export Forecasts

How the Model and Forecast Solve the Business Problem

Conclusion

Part B – Residential Power Forecast

Introduction

Business Problem

Data Loading

Missing Value Handling

Exploratory Analysis

Box-Cox Transformation

ACF Analysis

Differencing Check

Model Building

ARIMA Model (Correct Approach)

ETS Model

Model Comparison

Forecasting (2014)

Back Transformation

Export Results

How the Model Solves the Business Problem

Conclusion

Part C – Business Case: Water Flow Forecasting Using ARIMA

1. Business Objective

2. Data Preparation

4. Data Transformation & Stationarity

Box-Cox transformation

ARIMA model

Key observations:

8. Business Insight

Conclusion