Introduction

This report provides a comprehensive time series analysis and forecasting for three distinct business scenarios: ATM cash withdrawals, residential power consumption, and water flow rates. Using R, we evaluate data stationarity, identify seasonal patterns, and apply ARIMA models to generate accurate forecasts for the required future periods. The goal is to provide data-driven insights to optimize resource management.

PART A

1. Data Pre-processing

Before modeling, we must address the “business feeling” of the data. We check for:

Missing Values: Identifying gaps in daily records.

Outliers: Machines often have “zero” days due to maintenance or error.

Transformations: Using Box-Cox to stabilize variance if the withdrawal fluctuations increase with the mean.

library(readxl)
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.5.2
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble      3.3.0     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.5.0
## ✔ ggplot2     4.0.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tsibble' was built under R version 4.5.2
## Warning: package 'tsibbledata' was built under R version 4.5.2
## Warning: package 'feasts' was built under R version 4.5.2
## Warning: package 'fabletools' was built under R version 4.5.2
## Warning: package 'fable' was built under R version 4.5.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(writexl)
## Warning: package 'writexl' was built under R version 4.5.2
# Load data and convert Excel dates
atm_data <- read_excel("ATM624Data.xlsx") %>%
  mutate(Date = as.Date(DATE, origin = "1899-12-30")) %>%
  filter(!is.na(ATM)) %>%
  as_tsibble(index = Date, key = ATM)

# Fill gaps and interpolate missing values
atm_ts <- atm_data %>%
  fill_gaps() %>%
  group_by(ATM) %>%
  mutate(Cash = if_else(Cash == 0, NA_real_, Cash)) %>%
  model(arima = ARIMA(Cash)) %>%
  interpolate(atm_data)

2. Exploratory Data Analysis

We visualize the series to identify patterns. A clear weekly seasonality is observed in most ATMs, which suggests that a Seasonal ARIMA model is appropriate.

# Time Series Plot with rotated dates
atm_ts %>%
  autoplot(Cash) +
  facet_wrap(~ATM, scales = "free_y") +
  labs(title = "Daily Cash Withdrawals by ATM", y = "Cash (Hundreds)") +
  # Fix: Rotate date text 45 degrees
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Seasonal Plot
atm_ts %>%
  filter(ATM != "ATM4") %>% 
  gg_season(Cash, period = "week") +
  labs(title = "Weekly Seasonality Analysis") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

3. Model Selection and Diagnostics

We compare the Automated ARIMA model with a Seasonal Naive (SNAIVE) model. The SNAIVE model simply assumes that the cash demand for a given day will be the same as it was on the same day of the previous week.

We use the AICc (Akaike Information Criterion) to select the best ARIMA parameters and RMSE (Root Mean Squared Error) to compare performance against the benchmark.

# Fit multiple models to compare performance
fit <- atm_ts %>%
  model(
    arima_model = ARIMA(Cash),
    snaive_model = SNAIVE(Cash)
  )

# Compare model fit using accuracy measures
fit_accuracy <- fit %>%
  accuracy() %>%
  select(ATM, .model, RMSE, MAE, MAPE)

# Display the accuracy table to justify model choice
print(fit_accuracy)
## # A tibble: 8 × 5
##   ATM   .model         RMSE     MAE  MAPE
##   <chr> <chr>         <dbl>   <dbl> <dbl>
## 1 ATM1  arima_model   23.2   14.4   117. 
## 2 ATM1  snaive_model  27.6   17.5   116. 
## 3 ATM2  arima_model   23.8   16.7   Inf  
## 4 ATM2  snaive_model  29.6   20.3   Inf  
## 5 ATM3  arima_model    5.03   0.271  34.6
## 6 ATM3  snaive_model   8.04   0.735 100  
## 7 ATM4  arima_model  650.   324.    647. 
## 8 ATM4  snaive_model 896.   402.    450.

Justification of Model Choice:

ARIMA vs. SNAIVE: For ATM1, ATM2, and ATM4, the ARIMA model significantly reduces the error (RMSE) compared to the SNAIVE benchmark. This indicates that the ARIMA model successfully captured the complex autocorrelations beyond simple weekly repetition.

ATM3 Note: Due to the near-total lack of historical data, both models struggle to find a pattern, and the forecast for this specific machine should be treated with extreme caution.

4. Final Forecast and Export

We generate a 31-day forecast for May 2010. The visual includes the point forecast along with 80% and 95% prediction intervals. The numerical forecasts are then exported to an Excel file to fulfill reporting requirements.

# 1. Clean ATM 4 outlier before final forecast

atm_ts_final <- atm_data %>%
  fill_gaps() %>%
  group_by(ATM) %>%
  mutate(Cash = if_else(ATM == "ATM4" & Cash > 5000, NA_real_, Cash),
         Cash = if_else(Cash == 0, NA_real_, Cash)) %>%
  model(linear = TSLM(Cash ~ trend() + season())) %>%
  interpolate(atm_data)

# 2. Re-fit the ARIMA model with cleaned data
fit_final <- atm_ts_final %>%
  model(arima_model = ARIMA(Cash))

# 3. Generate 31-day forecast (ARIMA only)
atm_forecast <- fit_final %>%
  select(ATM, arima_model) %>%
  forecast(h = "31 days")

# 4. Plot ATMs 1, 2, and 3 (Classic Theme)
atm_forecast %>%
  filter(ATM != "ATM4") %>%
  autoplot(atm_ts_final) +
  labs(title = "ATM Cash Forecast - May 2010 (ATMs 1-3)",
       y = "Cash (Hundreds)", x = "Date") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# 5. Plot ATM 4 Separately (Classic Theme)
atm_forecast %>%
  filter(ATM == "ATM4") %>%
  autoplot(atm_ts_final) +
  labs(title = "ATM 4 Cash Forecast - May 2010 (Cleaned Scale)",
       y = "Cash (Hundreds)", x = "Date") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# 6. Export all numerical results to Excel
forecast_output <- atm_forecast %>%
  as_tibble() %>%
  select(ATM, Date, .mean) %>%
  rename(Forecasted_Cash_Hundreds = .mean)

write_xlsx(forecast_output, "ATM_May2010_Forecast.xlsx")

Process Summary

Techniques Used: Time series interpolation, STL decomposition, and Seasonal ARIMA modeling.

Findings: Most ATMs exhibit a strong weekly cycle, which the ARIMA models successfully captured to provide reliable estimates. The lack of historical data for ATM3 remains a notable limitation.

PART B

1. Introduction

This section analyzes monthly residential power usage from January 1998 to December 2013. The goal is to forecast the kilowatt-hours (KWH) for the year 2014. Electricity data usually presents a long-term upward trend combined with a strong annual seasonal cycle driven by weather conditions. We will use a seasonal ARIMA model with a Box-Cox transformation to ensure the forecast is robust and variance is stabilized.

2. Data Loading and Pre-processing

The data is loaded from Excel. We create a proper yearmonth index for the time series. We also check for missing values or anomalies, similar to our approach in Part A.

# Load Power Data using your specific column names
power_data <- read_excel("ResidentialCustomerForecastLoad-624.xlsx") %>%
  mutate(Month = yearmonth(`YYYY-MMM`)) %>% 
  as_tsibble(index = Month)

# Handling potential missing values
power_ts <- power_data %>%
  fill_gaps() %>%
  model(arima = ARIMA(KWH)) %>%
  interpolate(power_data)

3. Exploratory Data Analysis

We plot the series to observe the trend and the seasonal peaks. A seasonal plot is also generated to confirm the timing of the highest consumption months.

# Time Series Plot
power_ts %>%
  autoplot(KWH) +
  labs(title = "Monthly Residential Power Usage (1998-2013)",
       y = "KWH", x = "Year") +
  theme_classic()

# Seasonal Plot
power_ts %>%
  gg_season(KWH, labels = "both") +
  labs(title = "Seasonal Pattern Analysis", y = "KWH") +
  theme_classic()

4. Model selection and Accuracy

We calculate the optimal Lambda for a Box-Cox transformation to handle the increasing variance. We then fit an Automated ARIMA model and compare its performance against the Seasonal Naive (SNAIVE) benchmark.

# Determine optimal Lambda for Box-Cox transformation
lambda_power <- power_ts %>%
  features(KWH, features = guerrero) %>%
  pull(lambda_guerrero)

# Fit models
fit_power <- power_ts %>%
  model(
    arima_model = ARIMA(box_cox(KWH, lambda_power)),
    snaive_model = SNAIVE(KWH)
  )

# Display accuracy table (RMSE, MAE, MAPE)
fit_power %>% 
  accuracy() %>% 
  select(.model, RMSE, MAE, MAPE)
## # A tibble: 2 × 4
##   .model           RMSE     MAE  MAPE
##   <chr>           <dbl>   <dbl> <dbl>
## 1 arima_model  1194982. 853631.  16.4
## 2 snaive_model 1172842. 685446.  14.5

The SNAIVE model shows a slightly lower error rate, suggesting that the seasonal consumption pattern is highly consistent year over year. However, we proceed with the ARIMA model for the final forecast as it provides a more mathematically sophisticated treatment of the underlying trend and allows for better estimation of prediction intervals.

5. Final Forecast 2024

We generate the monthly demand forecast for 2014. The model captures the expected “dual-peak” behavior: high consumption in January (winter heating) and July-August (summer cooling).

# Generate the 2014 Forecast using the ARIMA model
power_forecast_final <- fit_power %>%
  select(arima_model) %>%
  forecast(h = "12 months")

# Final Visualization
power_forecast_final %>%
  autoplot(power_ts) +
  labs(title = "Residential Power Demand Forecast - 2014",
       subtitle = "Forecast based on Seasonal ARIMA (1998-2013 data)",
       y = "KWH", x = "Month") +
  theme_classic()

# Export the values to Excel
power_forecast_final %>%
  as_tibble() %>%
  select(Month, .mean) %>%
  rename(Forecasted_KWH = .mean) %>%
  write_xlsx("Residential_Power_Forecast_2014.xlsx")

Key Findings for Part B:

Seasonality: The dominant factor in residential power is the annual cycle. The peaks are remarkably stable, which explains why the SNAIVE model performed so well.

Trend: There is a slight long-term upward trend in consumption that the ARIMA model incorporates into the 2014 projections.

PART C

1. Introduction

This section analyzes water flow data from two separate sources: Waterflow_Pipe1.xlsx and Waterflow_Pipe2.xlsx. Since the recordings have different timestamps, the data is first synchronized and aggregated to an hourly frequency by calculating the mean flow per hour. The objective is to determine stationarity and provide a one week forward forecast.

2. Data Loading and Preparation

We load both datasets, convert the timestamps to a standard format, and compute the hourly average. Then, we join them into a single time-series object.

library(readxl)
library(fpp3)
library(writexl)
library(lubridate)
library(forecast) 
## Warning: package 'forecast' was built under R version 4.5.2
# Function to process each pipe file consistently
process_pipe <- function(file_path, pipe_name) {
  read_excel(file_path) %>%
    # Converting Excel numeric date to POSIXct format
    mutate(DateTime = as_datetime(`Date Time` * 86400, origin = "1899-12-30"),
           Hour = floor_date(DateTime, "hour")) %>%
    group_by(Hour) %>%
    # Taking the mean for multiple recordings within the same hour
    summarise(Flow = mean(WaterFlow, na.rm = TRUE), .groups = "drop") %>%
    mutate(Pipe = pipe_name)
}

# Loading and combining both datasets
pipe1_clean <- process_pipe("Waterflow_Pipe1.xlsx", "Pipe1")
pipe2_clean <- process_pipe("Waterflow_Pipe2.xlsx", "Pipe2")

# Create a unified tsibble and handle missing values
water_hourly <- bind_rows(pipe1_clean, pipe2_clean) %>%
  as_tsibble(index = Hour, key = Pipe) %>%
  fill_gaps() %>%
  group_by(Pipe) %>%
  # Linear interpolation for gaps to ensure a continuous series for ARIMA
  mutate(Flow = as.numeric(na.interp(Flow)))

3. Visual Analysis and Stationary

We visualize the hourly flow to identify potential patterns or shifts in the mean. We also apply the KPSS test to determine if differencing is required to achieve stationarity, which is a prerequisite for ARIMA modeling.

# Hourly Flow Visualization
water_hourly %>%
  autoplot(Flow) +
  facet_wrap(~Pipe, scales = "free_y") +
  labs(title = "Hourly Water Flow Rates (Aggregated)", 
       y = "Mean Flow", x = "Date") +
  theme_classic()
## `mutate_if()` ignored the following grouping variables:
## • Column `Pipe`

# Stationarity Test (KPSS)
water_hourly %>%
  features(Flow, unitroot_kpss)
## # A tibble: 2 × 3
##   Pipe  kpss_stat kpss_pvalue
##   <chr>     <dbl>       <dbl>
## 1 Pipe1     0.193      0.1   
## 2 Pipe2     0.475      0.0473

4. Forecasting

We fit an automated ARIMA model to the aggregated data. The forecast is projected for 7 days to provide a one week forward view of the expected water flow.

# Fit the automated ARIMA model
water_fit <- water_hourly %>%
  model(arima_model = ARIMA(Flow))

# Generate the one week forecast
water_forecast <- water_fit %>%
  forecast(h = 168)

# Visualization of the forecast with confidence intervals
water_forecast %>%
  autoplot(water_hourly) +
  facet_wrap(~Pipe, scales = "free_y") +
  labs(title = "One-Week Water Flow Forecast",
       subtitle = "Hourly aggregation with ARIMA (80% and 95% intervals)",
       y = "Flow Rate", x = "Hour") +
  theme_classic()
## `mutate_if()` ignored the following grouping variables:
## • Column `Pipe`

# Export results to an Excel-readable file
water_forecast %>%
  as_tibble() %>%
  select(Pipe, Hour, .mean) %>%
  rename(Forecasted_Flow = .mean) %>%
  write_xlsx("Water_Flow_Weekly_Forecast.xlsx")

Key Findings for Part C: Time Alignment: Aggregating to the hour effectively synchronized the irregular timestamps from both pipes into a single, cohesive time series.

Stationarity: The ARIMA model automatically identified the necessary level of differencing to stabilize the series before forecasting.