Part A – ATM Forecast

This section forecasts daily cash withdrawals (in hundreds of dollars) for four ATM machines (ATM1 to ATM4) for May 2010 using historical transaction data.

I applied two forecasting techniques: - ARIMA: A model that captures autocorrelation and trends in time series. - ETS (Error-Trend-Seasonality): A model that captures seasonality and trend components directly.

I then compared these models on performance and generated forecasts.

Data Loading and Preprocessing

# Load and clean data 
atm_data <- read_excel("ATM624Data.xlsx", sheet = "ATM Data") %>%
  mutate(DATE = as.Date(DATE, origin = "1899-12-30")) %>%
  filter(!is.na(ATM))

glimpse(atm_data)
## Rows: 1,460
## Columns: 3
## $ DATE <date> 2009-05-01, 2009-05-01, 2009-05-02, 2009-05-02, 2009-05-03, 2009…
## $ ATM  <chr> "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "…
## $ Cash <dbl> 96, 107, 82, 89, 85, 90, 90, 55, 99, 79, 88, 19, 8, 2, 104, 103, …

Time Series

To prepare for modeling, I converted the dataset into a daily time series using the tsibble package.

atm_ts <- atm_data %>%
  as_tsibble(index = DATE, key = ATM) %>%
  fill_gaps(.full = TRUE) %>%
  mutate(Cash = replace_na(Cash, 0)) %>%
  arrange(ATM, DATE)

atm_ts %>% group_by(ATM) %>% summarise(min = min(DATE), max = max(DATE))
## # A tsibble: 1,460 x 4 [1D]
## # Key:       ATM [4]
##    ATM   DATE       min        max       
##    <chr> <date>     <date>     <date>    
##  1 ATM1  2009-05-01 2009-05-01 2009-05-01
##  2 ATM1  2009-05-02 2009-05-02 2009-05-02
##  3 ATM1  2009-05-03 2009-05-03 2009-05-03
##  4 ATM1  2009-05-04 2009-05-04 2009-05-04
##  5 ATM1  2009-05-05 2009-05-05 2009-05-05
##  6 ATM1  2009-05-06 2009-05-06 2009-05-06
##  7 ATM1  2009-05-07 2009-05-07 2009-05-07
##  8 ATM1  2009-05-08 2009-05-08 2009-05-08
##  9 ATM1  2009-05-09 2009-05-09 2009-05-09
## 10 ATM1  2009-05-10 2009-05-10 2009-05-10
## # ℹ 1,450 more rows
has_gaps(atm_ts)
## # A tibble: 4 × 2
##   ATM   .gaps
##   <chr> <lgl>
## 1 ATM1  FALSE
## 2 ATM2  FALSE
## 3 ATM3  FALSE
## 4 ATM4  FALSE

Visualization

The plot below shows the actual daily cash withdrawals for each ATM. Noticeable weekly seasonality is present in ATM1 and ATM2. ATM3 shows limited activity until a late spike.

atm_ts %>%
  ggplot(aes(x = DATE, y = Cash)) +
  geom_line(color = "steelblue") +
  facet_wrap(~ATM, scales = "free_y") +
  labs(title = "Daily Cash Withdrawals per ATM", y = "Cash ($100s)", x = "Date") +
  theme_minimal()

Model Fitting

I fit both ARIMA and ETS models to each ATM’s time series. These models are evaluated using metrics like RMSE and AICc to understand performance.

# ARIMA and ETS models
models <- atm_ts %>%
  model(
    arima = ARIMA(Cash),
    ets = ETS(Cash)
  )

accuracy(models)
## # A tibble: 8 × 11
##   ATM   .model .type         ME   RMSE     MAE    MPE  MAPE  MASE RMSSE     ACF1
##   <chr> <chr>  <chr>      <dbl>  <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 ATM1  arima  Train… -1.03e- 1  25.9   16.3   -Inf   Inf   0.847 0.831  6.50e-2
## 2 ATM1  ets    Train… -1.89e- 1  25.9   16.4   -Inf   Inf   0.854 0.831  5.63e-2
## 3 ATM2  arima  Train… -8.54e- 1  23.9   16.8   -Inf   Inf   0.822 0.802  7.18e-4
## 4 ATM2  ets    Train… -7.47e- 1  24.7   17.4   -Inf   Inf   0.849 0.832  1.19e-2
## 5 ATM3  arima  Train…  2.71e- 1   5.03   0.271   34.6  34.6 0.370 0.625  1.24e-2
## 6 ATM3  ets    Train…  2.70e- 1   5.03   0.273  Inf   Inf   0.371 0.625 -7.36e-3
## 7 ATM4  arima  Train… -3.16e-13 650.   324.    -617.  647.  0.805 0.725 -9.36e-3
## 8 ATM4  ets    Train…  7.70e+ 1 645.   312.    -510.  552.  0.777 0.720 -1.06e-2

Forecast for May 2010

The following plot presents forecasts from both ARIMA and ETS models for each ATM. Forecast intervals (80% and 95%) are shaded. This allows visual comparison of uncertainty and projected demand patterns across models.

# Forecast horizon
forecast_days <- 31

forecasts <- models %>% forecast(h = forecast_days)

autoplot(forecasts, atm_ts) +
  labs(title = "Forecast for May 2010", y = "Cash ($100s)", x = "Date") +
  facet_wrap(~ATM, scales = "free_y")

Forecast Table (Exportable)

forecast_tbl <- forecasts %>%
  filter_index("2010-05-01" ~ "2010-05-31") %>%
  as_tibble()

print(colnames(forecast_tbl))
## [1] "ATM"    ".model" "DATE"   "Cash"   ".mean"
writexl::write_xlsx(forecast_tbl, "ATM_Forecast_May2010_raw.xlsx")

Conclusion

I evaluated two forecasting methods—ARIMA and ETS—across four ATMs. Key findings include:

  • ATM1 & ATM2 exhibit regular daily patterns, ideal for both models.
  • ATM3 had minimal activity with a late spike, making predictions less stable.
  • ATM4 contains a large outlier; forecasts revert to the typical range.

Part B – Forecasting Power

This section focuses on forecasting monthly residential power usage based on the dataset provided in ResidentialCustomerForecastLoad-624.xlsx. The goal is to model electricity usage (in kilowatt-hours) from January 1998 to December 2013 and forecast consumption for 2014.

Data Loading and Preprocessing

# Load residential power data
power_data <- read_excel("ResidentialCustomerForecastLoad-624.xlsx") %>%
  rename(CaseSequence = 1, `YYYY_MMM` = 2, KWH = 3) %>%
  mutate(Date = yearmonth(parse_date_time(`YYYY_MMM`, orders = "Y!b"))) %>%
  select(CaseSequence, Date, KWH)
  
glimpse(power_data)
## Rows: 192
## Columns: 3
## $ CaseSequence <dbl> 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 74…
## $ Date         <mth> 1998 Jan, 1998 Feb, 1998 Mar, 1998 Apr, 1998 May, 1998 Ju…
## $ KWH          <dbl> 6862583, 5838198, 5420658, 5010364, 4665377, 6467147, 891…

Time Series Preparation

Convert the dataset into a complete monthly time series using a manually created monthly sequence and left join.

power_data <- power_data %>% 
  filter(!is.na(Date)) %>% 
  arrange(Date)

full_dates <- seq(as.Date("1998-01-01"), as.Date("2013-12-01"), by = "month")

power_ts <- tibble(Date = yearmonth(full_dates)) %>%
  left_join(power_data, by = "Date") %>%
  as_tsibble(index = Date) %>%
  fill_gaps() %>%
  filter(!is.na(KWH))

Visualization

The plot below illustrates the monthly power consumption over time. We can observe a strong seasonal pattern, with periodic peaks likely corresponding to heating or cooling months.

power_ts %>%
  ggplot(aes(x = Date, y = KWH)) +
  geom_line(color = "darkgreen") +
  labs(title = "Monthly Residential Power Usage", x = "Date", y = "KWH") +
  theme_minimal()

Model Fitting and Forecasting

power_ts_base <- ts(na.omit(power_data$KWH), start = c(1998, 1), frequency = 12)

# Fit ETS and ARIMA models
ets_model <- ets(power_ts_base)
arima_model <- auto.arima(power_ts_base)

# Forecast 12 months ahead
ets_forecast <- forecast(ets_model, h = 12)
arima_forecast <- forecast(arima_model, h = 12)

p1 <- autoplot(ets_forecast) + ggtitle("ETS Forecast for 2014") + theme_minimal()
p2 <- autoplot(arima_forecast) + ggtitle("ARIMA Forecast for 2014") + theme_minimal()

p1 + p2

# Accuracy comparison
accuracy(ets_model)
##                    ME    RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 29765.89 1006104 660162.2 -4.650549 13.79142 0.8871978 0.1563559
accuracy(arima_model)
##                     ME     RMSE    MAE       MPE    MAPE      MASE         ACF1
## Training set -10349.99 927823.9 576140 -5.029934 12.4629 0.7742797 -0.002808329

Power Forecast Results Summary

Both ARIMA and ETS models successfully generated monthly forecasts for residential electricity usage in 2014.

  • ETS captured strong seasonal fluctuations and upward trends but showed sensitivity to historical spikes.
  • ARIMA performed comparably, with slightly tighter confidence bands and lower residual autocorrelation.
  • Accuracy metrics showed similar RMSE and MAPE values, suggesting both models are viable. However, ETS had slightly better interpretability for seasonality.

Both models capture strong seasonal patterns. ETS shows sharper seasonal cycles, while ARIMA follows long-term level shifts more smoothly. The drop around 2009–2010 is interpreted and extended into the forecast period with caution. Metrics suggest that while both models are appropriate, ARIMA offers slightly better performance for this particular dataset

Forecast Table Export

# Export to Excel
ets_tbl <- as_tibble(ets_forecast) %>% mutate(Model = "ETS")
arima_tbl <- as_tibble(arima_forecast) %>% mutate(Model = "ARIMA")

power_forecast_tbl <- bind_rows(ets_tbl, arima_tbl)

write_xlsx(power_forecast_tbl, "Power_Forecast_2014.xlsx")

Conclusion

In Part B, I explored monthly residential electricity usage trends and generated forecasts for 2014 using both ARIMA and ETS models. These methods allowed us to capture seasonality, trends, and variability in the power consumption data.

  • Both models yielded strong forecasts, with ARIMA offering slightly tighter prediction intervals.
  • ETS was more sensitive to seasonal swings, making it valuable for identifying recurring usage cycles.
  • Forecasted values for 2014 fell within reasonable seasonal expectations.

Part C – Water Flow Forecasting

This bonus section focuses on two datasets of water flow readings from different pipes. The objective is to align and aggregate these time-based recordings to hourly intervals, test for stationarity, and forecast one week ahead if applicable.

Data Loading and Preprocessing

# Load data from both pipes
pipe1 <- read_excel("Waterflow_Pipe1.xlsx")
pipe2 <- read_excel("Waterflow_Pipe2.xlsx")

pipe1 <- pipe1 %>% rename(timestamp = 1, flow = 2)
pipe2 <- pipe2 %>% rename(timestamp = 1, flow = 2)

pipe1 <- pipe1 %>% mutate(timestamp = as.POSIXct(timestamp * 86400, origin = "1899-12-30", tz = "UTC"))
pipe2 <- pipe2 %>% mutate(timestamp = as.POSIXct(timestamp * 86400, origin = "1899-12-30", tz = "UTC"))

Hourly Aggregation

tagged1 <- pipe1 %>% mutate(hour = floor_date(timestamp, "hour")) %>%
  group_by(hour) %>% summarise(flow = mean(flow, na.rm = TRUE))

tagged2 <- pipe2 %>% mutate(hour = floor_date(timestamp, "hour")) %>%
  group_by(hour) %>% summarise(flow = mean(flow, na.rm = TRUE))

Visualization

ggplot() +
  geom_line(data = tagged1, aes(x = hour, y = flow), color = "steelblue", alpha = 0.7) +
  geom_line(data = tagged2, aes(x = hour, y = flow), color = "darkred", alpha = 0.5) +
  labs(title = "Hourly Water Flow - Pipe 1 and Pipe 2", x = "Time", y = "Flow") +
  theme_minimal()

Stationarity Check and Forecasting

pipe_ts <- tagged1 %>%
  as_tsibble(index = hour) %>%
  fill_gaps()

# Check for stationarity
library(feasts)
pipe_ts %>% features(flow, unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.247         0.1
# forecast a week (168 hours)
library(fable)
model_pipe <- pipe_ts %>%
  model(
    arima = ARIMA(flow),
    ets = ETS(flow)
  )
## Warning: 1 error encountered for ets
## [1] ETS does not support missing values.
forecast_pipe <- model_pipe %>% forecast(h = "168 hours")

autoplot(forecast_pipe, pipe_ts) +
  labs(title = "Forecast for Pipe 1 – Next Week", y = "Flow", x = "Hour")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 168 rows containing missing values or values outside the scale range
## (`geom_line()`).

### Forecast Export

forecast_pipe %>%
  as_tibble() %>%
  write_xlsx("Pipe1_Weekly_Forecast.xlsx")

Conclusion

The KPSS test returned a p-value of 0.1, suggesting that the flow data from Pipe 1 is stationary and suitable for time series forecasting.