This section forecasts daily cash withdrawals (in hundreds of dollars) for four ATM machines (ATM1 to ATM4) for May 2010 using historical transaction data.
I applied two forecasting techniques: - ARIMA: A model that captures autocorrelation and trends in time series. - ETS (Error-Trend-Seasonality): A model that captures seasonality and trend components directly.
I then compared these models on performance and generated forecasts.
# Load and clean data
atm_data <- read_excel("ATM624Data.xlsx", sheet = "ATM Data") %>%
mutate(DATE = as.Date(DATE, origin = "1899-12-30")) %>%
filter(!is.na(ATM))
glimpse(atm_data)
## Rows: 1,460
## Columns: 3
## $ DATE <date> 2009-05-01, 2009-05-01, 2009-05-02, 2009-05-02, 2009-05-03, 2009…
## $ ATM <chr> "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "…
## $ Cash <dbl> 96, 107, 82, 89, 85, 90, 90, 55, 99, 79, 88, 19, 8, 2, 104, 103, …
To prepare for modeling, I converted the dataset into a daily time
series using the tsibble
package.
atm_ts <- atm_data %>%
as_tsibble(index = DATE, key = ATM) %>%
fill_gaps(.full = TRUE) %>%
mutate(Cash = replace_na(Cash, 0)) %>%
arrange(ATM, DATE)
atm_ts %>% group_by(ATM) %>% summarise(min = min(DATE), max = max(DATE))
## # A tsibble: 1,460 x 4 [1D]
## # Key: ATM [4]
## ATM DATE min max
## <chr> <date> <date> <date>
## 1 ATM1 2009-05-01 2009-05-01 2009-05-01
## 2 ATM1 2009-05-02 2009-05-02 2009-05-02
## 3 ATM1 2009-05-03 2009-05-03 2009-05-03
## 4 ATM1 2009-05-04 2009-05-04 2009-05-04
## 5 ATM1 2009-05-05 2009-05-05 2009-05-05
## 6 ATM1 2009-05-06 2009-05-06 2009-05-06
## 7 ATM1 2009-05-07 2009-05-07 2009-05-07
## 8 ATM1 2009-05-08 2009-05-08 2009-05-08
## 9 ATM1 2009-05-09 2009-05-09 2009-05-09
## 10 ATM1 2009-05-10 2009-05-10 2009-05-10
## # ℹ 1,450 more rows
has_gaps(atm_ts)
## # A tibble: 4 × 2
## ATM .gaps
## <chr> <lgl>
## 1 ATM1 FALSE
## 2 ATM2 FALSE
## 3 ATM3 FALSE
## 4 ATM4 FALSE
The plot below shows the actual daily cash withdrawals for each ATM. Noticeable weekly seasonality is present in ATM1 and ATM2. ATM3 shows limited activity until a late spike.
atm_ts %>%
ggplot(aes(x = DATE, y = Cash)) +
geom_line(color = "steelblue") +
facet_wrap(~ATM, scales = "free_y") +
labs(title = "Daily Cash Withdrawals per ATM", y = "Cash ($100s)", x = "Date") +
theme_minimal()
I fit both ARIMA and ETS models to each ATM’s time series. These models are evaluated using metrics like RMSE and AICc to understand performance.
# ARIMA and ETS models
models <- atm_ts %>%
model(
arima = ARIMA(Cash),
ets = ETS(Cash)
)
accuracy(models)
## # A tibble: 8 × 11
## ATM .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ATM1 arima Train… -1.03e- 1 25.9 16.3 -Inf Inf 0.847 0.831 6.50e-2
## 2 ATM1 ets Train… -1.89e- 1 25.9 16.4 -Inf Inf 0.854 0.831 5.63e-2
## 3 ATM2 arima Train… -8.54e- 1 23.9 16.8 -Inf Inf 0.822 0.802 7.18e-4
## 4 ATM2 ets Train… -7.47e- 1 24.7 17.4 -Inf Inf 0.849 0.832 1.19e-2
## 5 ATM3 arima Train… 2.71e- 1 5.03 0.271 34.6 34.6 0.370 0.625 1.24e-2
## 6 ATM3 ets Train… 2.70e- 1 5.03 0.273 Inf Inf 0.371 0.625 -7.36e-3
## 7 ATM4 arima Train… -3.16e-13 650. 324. -617. 647. 0.805 0.725 -9.36e-3
## 8 ATM4 ets Train… 7.70e+ 1 645. 312. -510. 552. 0.777 0.720 -1.06e-2
The following plot presents forecasts from both ARIMA and ETS models for each ATM. Forecast intervals (80% and 95%) are shaded. This allows visual comparison of uncertainty and projected demand patterns across models.
# Forecast horizon
forecast_days <- 31
forecasts <- models %>% forecast(h = forecast_days)
autoplot(forecasts, atm_ts) +
labs(title = "Forecast for May 2010", y = "Cash ($100s)", x = "Date") +
facet_wrap(~ATM, scales = "free_y")
forecast_tbl <- forecasts %>%
filter_index("2010-05-01" ~ "2010-05-31") %>%
as_tibble()
print(colnames(forecast_tbl))
## [1] "ATM" ".model" "DATE" "Cash" ".mean"
writexl::write_xlsx(forecast_tbl, "ATM_Forecast_May2010_raw.xlsx")
I evaluated two forecasting methods—ARIMA and ETS—across four ATMs. Key findings include:
This section focuses on forecasting monthly residential power usage
based on the dataset provided in
ResidentialCustomerForecastLoad-624.xlsx
. The goal is to
model electricity usage (in kilowatt-hours) from January 1998 to
December 2013 and forecast consumption for
2014.
# Load residential power data
power_data <- read_excel("ResidentialCustomerForecastLoad-624.xlsx") %>%
rename(CaseSequence = 1, `YYYY_MMM` = 2, KWH = 3) %>%
mutate(Date = yearmonth(parse_date_time(`YYYY_MMM`, orders = "Y!b"))) %>%
select(CaseSequence, Date, KWH)
glimpse(power_data)
## Rows: 192
## Columns: 3
## $ CaseSequence <dbl> 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 74…
## $ Date <mth> 1998 Jan, 1998 Feb, 1998 Mar, 1998 Apr, 1998 May, 1998 Ju…
## $ KWH <dbl> 6862583, 5838198, 5420658, 5010364, 4665377, 6467147, 891…
Convert the dataset into a complete monthly time series using a manually created monthly sequence and left join.
power_data <- power_data %>%
filter(!is.na(Date)) %>%
arrange(Date)
full_dates <- seq(as.Date("1998-01-01"), as.Date("2013-12-01"), by = "month")
power_ts <- tibble(Date = yearmonth(full_dates)) %>%
left_join(power_data, by = "Date") %>%
as_tsibble(index = Date) %>%
fill_gaps() %>%
filter(!is.na(KWH))
The plot below illustrates the monthly power consumption over time. We can observe a strong seasonal pattern, with periodic peaks likely corresponding to heating or cooling months.
power_ts %>%
ggplot(aes(x = Date, y = KWH)) +
geom_line(color = "darkgreen") +
labs(title = "Monthly Residential Power Usage", x = "Date", y = "KWH") +
theme_minimal()
power_ts_base <- ts(na.omit(power_data$KWH), start = c(1998, 1), frequency = 12)
# Fit ETS and ARIMA models
ets_model <- ets(power_ts_base)
arima_model <- auto.arima(power_ts_base)
# Forecast 12 months ahead
ets_forecast <- forecast(ets_model, h = 12)
arima_forecast <- forecast(arima_model, h = 12)
p1 <- autoplot(ets_forecast) + ggtitle("ETS Forecast for 2014") + theme_minimal()
p2 <- autoplot(arima_forecast) + ggtitle("ARIMA Forecast for 2014") + theme_minimal()
p1 + p2
# Accuracy comparison
accuracy(ets_model)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 29765.89 1006104 660162.2 -4.650549 13.79142 0.8871978 0.1563559
accuracy(arima_model)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -10349.99 927823.9 576140 -5.029934 12.4629 0.7742797 -0.002808329
Both ARIMA and ETS models successfully generated monthly forecasts for residential electricity usage in 2014.
Both models capture strong seasonal patterns. ETS shows sharper seasonal cycles, while ARIMA follows long-term level shifts more smoothly. The drop around 2009–2010 is interpreted and extended into the forecast period with caution. Metrics suggest that while both models are appropriate, ARIMA offers slightly better performance for this particular dataset
# Export to Excel
ets_tbl <- as_tibble(ets_forecast) %>% mutate(Model = "ETS")
arima_tbl <- as_tibble(arima_forecast) %>% mutate(Model = "ARIMA")
power_forecast_tbl <- bind_rows(ets_tbl, arima_tbl)
write_xlsx(power_forecast_tbl, "Power_Forecast_2014.xlsx")
In Part B, I explored monthly residential electricity usage trends and generated forecasts for 2014 using both ARIMA and ETS models. These methods allowed us to capture seasonality, trends, and variability in the power consumption data.
This bonus section focuses on two datasets of water flow readings from different pipes. The objective is to align and aggregate these time-based recordings to hourly intervals, test for stationarity, and forecast one week ahead if applicable.
# Load data from both pipes
pipe1 <- read_excel("Waterflow_Pipe1.xlsx")
pipe2 <- read_excel("Waterflow_Pipe2.xlsx")
pipe1 <- pipe1 %>% rename(timestamp = 1, flow = 2)
pipe2 <- pipe2 %>% rename(timestamp = 1, flow = 2)
pipe1 <- pipe1 %>% mutate(timestamp = as.POSIXct(timestamp * 86400, origin = "1899-12-30", tz = "UTC"))
pipe2 <- pipe2 %>% mutate(timestamp = as.POSIXct(timestamp * 86400, origin = "1899-12-30", tz = "UTC"))
tagged1 <- pipe1 %>% mutate(hour = floor_date(timestamp, "hour")) %>%
group_by(hour) %>% summarise(flow = mean(flow, na.rm = TRUE))
tagged2 <- pipe2 %>% mutate(hour = floor_date(timestamp, "hour")) %>%
group_by(hour) %>% summarise(flow = mean(flow, na.rm = TRUE))
ggplot() +
geom_line(data = tagged1, aes(x = hour, y = flow), color = "steelblue", alpha = 0.7) +
geom_line(data = tagged2, aes(x = hour, y = flow), color = "darkred", alpha = 0.5) +
labs(title = "Hourly Water Flow - Pipe 1 and Pipe 2", x = "Time", y = "Flow") +
theme_minimal()
pipe_ts <- tagged1 %>%
as_tsibble(index = hour) %>%
fill_gaps()
# Check for stationarity
library(feasts)
pipe_ts %>% features(flow, unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.247 0.1
# forecast a week (168 hours)
library(fable)
model_pipe <- pipe_ts %>%
model(
arima = ARIMA(flow),
ets = ETS(flow)
)
## Warning: 1 error encountered for ets
## [1] ETS does not support missing values.
forecast_pipe <- model_pipe %>% forecast(h = "168 hours")
autoplot(forecast_pipe, pipe_ts) +
labs(title = "Forecast for Pipe 1 – Next Week", y = "Flow", x = "Hour")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 168 rows containing missing values or values outside the scale range
## (`geom_line()`).
### Forecast Export
forecast_pipe %>%
as_tibble() %>%
write_xlsx("Pipe1_Weekly_Forecast.xlsx")
The KPSS test returned a p-value of 0.1, suggesting that the flow data from Pipe 1 is stationary and suitable for time series forecasting.