#PART A: ATM Forecast Report
Objective
The objective of this analysis is to forecast the daily cash withdrawals for four ATM machines for May 2010 using historical data. The variable Cash is measured in hundreds of dollars. The goal is to apply time series forecasting methods and evaluate appropriate techniques based on the data’s structure.
Data Preparation
The dataset consists of daily cash withdrawals for four ATMs over approximately one year. Each ATM represents its own time series.
data <- read_excel("ATM624Data.xlsx") %>%
mutate(DATE = as.Date(DATE, origin = "1899-12-30")) %>%
as_tsibble(index = DATE, key = ATM)
data
## # A tsibble: 1,474 x 3 [1D]
## # Key: ATM [5]
## DATE ATM Cash
## <date> <chr> <dbl>
## 1 2009-05-01 ATM1 96
## 2 2009-05-02 ATM1 82
## 3 2009-05-03 ATM1 85
## 4 2009-05-04 ATM1 90
## 5 2009-05-05 ATM1 99
## 6 2009-05-06 ATM1 88
## 7 2009-05-07 ATM1 8
## 8 2009-05-08 ATM1 104
## 9 2009-05-09 ATM1 87
## 10 2009-05-10 ATM1 93
## # ℹ 1,464 more rows
The DATE variable was converted into a proper date format, and the data was converted into a tsibble object so that forecasting models could be applied separately to each ATM.
Analysis
autoplot(data, Cash) +
facet_wrap(~ATM, scales = "free_y") +
labs(
title = "Daily Cash Withdrawals by ATM",
x = "Date",
y = "Cash (hundreds of dollars)"
)
## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_line()`).
Initial visualizations suggest that the series are fairly stable over time. There is no strong long-term trend or obvious seasonality, although there is some day-to-day variation.
Forecasting Methodology
ETS Models
Exponential smoothing models were fit to each ATM time series.
fit_ets <- data %>%
model(ETS(Cash))
## Warning in min(y, na.rm = TRUE): no non-missing arguments to min; returning Inf
## Warning: 1 error encountered for ETS(Cash)
## [1] 0 (non-NA) cases
report(fit_ets)
## Warning in report.mdl_df(fit_ets): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 4 × 10
## ATM .model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ATM1 ETS(Cash) 0.190 -2389. 4783. 4783. 4795. 1342. 1343. NA
## 2 ATM2 ETS(Cash) 1464. -2405. 4820. 4820. 4839. 1448. 1450. NA
## 3 ATM3 ETS(Cash) 25.4 -1666. 3338. 3338. 3350. 25.3 44.3 0.273
## 4 ATM4 ETS(Cash) 1.66 -3335. 6691. 6691. 6730. 416178. 416926. 0.777
ETS models were considered appropriate because they can capture level, trend, and seasonality when present.
ARIMA Models
ARIMA models were also fit for comparison.
fit_arima <- data %>%
model(ARIMA(Cash))
## Warning: 1 error encountered for ARIMA(Cash)
## [1] All observations are missing, a model cannot be estimated without data.
report(fit_arima)
## Warning in report.mdl_df(fit_arima): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 4 × 9
## ATM .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 ATM1 ARIMA(Cash) 602. -1641. 3290. 3290. 3306. <cpl [15]> <cpl [0]>
## 2 ATM2 ARIMA(Cash) 649. -1659. 3325. 3325. 3336. <cpl [14]> <cpl [0]>
## 3 ATM3 ARIMA(Cash) 25.4 -1109. 2223. 2223. 2235. <cpl [0]> <cpl [2]>
## 4 ATM4 ARIMA(Cash) 423718. -2882. 5768. 5768. 5776. <cpl [0]> <cpl [0]>
ARIMA models allow for autocorrelation and differencing if needed. In this case, ARIMA was explored to confirm whether any additional structure beyond a stable level was present.
Techniques Considered but Not Used
Several techniques were considered but not selected as the final approach:
Trend models were not emphasized because the data did not show a consistent upward or downward movement. Seasonal models were not selected because no clear repeating seasonal pattern was visible in the data. More complex ARIMA models were not necessary because the data appeared relatively simple. Regression or causal forecasting methods were not used because no external predictors were provided.
Model Comparison
accuracy(fit_ets)
## # A tibble: 5 × 11
## ATM .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ATM1 ETS(Cash) Training -0.0687 36.8 27.5 -180. 204. 1.56 1.32
## 2 ATM2 ETS(Cash) Training 0.709 38.2 32.7 -Inf Inf 1.60 1.28
## 3 ATM3 ETS(Cash) Training 0.270 5.03 0.273 Inf Inf 0.371 0.625
## 4 ATM4 ETS(Cash) Training 77.0 645. 312. -510. 552. 0.777 0.720
## 5 <NA> ETS(Cash) Training NaN NaN NaN NaN NaN NaN NaN
## # ℹ 1 more variable: ACF1 <dbl>
accuracy(fit_arima)
## # A tibble: 5 × 11
## ATM .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ATM1 ARIMA(Cash) Trai… -8.94e- 2 24.3 15.4 -98.7 115. 0.870 0.873
## 2 ATM2 ARIMA(Cash) Trai… -1.18e- 1 25.2 17.3 -Inf Inf 0.843 0.846
## 3 ATM3 ARIMA(Cash) Trai… 2.71e- 1 5.03 0.271 34.6 34.6 0.370 0.625
## 4 ATM4 ARIMA(Cash) Trai… -1.51e-10 650. 324. -617. 647. 0.805 0.725
## 5 <NA> ARIMA(Cash) Trai… NaN NaN NaN NaN NaN NaN NaN
## # ℹ 1 more variable: ACF1 <dbl>
Both ETS and ARIMA models were examined. Because the series appear relatively stable and simple, the ETS model was selected as the final model due to its interpretability.
Final Model Selection
The final model selected was an ETS model for each ATM. This model was chosen because it captures the stable level of the series without imposing unnecessary complexity.
For these data, the resulting forecasts are relatively flat, which is consistent with ETS(A,N,N).
Residual Diagnostics
To evaluate the adequacy of the selected forecasting models, residual diagnostics were performed. Because each ATM represents a separate time series, residual diagnostics were conducted individually for each ATM model.
atm1_data <- data %>%
filter(ATM == "ATM1")
fit_ets_atm1 <- atm1_data %>%
model(ETS(Cash))
fit_ets_atm1 %>%
gg_tsresiduals()
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_rug()`).
atm2_data <- data %>%
filter(ATM == "ATM2")
fit_ets_atm2 <- atm2_data %>%
model(ETS(Cash))
fit_ets_atm2 %>%
gg_tsresiduals()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_rug()`).
atm3_data <- data %>%
filter(ATM == "ATM3")
fit_ets_atm3 <- atm3_data %>%
model(ETS(Cash))
fit_ets_atm3 %>%
gg_tsresiduals()
atm4_data <- data %>%
filter(ATM == "ATM4")
fit_ets_atm4 <- atm1_data %>%
model(ETS(Cash))
fit_ets_atm4 %>%
gg_tsresiduals()
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_rug()`).
Residual diagnostics indicate that the models for ATM1, ATM2,
and ATM4 are adequate, with residuals behaving like white noise, while
ATM3 shows a large outlier, suggesting an anomaly rather than a
systematic pattern.
Forecasts for May 2010
fc_ets <- fit_ets %>%
forecast(h = "31 days")
autoplot(data, Cash) +
autolayer(fc_ets) +
facet_wrap(~ATM, scales = "free_y") +
labs(
title = "Forecasts for May 2010",
x = "Date",
y = "Cash (hundreds of dollars)"
)
## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 31 rows containing missing values or values outside the scale range
## (`geom_line()`).
The forecasts for May 2010 are relatively constant across days for each ATM. This occurs because the selected ETS model assumes that the series has a stable level with no strong trend or seasonality.
Forecast Interpretation
The flat forecasts suggest that ATM withdrawal behavior is stable over time and that the best prediction for future values is the current estimated level of the series. This is a reasonable result when no strong systematic patterns are present.This means cash demand is predictable.
Forecast Output File
The forecast values for May 2010 were exported to an Excel-readable file.
forecast_output <- fc_ets %>%
as_tibble() %>%
select(DATE, ATM, .mean) %>%
rename(Forecast_Cash = .mean)
write_xlsx(forecast_output, "ATM_Forecast_May2010.xlsx")
forecast_output
## # A tibble: 155 × 3
## DATE ATM Forecast_Cash
## <date> <chr> <dbl>
## 1 2010-05-01 ATM1 78.3
## 2 2010-05-02 ATM1 78.3
## 3 2010-05-03 ATM1 78.3
## 4 2010-05-04 ATM1 78.3
## 5 2010-05-05 ATM1 78.3
## 6 2010-05-06 ATM1 78.3
## 7 2010-05-07 ATM1 78.3
## 8 2010-05-08 ATM1 78.3
## 9 2010-05-09 ATM1 78.3
## 10 2010-05-10 ATM1 78.3
## # ℹ 145 more rows
Limitations
This analysis has several limitations:
Only one year of data was available. No external variables, such as holidays or local events, were included.
Additional historical data and explanatory variables could improve the forecasts.
Conclusion
This analysis used ETS and ARIMA methods to forecast ATM cash withdrawals for May 2010. Based on the observed stability of the series and the residual diagnostics, a simple ETS model was selected as the most appropriate forecasting approach. The resulting forecasts are consistent with the historical behavior of the data and provide a practical basis for ATM cash planning.
PART B:Forecasting Power Report
Objective
The objective of this analysis is to model monthly
residential power consumption and produce forecasts for each month of
2014. The variable KWH represents power consumption in
kilowatt hours.
Data Preparation
power_data <- read_excel("ResidentialCustomerForecastLoad-624.xlsx") %>%
mutate(
Month = yearmonth(`YYYY-MMM`)
) %>%
select(Month, KWH) %>%
as_tsibble(index = Month)
power_data
## # A tsibble: 192 x 2 [1M]
## Month KWH
## <mth> <dbl>
## 1 1998 Jan 6862583
## 2 1998 Feb 5838198
## 3 1998 Mar 5420658
## 4 1998 Apr 5010364
## 5 1998 May 4665377
## 6 1998 Jun 6467147
## 7 1998 Jul 8914755
## 8 1998 Aug 8607428
## 9 1998 Sep 6989888
## 10 1998 Oct 6345620
## # ℹ 182 more rows
The source data runs from January 1998 through December 2013. During data review, one missing observation was identified for September 2008. Because this is monthly data with a strong seasonal pattern, that missing value was imputed using the average of the same month from the adjacent years.
power_data <- power_data %>%
mutate(
KWH = if_else(
Month == yearmonth("2008 Sep"),
mean(c(
KWH[Month == yearmonth("2007 Sep")],
KWH[Month == yearmonth("2009 Sep")]
), na.rm = TRUE),
KWH
)
)
power_data %>% filter(Month >= yearmonth("2008 Aug"), Month <= yearmonth("2008 Oct"))
## # A tsibble: 3 x 2 [1M]
## Month KWH
## <mth> <dbl>
## 1 2008 Aug 8037137
## 2 2008 Sep 7625058
## 3 2008 Oct 5101803
Exploratory Analysis
autoplot(power_data, KWH) +
labs(
title = "Monthly Residential Power Consumption",
x = "Month",
y = "KWH"
)
The time plot shows strong seasonality and moderate long-term level changes. Consumption is generally highest in winter and summer months and lower during other seasons, which is consistent with residential heating and cooling demand. Because the data are monthly and seasonal, models that can handle recurring seasonal structure are appropriate.
Forecasting Process
To choose a forecasting method, I compared several approaches: Seasonal naive benchmark, ETS models, ARIMA
To evaluate performance, the final 12 months (2013) were held out as a test set, and the remaining data were used for training.
train <- power_data %>% filter_index(. ~ "2012 Dec")
test <- power_data %>% filter_index("2013 Jan" ~ .)
fit_models <- train %>%
model(
SNAIVE = SNAIVE(KWH),
ETS = ETS(KWH),
ARIMA = ARIMA(KWH)
)
# Training accuracy
accuracy(fit_models)
## # A tibble: 3 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE Training 72034. 1182035. 690220. -4.53 15.0 1 1.00 0.255
## 2 ETS Training 33044. 817772. 489325. -5.07 12.4 0.709 0.692 0.162
## 3 ARIMA Training -7773. 780465. 476603. -5.00 11.4 0.691 0.660 0.00522
# Test forecasts
fc_models <- fit_models %>%
forecast(new_data = test)
# Test accuracy
accuracy(fc_models, test)
## # A tibble: 3 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA Test 691396. 1583865. 1028245. 7.68 12.5 NaN NaN 0.0170
## 2 ETS Test 406318. 1068359. 691602. 4.23 8.26 NaN NaN 0.0963
## 3 SNAIVE Test 405195. 1035538. 618606. 4.55 7.06 NaN NaN -0.0313
Techniques Used
Seasonal Naive
Seasonal naive was used as a benchmark because for strongly seasonal monthly series it is often a reasonable starting point.
ETS
ETS models were used because they are well suited to time series with level, trend, and seasonal structure.
ARIMA
ARIMA models were also fit because they can flexibly model autocorrelation and seasonal differencing when needed.
Techniques Not Used
More complicated methods such as regression with external predictors were not used because no explanatory variables were provided. Highly complex models were also avoided because the series is relatively structured and interpretable.
Model Selection
The ARIMA model provided the best overall forecast performance and was selected as the final model.
report(fit_models)
## Warning in report.mdl_df(fit_models): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 3 × 11
## .model sigma2 log_lik AIC AICc BIC MSE AMSE MAE ar_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list>
## 1 SNAIVE 1.40e+12 NA NA NA NA NA NA NA <NULL>
## 2 ETS 1.37e- 2 -2892. 5814. 5817. 5862. 6.69e11 7.14e11 0.0751 <NULL>
## 3 ARIMA 6.73e+11 -2532. 5076. 5076. 5094. NA NA NA <cpl>
## # ℹ 1 more variable: ma_roots <list>
Residual Diagnostics
Residual diagnostics were checked for the final ARIMA model.
fit_models %>%
select(ARIMA) %>%
gg_tsresiduals()
The residual diagnostics suggest that the model is adequate. The residuals are centered around zero, and the autocorrelation in the residuals appears limited.
Final Forecast for 2014
The final model was re-fit using the full dataset and then used to generate forecasts for the 12 months of 2014.
final_fit <- power_data %>%
model(ARIMA(KWH))
final_fc <- final_fit %>%
forecast(h = "12 months")
autoplot(final_fc, power_data) +
labs(
title = "Forecast of Monthly Residential Power Consumption for 2014",
x = "Month",
y = "KWH"
)
forecast_output <- final_fc %>%
hilo(level = 95) %>%
unpack_hilo(`95%`) %>%
as_tibble() %>%
select(Month, .mean, `95%_lower`, `95%_upper`) %>%
rename(
Forecast_KWH = .mean,
Lower_95 = `95%_lower`,
Upper_95 = `95%_upper`
)
forecast_output
## # A tibble: 12 × 4
## Month Forecast_KWH Lower_95 Upper_95
## <mth> <dbl> <dbl> <dbl>
## 1 2014 Jan 9691775. 8005231. 11378320.
## 2 2014 Feb 8175614. 6442987. 9908240.
## 3 2014 Mar 6739530. 5006904. 8472157.
## 4 2014 Apr 5959783. 4227156. 7692409.
## 5 2014 May 5728339. 3995713. 7460965.
## 6 2014 Jun 7527559. 5794933. 9260185.
## 7 2014 Jul 7919359. 6186733. 9651985.
## 8 2014 Aug 9286185. 7553559. 11018811.
## 9 2014 Sep 8257839. 6525213. 9990465.
## 10 2014 Oct 6022478. 4289851. 7755104.
## 11 2014 Nov 5767536. 4034910. 7500162.
## 12 2014 Dec 7479410. 5746784. 9212035.
Forecast Interpretation
The 2014 forecast preserves the strong seasonal pattern seen in the historical data. Higher KWH values are forecast for winter and late summer months, while lower values are forecast in spring and autumn. This pattern is reasonable given the historical shape of the series and the seasonal nature of residential electricity demand.
Conclusion
The series displays strong and recurring monthly seasonality, so a seasonal forecasting framework was necessary. After comparing seasonal naive, ETS, and ARIMA models, the ARIMA model was selected because it provided the best test-set performance while also passing residual diagnostics reasonably well. The final 2014 forecasts are therefore based on the fitted ARIMA model and are consistent with the historical seasonal pattern.