DATA 624 Project 1

#PART A: ATM Forecast Report

Objective

The objective of this analysis is to forecast the daily cash withdrawals for four ATM machines for May 2010 using historical data. The variable Cash is measured in hundreds of dollars. The goal is to apply time series forecasting methods and evaluate appropriate techniques based on the data’s structure.

Data Preparation

The dataset consists of daily cash withdrawals for four ATMs over approximately one year. Each ATM represents its own time series.

data <- read_excel("ATM624Data.xlsx") %>%
  mutate(DATE = as.Date(DATE, origin = "1899-12-30")) %>%
  as_tsibble(index = DATE, key = ATM)

data

## # A tsibble: 1,474 x 3 [1D]
## # Key:       ATM [5]
##    DATE       ATM    Cash
##    <date>     <chr> <dbl>
##  1 2009-05-01 ATM1     96
##  2 2009-05-02 ATM1     82
##  3 2009-05-03 ATM1     85
##  4 2009-05-04 ATM1     90
##  5 2009-05-05 ATM1     99
##  6 2009-05-06 ATM1     88
##  7 2009-05-07 ATM1      8
##  8 2009-05-08 ATM1    104
##  9 2009-05-09 ATM1     87
## 10 2009-05-10 ATM1     93
## # ℹ 1,464 more rows

The DATE variable was converted into a proper date format, and the data was converted into a tsibble object so that forecasting models could be applied separately to each ATM.

Analysis

autoplot(data, Cash) +
  facet_wrap(~ATM, scales = "free_y") +
  labs(
    title = "Daily Cash Withdrawals by ATM",
    x = "Date",
    y = "Cash (hundreds of dollars)"
  )

## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_line()`).

Initial visualizations suggest that the series are fairly stable over time. There is no strong long-term trend or obvious seasonality, although there is some day-to-day variation.

Forecasting Methodology

ETS Models

Exponential smoothing models were fit to each ATM time series.

fit_ets <- data %>%
  model(ETS(Cash))

## Warning in min(y, na.rm = TRUE): no non-missing arguments to min; returning Inf

## Warning: 1 error encountered for ETS(Cash)
## [1] 0 (non-NA) cases

report(fit_ets)

## Warning in report.mdl_df(fit_ets): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 4 × 10
##   ATM   .model      sigma2 log_lik   AIC  AICc   BIC      MSE     AMSE    MAE
##   <chr> <chr>        <dbl>   <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>  <dbl>
## 1 ATM1  ETS(Cash)    0.190  -2389. 4783. 4783. 4795.   1342.    1343.  NA    
## 2 ATM2  ETS(Cash) 1464.     -2405. 4820. 4820. 4839.   1448.    1450.  NA    
## 3 ATM3  ETS(Cash)   25.4    -1666. 3338. 3338. 3350.     25.3     44.3  0.273
## 4 ATM4  ETS(Cash)    1.66   -3335. 6691. 6691. 6730. 416178.  416926.   0.777

ETS models were considered appropriate because they can capture level, trend, and seasonality when present.

ARIMA Models

ARIMA models were also fit for comparison.

fit_arima <- data %>%
  model(ARIMA(Cash))

## Warning: 1 error encountered for ARIMA(Cash)
## [1] All observations are missing, a model cannot be estimated without data.

report(fit_arima)

## Warning in report.mdl_df(fit_arima): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 4 × 9
##   ATM   .model        sigma2 log_lik   AIC  AICc   BIC ar_roots   ma_roots 
##   <chr> <chr>          <dbl>   <dbl> <dbl> <dbl> <dbl> <list>     <list>   
## 1 ATM1  ARIMA(Cash)    602.   -1641. 3290. 3290. 3306. <cpl [15]> <cpl [0]>
## 2 ATM2  ARIMA(Cash)    649.   -1659. 3325. 3325. 3336. <cpl [14]> <cpl [0]>
## 3 ATM3  ARIMA(Cash)     25.4  -1109. 2223. 2223. 2235. <cpl [0]>  <cpl [2]>
## 4 ATM4  ARIMA(Cash) 423718.   -2882. 5768. 5768. 5776. <cpl [0]>  <cpl [0]>

ARIMA models allow for autocorrelation and differencing if needed. In this case, ARIMA was explored to confirm whether any additional structure beyond a stable level was present.

Techniques Considered but Not Used

Several techniques were considered but not selected as the final approach:

Trend models were not emphasized because the data did not show a consistent upward or downward movement. Seasonal models were not selected because no clear repeating seasonal pattern was visible in the data. More complex ARIMA models were not necessary because the data appeared relatively simple. Regression or causal forecasting methods were not used because no external predictors were provided.

Model Comparison

accuracy(fit_ets)

## # A tibble: 5 × 11
##   ATM   .model    .type          ME   RMSE     MAE   MPE  MAPE    MASE   RMSSE
##   <chr> <chr>     <chr>       <dbl>  <dbl>   <dbl> <dbl> <dbl>   <dbl>   <dbl>
## 1 ATM1  ETS(Cash) Training  -0.0687  36.8   27.5   -180.  204.   1.56    1.32 
## 2 ATM2  ETS(Cash) Training   0.709   38.2   32.7   -Inf   Inf    1.60    1.28 
## 3 ATM3  ETS(Cash) Training   0.270    5.03   0.273  Inf   Inf    0.371   0.625
## 4 ATM4  ETS(Cash) Training  77.0    645.   312.    -510.  552.   0.777   0.720
## 5 <NA>  ETS(Cash) Training NaN      NaN    NaN      NaN   NaN  NaN     NaN    
## # ℹ 1 more variable: ACF1 <dbl>

accuracy(fit_arima)

## # A tibble: 5 × 11
##   ATM   .model      .type         ME   RMSE     MAE    MPE  MAPE    MASE   RMSSE
##   <chr> <chr>       <chr>      <dbl>  <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>
## 1 ATM1  ARIMA(Cash) Trai…  -8.94e- 2  24.3   15.4    -98.7 115.    0.870   0.873
## 2 ATM2  ARIMA(Cash) Trai…  -1.18e- 1  25.2   17.3   -Inf   Inf     0.843   0.846
## 3 ATM3  ARIMA(Cash) Trai…   2.71e- 1   5.03   0.271   34.6  34.6   0.370   0.625
## 4 ATM4  ARIMA(Cash) Trai…  -1.51e-10 650.   324.    -617.  647.    0.805   0.725
## 5 <NA>  ARIMA(Cash) Trai… NaN        NaN    NaN      NaN   NaN   NaN     NaN    
## # ℹ 1 more variable: ACF1 <dbl>

Both ETS and ARIMA models were examined. Because the series appear relatively stable and simple, the ETS model was selected as the final model due to its interpretability.

Final Model Selection

The final model selected was an ETS model for each ATM. This model was chosen because it captures the stable level of the series without imposing unnecessary complexity.

For these data, the resulting forecasts are relatively flat, which is consistent with ETS(A,N,N).

Residual Diagnostics

To evaluate the adequacy of the selected forecasting models, residual diagnostics were performed. Because each ATM represents a separate time series, residual diagnostics were conducted individually for each ATM model.

atm1_data <- data %>%
  filter(ATM == "ATM1")

fit_ets_atm1 <- atm1_data %>%
  model(ETS(Cash))

fit_ets_atm1 %>%
  gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_rug()`).

atm2_data <- data %>%
  filter(ATM == "ATM2")

fit_ets_atm2 <- atm2_data %>%
  model(ETS(Cash))

fit_ets_atm2 %>%
  gg_tsresiduals()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_rug()`).

atm3_data <- data %>%
  filter(ATM == "ATM3")

fit_ets_atm3 <- atm3_data %>%
  model(ETS(Cash))

fit_ets_atm3 %>%
  gg_tsresiduals()

atm4_data <- data %>%
  filter(ATM == "ATM4")

fit_ets_atm4 <- atm1_data %>%
  model(ETS(Cash))

fit_ets_atm4 %>%
  gg_tsresiduals()

## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_rug()`).

Residual diagnostics indicate that the models for ATM1, ATM2, and ATM4 are adequate, with residuals behaving like white noise, while ATM3 shows a large outlier, suggesting an anomaly rather than a systematic pattern.

Forecasts for May 2010

fc_ets <- fit_ets %>%
  forecast(h = "31 days")

autoplot(data, Cash) +
  autolayer(fc_ets) +
  facet_wrap(~ATM, scales = "free_y") +
  labs(
    title = "Forecasts for May 2010",
    x = "Date",
    y = "Cash (hundreds of dollars)"
  )

## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 31 rows containing missing values or values outside the scale range
## (`geom_line()`).

The forecasts for May 2010 are relatively constant across days for each ATM. This occurs because the selected ETS model assumes that the series has a stable level with no strong trend or seasonality.

Forecast Interpretation

The flat forecasts suggest that ATM withdrawal behavior is stable over time and that the best prediction for future values is the current estimated level of the series. This is a reasonable result when no strong systematic patterns are present.This means cash demand is predictable.

Forecast Output File

The forecast values for May 2010 were exported to an Excel-readable file.

forecast_output <- fc_ets %>%
  as_tibble() %>%
  select(DATE, ATM, .mean) %>%
  rename(Forecast_Cash = .mean)

write_xlsx(forecast_output, "ATM_Forecast_May2010.xlsx")
forecast_output

## # A tibble: 155 × 3
##    DATE       ATM   Forecast_Cash
##    <date>     <chr>         <dbl>
##  1 2010-05-01 ATM1           78.3
##  2 2010-05-02 ATM1           78.3
##  3 2010-05-03 ATM1           78.3
##  4 2010-05-04 ATM1           78.3
##  5 2010-05-05 ATM1           78.3
##  6 2010-05-06 ATM1           78.3
##  7 2010-05-07 ATM1           78.3
##  8 2010-05-08 ATM1           78.3
##  9 2010-05-09 ATM1           78.3
## 10 2010-05-10 ATM1           78.3
## # ℹ 145 more rows

Limitations

This analysis has several limitations:

Only one year of data was available. No external variables, such as holidays or local events, were included.

Additional historical data and explanatory variables could improve the forecasts.

Conclusion

This analysis used ETS and ARIMA methods to forecast ATM cash withdrawals for May 2010. Based on the observed stability of the series and the residual diagnostics, a simple ETS model was selected as the most appropriate forecasting approach. The resulting forecasts are consistent with the historical behavior of the data and provide a practical basis for ATM cash planning.

PART B:Forecasting Power Report

Objective

The objective of this analysis is to model monthly residential power consumption and produce forecasts for each month of 2014. The variable KWH represents power consumption in kilowatt hours.

Data Preparation

power_data <- read_excel("ResidentialCustomerForecastLoad-624.xlsx") %>%
  mutate(
    Month = yearmonth(`YYYY-MMM`)
  ) %>%
  select(Month, KWH) %>%
  as_tsibble(index = Month)

power_data

## # A tsibble: 192 x 2 [1M]
##       Month     KWH
##       <mth>   <dbl>
##  1 1998 Jan 6862583
##  2 1998 Feb 5838198
##  3 1998 Mar 5420658
##  4 1998 Apr 5010364
##  5 1998 May 4665377
##  6 1998 Jun 6467147
##  7 1998 Jul 8914755
##  8 1998 Aug 8607428
##  9 1998 Sep 6989888
## 10 1998 Oct 6345620
## # ℹ 182 more rows

The source data runs from January 1998 through December 2013. During data review, one missing observation was identified for September 2008. Because this is monthly data with a strong seasonal pattern, that missing value was imputed using the average of the same month from the adjacent years.

power_data <- power_data %>%
  mutate(
    KWH = if_else(
      Month == yearmonth("2008 Sep"),
      mean(c(
        KWH[Month == yearmonth("2007 Sep")],
        KWH[Month == yearmonth("2009 Sep")]
      ), na.rm = TRUE),
      KWH
    )
  )

power_data %>% filter(Month >= yearmonth("2008 Aug"), Month <= yearmonth("2008 Oct"))

## # A tsibble: 3 x 2 [1M]
##      Month     KWH
##      <mth>   <dbl>
## 1 2008 Aug 8037137
## 2 2008 Sep 7625058
## 3 2008 Oct 5101803

Exploratory Analysis

autoplot(power_data, KWH) +
  labs(
    title = "Monthly Residential Power Consumption",
    x = "Month",
    y = "KWH"
  )

The time plot shows strong seasonality and moderate long-term level changes. Consumption is generally highest in winter and summer months and lower during other seasons, which is consistent with residential heating and cooling demand. Because the data are monthly and seasonal, models that can handle recurring seasonal structure are appropriate.

Forecasting Process

To choose a forecasting method, I compared several approaches: Seasonal naive benchmark, ETS models, ARIMA

To evaluate performance, the final 12 months (2013) were held out as a test set, and the remaining data were used for training.

train <- power_data %>% filter_index(. ~ "2012 Dec")
test  <- power_data %>% filter_index("2013 Jan" ~ .)

fit_models <- train %>%
  model(
    SNAIVE = SNAIVE(KWH),
    ETS = ETS(KWH),
    ARIMA = ARIMA(KWH)
  )

# Training accuracy
accuracy(fit_models)

## # A tibble: 3 × 10
##   .model .type        ME     RMSE     MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>     <dbl>    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 SNAIVE Training 72034. 1182035. 690220. -4.53  15.0 1     1.00  0.255  
## 2 ETS    Training 33044.  817772. 489325. -5.07  12.4 0.709 0.692 0.162  
## 3 ARIMA  Training -7773.  780465. 476603. -5.00  11.4 0.691 0.660 0.00522

# Test forecasts
fc_models <- fit_models %>%
  forecast(new_data = test)

# Test accuracy
accuracy(fc_models, test)

## # A tibble: 3 × 10
##   .model .type      ME     RMSE      MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>   <dbl>    <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 ARIMA  Test  691396. 1583865. 1028245.  7.68 12.5    NaN   NaN  0.0170
## 2 ETS    Test  406318. 1068359.  691602.  4.23  8.26   NaN   NaN  0.0963
## 3 SNAIVE Test  405195. 1035538.  618606.  4.55  7.06   NaN   NaN -0.0313

Techniques Used

Seasonal Naive

Seasonal naive was used as a benchmark because for strongly seasonal monthly series it is often a reasonable starting point.

ETS

ETS models were used because they are well suited to time series with level, trend, and seasonal structure.

ARIMA

ARIMA models were also fit because they can flexibly model autocorrelation and seasonal differencing when needed.

Techniques Not Used

More complicated methods such as regression with external predictors were not used because no explanatory variables were provided. Highly complex models were also avoided because the series is relatively structured and interpretable.

Model Selection

The ARIMA model provided the best overall forecast performance and was selected as the final model.

report(fit_models)

## Warning in report.mdl_df(fit_models): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 3 × 11
##   .model   sigma2 log_lik   AIC  AICc   BIC        MSE     AMSE     MAE ar_roots
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>      <dbl>    <dbl>   <dbl> <list>  
## 1 SNAIVE 1.40e+12     NA    NA    NA    NA    NA       NA       NA      <NULL>  
## 2 ETS    1.37e- 2  -2892. 5814. 5817. 5862.    6.69e11  7.14e11  0.0751 <NULL>  
## 3 ARIMA  6.73e+11  -2532. 5076. 5076. 5094.   NA       NA       NA      <cpl>   
## # ℹ 1 more variable: ma_roots <list>

Residual Diagnostics

Residual diagnostics were checked for the final ARIMA model.

fit_models %>%
  select(ARIMA) %>%
  gg_tsresiduals()

The residual diagnostics suggest that the model is adequate. The residuals are centered around zero, and the autocorrelation in the residuals appears limited.

Final Forecast for 2014

The final model was re-fit using the full dataset and then used to generate forecasts for the 12 months of 2014.

final_fit <- power_data %>%
  model(ARIMA(KWH))

final_fc <- final_fit %>%
  forecast(h = "12 months")

autoplot(final_fc, power_data) +
  labs(
    title = "Forecast of Monthly Residential Power Consumption for 2014",
    x = "Month",
    y = "KWH"
  )

forecast_output <- final_fc %>%
  hilo(level = 95) %>%
  unpack_hilo(`95%`) %>%
  as_tibble() %>%
  select(Month, .mean, `95%_lower`, `95%_upper`) %>%
  rename(
    Forecast_KWH = .mean,
    Lower_95 = `95%_lower`,
    Upper_95 = `95%_upper`
  )

forecast_output

## # A tibble: 12 × 4
##       Month Forecast_KWH Lower_95  Upper_95
##       <mth>        <dbl>    <dbl>     <dbl>
##  1 2014 Jan     9691775. 8005231. 11378320.
##  2 2014 Feb     8175614. 6442987.  9908240.
##  3 2014 Mar     6739530. 5006904.  8472157.
##  4 2014 Apr     5959783. 4227156.  7692409.
##  5 2014 May     5728339. 3995713.  7460965.
##  6 2014 Jun     7527559. 5794933.  9260185.
##  7 2014 Jul     7919359. 6186733.  9651985.
##  8 2014 Aug     9286185. 7553559. 11018811.
##  9 2014 Sep     8257839. 6525213.  9990465.
## 10 2014 Oct     6022478. 4289851.  7755104.
## 11 2014 Nov     5767536. 4034910.  7500162.
## 12 2014 Dec     7479410. 5746784.  9212035.

Forecast Interpretation

The 2014 forecast preserves the strong seasonal pattern seen in the historical data. Higher KWH values are forecast for winter and late summer months, while lower values are forecast in spring and autumn. This pattern is reasonable given the historical shape of the series and the seasonal nature of residential electricity demand.

Conclusion

The series displays strong and recurring monthly seasonality, so a seasonal forecasting framework was necessary. After comparing seasonal naive, ETS, and ARIMA models, the ARIMA model was selected because it provided the best test-set performance while also passing residual diagnostics reasonably well. The final 2014 forecasts are therefore based on the fitted ARIMA model and are consistent with the historical seasonal pattern.

DATA 624 Project 1

Chanice Mckenzie