Part A - Forecasting ATM Cash Withdrawals for May 2010

1. Introduction

1a. Goal:

Forecast the amount of cash withdrawn from 4 ATMs for May 2010 using historical withdrawal data.

1b. Dataset Overview:

The dataset consists of time series data for daily cash withdrawals from 4 different ATMs between May 1, 2009 and April 30, 2010. There are three variables:

DATE: $\text{date}$ format
ATM: $\text{character}$ format
Cash: $\text{dbl}$ format

2. Data Preparation

To prepare the data for analysis and forecasting, I made sure that the $\text{DATE}$ column was in Date format and that the $\text{Cash}$ column was in numeric format. The dataset $\text{Cash}$ value was in hundreds of dollars, so I multiplied by 100 to convert to USD. The rows with NAs in the ATM column were dropped since only the 4 ATMs were being considered in this analysis. The tibble was then converted to a tsibble with key being $\text{ATM}$ and index being $\text{DATE}$.

# get the data
atm_data <- read_xlsx(link1)

# Convert to tsibble
atm <- atm_data %>%
  mutate(
    DATE = as.Date(DATE, origin = "1899-12-30"),
    Cash = as.numeric(Cash) * 100  # convert from hundreds to USD
  ) %>%
  drop_na(ATM) %>% # drop rows with NAs in ATM column
  as_tsibble(key = ATM, index = DATE)

3. Exploratory Data Analysis

In this section, I explored the data further by looking at the spread of the data by column.

3a. Data Summary

# summarize data
summary(atm)

##       DATE                ATM                 Cash        
##  Min.   :2009-05-01   Length:1460        Min.   :      0  
##  1st Qu.:2009-07-31   Class :character   1st Qu.:     50  
##  Median :2009-10-30   Mode  :character   Median :   7300  
##  Mean   :2009-10-30                      Mean   :  15558  
##  3rd Qu.:2010-01-29                      3rd Qu.:  11400  
##  Max.   :2010-04-30                      Max.   :1091976  
##                                          NA's   :5

# Looking at NA Cash values
atm %>%
  filter(is.na(Cash))

## # A tsibble: 5 x 3 [1D]
## # Key:       ATM [2]
##   DATE       ATM    Cash
##   <date>     <chr> <dbl>
## 1 2009-06-13 ATM1     NA
## 2 2009-06-16 ATM1     NA
## 3 2009-06-22 ATM1     NA
## 4 2009-06-18 ATM2     NA
## 5 2009-06-24 ATM2     NA

3b. Handling Missing Values

There are 5 NA $\text{Cash}$ values in the whole dataset, with 3 belonging to ATM1 and 2 belonging to ATM2. Given that removing these values would result in a gap in the data that is already limited by the dataset being daily values for a period of 1 year, I decided to impute these NAs with the mean value of the corresponding ATM.

# Imputing NA Cash values with the ATM's mean
atm <- atm %>%
  group_by(ATM) %>%
  mutate(
    Cash = ifelse(is.na(Cash),
                  mean(Cash, na.rm = TRUE),
                  Cash)
  ) %>%
  ungroup()

3c. Plotting Time Series

# plotting the time series for each ATM
autoplot(atm, Cash) +
  facet_wrap(~ATM, scales = "free_y") +
  labs(
    title = "Daily Cash Withdrawals by ATM",
    x = "Date",
    y = "Withdrawal (USD)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

The time series plots show us:

ATM1 may have a seasonal pattern, likely weekly seasonality
ATM2 may have a seasonal pattern, likely weekly seasonality
ATM3 had no withdrawals until towards the end of the period, then had a sudden spike in April 2010
ATM4 exhibits a large outlier in February 2010

3d. Looking Closer at ATM3

atm %>%
  filter(ATM == "ATM3" & Cash >0)

## # A tsibble: 3 x 3 [1D]
## # Key:       ATM [1]
##   DATE       ATM    Cash
##   <date>     <chr> <dbl>
## 1 2010-04-28 ATM3   9600
## 2 2010-04-29 ATM3   8200
## 3 2010-04-30 ATM3   8500

Looking at the time series plot for ATM 3, it looks like there is one extreme outlier in the time series, with all other dates having zero withdrawals. However, looking at the above table, we can see that there were 3 consecutive days with withdrawals. These are still outliers compared to the rest of the data, but the fact that they occurred consecutively means they are likely not errors and can be easily explained.

3e. Looking Closer at ATM4

atm %>%
  filter(ATM == "ATM4" & Cash > 175000)

## # A tsibble: 1 x 3 [1D]
## # Key:       ATM [1]
##   DATE       ATM       Cash
##   <date>     <chr>    <dbl>
## 1 2010-02-09 ATM4  1091976.

The outlier occurred on February 9th, 2010 with a value of $1,091,976. No other withdrawals exceeded $175,000, making this a clear and extreme anomaly. I decided to impute the value with the mean value of the day before and after.

atm <- atm %>%
  group_by(ATM) %>%
  mutate(Cash = ifelse(ATM == "ATM4" & DATE == as.Date("2010-02-09"),
                       mean(Cash[ATM == "ATM4" & 
                                 DATE %in% c(as.Date("2010-02-08"), as.Date("2010-02-10"))],
                            na.rm = TRUE),
                       Cash)) %>%
  ungroup()

3f. Variance Stabilization Check

# box cox
atm_lambda <- atm %>%
  features(Cash, features = guerrero)

atm_lambda

## # A tibble: 4 × 2
##   ATM   lambda_guerrero
##   <chr>           <dbl>
## 1 ATM1            0.237
## 2 ATM2            0.724
## 3 ATM3            1.45 
## 4 ATM4            0.450

ATM1: $\lambda = 0.2365721 \Rightarrow$ Use Box-Cox
ATM2: $\lambda = 0.7240522 \Rightarrow$ No transformation needed since close to 1
ATM3: $\lambda = 1.4461493 \Rightarrow$ No transformation needed since close to 1
ATM4: $\lambda = 0.4497706 \Rightarrow$ Square root of the data is used since $\lambda \approx 0.5$.

However, since the ATM data spans a single year and residual diagnostics later confirm constant variance, a Box–Cox transformation was not applied. I experimented with applying the square root transformation for ATM4, but it did not materially improve residual behavior or model performance. Therefore, the analysis proceeded using the untransformed data to preserve interpretability and maintain consistency across all ATMs.

4. Model Development

This section describes the development, training, and evaluation of forecasting models used to predict daily cash withdrawals for each ATM.

4a. Training

I tested 3 different forecasting models per ATM:

Seasonal Naive (SNAIVE): This is a baseline model. Each day’s forecast equals the value from 7 days earlier.
Exponential Smoothing (ETS): Captures level, trend, and seasonality automatically.
ARIMA: Autoregressive Integrated Moving Average model with automatic order selection.

# Fit Models
fit_models_atm <- atm %>%
  model(
    snaive = SNAIVE(Cash ~ lag("week")),
    ets    = ETS(Cash),
    arima  = ARIMA(Cash)
  )

4b. Model Overview

# Model info
glimpse(fit_models_atm)

## Rows: 4
## Columns: 4
## Key: ATM [4]
## $ ATM    <chr> "ATM1", "ATM2", "ATM3", "ATM4"
## $ snaive <model> [SNAIVE], [SNAIVE], [SNAIVE], [SNAIVE]
## $ ets    <model> [ETS(A,N,A)], [ETS(A,N,A)], [ETS(A,N,N)], [ETS(M,N,M)]
## $ arima  <model> [ARIMA(0,0,1)(0,1,2)[7]], [ARIMA(2,0,2)(0,1,1)[7]], [ARIMA(0,0,…

4c. Best Model Selection

# Check accuracy for each model - focusing on MAE, RMSE, and MAPE
accuracy(fit_models_atm)

## # A tibble: 12 × 11
##    ATM   .model .type         ME   RMSE    MAE    MPE  MAPE  MASE RMSSE     ACF1
##    <chr> <chr>  <chr>      <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
##  1 ATM1  snaive Training -3.63e0  2787. 1.78e3  -96.6 116.  1     1      0.151  
##  2 ATM1  ets    Training -2.20e1  2384. 1.52e3 -107.  122.  0.853 0.855  0.125  
##  3 ATM1  arima  Training -7.36e0  2341. 1.47e3 -103.  118.  0.825 0.840 -0.00795
##  4 ATM2  snaive Training  2.23e0  3005. 2.08e3 -Inf   Inf   1     1      0.0459 
##  5 ATM2  ets    Training -7.10e1  2506. 1.79e3 -Inf   Inf   0.860 0.834  0.0198 
##  6 ATM2  arima  Training -8.81e1  2407. 1.70e3 -Inf   Inf   0.819 0.801 -0.00507
##  7 ATM3  snaive Training  7.35e1   804. 7.35e1  100   100   1.00  1.00   0.640  
##  8 ATM3  ets    Training  2.70e1   503. 2.72e1  Inf   Inf   0.371 0.625 -0.00717
##  9 ATM3  arima  Training  2.71e1   503. 2.71e1   34.6  34.6 0.370 0.625  0.0124 
## 10 ATM4  snaive Training -3.70e2 45304. 3.46e4 -392.  447.  1     1      0.0621 
## 11 ATM4  ets    Training -3.04e3 35425. 2.77e4 -452.  484.  0.800 0.782  0.0574 
## 12 ATM4  arima  Training  3.16e0 33952. 2.80e4 -507.  540.  0.809 0.749  0.00472

The best model was determined for each ATM by assessing which model has lower MAE, RMSE, and MAPEs.

ATM1: ARIMA achieved the lowest MAE and RMSE, outperforming ETS and SNAIVE. The ARIMA model best captured short-term fluctuations and weekly seasonality. Specifically, ARIMA(0,0,1)(0,1,2)[7] is selected, which is a seasonal moving average model where weekly seasonality is handled by 1 seasonal difference and two seasonal MA terms.
ATM2: ARIMA had the lowest MAE and RMSE. MAPE was infinite due to their being zero-values at some dates, but relative values confirm the advantage of using ARIMA. Specifically, ARIMA(2,0,2)(0,1,1)[7] is selected, which combines both AR and MA components plus a single seasonal MA term. It models both short-term autocorrelation and weekly seasonal fluctuations.
ATM3: The series remained flat for most of the period and then had a sudden spike towards the end of the period. The ETS and ARIMA models have nearly identical MAE and RMSE values. MAPE is infinite for the ETS model, while ARIMA has a finite number. ARIMA was therefore selected as the preferred model, as it handled long zero periods more realistically and yielded stable percentage errors. Specifically, ARIMA(0,0,2) is selected, which is a simple non-seasonal moving average model. The series is stationary and primarily influenced by the average of the last two random shocks. It does not account for seasonality.
ATM4: For ATM4, ARIMA and ETS both performed similarly. The ARIMA model achieved the lowest RMSE and residual autocorrelation, indicating a more accurate fit after accounting for the February 2010 outlier. Specifically, ARIMA(3,0,2)(1,0,0)[7] w/ mean is selected, which effectively captures the short-term fluctuations and the weekly seasonal pattern.

4d. Residual Diagnostics

ATM1

The below residual diagnostics indicate that the model is well-fit.

Time series plot shows that the residuals are scattered about 0 with no visible upward or downward trend
ACF confirms no significant autocorrelation
Residuals appear approximately normal based on the histogram

These diagnostics confirm that the model adequately captures the trend and seasonality, and no further transformations are necessary.

# ATM1
fit_models_atm %>%
  filter(ATM == "ATM1") %>%
  select(arima) %>%  
  gg_tsresiduals()

ATM2

The below residual diagnostics indicate that the model is well-fit.

Time series plot shows that the residuals are scattered about 0 with no visible upward or downward trend
ACF confirms no significant autocorrelation
Residuals appear approximately normal based on the histogram

These diagnostics confirm that the model adequately captures the trend and seasonality, and no further transformations are necessary.

# ATM2
fit_models_atm %>%
  filter(ATM == "ATM2") %>%
  select(arima) %>%  
  gg_tsresiduals()

ATM3

The residual diagnostics for ATM3 indicate that the ARIMA(0,0,2) model provides a reasonable fit given the atypical data structure.

The residuals are nearly zero for most of the series, with isolated large deviations corresponding to the few active days in late April 2010.
ACF confirms that there is no significant autocorrelation or trend remaining.
While we cannot confirm normality, there is not enough data present for this ATM, so this is expected and we can continue with the forecast.

These diagnostics confirms that a box-cox transformation would not be necessary here as the issue is not variance growth, but lack of activity.

# ATM3
fit_models_atm %>%
  filter(ATM == "ATM3") %>%
  select(arima) %>%  
  gg_tsresiduals()

ATM4

The below residual diagnostics for ATM4 indicate that the ARIMA(3,0,2)(1,0,0)[7] w/ mean model provides a reasonable fit, with a few notable points.

The residuals fluctuate around zero with no clear trend or seasonal pattern.
The majority of autocorrelation spikes fall within the blue confidence bounds, suggesting that residuals behave approximately as white noise. There are a few small spikes, with one being outside of the blue bounds, which indicates that there may be some weak remaining weekly seasonality.
The histogram is slightly right-skewed, but most of the residuals cluster near zero.

Overall, these diagnostics confirm that the model adequately captures the underlying structure of ATM4 withdrawals. As discussed previously, I experimented with applying the square root of the data due to the lambda being approximately 0.5, but the model performance and residual behavior did not improve. I decided that no transformation was necessary because of this.

# ATM4
fit_models_atm %>%
  filter(ATM == "ATM4") %>%
  select(arima) %>%  
  gg_tsresiduals()

5. Forecasting

Once the best models were determined for each ATM, I forecasted the cash withdrawals for each ATM for the month of May.

5a. Forecasting

acc_tables_atm <- accuracy(fit_models_atm)

# select best models
best_models <- acc_tables_atm %>%
  group_by(ATM) %>%
  arrange(RMSE, MAE, .by_group = TRUE) %>%
  slice(1) %>%
  ungroup() %>%
  select(ATM, .model)

# forecast
future <- new_data(atm, 31)

# Forecast with ALL models 
fc_all <- forecast(fit_models_atm, new_data = future)

# Keep only the chosen model per ATM
final_fc <- fc_all %>%
  inner_join(best_models, by = c("ATM", ".model"))

# write to excel
final_fc %>%
  write_xlsx('May2010_ATM_Forecasts.xlsx')

5b. Forecast Plots

# Plot forecasts for each ATM
autoplot(final_fc, atm) +
  labs(title = "Forecasted Cash Withdrawals for May 2010", y = "USD") +
  facet_wrap(~ATM, scales = "free_y") +
  xlab("Date") +
  ylab("Withdrawal (USD)")+
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
  )

6. Results

The forecast plots present the 31 day forecasts for May 2010, overlaid on the historical withdrawal data for each ATM. The shaded regions represent the 80% and 95% prediction intervals, showing the expected range of cash withdrawals based on the selected model for each ATM.

For ATM 1 and ATM 2, the ARIMA model was used and produces forecasts that closely follow the historical weekly pattern observed throughout the period. The prediction intervals remain relatively narrow, reflecting stable withdrawal behavior and strong weekly seasonality.

ATM 3 exhibited zero activity for most of the period and then had a sudden spike in April 2010. ARIMA generates a largely flat forecast with wide uncertainty. This is consistent with its historical behavior and lack of seasonal structure. If there was more data available from previous years, we’d be able to see if there was a cyclical trend with this data.

For ATM 4, the ARIMA model was applied. The forecasts show moderately volatile but centered withdrawals around historical averages. The prediction intervals are very wide, indicating higher uncertainty. The reason for this is likely due to the ATM’s more irregular withdrawal behavior. Despite this, the model still captures the overall level and weekly seasonality adequately.

Overall, the forecasts suggest that ATM 1 and ATM 2 are expected to maintain consistent withdrawal volumes through the month of May, while ATM 3 and ATM 4 show greater variability and uncertainty due to their irregular historical behavior.

Part B - Forecasting Power

1. Introduction

1a. Goal

Using a simple dataset of residential power usage between January 1998 and December 2013, the goal is to model this data and provide a monthly forecast for 2014.

1b. Dataset Overview

# read data
power <- read_xlsx(link2)

# look at first 5 rows
head(power, 5)

## # A tibble: 5 × 3
##   CaseSequence `YYYY-MMM`     KWH
##          <dbl> <chr>        <dbl>
## 1          733 1998-Jan   6862583
## 2          734 1998-Feb   5838198
## 3          735 1998-Mar   5420658
## 4          736 1998-Apr   5010364
## 5          737 1998-May   4665377

The dataset contains three columns:

CaseSequence: Sequence in the time series in $\text{dbl}$
YYYY-MMM: Month and year in $\text{chr}$ format
KWH: Power consumption in Kilowatt hours in $\text{dbl}$ format.

2. Data Preparation

To prepare the data for analysis and forecasting, I converted the YYYY-MMM column into a $\text{date}$ object and renamed to Date, removed the original YYYY-MMM column, and then converted the tibble into a tsibble.

# convert YYYY-MMM column into date format and convert data into tsibble for time series
power <- power %>%
  mutate(
    Date = yearmonth(`YYYY-MMM`)  # convert to yearmonth object
  ) %>%
  select(CaseSequence, Date, KWH) %>%
  as_tsibble(index = Date)

3. Exploratory Data Analysis

In this section, I explored the data further by looking at the spread of the data by column.

3a. Data Summary

summary(power$KWH)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##   770523  5429912  6283324  6502475  7620524 10655730        1

range(power$Date)

## <yearmonth[2]>
## [1] "1998 Jan" "2013 Dec"

3b. Handling Missing Values

power %>%
  filter(is.na(KWH))

## # A tsibble: 1 x 3 [1M]
##   CaseSequence     Date   KWH
##          <dbl>    <mth> <dbl>
## 1          861 2008 Sep    NA

# plotting the power time series
autoplot(power, KWH) +
  labs(
    title = "Residential Power Usage\nBetween January 1998 and December 2013",
    x = "Month",
    y = "Power (KWH)"
  )

We can see that there is 1 NA in the KWH column. Looking at the time series plot, this NA causes a gap in the data. I decided to impute with the mean value for that month (September) across all years.

# Impute missing September values with the average September KWH across all years
power <- power %>%
  mutate(
    month = month(Date, label = TRUE),   # extract month name
    KWH = ifelse(
      is.na(KWH) & month == "Sep",
      mean(KWH[month == "Sep"], na.rm = TRUE),  # average KWH of all Septembers
      KWH
    )
  ) %>%
  select(-month)

3c. Plotting the Time Series After Handling Missing Values

# plotting the power time series
autoplot(power, KWH) +
  labs(
    title = "Residential Power Usage\nBetween January 1998 and December 2013",
    x = "Month",
    y = "Power (KWH)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

We can see after imputing the missing value, the gap is now gone.

This time series plot shows us:

There is strong seasonality to this data.
There is a slight upward trend towards the end of the time series, but remains pretty stable between 1998 and 2009.
There is an outlier around 2010/2011. This could represent missing data, a reporting error, or a major event like a power outage.
After the major outlier, the seasonal pattern resumes with even stronger peaks.

3d. Looking Closer at the Outlier

power %>%
  filter(KWH<3000000)

## # A tsibble: 1 x 3 [1M]
##   CaseSequence     Date    KWH
##          <dbl>    <mth>  <dbl>
## 1          883 2010 Jul 770523

The outlier occurred in July 2010. The outlier is not 0, it is 770,523 KWH. There is not enough information to explain whether this is a reporting error or a major event (like a power outage). We will consider this outlier in our forecasting.

3f. Variance Check

# box cox
power_lambda <- power %>%
  features(KWH, features = guerrero) %>%
  pull(lambda_guerrero)

power_lambda

## [1] 0.1226627

The lambda is 0.1226627, which is very close to 0. I will apply a box-cox transformation with $\lambda = 0.1226627$.

# applying box-cox transformation
power$KWH_transformed <- box_cox(power$KWH, power_lambda)

# plot after transformation
autoplot(power, KWH_transformed) +
  labs(
    title = "Residential Power Usage\nBetween January 1998 and December 2013\nBox-Cox Transformation",
    x = "Month",
    y = "Power (KWH)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

After the Box-Cox transformation, the variance appears slightly smoother, but it wasn’t highly unstable in the first place, so the overall structure didn’t change much. The seasonal structure of the data is preserved. The transformation does not eliminate the outlier, which confirms that the outlier is a genuine anomaly. Given the limited improvement and loss of direct interpretability, I decided to not apply the box-cox transformation to the forecasting model.

4. Model Development

This section describes the development, training, and selection of the forecasting models used to predict the monthly residential power usage for 2014.

4a. Training

I tested 3 different forecasting models for this data:

ETS: Captures smooth changes in trend and seasonality automatically
ARIMA: Handles both autoregression and seasonality
STL + ETS: Separates trend + seasonality first, then forecasts the components.

The SNAIVE model was not considered because it does not capture trend + seasonality interaction.

# fit models
fit_models_power <- power %>%
  model(
    ets = ETS(KWH),
    arima = ARIMA(KWH),
    stl_ets = decomposition_model(
      STL(KWH ~ season(window = "periodic")),
      ETS(season_adjust ~ error("A") + trend("A") + season("A"))
    )
  )

4b. Model Overview

# MODEL INFO
glimpse(fit_models_power)

## Rows: 1
## Columns: 3
## $ ets     <model> [ETS(M,N,M)]
## $ arima   <model> [ARIMA(0,0,1)(1,1,1)[12] w/ drift]
## $ stl_ets <model> [STL decomposition model]

4c. Best Model Selection

# Check accuracy for each model - focusing on MAE, RMSE, and MAPE
accuracy(fit_models_power)

## # A tibble: 3 × 10
##   .model  .type         ME    RMSE     MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <chr>   <chr>      <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 ets     Training  62242. 826660. 500307. -4.33  12.0 0.729 0.705 0.173 
## 2 arima   Training -25758. 823919. 489801. -5.52  11.6 0.714 0.702 0.0131
## 3 stl_ets Training  53043. 821514. 504335. -4.46  11.8 0.735 0.700 0.233

Both the STL_ETS and ARIMA produce strong forecasts. Although STL+ETS yielded the smallest RMSE, ARIMA achieved lower average errors (MAE and MAPE) and cleaner residual diagnostics (near-zero autocorrelation). So, ARIMA was selected as the preferred model for forecasting 2014 residential power usage. Specifically, the selected model is ARIMA(0,0,1)(1,1,1)[12] w/ drift.

5. Forecasting

Once the best model was selected, I forecasted the power usage for 2014.

5a. Forecasting

# ARIMA model
arima_model <- power %>%
  model(ARIMA(KWH))

# future 12 months
future_power <- new_data(power,12)

# generate forecasts
fc_power <- forecast(arima_model, new_data = future_power)

# write to excel
fc_power %>%
  write_xlsx('2014_Power_Usage_Forecasts.xlsx')

5b. Forecast Plots

# plot forecasts
autoplot(fc_power, power) +
  labs(
    title = "Residential Power Usage Forecast for 2014\n(ARIMA Model)",
    x = "Month",
    y = "Power (KWH)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face='bold')
  )

6. Results

fc_power

## # A fable: 12 x 4 [1M]
## # Key:     .model [1]
##    .model         Date
##    <chr>         <mth>
##  1 ARIMA(KWH) 2014 Jan
##  2 ARIMA(KWH) 2014 Feb
##  3 ARIMA(KWH) 2014 Mar
##  4 ARIMA(KWH) 2014 Apr
##  5 ARIMA(KWH) 2014 May
##  6 ARIMA(KWH) 2014 Jun
##  7 ARIMA(KWH) 2014 Jul
##  8 ARIMA(KWH) 2014 Aug
##  9 ARIMA(KWH) 2014 Sep
## 10 ARIMA(KWH) 2014 Oct
## 11 ARIMA(KWH) 2014 Nov
## 12 ARIMA(KWH) 2014 Dec
## # ℹ 2 more variables: KWH <dist>, .mean <dbl>

The forecast plots show the 12 month forecasts for 2014, overlaid on historical residential power usage data between 1998 and 2013. The shaded regions represent the 80% and 95% prediction intervals, showing the expected range of power usage. The forecasts show a clear seasonal pattern consistent with prior years, with peaks in the winter (January) and late summer (August), consistent with heating/cooling demand cycles. Prediction uncertainty remains moderate and stable across the year. Overall, we can conclude that this is a relatively reliable forecast.

DATA 624 Project 1

Kristin Lussi

2025-10-26

Part A - Forecasting ATM Cash Withdrawals for May 2010

1. Introduction

1a. Goal:

1b. Dataset Overview:

2. Data Preparation

3. Exploratory Data Analysis

3a. Data Summary

3b. Handling Missing Values

3c. Plotting Time Series

3d. Looking Closer at ATM3

3e. Looking Closer at ATM4

3f. Variance Stabilization Check

4. Model Development

4a. Training

4b. Model Overview

4c. Best Model Selection

4d. Residual Diagnostics

ATM1

ATM2

ATM3

ATM4

5. Forecasting

5a. Forecasting

5b. Forecast Plots

6. Results

Part B - Forecasting Power

1. Introduction

1a. Goal

1b. Dataset Overview

2. Data Preparation

3. Exploratory Data Analysis

3a. Data Summary

3b. Handling Missing Values

3c. Plotting the Time Series After Handling Missing Values

3d. Looking Closer at the Outlier

3f. Variance Check

4. Model Development

4a. Training

4b. Model Overview

4c. Best Model Selection

5. Forecasting

5a. Forecasting

5b. Forecast Plots

6. Results