Introduction

This project applies time series forecasting techniques to two datasets:

Part A: Forecasting cash withdrawals for 4 ATM machines for May 2010
Part B: Forecasting monthly residential power consumption for 2014

The methods used include time series decomposition, exponential smoothing, and ARIMA models.

# Load required libraries
library(tidyverse)
library(readxl)
library(writexl)
library(fpp3)

Part A: ATM Cash Withdrawal Forecasting

1. Data Loading and Exploration

# Load the ATM data
atm_data <- read_excel("ATM624Data.xlsx")

# Convert DATE to proper Date format based on Excel file
atm_data <- atm_data %>%
  mutate(DATE = as.Date(DATE, origin = "1899-12-30"))

# Display structure
str(atm_data)

## tibble [1,474 × 3] (S3: tbl_df/tbl/data.frame)
##  $ DATE: Date[1:1474], format: "2009-05-01" "2009-05-01" ...
##  $ ATM : chr [1:1474] "ATM1" "ATM2" "ATM1" "ATM2" ...
##  $ Cash: num [1:1474] 96 107 82 89 85 90 90 55 99 79 ...

# Preview data
head(atm_data, 10)

## # A tibble: 10 × 3
##    DATE       ATM    Cash
##    <date>     <chr> <dbl>
##  1 2009-05-01 ATM1     96
##  2 2009-05-01 ATM2    107
##  3 2009-05-02 ATM1     82
##  4 2009-05-02 ATM2     89
##  5 2009-05-03 ATM1     85
##  6 2009-05-03 ATM2     90
##  7 2009-05-04 ATM1     90
##  8 2009-05-04 ATM2     55
##  9 2009-05-05 ATM1     99
## 10 2009-05-05 ATM2     79

tail(atm_data, 10)

## # A tibble: 10 × 3
##    DATE       ATM     Cash
##    <date>     <chr>  <dbl>
##  1 2010-04-21 ATM4  557.  
##  2 2010-04-22 ATM4  386.  
##  3 2010-04-23 ATM4  165.  
##  4 2010-04-24 ATM4    5.45
##  5 2010-04-25 ATM4  542.  
##  6 2010-04-26 ATM4  404.  
##  7 2010-04-27 ATM4   13.7 
##  8 2010-04-28 ATM4  348.  
##  9 2010-04-29 ATM4   44.2 
## 10 2010-04-30 ATM4  482.

# Summary
summary(atm_data)

##       DATE                ATM                 Cash        
##  Min.   :2009-05-01   Length:1474        Min.   :    0.0  
##  1st Qu.:2009-08-01   Class :character   1st Qu.:    0.5  
##  Median :2009-11-01   Mode  :character   Median :   73.0  
##  Mean   :2009-10-31                      Mean   :  155.6  
##  3rd Qu.:2010-02-01                      3rd Qu.:  114.0  
##  Max.   :2010-05-14                      Max.   :10919.8  
##                                          NA's   :19

Missing Values Analysis

# Check for missing values
cat("Missing values by column:\n")

## Missing values by column:

colSums(is.na(atm_data))

## DATE  ATM Cash 
##    0   14   19

# Missing values by ATM
atm_data %>%
  group_by(ATM) %>%
  summarise(
    Total_Records = n(),
    Missing_Cash = sum(is.na(Cash)),
    Pct_Missing = round(100 * Missing_Cash / Total_Records, 2)
  ) %>%
  knitr::kable(caption = "Missing Data Summary by ATM")

Missing Data Summary by ATM
ATM	Total_Records	Missing_Cash	Pct_Missing
ATM1	365	3	0.82
ATM2	365	2	0.55
ATM3	365	0	0.00
ATM4	365	0	0.00
NA	14	14	100.00

2. Data Cleaning

I’ll handle missing values using linear interpolation. For ATM data, this is appropriate because withdrawal patterns are relatively stable over short periods. With only 1% of values missing and gaps appearing to be isolated individual days, interpolating between adjacent observations provides reasonable estimates without introducing significant bias into the analysis.

# Remove rows with missing ATM identifiers
atm_clean <- atm_data %>%
  filter(!is.na(ATM)) %>%
  mutate(DATE = as.Date(DATE))

# For each ATM, fill in missing cash values using linear interpolation
# Using tidyr::fill and manual interpolation
atm_clean <- atm_clean %>%
  group_by(ATM) %>%
  arrange(DATE) %>%
  mutate(
    # Mark positions of non-NA values
    Cash = approx(x = seq_along(Cash), y = Cash, xout = seq_along(Cash), 
                  method = "linear", rule = 2)$y
  ) %>%
  ungroup()

# Check date coverage by ATM
atm_clean %>%
  group_by(ATM) %>%
  summarise(
    Start_Date = format(as.Date(min(DATE)), "%Y-%m-%d"),
    End_Date = format(as.Date(max(DATE)), "%Y-%m-%d"),
    N_Days = n(),
    Mean_Cash = round(mean(Cash, na.rm = TRUE), 2),
    SD_Cash = round(sd(Cash, na.rm = TRUE), 2)
  ) %>%
  knitr::kable(caption = "Date Coverage and Summary Statistics by ATM")

Date Coverage and Summary Statistics by ATM
ATM	Start_Date	End_Date	N_Days	Mean_Cash	SD_Cash
ATM1	2009-05-01	2010-04-30	365	84.15	36.64
ATM2	2009-05-01	2010-04-30	365	62.59	38.81
ATM3	2009-05-01	2010-04-30	365	0.72	7.94
ATM4	2009-05-01	2010-04-30	365	474.04	650.94

3. Time Series Visualization

# Time series plot for all ATMs
ggplot(atm_clean, aes(x = DATE, y = Cash, color = ATM)) +
  geom_line(alpha = 0.7) +
  facet_wrap(~ATM, ncol = 2, scales = "free_y") +
  labs(
    title = "ATM Cash Withdrawals Over Time",
    subtitle = "May 2009 - May 2010",
    x = "Date",
    y = "Cash (hundreds of dollars)"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  )

# Box plots by day of week
atm_clean %>%
  mutate(DayOfWeek = wday(DATE, label = TRUE)) %>%
  ggplot(aes(x = DayOfWeek, y = Cash, fill = ATM)) +
  geom_boxplot(alpha = 0.7) +
  facet_wrap(~ATM, scales = "free_y") +
  labs(
    title = "Cash Withdrawal Patterns by Day of Week",
    x = "Day of Week",
    y = "Cash (hundreds of dollars)"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5)
  )

Looking at the time series plots, ATM1 and ATM2 behave like one might expect from a typical consumer ATM. Withdrawals range from $0-$180 (hundreds of dollars) with predictable weekly patterns throughout the year. Interestingly, both show a consistent dip on Thursdays. Maybe more people get paid Monday or Friday, withdrawing shortly after that. The box plots confirm the weekly cycles are real, with activity picking up on weekends when people are out spending. This predictability makes them perfect candidates for ARIMA forecasting.

ATM3 and ATM4 are a different story altogether. ATM3 is basically flat at zero for nearly the entire year until one random spike around April/May 2010. Something seems off here—the machine might be broken, or it’s in an inaccessible location. ATM4, on the other hand, is operating in a whole different zone We’re talking withdrawals up to $10,000+ (hundreds of dollars)—about 100 times what the consumer ATMs see. There’s no weekly pattern here; instead, we see irregular spikes that suggest “business transactions” rather than everyday banking. This machine is likely serving commercial clients rather than everyday consumers.

4. Time Series Decomposition

I’ll decompose the time series for each ATM to understand the trend, seasonal, and irregular components.

# Convert to tsibble format 
atm_tsibble <- atm_clean %>%
  as_tsibble(key = ATM, index = DATE)

# STL decomposition for each ATM
atm_decomp <- atm_tsibble %>%
  model(
    stl = STL(Cash ~ season(window = 7))
  ) %>%
  components()

# Plot decomposition for all ATMs
atm_decomp %>%
  autoplot() +
  labs(title = "STL Decomposition by ATM") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Individual decomposition plots
for (atm_name in unique(atm_clean$ATM)) {
  p <- atm_decomp %>%
    filter(ATM == atm_name) %>%
    autoplot() +
    labs(title = paste("STL Decomposition -", atm_name)) +
    theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5))
  print(p)
}

The STL decomposition breaks down each ATM’s cash withdrawal pattern into three components: trend, weekly seasonality, and remainder (irregular fluctuations). This analysis helps us understand what’s really driving the patterns we see in the raw data.

For ATM1 and ATM2, the decomposition reveals some interesting dynamics. Both machines show a gradually declining trend over the year, with ATM1 dropping from around 100 to about 75 (hundreds of dollars) and ATM2 following a similar pattern from roughly 80 down to 60. This suggests these locations might be experiencing reduced foot traffic or changing customer behavior over time. The seasonal component clearly shows the weekly cycle we noticed earlier—those regular peaks and valleys that repeat every seven days. What’s particularly useful here is seeing how consistent the weekly pattern is throughout the entire year. The remainder component (the random noise) is relatively small for both machines, which is exactly what we want to see. It means most of the variation in the data can be explained by the trend and weekly patterns, making these specific ATMs highly predictable for forecasting purposes.

ATM3’s decomposition tells the story of a machine that’s been sitting idle, like we saw in the earlier visualization. The trend line is flat at near-zero for the entire period until that sudden jump at the end. There’s no meaningful seasonal pattern because there’s no activity to create one. The remainder component only shows action at the very end when the machine finally comes to life. This decomposition confirms what we suspected—this machine wasn’t operational for most of the observation period.

ATM4’s trend component sits around 500-700 (hundreds of dollars) for most of the year, then shows a huge spike in late March 2010 where it jumps to over 1,500. The seasonal component is relatively small compared to the overall scale of transactions. This makes sense, given that this machine’s activity seems to be driven by business needs rather than weekly consumer cycles. The remainder component stands out for ATM4—it’s highly variable, showing irregular large withdrawals. The fact that the irregular component is so much larger than the seasonal component tells us that trying to forecast this ATM based on patterns alone will be challenging. We’d probably get better results if we had additional information about the type of business clients using this machine. If we knew data such as scheduled payroll dates and known large transaction requests, it might inform better forecasting.

5. Model Building and Forecasting

I’ll apply both Exponential Smoothing (ETS) and ARIMA models to each ATM using the fable framework.

5.1 ATM1 Forecasting

# Prepare ATM1 data
atm1_tsibble <- atm_tsibble %>%
  filter(ATM == "ATM1")

# Fit multiple models
atm1_fit <- atm1_tsibble %>%
  model(
    ets = ETS(Cash),
    arima = ARIMA(Cash)
  )

# Model summaries
cat("ATM1 - Model Details Below:\n\n")

## ATM1 - Model Details Below:

report(atm1_fit %>% select(ets))

## Series: Cash 
## Model: ETS(A,N,A) 
##   Smoothing parameters:
##     alpha = 0.02126635 
##     gamma = 0.3298864 
## 
##   Initial states:
##      l[0]      s[0]    s[-1]    s[-2]    s[-3]    s[-4]    s[-5]   s[-6]
##  77.02956 -55.27235 2.531658 16.15413 4.529656 11.40245 12.92571 7.72875
## 
##   sigma^2:  580.2586
## 
##      AIC     AICc      BIC 
## 4487.018 4487.639 4526.017

report(atm1_fit %>% select(arima))

## Series: Cash 
## Model: ARIMA(0,0,1)(0,1,2)[7] 
## 
## Coefficients:
##          ma1     sma1     sma2
##       0.1950  -0.5806  -0.1037
## s.e.  0.0546   0.0506   0.0494
## 
## sigma^2 estimated as 556.4:  log likelihood=-1640.01
## AIC=3288.03   AICc=3288.14   BIC=3303.55

# Generate forecasts (31 days for May 2010)
atm1_fc <- atm1_fit %>%
  forecast(h = 31)

# Check accuracy
cat("\nModel Accuracy Comparison:\n")

## 
## Model Accuracy Comparison:

accuracy(atm1_fit) %>%
  arrange(RMSE) %>%
  knitr::kable(digits = 3)

ATM	.model	.type	ME	RMSE	MAE	MPE	MAPE	MASE	RMSSE	ACF1
ATM1	arima	Training	-0.074	23.262	14.54	-102.259	117.148	0.822	0.838	-0.009
ATM1	ets	Training	-0.048	23.790	15.08	-106.835	121.865	0.852	0.857	0.144

# Plot forecasts
atm1_fc %>%
  autoplot(atm1_tsibble, level = NULL) +
  labs(
    title = "ATM1: 31-Day Forecast Comparison",
    x = "Date",
    y = "Cash (hundreds of dollars)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Plot ARIMA forecast with intervals
atm1_fc %>%
  filter(.model == "arima") %>%
  autoplot(atm1_tsibble) +
  labs(
    title = "ATM1: May 2010 Forecast (ARIMA)",
    x = "Date",
    y = "Cash (hundreds of dollars)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Residual diagnostics
atm1_fit %>%
  select(arima) %>%
  gg_tsresiduals()

ATM1 Model Interpretation

The ARIMA model slightly outperforms ETS for ATM1, with an RMSE of 23.262 compared to 23.790. The selected ARIMA(0,0,1)(0,1,2)[7] specification handles the weekly seasonality through seasonal moving average terms at a 7-day lag, while the seasonal differencing (the middle “1” in the seasonal part) accounts for the gradual declining trend we saw in the decomposition.

Looking at the forecast comparison plot, both models produce very similar predictions for May 2010, which makes sense given their comparable accuracy metrics. The ARIMA forecast maintains the weekly cyclical pattern we’ve observed throughout the historical data, with the confidence intervals widening as we move further into the forecast period—this is expected and reflects increasing uncertainty about future values. The forecast appears reasonable, continuing the patterns we’ve seen historically, with the typical weekly oscillations between low and high usage days.

The residual diagnostics look solid. The innovation residuals scatter randomly around zero without any obvious patterns, which is what we want to see. The ACF plot confirms this—nearly all the lags stay within those blue dashed lines, which means there’s no leftover correlation in the residuals. The histogram of residuals is roughly bell-shaped and centered at zero, suggesting the residuals are approximately normally distributed. A few larger outliers appear in the residual plot, but these are relatively rare and don’t seem to follow any systematic pattern. OOverall, the diagnostics tell us that we can be fairly confident in these May 2010 forecasts.

5.2 ATM2 Forecasting

# Prepare ATM2 data
atm2_tsibble <- atm_tsibble %>%
  filter(ATM == "ATM2")

# Fit models
atm2_fit <- atm2_tsibble %>%
  model(
    ets = ETS(Cash),
    arima = ARIMA(Cash)
  )

cat("ATM2 - Selected ARIMA Model Below:\n")

## ATM2 - Selected ARIMA Model Below:

report(atm2_fit %>% select(arima))

## Series: Cash 
## Model: ARIMA(2,0,2)(0,1,1)[7] 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2     sma1
##       -0.4320  -0.9130  0.4773  0.8048  -0.7547
## s.e.   0.0553   0.0407  0.0861  0.0556   0.0381
## 
## sigma^2 estimated as 602.5:  log likelihood=-1653.67
## AIC=3319.33   AICc=3319.57   BIC=3342.61

# Generate forecast
atm2_fc <- atm2_fit %>%
  forecast(h = 31)

# Plot ARIMA forecast
atm2_fc %>%
  filter(.model == "arima") %>%
  autoplot(atm2_tsibble) +
  labs(
    title = "ATM2: May 2010 Forecast (ARIMA)",
    x = "Date",
    y = "Cash (hundreds of dollars)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Residual diagnostics
atm2_fit %>%
  select(arima) %>%
  gg_tsresiduals()

ATM2 Model Interpretation

The ARIMA model for ATM2 handles the weekly patterns and declining trend we saw in the decomposition, similar to what we did with ATM1. The automatic selection process picked a model that best fits ATM2’s particular usage patterns.

The May 2010 forecast continues the weekly up-and-down pattern we’ve seen throughout the year. The forecast looks reasonable and maintains the typical range of variation we’ve observed in the historical data.

The residuals look good—they scatter around zero without any obvious patterns, which tells us the model captured what it needed to from the data. The ACF plot backs this up, with nearly all the lags staying within those bounds. The histogram shows the residuals are roughly normal and centered at zero. There are a few bigger outliers scattered throughout, but they don’t seem to follow any particular pattern or cluster in one time period. Overall, the diagnostics give us good reason to trust these May forecasts.

5.3 ATM3 Forecasting

atm3_tsibble <- atm_tsibble %>%
  filter(ATM == "ATM3")

atm3_fit <- atm3_tsibble %>%
  model(
    arima = ARIMA(Cash)
  )

cat("ATM3 - Selected ARIMA Model Below:\n")

## ATM3 - Selected ARIMA Model Below:

report(atm3_fit)

## Series: Cash 
## Model: ARIMA(0,0,2) 
## 
## Coefficients:
##          ma1     ma2
##       0.8392  0.8557
## s.e.  0.0496  0.0611
## 
## sigma^2 estimated as 25.4:  log likelihood=-1108.69
## AIC=2223.39   AICc=2223.46   BIC=2235.09

atm3_fc <- atm3_fit %>%
  forecast(h = 31)

atm3_fc %>%
  autoplot(atm3_tsibble) +
  labs(
    title = "ATM3: May 2010 Forecast (ARIMA)",
    x = "Date",
    y = "Cash (hundreds of dollars)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

atm3_fit %>%
  gg_tsresiduals()

ATM3 presents a difficult forecasting situation. With nearly 11 months of zero activity followed by a brief spike in late April, there’s not enough meaningful pattern for the model to learn from. The data is essentially a flat line at zero with one outlier, which makes it challenging to identify any reliable trends or seasonal patterns.

The May 2010 forecast predicts near-zero values, continuing the baseline that dominated the historical period. While this makes sense given what the model saw, it raises real questions about usefulness. If the machine was recently repaired and brought online (as that late April spike suggests), the historical data won’t reflect its future behavior at all. We’d be predicting based on a broken machine’s performance. On the other hand, if it’s still non-functional, we’re forecasting something that won’t generate any cash flow anyway. Before putting any trust into these predictions, ATM3 needs operational investigation to determine if it will actually be functioning in May 2010.

5.4 ATM4 Forecasting

atm4_tsibble <- atm_tsibble %>%
  filter(ATM == "ATM4")

atm4_fit <- atm4_tsibble %>%
  model(
    arima = ARIMA(Cash)
  )

cat("ATM4 - Selected ARIMA Model:\n")

## ATM4 - Selected ARIMA Model:

report(atm4_fit)

## Series: Cash 
## Model: ARIMA(0,0,0) w/ mean 
## 
## Coefficients:
##       constant
##       474.0433
## s.e.   34.0248
## 
## sigma^2 estimated as 423718:  log likelihood=-2882.03
## AIC=5768.06   AICc=5768.1   BIC=5775.86

atm4_fc <- atm4_fit %>%
  forecast(h = 31)

atm4_fc %>%
  autoplot(atm4_tsibble) +
  labs(
    title = "ATM4: May 2010 Forecast (ARIMA)",
    x = "Date",
    y = "Cash (hundreds of dollars)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

atm4_fit %>%
  gg_tsresiduals()

ATM4 is completely different from the other ATMs, as we discussed before, and the model selection reflects that. The automatic ARIMA process selected ARIMA(0,0,0) with mean, essentially just forecasting the average value of $474 (hundreds of dollars). This might seem overly simplistic, but it makes sense given ATM4’s behavior. With huge irregular spikes reaching $10,000+ and no consistent pattern, there’s no predictable structure for a more complex model to capture.

The May 2010 forecast is a flat line at the historical mean of around $474. Given the extreme variability in the transaction history (re: those huge spikes we saw earlier) this simple average-based forecast isn’t particularly useful for cash management planning. The residual diagnostics show that one spike dominates everything, with the histogram heavily concentrated near zero but with extreme outliers. For ATM4, traditional time series forecasting has limited value—better forecasts would require business intelligence about planned large withdrawals from the commercial clients using this machine.

6. Export Part A Forecasts

# Combine all forecasts using ARIMA model
atm_all_forecasts <- bind_rows(
  atm1_fc %>% filter(.model == "arima") %>% as_tibble() %>% mutate(ATM = "ATM1"),
  atm2_fc %>% filter(.model == "arima") %>% as_tibble() %>% mutate(ATM = "ATM2"),
  atm3_fc %>% as_tibble() %>% mutate(ATM = "ATM3"),
  atm4_fc %>% as_tibble() %>% mutate(ATM = "ATM4")
)

# Filter for May 2010 only
atm_forecasts <- atm_all_forecasts %>%
  filter(month(DATE) == 5, year(DATE) == 2010) %>%
  select(DATE, ATM, Cash = .mean) %>%
  mutate(Cash = round(Cash, 2)) %>%
  arrange(ATM, DATE)

# Display forecast summary
atm_forecasts %>%
  group_by(ATM) %>%
  summarise(
    N_Days = n(),
    Mean_Forecast = round(mean(Cash), 2),
    Min_Forecast = round(min(Cash), 2),
    Max_Forecast = round(max(Cash), 2)
  ) %>%
  knitr::kable(caption = "May 2010 Forecast Summary by ATM")

May 2010 Forecast Summary by ATM
ATM	N_Days	Mean_Forecast	Min_Forecast	Max_Forecast
ATM1	31	78.27	8.03	104.06
ATM2	31	57.71	5.29	97.66
ATM3	31	0.13	0.00	2.61
ATM4	31	474.04	474.04	474.04

# Export to Excel
write_xlsx(atm_forecasts, "ATM_Forecast_May2010.xlsx")
cat("\n✓ ATM forecasts exported to: ATM_Forecast_May2010.xlsx\n")

## 
## ✓ ATM forecasts exported to: ATM_Forecast_May2010.xlsx

# Display first few rows
head(atm_forecasts, 20) %>%
  knitr::kable(caption = "Sample of ATM Forecasts")

Sample of ATM Forecasts
DATE	ATM	Cash
2010-05-01	ATM1	87.17
2010-05-02	ATM1	104.06
2010-05-03	ATM1	73.15
2010-05-04	ATM1	8.03
2010-05-05	ATM1	100.01
2010-05-06	ATM1	80.86
2010-05-07	ATM1	86.67
2010-05-08	ATM1	87.96
2010-05-09	ATM1	102.75
2010-05-10	ATM1	73.39
2010-05-11	ATM1	8.59
2010-05-12	ATM1	100.65
2010-05-13	ATM1	80.39
2010-05-14	ATM1	86.93
2010-05-15	ATM1	88.03
2010-05-16	ATM1	102.75
2010-05-17	ATM1	73.39
2010-05-18	ATM1	8.59
2010-05-19	ATM1	100.65
2010-05-20	ATM1	80.39

7. Part A Discussion

Model Selection

I applied both Exponential Smoothing (ETS) and ARIMA models to each ATM, allowing automatic selection of optimal specifications. For ATM1 and ATM2, ARIMA models were selected as they provided better accuracy metrics (lower RMSE), captured weekly seasonality effectively, and passed residual diagnostic tests. These consumer-oriented ATMs showed predictable patterns that the models handled well.

ATM3 and ATM4 presented us with different challenges. ATM3 seemed unused for most of the observation period, with about 11 months of zero activity followed by a brief spike. This left limited meaningful data for forecasting. ATM4’s extreme variability and lack of consistent patterns led to selection of a simple ARIMA(0,0,0) with mean specification, basically forecasting the historical average.

Key Findings

ATM1 and ATM2 handle typical consumer banking with withdrawals in the $0-180 range (hundreds of dollars) and clear weekly patterns. Both show a consistent Thursday dip that probably reflects factors like reduced foot traffic or business closures. These predictable cycles make them well-suited for standard time series forecasting.

ATM4 operates on a completely different scale, with transactions reaching $10,000+ (hundreds of dollars)—roughly 100 times larger than the consumer ATMs. The irregular spikes suggest business-driven activity from commercial clients rather than everyday banking patterns. Traditional forecasting based on historical patterns has limited value for this ATM’s data; better predictions would need more data about commercial clients/business uses.

ATM3’s near-zero activity throughout the year suggests operational issues. This requires investigation before any forecasting can be meaningful.

Part B: Residential Power Consumption Forecasting

1. Data Loading and Exploration

# Load power data
power_data <- read_excel("ResidentialCustomerForecastLoad624.xlsx")

str(power_data)

## tibble [192 × 3] (S3: tbl_df/tbl/data.frame)
##  $ CaseSequence: num [1:192] 733 734 735 736 737 738 739 740 741 742 ...
##  $ YYYY-MMM    : chr [1:192] "1998-Jan" "1998-Feb" "1998-Mar" "1998-Apr" ...
##  $ KWH         : num [1:192] 6862583 5838198 5420658 5010364 4665377 ...

head(power_data, 12)

## # A tibble: 12 × 3
##    CaseSequence `YYYY-MMM`     KWH
##           <dbl> <chr>        <dbl>
##  1          733 1998-Jan   6862583
##  2          734 1998-Feb   5838198
##  3          735 1998-Mar   5420658
##  4          736 1998-Apr   5010364
##  5          737 1998-May   4665377
##  6          738 1998-Jun   6467147
##  7          739 1998-Jul   8914755
##  8          740 1998-Aug   8607428
##  9          741 1998-Sep   6989888
## 10          742 1998-Oct   6345620
## 11          743 1998-Nov   4640410
## 12          744 1998-Dec   4693479

summary(power_data)

##   CaseSequence     YYYY-MMM              KWH          
##  Min.   :733.0   Length:192         Min.   :  770523  
##  1st Qu.:780.8   Class :character   1st Qu.: 5429912  
##  Median :828.5   Mode  :character   Median : 6283324  
##  Mean   :828.5                      Mean   : 6502475  
##  3rd Qu.:876.2                      3rd Qu.: 7620524  
##  Max.   :924.0                      Max.   :10655730  
##                                     NA's   :1

# Check for missing values
cat("\nMissing values:\n")

## 
## Missing values:

colSums(is.na(power_data))

## CaseSequence     YYYY-MMM          KWH 
##            0            0            1

2. Data Preparation

There’s only one missing value in this dataset. Linear interpolation works well here too - monthly power consumption follows seasonal patterns and usually doesn’t have sudden spikes, so estimating the missing month based on the values before and after it makes sense.

# Parse date and handle missing values
power_clean <- power_data %>%
  mutate(Date = yearmonth(as.Date(paste0(`YYYY-MMM`, "-01"), format = "%Y-%b-%d"))) %>%
  arrange(Date)

# Impute single missing value using linear interpolation
power_clean <- power_clean %>%
  mutate(
    KWH = approx(x = seq_along(KWH), y = KWH, xout = seq_along(KWH), 
                 method = "linear", rule = 2)$y
  )

# Verify no missing values remain
cat("Missing values after imputation:", sum(is.na(power_clean$KWH)), "\n\n")

## Missing values after imputation: 0

# Convert to tsibble
power_tsibble <- power_clean %>%
  select(Date, KWH) %>%
  as_tsibble(index = Date)

# Summary statistics
power_tsibble %>%
  summarise(
    Start_Date = format(min(Date), "%Y %b"),
    End_Date = format(max(Date), "%Y %b"),
    N_Months = n(),
    Mean_KWH = round(mean(KWH), 0),
    SD_KWH = round(sd(KWH), 0),
    Min_KWH = round(min(KWH), 0),
    Max_KWH = round(max(KWH), 0)
  ) %>%
  knitr::kable(caption = "Power Consumption Summary", format.args = list(big.mark = ","))

Power Consumption Summary
Date	Start_Date	End_Date	N_Months	Mean_KWH	SD_KWH	Min_KWH	Max_KWH
1998 Jan	1998 Jan	1998 Jan	1	6,862,583	NA	6,862,583	6,862,583
1998 Feb	1998 Feb	1998 Feb	1	5,838,198	NA	5,838,198	5,838,198
1998 Mar	1998 Mar	1998 Mar	1	5,420,658	NA	5,420,658	5,420,658
1998 Apr	1998 Apr	1998 Apr	1	5,010,364	NA	5,010,364	5,010,364
1998 May	1998 May	1998 May	1	4,665,377	NA	4,665,377	4,665,377
1998 Jun	1998 Jun	1998 Jun	1	6,467,147	NA	6,467,147	6,467,147
1998 Jul	1998 Jul	1998 Jul	1	8,914,755	NA	8,914,755	8,914,755
1998 Aug	1998 Aug	1998 Aug	1	8,607,428	NA	8,607,428	8,607,428
1998 Sep	1998 Sep	1998 Sep	1	6,989,888	NA	6,989,888	6,989,888
1998 Oct	1998 Oct	1998 Oct	1	6,345,620	NA	6,345,620	6,345,620
1998 Nov	1998 Nov	1998 Nov	1	4,640,410	NA	4,640,410	4,640,410
1998 Dec	1998 Dec	1998 Dec	1	4,693,479	NA	4,693,479	4,693,479
1999 Jan	1999 Jan	1999 Jan	1	7,183,759	NA	7,183,759	7,183,759
1999 Feb	1999 Feb	1999 Feb	1	5,759,262	NA	5,759,262	5,759,262
1999 Mar	1999 Mar	1999 Mar	1	4,847,656	NA	4,847,656	4,847,656
1999 Apr	1999 Apr	1999 Apr	1	5,306,592	NA	5,306,592	5,306,592
1999 May	1999 May	1999 May	1	4,426,794	NA	4,426,794	4,426,794
1999 Jun	1999 Jun	1999 Jun	1	5,500,901	NA	5,500,901	5,500,901
1999 Jul	1999 Jul	1999 Jul	1	7,444,416	NA	7,444,416	7,444,416
1999 Aug	1999 Aug	1999 Aug	1	7,564,391	NA	7,564,391	7,564,391
1999 Sep	1999 Sep	1999 Sep	1	7,899,368	NA	7,899,368	7,899,368
1999 Oct	1999 Oct	1999 Oct	1	5,358,314	NA	5,358,314	5,358,314
1999 Nov	1999 Nov	1999 Nov	1	4,436,269	NA	4,436,269	4,436,269
1999 Dec	1999 Dec	1999 Dec	1	4,419,229	NA	4,419,229	4,419,229
2000 Jan	2000 Jan	2000 Jan	1	7,068,296	NA	7,068,296	7,068,296
2000 Feb	2000 Feb	2000 Feb	1	5,876,083	NA	5,876,083	5,876,083
2000 Mar	2000 Mar	2000 Mar	1	4,807,961	NA	4,807,961	4,807,961
2000 Apr	2000 Apr	2000 Apr	1	4,873,080	NA	4,873,080	4,873,080
2000 May	2000 May	2000 May	1	5,050,891	NA	5,050,891	5,050,891
2000 Jun	2000 Jun	2000 Jun	1	7,092,865	NA	7,092,865	7,092,865
2000 Jul	2000 Jul	2000 Jul	1	6,862,662	NA	6,862,662	6,862,662
2000 Aug	2000 Aug	2000 Aug	1	7,517,830	NA	7,517,830	7,517,830
2000 Sep	2000 Sep	2000 Sep	1	8,912,169	NA	8,912,169	8,912,169
2000 Oct	2000 Oct	2000 Oct	1	5,844,352	NA	5,844,352	5,844,352
2000 Nov	2000 Nov	2000 Nov	1	5,041,769	NA	5,041,769	5,041,769
2000 Dec	2000 Dec	2000 Dec	1	6,220,334	NA	6,220,334	6,220,334
2001 Jan	2001 Jan	2001 Jan	1	7,538,529	NA	7,538,529	7,538,529
2001 Feb	2001 Feb	2001 Feb	1	6,602,448	NA	6,602,448	6,602,448
2001 Mar	2001 Mar	2001 Mar	1	5,779,180	NA	5,779,180	5,779,180
2001 Apr	2001 Apr	2001 Apr	1	4,835,210	NA	4,835,210	4,835,210
2001 May	2001 May	2001 May	1	4,787,904	NA	4,787,904	4,787,904
2001 Jun	2001 Jun	2001 Jun	1	6,283,324	NA	6,283,324	6,283,324
2001 Jul	2001 Jul	2001 Jul	1	7,855,129	NA	7,855,129	7,855,129
2001 Aug	2001 Aug	2001 Aug	1	8,450,717	NA	8,450,717	8,450,717
2001 Sep	2001 Sep	2001 Sep	1	7,112,069	NA	7,112,069	7,112,069
2001 Oct	2001 Oct	2001 Oct	1	5,242,535	NA	5,242,535	5,242,535
2001 Nov	2001 Nov	2001 Nov	1	4,461,979	NA	4,461,979	4,461,979
2001 Dec	2001 Dec	2001 Dec	1	5,240,995	NA	5,240,995	5,240,995
2002 Jan	2002 Jan	2002 Jan	1	7,099,063	NA	7,099,063	7,099,063
2002 Feb	2002 Feb	2002 Feb	1	6,413,429	NA	6,413,429	6,413,429
2002 Mar	2002 Mar	2002 Mar	1	5,839,514	NA	5,839,514	5,839,514
2002 Apr	2002 Apr	2002 Apr	1	5,371,604	NA	5,371,604	5,371,604
2002 May	2002 May	2002 May	1	5,439,166	NA	5,439,166	5,439,166
2002 Jun	2002 Jun	2002 Jun	1	5,850,383	NA	5,850,383	5,850,383
2002 Jul	2002 Jul	2002 Jul	1	7,039,702	NA	7,039,702	7,039,702
2002 Aug	2002 Aug	2002 Aug	1	8,058,748	NA	8,058,748	8,058,748
2002 Sep	2002 Sep	2002 Sep	1	8,245,227	NA	8,245,227	8,245,227
2002 Oct	2002 Oct	2002 Oct	1	5,865,014	NA	5,865,014	5,865,014
2002 Nov	2002 Nov	2002 Nov	1	4,908,979	NA	4,908,979	4,908,979
2002 Dec	2002 Dec	2002 Dec	1	5,779,958	NA	5,779,958	5,779,958
2003 Jan	2003 Jan	2003 Jan	1	7,256,079	NA	7,256,079	7,256,079
2003 Feb	2003 Feb	2003 Feb	1	6,190,517	NA	6,190,517	6,190,517
2003 Mar	2003 Mar	2003 Mar	1	6,120,626	NA	6,120,626	6,120,626
2003 Apr	2003 Apr	2003 Apr	1	4,885,643	NA	4,885,643	4,885,643
2003 May	2003 May	2003 May	1	5,296,096	NA	5,296,096	5,296,096
2003 Jun	2003 Jun	2003 Jun	1	6,051,571	NA	6,051,571	6,051,571
2003 Jul	2003 Jul	2003 Jul	1	6,900,676	NA	6,900,676	6,900,676
2003 Aug	2003 Aug	2003 Aug	1	8,476,499	NA	8,476,499	8,476,499
2003 Sep	2003 Sep	2003 Sep	1	7,791,791	NA	7,791,791	7,791,791
2003 Oct	2003 Oct	2003 Oct	1	5,344,613	NA	5,344,613	5,344,613
2003 Nov	2003 Nov	2003 Nov	1	4,913,707	NA	4,913,707	4,913,707
2003 Dec	2003 Dec	2003 Dec	1	5,756,193	NA	5,756,193	5,756,193
2004 Jan	2004 Jan	2004 Jan	1	7,584,596	NA	7,584,596	7,584,596
2004 Feb	2004 Feb	2004 Feb	1	6,560,742	NA	6,560,742	6,560,742
2004 Mar	2004 Mar	2004 Mar	1	6,526,586	NA	6,526,586	6,526,586
2004 Apr	2004 Apr	2004 Apr	1	4,831,688	NA	4,831,688	4,831,688
2004 May	2004 May	2004 May	1	4,878,262	NA	4,878,262	4,878,262
2004 Jun	2004 Jun	2004 Jun	1	6,421,614	NA	6,421,614	6,421,614
2004 Jul	2004 Jul	2004 Jul	1	7,307,931	NA	7,307,931	7,307,931
2004 Aug	2004 Aug	2004 Aug	1	7,309,774	NA	7,309,774	7,309,774
2004 Sep	2004 Sep	2004 Sep	1	6,690,366	NA	6,690,366	6,690,366
2004 Oct	2004 Oct	2004 Oct	1	5,444,948	NA	5,444,948	5,444,948
2004 Nov	2004 Nov	2004 Nov	1	4,824,940	NA	4,824,940	4,824,940
2004 Dec	2004 Dec	2004 Dec	1	5,791,208	NA	5,791,208	5,791,208
2005 Jan	2005 Jan	2005 Jan	1	8,225,477	NA	8,225,477	8,225,477
2005 Feb	2005 Feb	2005 Feb	1	6,564,338	NA	6,564,338	6,564,338
2005 Mar	2005 Mar	2005 Mar	1	5,581,725	NA	5,581,725	5,581,725
2005 Apr	2005 Apr	2005 Apr	1	5,563,071	NA	5,563,071	5,563,071
2005 May	2005 May	2005 May	1	4,453,983	NA	4,453,983	4,453,983
2005 Jun	2005 Jun	2005 Jun	1	5,900,212	NA	5,900,212	5,900,212
2005 Jul	2005 Jul	2005 Jul	1	8,337,998	NA	8,337,998	8,337,998
2005 Aug	2005 Aug	2005 Aug	1	7,786,659	NA	7,786,659	7,786,659
2005 Sep	2005 Sep	2005 Sep	1	7,057,213	NA	7,057,213	7,057,213
2005 Oct	2005 Oct	2005 Oct	1	6,694,523	NA	6,694,523	6,694,523
2005 Nov	2005 Nov	2005 Nov	1	4,313,019	NA	4,313,019	4,313,019
2005 Dec	2005 Dec	2005 Dec	1	6,181,548	NA	6,181,548	6,181,548
2006 Jan	2006 Jan	2006 Jan	1	7,793,358	NA	7,793,358	7,793,358
2006 Feb	2006 Feb	2006 Feb	1	5,914,945	NA	5,914,945	5,914,945
2006 Mar	2006 Mar	2006 Mar	1	5,819,734	NA	5,819,734	5,819,734
2006 Apr	2006 Apr	2006 Apr	1	5,255,988	NA	5,255,988	5,255,988
2006 May	2006 May	2006 May	1	4,740,588	NA	4,740,588	4,740,588
2006 Jun	2006 Jun	2006 Jun	1	7,052,275	NA	7,052,275	7,052,275
2006 Jul	2006 Jul	2006 Jul	1	7,945,564	NA	7,945,564	7,945,564
2006 Aug	2006 Aug	2006 Aug	1	8,241,110	NA	8,241,110	8,241,110
2006 Sep	2006 Sep	2006 Sep	1	7,296,355	NA	7,296,355	7,296,355
2006 Oct	2006 Oct	2006 Oct	1	5,104,799	NA	5,104,799	5,104,799
2006 Nov	2006 Nov	2006 Nov	1	4,458,429	NA	4,458,429	4,458,429
2006 Dec	2006 Dec	2006 Dec	1	6,226,214	NA	6,226,214	6,226,214
2007 Jan	2007 Jan	2007 Jan	1	8,031,295	NA	8,031,295	8,031,295
2007 Feb	2007 Feb	2007 Feb	1	7,928,337	NA	7,928,337	7,928,337
2007 Mar	2007 Mar	2007 Mar	1	6,443,170	NA	6,443,170	6,443,170
2007 Apr	2007 Apr	2007 Apr	1	4,841,979	NA	4,841,979	4,841,979
2007 May	2007 May	2007 May	1	4,862,847	NA	4,862,847	4,862,847
2007 Jun	2007 Jun	2007 Jun	1	5,022,647	NA	5,022,647	5,022,647
2007 Jul	2007 Jul	2007 Jul	1	6,426,220	NA	6,426,220	6,426,220
2007 Aug	2007 Aug	2007 Aug	1	7,447,146	NA	7,447,146	7,447,146
2007 Sep	2007 Sep	2007 Sep	1	7,666,970	NA	7,666,970	7,666,970
2007 Oct	2007 Oct	2007 Oct	1	5,785,964	NA	5,785,964	5,785,964
2007 Nov	2007 Nov	2007 Nov	1	4,907,057	NA	4,907,057	4,907,057
2007 Dec	2007 Dec	2007 Dec	1	6,047,292	NA	6,047,292	6,047,292
2008 Jan	2008 Jan	2008 Jan	1	7,964,293	NA	7,964,293	7,964,293
2008 Feb	2008 Feb	2008 Feb	1	7,597,060	NA	7,597,060	7,597,060
2008 Mar	2008 Mar	2008 Mar	1	6,085,644	NA	6,085,644	6,085,644
2008 Apr	2008 Apr	2008 Apr	1	5,352,359	NA	5,352,359	5,352,359
2008 May	2008 May	2008 May	1	4,608,528	NA	4,608,528	4,608,528
2008 Jun	2008 Jun	2008 Jun	1	6,548,439	NA	6,548,439	6,548,439
2008 Jul	2008 Jul	2008 Jul	1	7,643,987	NA	7,643,987	7,643,987
2008 Aug	2008 Aug	2008 Aug	1	8,037,137	NA	8,037,137	8,037,137
2008 Sep	2008 Sep	2008 Sep	1	6,569,470	NA	6,569,470	6,569,470
2008 Oct	2008 Oct	2008 Oct	1	5,101,803	NA	5,101,803	5,101,803
2008 Nov	2008 Nov	2008 Nov	1	4,555,602	NA	4,555,602	4,555,602
2008 Dec	2008 Dec	2008 Dec	1	6,442,746	NA	6,442,746	6,442,746
2009 Jan	2009 Jan	2009 Jan	1	8,072,330	NA	8,072,330	8,072,330
2009 Feb	2009 Feb	2009 Feb	1	6,976,800	NA	6,976,800	6,976,800
2009 Mar	2009 Mar	2009 Mar	1	5,691,452	NA	5,691,452	5,691,452
2009 Apr	2009 Apr	2009 Apr	1	5,531,616	NA	5,531,616	5,531,616
2009 May	2009 May	2009 May	1	5,264,439	NA	5,264,439	5,264,439
2009 Jun	2009 Jun	2009 Jun	1	5,804,433	NA	5,804,433	5,804,433
2009 Jul	2009 Jul	2009 Jul	1	7,713,260	NA	7,713,260	7,713,260
2009 Aug	2009 Aug	2009 Aug	1	8,350,517	NA	8,350,517	8,350,517
2009 Sep	2009 Sep	2009 Sep	1	7,583,146	NA	7,583,146	7,583,146
2009 Oct	2009 Oct	2009 Oct	1	5,566,075	NA	5,566,075	5,566,075
2009 Nov	2009 Nov	2009 Nov	1	5,339,890	NA	5,339,890	5,339,890
2009 Dec	2009 Dec	2009 Dec	1	7,089,880	NA	7,089,880	7,089,880
2010 Jan	2010 Jan	2010 Jan	1	9,397,357	NA	9,397,357	9,397,357
2010 Feb	2010 Feb	2010 Feb	1	8,390,677	NA	8,390,677	8,390,677
2010 Mar	2010 Mar	2010 Mar	1	7,347,915	NA	7,347,915	7,347,915
2010 Apr	2010 Apr	2010 Apr	1	5,776,131	NA	5,776,131	5,776,131
2010 May	2010 May	2010 May	1	4,919,289	NA	4,919,289	4,919,289
2010 Jun	2010 Jun	2010 Jun	1	6,696,292	NA	6,696,292	6,696,292
2010 Jul	2010 Jul	2010 Jul	1	770,523	NA	770,523	770,523
2010 Aug	2010 Aug	2010 Aug	1	7,922,701	NA	7,922,701	7,922,701
2010 Sep	2010 Sep	2010 Sep	1	7,819,472	NA	7,819,472	7,819,472
2010 Oct	2010 Oct	2010 Oct	1	5,875,917	NA	5,875,917	5,875,917
2010 Nov	2010 Nov	2010 Nov	1	4,800,733	NA	4,800,733	4,800,733
2010 Dec	2010 Dec	2010 Dec	1	6,152,583	NA	6,152,583	6,152,583
2011 Jan	2011 Jan	2011 Jan	1	8,394,747	NA	8,394,747	8,394,747
2011 Feb	2011 Feb	2011 Feb	1	8,898,062	NA	8,898,062	8,898,062
2011 Mar	2011 Mar	2011 Mar	1	6,356,903	NA	6,356,903	6,356,903
2011 Apr	2011 Apr	2011 Apr	1	5,685,227	NA	5,685,227	5,685,227
2011 May	2011 May	2011 May	1	5,506,308	NA	5,506,308	5,506,308
2011 Jun	2011 Jun	2011 Jun	1	8,037,779	NA	8,037,779	8,037,779
2011 Jul	2011 Jul	2011 Jul	1	10,093,343	NA	10,093,343	10,093,343
2011 Aug	2011 Aug	2011 Aug	1	10,308,076	NA	10,308,076	10,308,076
2011 Sep	2011 Sep	2011 Sep	1	8,943,599	NA	8,943,599	8,943,599
2011 Oct	2011 Oct	2011 Oct	1	5,603,920	NA	5,603,920	5,603,920
2011 Nov	2011 Nov	2011 Nov	1	6,154,138	NA	6,154,138	6,154,138
2011 Dec	2011 Dec	2011 Dec	1	8,273,142	NA	8,273,142	8,273,142
2012 Jan	2012 Jan	2012 Jan	1	8,991,267	NA	8,991,267	8,991,267
2012 Feb	2012 Feb	2012 Feb	1	7,952,204	NA	7,952,204	7,952,204
2012 Mar	2012 Mar	2012 Mar	1	6,356,961	NA	6,356,961	6,356,961
2012 Apr	2012 Apr	2012 Apr	1	5,569,828	NA	5,569,828	5,569,828
2012 May	2012 May	2012 May	1	5,783,598	NA	5,783,598	5,783,598
2012 Jun	2012 Jun	2012 Jun	1	7,926,956	NA	7,926,956	7,926,956
2012 Jul	2012 Jul	2012 Jul	1	8,886,851	NA	8,886,851	8,886,851
2012 Aug	2012 Aug	2012 Aug	1	9,612,423	NA	9,612,423	9,612,423
2012 Sep	2012 Sep	2012 Sep	1	7,559,148	NA	7,559,148	7,559,148
2012 Oct	2012 Oct	2012 Oct	1	5,576,852	NA	5,576,852	5,576,852
2012 Nov	2012 Nov	2012 Nov	1	5,731,899	NA	5,731,899	5,731,899
2012 Dec	2012 Dec	2012 Dec	1	6,609,694	NA	6,609,694	6,609,694
2013 Jan	2013 Jan	2013 Jan	1	10,655,730	NA	10,655,730	10,655,730
2013 Feb	2013 Feb	2013 Feb	1	7,681,798	NA	7,681,798	7,681,798
2013 Mar	2013 Mar	2013 Mar	1	6,517,514	NA	6,517,514	6,517,514
2013 Apr	2013 Apr	2013 Apr	1	6,105,359	NA	6,105,359	6,105,359
2013 May	2013 May	2013 May	1	5,940,475	NA	5,940,475	5,940,475
2013 Jun	2013 Jun	2013 Jun	1	7,920,627	NA	7,920,627	7,920,627
2013 Jul	2013 Jul	2013 Jul	1	8,415,321	NA	8,415,321	8,415,321
2013 Aug	2013 Aug	2013 Aug	1	9,080,226	NA	9,080,226	9,080,226
2013 Sep	2013 Sep	2013 Sep	1	7,968,220	NA	7,968,220	7,968,220
2013 Oct	2013 Oct	2013 Oct	1	5,759,367	NA	5,759,367	5,759,367
2013 Nov	2013 Nov	2013 Nov	1	5,769,083	NA	5,769,083	5,769,083
2013 Dec	2013 Dec	2013 Dec	1	9,606,304	NA	9,606,304	9,606,304

3. Time Series Visualization

# Time series plot
power_tsibble %>%
  autoplot(KWH) +
  labs(
    title = "Residential Power Consumption (1998-2013)",
    x = "Date",
    y = "Power Consumption (KWH)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

# Seasonal plot
power_tsibble %>%
  gg_season(KWH, labels = "both") +
  labs(
    title = "Seasonal Plot - Power Consumption",
    x = "Month",
    y = "KWH"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

# Subseries plot
power_tsibble %>%
  gg_subseries(KWH) +
  labs(
    title = "Seasonal Subseries Plot",
    subtitle = "Blue line shows the mean for each month",
    x = "Year",
    y = "KWH"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 8)  # Angle the labels
  ) +
  scale_y_continuous(labels = scales::comma)

The residential power consumption data from 1998-2013 shows a clear and consistent seasonal pattern. The time series plot reveals regular oscillations throughout the year, with consumption ranging from about 4-5 million KWH in the spring/fall months up to 8-10 million KWH during peak summer and winter periods. There’s a notable anomaly in July 2010 where consumption dropped dramatically to around 770,000 KWH. This appears to be a data recording error or system outage rather than actual consumption, since it’s completely out of line with every other July in the dataset.

The seasonal plot shows that every year follows a similar pattern. Summer months (June-August) consistently show the highest consumption, peaking in July and August when air conditioning demand drives usage up. There’s a secondary peak in winter (December-January) from heating needs, though it’s typically lower than the summer peak. Spring and Fall show the lowest consumption, with April and May being particularly low, as temperatures are mild and central air/heat is not as necessary. The more recent years (2010-2013) in the dataset show slightly higher peaks than previous years, suggesting population growth, increased housing built, and/or changing consumption patterns for other factors. Sometimes power companies raise their prices as well, that could play a role.

The subseries plot breaks down each month across all years, making it easy to spot trends within specific months. August shows the highest average consumption, with January close behind—both averaging around 8.5 million KWH. Both months also display upward trends over time, with August showing considerable year-to-year variability, while January shows a steady increase in recent winters. July also shows high consumption with substantial variability. The spring months (March-May) remain relatively stable across the entire period. The July 2010 outlier is particularly noticeable: it drops dramatically to around 2 million KWH.

4. Time Series Decomposition

# STL decomposition
power_decomp <- power_tsibble %>%
  model(
    stl = STL(KWH ~ season(window = "periodic"))
  ) %>%
  components()

# Plot decomposition
power_decomp %>%
  autoplot() +
  labs(title = "STL Decomposition of Power Consumption") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Calculate seasonal strength
power_features <- power_tsibble %>%
  features(KWH, feat_stl)

cat("Seasonal Strength:", round(power_features$seasonal_strength_year, 3), "\n")

## Seasonal Strength: 0.757

cat("Trend Strength:", round(power_features$trend_strength, 3), "\n")

## Trend Strength: 0.394

cat("Interpretation: Strong seasonal pattern present\n")

## Interpretation: Strong seasonal pattern present

The STL decomposition breaks the power consumption into three components: trend, seasonal pattern, and remainder. The trend component shows a gradual increase from around 6 million KWH in the late 1990s to about 7.5 million KWH by 2013. The increase isn’t linear though. There’s a relatively flat period through most of the 2000s, then a noticeable uptick starting around 2010. This could potentially reflect economic recovery after the 2008 recession.

The seasonal component reveals a consistent pattern that repeats year after year. With a seasonal strength of 0.757, seasonality accounts for approximately 76% of the variation in the data, quite a strong influence. The trend strength of 0.394 is more moderate, indicating that while a long-term trend exists, it’s not as dominant as the seasonal pattern.

The remainder component shows the irregular fluctuations that aren’t captured by trend or seasonality. For most of the series, the remainders are relatively small and randomly scattered around zero, which is exactly what we want to see. This tells us the trend and seasonal components are doing a good job explaining the data. The massive negative spike in July 2010 dominates the remainder plot. This suggests it’s an outlier that doesn’t fit the normal patterns. Besides that one observation, the remainders look okay, with no obvious patterns or clustering.

5. Model Building and Forecasting

5.1 Benchmark Methods

# Fit benchmark models
power_bench <- power_tsibble %>%
  model(
    naive = NAIVE(KWH),
    snaive = SNAIVE(KWH),
    drift = RW(KWH ~ drift())
  )

# Generate forecasts
power_bench_fc <- power_bench %>%
  forecast(h = 12)

# Plot benchmark methods
power_bench_fc %>%
  autoplot(power_tsibble, level = NULL) +
  labs(
    title = "Benchmark Forecasting Methods",
    x = "Year",
    y = "KWH"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

5.2 Exponential Smoothing

# Fit ETS model
power_ets <- power_tsibble %>%
  model(
    ets = ETS(KWH)
  )

cat("Selected ETS Model:\n")

## Selected ETS Model:

report(power_ets)

## Series: KWH 
## Model: ETS(M,N,M) 
##   Smoothing parameters:
##     alpha = 0.1137195 
##     gamma = 0.0001001086 
## 
##   Initial states:
##     l[0]      s[0]     s[-1]     s[-2]    s[-3]    s[-4]    s[-5]     s[-6]
##  6134773 0.9547463 0.7533319 0.8602685 1.191176 1.255577 1.199206 0.9982427
##      s[-7]     s[-8]     s[-9]   s[-10]   s[-11]
##  0.7683657 0.8160779 0.9174797 1.061846 1.223683
## 
##   sigma^2:  0.0144
## 
##      AIC     AICc      BIC 
## 6224.057 6226.784 6272.919

# Generate forecast
power_ets_fc <- power_ets %>%
  forecast(h = 12)

# Plot
power_ets_fc %>%
  autoplot(power_tsibble) +
  labs(
    title = "2014 Power Forecast - ETS Model",
    x = "Year",
    y = "KWH"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

# Check residuals
power_ets %>%
  gg_tsresiduals()

The ETS model automatically selected ETS(M,N,M), which stands for multiplicative errors, no trend, and multiplicative seasonality. This makes sense given the data shows strong seasonal swings but a relatively weak long-term trend. The model uses a low alpha (0.114) for the level, meaning it changes slowly and smooths out short-term fluctuations. The gamma parameter for seasonality is extremely small (0.0001), indicating that the seasonal pattern is very stable and doesn’t need much updating from year to year.

The residuals show that the model accurately captures most of the systematic patterns in the data. The innovation residuals are mostly randomly scattered around zero. The ACF plot shows good behavior at most lags, with nearly all autocorrelations falling within the confidence bounds. The histogram of residuals shows approximate normality with some deviation in the tails, which means the prediction intervals might be slightly optimistic but the model is generally solid for this dataset.

5.3 ARIMA Model

# Fit ARIMA model
power_arima <- power_tsibble %>%
  model(
    arima = ARIMA(KWH)
  )

cat("Selected ARIMA Model Below:\n")

## Selected ARIMA Model Below:

report(power_arima)

## Series: KWH 
## Model: ARIMA(0,0,1)(1,1,1)[12] w/ drift 
## 
## Coefficients:
##          ma1     sar1     sma1   constant
##       0.2439  -0.1518  -0.7311  105259.65
## s.e.  0.0723   0.0963   0.0821   27000.49
## 
## sigma^2 estimated as 7.466e+11:  log likelihood=-2719.89
## AIC=5449.79   AICc=5450.13   BIC=5465.75

# Generate forecast
power_arima_fc <- power_arima %>%
  forecast(h = 12)

# Plot
power_arima_fc %>%
  autoplot(power_tsibble) +
  labs(
    title = "2014 Power Forecast - ARIMA Model",
    x = "Year",
    y = "KWH"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

# Check residuals
power_arima %>%
  gg_tsresiduals()

The ARIMA model specification chosen by automatic selection is ARIMA(0,0,1)(1,1,2)[12] with drift. The seasonal differencing (the middle “1” in the seasonal part) handles the growing trend by looking at year-over-year changes rather than raw levels. The seasonal MA terms capture the repeating annual pattern. The drift term adds that gradual upward trend we see in the decomposition. This combination of features allows the model to adapt to both the long-term growth and the strong seasonal cycles.

The residuals look solid overall. They scatter randomly around zero without any obvious patterns, which suggests the model has extracted the systematic information from the data. The ACF plot is particularly reassuring. Nearly all the autocorrelation lags fall within the confidence bounds, with only a couple of minor exceptions. This indicates the residuals are essentially white noise with no leftover structure that the model missed.

The histogram of residuals is roughly bell-shaped and centered near zero, though it shows some slight departures from perfect normality. There’s a bit of a tail on the positive side, and the distribution looks slightly wider than a normal curve would be. This isn’t a major concern for forecasting purposes, but it does mean the prediction intervals might be slightly optimistic. The extreme outlier from July 2010 appears in the residual plot as a massive negative spike, but it’s an isolated incident that doesn’t affect the overall model quality.

5.4 Model Comparison

# Combine all models
power_all_models <- power_tsibble %>%
  model(
    snaive = SNAIVE(KWH),
    ets = ETS(KWH),
    arima = ARIMA(KWH)
  )

# Compare accuracy metrics
cat("Model Accuracy Comparison:\n\n")

## Model Accuracy Comparison:

accuracy(power_all_models) %>%
  arrange(RMSE) %>%
  select(.model, RMSE, MAE, MAPE) %>%
  knitr::kable(digits = 2, caption = "Model Performance Comparison")

Model Performance Comparison
.model	RMSE	MAE	MAPE
arima	827254.2	493305.9	11.68
ets	835107.0	503972.8	12.04
snaive	1178104.4	696708.7	14.62

# Generate forecasts for all models
power_all_fc <- power_all_models %>%
  forecast(h = 12)

# Plot comparison
power_all_fc %>%
  autoplot(power_tsibble, level = NULL) +
  labs(
    title = "2014 Forecast Comparison",
    x = "Year",
    y = "KWH"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = scales::comma)

The model comparison reveals that ARIMA significantly outperforms the benchmark methods. The ARIMA model achieves an RMSE of 827,254 KWH and a MAPE of 11.68%, making it the most accurate of the three approaches tested. The ETS model comes in a close second with an RMSE of 835,107 KWH and MAPE of 12.04%. These two sophisticated models perform similarly because they’re both capturing the seasonal dynamics effectively, just through slightly different methods.

The seasonal naive model, which simply uses last year’s value for each month as the forecast, performs noticeably worse with an RMSE of 1,178,104 KWH and MAPE of 14.62%. This makes sense given that there’s an upward trend in the data. Just repeating last year’s values will systematically under-forecast as consumption grows. The fact that seasonal naive still performs somewhat well shows how strong and consistent the seasonal pattern is. Even this simple approach captures much of the variation.

The forecast comparison plot shows all three models projected into 2014. The ARIMA (red) and ETS (green) forecasts are nearly identical and closely overlap, both continuing the seasonal pattern while trending slightly upward. The seasonal naive forecast (blue) also captures the seasonal pattern but sits noticeably higher than the other two models, particularly in the later months of 2014. This occurs because the seasonal naive method simply repeats the values from 2013, which included some exceptionally high consumption months, without adjusting for trend. Unlike ARIMA and ETS models which can incorporate damping to gradually flatten trends over longer forecast horizons (making predictions more conservative as uncertainty increases), the seasonal naive approach mechanically repeats past patterns without any such adjustment, which can lead to overly optimistic or pessimistic forecasts depending on the recent data.

6. Export Part B Forecasts

# Extract ARIMA forecasts
power_forecast <- power_arima_fc %>%
  hilo(level = c(80, 95)) %>%
  unpack_hilo(c("80%", "95%")) %>%
  as_tibble() %>%
  mutate(
    Date = as.Date(Date),
    `YYYY-MMM` = format(Date, "%Y-%b"),
    KWH_Forecast = round(.mean, 0),
    Lower_80 = round(`80%_lower`, 0),
    Upper_80 = round(`80%_upper`, 0),
    Lower_95 = round(`95%_lower`, 0),
    Upper_95 = round(`95%_upper`, 0)
  ) %>%
  select(Date, `YYYY-MMM`, KWH_Forecast, Lower_80, Upper_80, Lower_95, Upper_95)

# Display forecast
power_forecast %>%
  knitr::kable(
    caption = "2014 Monthly Power Consumption Forecast (ARIMA)",
    format.args = list(big.mark = ",")
  )

2014 Monthly Power Consumption Forecast (ARIMA)
Date	YYYY-MMM	KWH_Forecast	Lower_80	Upper_80	Lower_95	Upper_95
2014-01-01	2014-Jan	9,714,361	8,607,037	10,821,686	8,020,854	11,407,868
2014-02-01	2014-Feb	8,181,495	7,041,708	9,321,281	6,438,342	9,924,647
2014-03-01	2014-Mar	6,737,578	5,597,792	7,877,365	4,994,426	8,480,731
2014-04-01	2014-Apr	5,957,336	4,817,549	7,097,122	4,214,183	7,700,488
2014-05-01	2014-May	5,727,983	4,588,197	6,867,770	3,984,831	7,471,136
2014-06-01	2014-Jun	7,534,009	6,394,222	8,673,795	5,790,856	9,277,162
2014-07-01	2014-Jul	7,921,051	6,781,265	9,060,837	6,177,898	9,664,204
2014-08-01	2014-Aug	9,292,410	8,152,624	10,432,196	7,549,257	11,035,563
2014-09-01	2014-Sep	8,182,948	7,043,162	9,322,734	6,439,795	9,926,101
2014-10-01	2014-Oct	6,015,721	4,875,934	7,155,507	4,272,568	7,758,873
2014-11-01	2014-Nov	5,769,225	4,629,439	6,909,011	4,026,072	7,512,378
2014-12-01	2014-Dec	7,484,127	6,344,341	8,623,913	5,740,975	9,227,279

# Export to Excel
write_xlsx(power_forecast, "Power_Forecast_2014.xlsx")
cat("\n✓ Power forecasts exported to: Power_Forecast_2014.xlsx\n")

## 
## ✓ Power forecasts exported to: Power_Forecast_2014.xlsx

The 2014 ARIMA forecast continues the established seasonal pattern while incorporating the gradual upward trend. The model predicts January 2014 around 9.7 million KWH, which is lower than January 2013’s exceptionally high 10.7 million KWH but still elevated compared to earlier years, suggesting the model is applying some dampening to extreme recent values. Summer peaks in August reach approximately 9.3 million KWH, consistent with recent summer consumption patterns.

The spring months show the expected seasonal dip, with April and May forecasted around 5.7-5.9 million KWH. Fall and winter months follow the typical seasonal pattern, with December forecasted around 7.5 million KWH. The forecasts generally trend upward from historical averages while avoiding simply extrapolating the most extreme recent values.

The confidence intervals provide important context for these point forecasts. The 80% intervals range from about ±1 million KWH around the forecasts in early months, widening slightly as we move further into 2014. The 95% intervals span roughly ±1.5 million KWH. These intervals reflect the model’s uncertainty and remind us that while the seasonal pattern is reliable, weather extremes, economic changes, or other factors could easily push actual consumption outside the predicted range for any given month.

7. Part B Discussion

Model Selection

I compared three forecasting approaches: benchmark methods (seasonal naive), ETS (automatically selected model with seasonal components), and ARIMA (selected model to handle both trend and seasonality). The ARIMA model was chosen for the final forecast because it achieved the lowest RMSE, passed residual diagnostic tests with residuals approximating white noise, and successfully captured both the seasonal pattern and underlying trend.

Key Findings

The data exhibits strong seasonality with clear peaks in summer (July-August) for cooling and winter (December-January) for heating. The overall consumption trend is gradually increasing across the 16-year period, though it remained relatively stable through most of the 2000s before picking up around 2010. The ARIMA model successfully captured these seasonal dynamics while accounting for the upward trend. The confidence intervals provide realistic uncertainty bounds that reflect inherent variability in power consumption.

Forecast Interpretation

The 2014 forecasts show expected summer peaks around 9.3 million KWH, winter peaks in January near 9.7 million KWH, and lower consumption in spring and fall around 5.7-6.0 million KWH. These patterns are consistent with historical data but adjusted upward based on recent trends. Time series forecasting works particularly well for residential power consumption because the seasonal patterns are strong and driven by predictable factors like weather and human behavior. However, these forecasts assume conditions similar to recent history. Unusual weather patterns, economic shifts, or changes in energy efficiency could cause actual consumption to diverge from our predictions.

Applied Time Series Analysis: Forecasting ATM Transactions and Energy Demand

Tai Chou-Kudu

10-24-2025

Introduction

Part A: ATM Cash Withdrawal Forecasting

1. Data Loading and Exploration

Missing Values Analysis

2. Data Cleaning

3. Time Series Visualization

4. Time Series Decomposition

5. Model Building and Forecasting

5.1 ATM1 Forecasting

5.2 ATM2 Forecasting

5.3 ATM3 Forecasting

5.4 ATM4 Forecasting

6. Export Part A Forecasts

7. Part A Discussion

Model Selection

Key Findings

Part B: Residential Power Consumption Forecasting

1. Data Loading and Exploration

2. Data Preparation

3. Time Series Visualization

4. Time Series Decomposition

5. Model Building and Forecasting

5.1 Benchmark Methods

5.2 Exponential Smoothing

5.3 ARIMA Model

5.4 Model Comparison

6. Export Part B Forecasts

7. Part B Discussion

Model Selection

Key Findings

Forecast Interpretation