Overview

This project has three parts. Part A forecasts daily ATM cash withdrawals for May 2010. Part B forecasts monthly residential electricity usage for 2014. Part C is the bonus section and forecasts one week of hourly waterflow.

Load Packages

library(readxl)
library(dplyr)
library(tidyr)
library(lubridate)
library(forecast)
library(writexl)
library(zoo)
library(ggplot2)
library(knitr)

Load Data

# read the four Excel files from the same folder as the R file
atm_raw <- read_excel("ATM624Data.xlsx")

power_raw <- read_excel("ResidentialCustomerForecastLoad-624.xlsx")

pipe1_raw <- read_excel("Waterflow_Pipe1.xlsx")

pipe2_raw <- read_excel("Waterflow_Pipe2.xlsx")

# convert the Excel serial date into a proper R date
atm_raw <- atm_raw %>%
  mutate(DATE = as.Date(DATE, origin = "1899-12-30"))

# convert the Excel numeric date time values into a proper R date time value
pipe1_raw <- pipe1_raw %>%
  mutate(`Date Time` = as.POSIXct(`Date Time` * 86400,
                                  origin = "1899-12-30",
                                  tz = "UTC")) %>%
  mutate(`Date Time` = lubridate::floor_date(`Date Time`, unit = "minute"))

pipe2_raw <- pipe2_raw %>%
  mutate(`Date Time` = as.POSIXct(`Date Time` * 86400,
                                  origin = "1899-12-30",
                                  tz = "UTC")) %>%
  mutate(`Date Time` = lubridate::floor_date(`Date Time`, unit = "minute"))

# quick preview for each dataset
head(atm_raw)

## # A tibble: 6 × 3
##   DATE       ATM    Cash
##   <date>     <chr> <dbl>
## 1 2009-05-01 ATM1     96
## 2 2009-05-01 ATM2    107
## 3 2009-05-02 ATM1     82
## 4 2009-05-02 ATM2     89
## 5 2009-05-03 ATM1     85
## 6 2009-05-03 ATM2     90

head(power_raw)

## # A tibble: 6 × 3
##   CaseSequence `YYYY-MMM`     KWH
##          <dbl> <chr>        <dbl>
## 1          733 1998-Jan   6862583
## 2          734 1998-Feb   5838198
## 3          735 1998-Mar   5420658
## 4          736 1998-Apr   5010364
## 5          737 1998-May   4665377
## 6          738 1998-Jun   6467147

head(pipe1_raw)

## # A tibble: 6 × 2
##   `Date Time`         WaterFlow
##   <dttm>                  <dbl>
## 1 2015-10-23 00:24:00     23.4 
## 2 2015-10-23 00:40:00     28.0 
## 3 2015-10-23 00:53:00     23.1 
## 4 2015-10-23 00:55:00     30.0 
## 5 2015-10-23 01:19:00      6.00
## 6 2015-10-23 01:23:00     15.9

head(pipe2_raw)

## # A tibble: 6 × 2
##   `Date Time`         WaterFlow
##   <dttm>                  <dbl>
## 1 2015-10-23 01:00:00      18.8
## 2 2015-10-23 01:59:00      43.1
## 3 2015-10-23 03:00:00      38.0
## 4 2015-10-23 04:00:00      36.1
## 5 2015-10-23 04:59:00      31.9
## 6 2015-10-23 06:00:00      28.2

Part A

The goal in Part A is to forecast how much cash will be taken out of four different ATM machines for May 2010. The cash variable is recorded in hundreds of dollars.

I treated each ATM as a separate time series because the four machines do not behave the same way. ATM1, ATM2, and ATM4 show short term day of week behavior. ATM3 is very unusual because it is inactive for almost the whole data set and only becomes active very late in the series. Because of that, I did not force one single model onto all four machines.

Clean & Explore The ATM Data

# keep only rows with a valid ATM label and cash value 
atm_data <- atm_raw %>%
  filter(!is.na(ATM), !is.na(Cash)) %>%
  mutate(DATE = as.Date(DATE)) %>%
  arrange(ATM, DATE)

# check the date range and ATM labels
range(atm_data$DATE)

## [1] "2009-05-01" "2010-04-30"

unique(atm_data$ATM)

## [1] "ATM1" "ATM2" "ATM3" "ATM4"

# count the number of rows for each ATM
atm_data %>%
  count(ATM)

## # A tibble: 4 × 2
##   ATM       n
##   <chr> <int>
## 1 ATM1    362
## 2 ATM2    363
## 3 ATM3    365
## 4 ATM4    365

The ATM data runs from May 1, 2009 through April 30, 2010. There are four machines in the file, and each one shows a different pattern of activity.

# plot each ATM separately to compare level, volatility, and unusual behavior
atm_data %>%
  ggplot(aes(x = DATE, y = Cash)) +
  geom_line() +
  facet_wrap(~ ATM, scales = "free_y", ncol = 2) +
  labs(
    title = "ATM daily cash withdrawls",
    x = "Date",
    y = "Cash in hundreds of dollars"
  )

The time plot shows that ATM1, ATM2, and ATM4 each have their own level and pattern. ATM4 has much larger values and more volatility than the other machines. ATM3 is almost all zeros for most of the history, so it behaves more like a newly activated machine than a stable series. ATM4 also contains one very large spike, so I wanted a method that would be less sensitive to extreme values.

Modeling Approach For Part A

For ATM1, ATM2, and ATM4, I used a recent weekday median forecast. This was a practical business style approach because ATM usage often depends on the day of the week, and a recent window keeps the forecast responsive to the most current behavior. I used the median instead of the mean because it is less affected by large unusual withdrawals, especially for ATM4.

For ATM3, I used the average of the last 3 active days. I chose this because the machine had almost no normal operating history. A formal time series model would look more technical, but it would not be very trustworthy when the series is mostly zeros and only turns on at the end.

I did not use one single model for all four ATMs because the series behaved very differently. I also did not use a standard time series model for ATM3 because there was not enough stable history to support it.

Build The ATM Forecasts

# helper function for a weekday median forecast using a recent observations for each ATM
weekday_median_forecast <- function(data, atm_name, future_dates, window_days) {
  recent_data <- data %>%
    filter(ATM == atm_name) %>%
    arrange(DATE) %>%
    tail(window_days) %>%
    mutate(weekday = wday(DATE, week_start = 1, label = TRUE))

  weekday_profile <- recent_data %>%
    group_by(weekday) %>%
    summarise(forecast = median(Cash, na.rm = TRUE), .groups = "drop")

  tibble(Date = future_dates) %>%
    mutate(weekday = wday(Date, week_start = 1, label = TRUE)) %>%
    left_join(weekday_profile, by = "weekday") %>%
    select(Date, forecast)
}

# create the May 2010 forecast dates
may_dates <- seq(as.Date("2010-05-01"), as.Date("2010-05-31"), by = "day")

# ATM1 and ATM2 use the last 56 days
atm1_fc <- weekday_median_forecast(atm_data, "ATM1", may_dates, 56) %>%
  mutate(ATM = "ATM1", Model = "Weekday median of last 56 days")

atm2_fc <- weekday_median_forecast(atm_data, "ATM2", may_dates, 56) %>%
  mutate(ATM = "ATM2", Model = "Weekday median of last 56 days")

# ATM3 uses the mean of the last 3 active days
atm3_level <- atm_data %>%
  filter(ATM == "ATM3", Cash > 0) %>%
  arrange(DATE) %>%
  tail(3) %>%
  summarise(level = mean(Cash, na.rm = TRUE)) %>%
  pull(level)

atm3_fc <- tibble(
  Date = may_dates,
  forecast = atm3_level,
  ATM = "ATM3",
  Model = "Average of last 3 active days"
)

# ATM4 uses a shorter recent window because the series is more volatile
atm4_fc <- weekday_median_forecast(atm_data, "ATM4", may_dates, 28) %>%
  mutate(ATM = "ATM4", Model = "Weekday median of last 28 days")

# combine the four ATM forecasts into one table
atm_final <- bind_rows(atm1_fc, atm2_fc, atm3_fc, atm4_fc) %>%
  rename(Forecast_Cash_Hundreds = forecast) %>%
  select(Date, ATM, Model, Forecast_Cash_Hundreds) %>%
  arrange(Date, ATM)

atm_final %>%
  head(12)

## # A tibble: 12 × 4
##    Date       ATM   Model                          Forecast_Cash_Hundreds
##    <date>     <chr> <chr>                                           <dbl>
##  1 2010-05-01 ATM1  Weekday median of last 56 days                   87.5
##  2 2010-05-01 ATM2  Weekday median of last 56 days                   71  
##  3 2010-05-01 ATM3  Average of last 3 active days                    87.7
##  4 2010-05-01 ATM4  Weekday median of last 28 days                  268. 
##  5 2010-05-02 ATM1  Weekday median of last 56 days                  110. 
##  6 2010-05-02 ATM2  Weekday median of last 56 days                   74.5
##  7 2010-05-02 ATM3  Average of last 3 active days                    87.7
##  8 2010-05-02 ATM4  Weekday median of last 28 days                  412. 
##  9 2010-05-03 ATM1  Weekday median of last 56 days                   68  
## 10 2010-05-03 ATM2  Weekday median of last 56 days                    9  
## 11 2010-05-03 ATM3  Average of last 3 active days                    87.7
## 12 2010-05-03 ATM4  Weekday median of last 28 days                  437.

Part A Forecast Table

The table below shows the first three forecast days for each ATM. The full May forecast will be exported to Excel at the end of the report.

atm_final %>%
  filter(Date <= as.Date("2010-05-03")) %>%
  kable(digits = 2, caption = "Part A forecast sample")

Part A forecast sample
Date	ATM	Model	Forecast_Cash_Hundreds
2010-05-01	ATM1	Weekday median of last 56 days	87.50
2010-05-01	ATM2	Weekday median of last 56 days	71.00
2010-05-01	ATM3	Average of last 3 active days	87.67
2010-05-01	ATM4	Weekday median of last 28 days	268.04
2010-05-02	ATM1	Weekday median of last 56 days	110.50
2010-05-02	ATM2	Weekday median of last 56 days	74.50
2010-05-02	ATM3	Average of last 3 active days	87.67
2010-05-02	ATM4	Weekday median of last 28 days	411.70
2010-05-03	ATM1	Weekday median of last 56 days	68.00
2010-05-03	ATM2	Weekday median of last 56 days	9.00
2010-05-03	ATM3	Average of last 3 active days	87.67
2010-05-03	ATM4	Weekday median of last 28 days	437.13

For Part A, I forecasted each ATM separately because the four machines showed very different behavior. ATM1, ATM2, and ATM4 showed clear day of week structure, so I used a recent weekday median forecast for those machines. This approach was simple, practical, and matched the short term business pattern in the data. It was also less affected by unusual spikes than a mean based forecast, which was especially important for ATM4.

ATM3 was different because it was inactive for almost the whole series and only turned on near the end. Since there was not enough stable operating history for a standard time series model, I used the average of the last 3 active days as a start up style forecast for May 2010. I did not use the same modeling approach for all four ATMs because ATM3 did not have enough normal historical behavior to support it.

Overall, the results suggest that each ATM has its own demand profile, so forecasting them separately was more appropriate than pooling them together. ATM1 and ATM2 show regular weekday movement, ATM3 behaves like a newly activated machine, and ATM4 has the highest scale and the most volatility.

Part B

The goal in Part B is to model residential power usage from January 1998 through December 2013 and forecast the monthly values for 2014.

This series is monthly, long, and seasonal. For utility usage, a model that captures a smooth trend and repeated yearly seasonality is a natural choice.

Clean & Explore The Power Data

# create a monthly date variable and keep the KWH series
power_data <- power_raw %>%
  transmute(
    Date = as.Date(paste0(`YYYY-MMM`, "-01"), format = "%Y-%b-%d"),
    KWH = as.numeric(KWH)
  ) %>%
  arrange(Date)

# check the date range and missing values
range(power_data$Date)

## [1] "1998-01-01" "2013-12-01"

sum(is.na(power_data$KWH))

## [1] 1

# display the missing month if exists
power_data %>%
  filter(is.na(KWH))

## # A tibble: 1 × 2
##   Date         KWH
##   <date>     <dbl>
## 1 2008-09-01    NA

# plot the monthly power series
power_data %>%
  ggplot(aes(x = Date, y = KWH)) +
  geom_line() +
  labs(
    title = "Residential monthly power usage",
    x = "Date",
    y = "KWH"
  )

The plot shows strong seasonality and a changing long term level. There is one missing month in September 2008, so I filled that single gap before fitting the model. The series also shows a very sharp drop around 2010, which appears unusual relative to the rest of the history, but I kept it in the series because it is part of the observed data.

Build The Power Forecast

# convert the data to a monthly time series object
power_ts <- ts(power_data$KWH, start = c(1998, 1), frequency = 12)

# fill the one missing value using interpolation designed for time series
power_ts <- na.interp(power_ts)

# fill the one missing month using time series interpolation
power_fit <- hw(
  power_ts,
  h = 12,
  seasonal = "additive",
  damped = TRUE
)

# create the 2014 forecast table 
power_final <- tibble(
  Month = seq(as.Date("2014-01-01"), as.Date("2014-12-01"), by = "month"),
  Model = "Holt Winters additive trend and additive seasonality",
  Forecast_KWH = as.numeric(power_fit$mean)
)

power_final

## # A tibble: 12 × 3
##    Month      Model                                                Forecast_KWH
##    <date>     <chr>                                                       <dbl>
##  1 2014-01-01 Holt Winters additive trend and additive seasonality     9064999.
##  2 2014-02-01 Holt Winters additive trend and additive seasonality     8153696.
##  3 2014-03-01 Holt Winters additive trend and additive seasonality     7154152.
##  4 2014-04-01 Holt Winters additive trend and additive seasonality     6480311.
##  5 2014-05-01 Holt Winters additive trend and additive seasonality     6239751.
##  6 2014-06-01 Holt Winters additive trend and additive seasonality     7715486.
##  7 2014-07-01 Holt Winters additive trend and additive seasonality     8622397.
##  8 2014-08-01 Holt Winters additive trend and additive seasonality     9505771.
##  9 2014-09-01 Holt Winters additive trend and additive seasonality     8932850.
## 10 2014-10-01 Holt Winters additive trend and additive seasonality     6925018.
## 11 2014-11-01 Holt Winters additive trend and additive seasonality     6181632.
## 12 2014-12-01 Holt Winters additive trend and additive seasonality     7466246.

# plot the historical series with the 2014 forecast
plot(power_fit, main = "Residential power forecast for 2014")

Part B Forecast Table

power_final %>%
  mutate(
    Month = format(Month, "%Y-%m"),
    Forecast_KWH = format(round(Forecast_KWH, 0), big.mark = ",", scientific = FALSE)
  ) %>%
  kable(caption = "Part B 2014 monthly forecast")

Part B 2014 monthly forecast
Month	Model	Forecast_KWH
2014-01	Holt Winters additive trend and additive seasonality	9,064,999
2014-02	Holt Winters additive trend and additive seasonality	8,153,696
2014-03	Holt Winters additive trend and additive seasonality	7,154,152
2014-04	Holt Winters additive trend and additive seasonality	6,480,311
2014-05	Holt Winters additive trend and additive seasonality	6,239,751
2014-06	Holt Winters additive trend and additive seasonality	7,715,486
2014-07	Holt Winters additive trend and additive seasonality	8,622,397
2014-08	Holt Winters additive trend and additive seasonality	9,505,771
2014-09	Holt Winters additive trend and additive seasonality	8,932,850
2014-10	Holt Winters additive trend and additive seasonality	6,925,018
2014-11	Holt Winters additive trend and additive seasonality	6,181,632
2014-12	Holt Winters additive trend and additive seasonality	7,466,246

For Part B, I treated the residential power data as a monthly seasonal time series. After checking the series, I found one missing month and filled it with interpolation so the data would be complete before modeling. I then fit a Holt Winters model with additive trend and additive seasonality because the series shows both a changing long term level and a repeating yearly pattern.

I did not use a nonseasonal model because the seasonal pattern is very clear in the plot, and I did not move to a more complex model because a seasonal smoothing approach already captured the main structure of the data well. The final result is a 12 month forecast for 2014.

Part C

The goal in Part C is to time base sequence the waterflow data, aggregate the readings by hour using the mean, check whether the data is stable enough to forecast, and then produce a one week forecast.

The two waterflow files have different timestamp structures. Pipe1 has irregular timestamps and Pipe2 is already hourly, but I still aggregate both to hourly mean flow so the treatment is consistent.

Clean & Explore The Waterflow Data

# convert the raw timestamps into hourly mean waterflow
prepare_pipe <- function(data) {
  data %>%
    mutate(hour = floor_date(`Date Time`, unit = "hour")) %>%
    group_by(hour) %>%
    summarise(WaterFlow = mean(WaterFlow, na.rm = TRUE), .groups = "drop") %>%
    arrange(hour) %>%
    complete(hour = seq(min(hour), max(hour), by = "hour")) %>%
    mutate(WaterFlow = na.approx(WaterFlow, na.rm = FALSE)) %>%
    mutate(WaterFlow = na.locf(WaterFlow, na.rm = FALSE)) %>%
    mutate(WaterFlow = na.locf(WaterFlow, fromLast = TRUE, na.rm = FALSE))
}

pipe1_hourly <- prepare_pipe(pipe1_raw)
pipe2_hourly <- prepare_pipe(pipe2_raw)

# quick check
head(pipe1_hourly)

## # A tibble: 6 × 2
##   hour                WaterFlow
##   <dttm>                  <dbl>
## 1 2015-10-23 00:00:00      26.1
## 2 2015-10-23 01:00:00      18.9
## 3 2015-10-23 02:00:00      15.2
## 4 2015-10-23 03:00:00      23.1
## 5 2015-10-23 04:00:00      15.5
## 6 2015-10-23 05:00:00      22.7

head(pipe2_hourly)

## # A tibble: 6 × 2
##   hour                WaterFlow
##   <dttm>                  <dbl>
## 1 2015-10-23 01:00:00      30.9
## 2 2015-10-23 02:00:00      34.5
## 3 2015-10-23 03:00:00      38.0
## 4 2015-10-23 04:00:00      34.0
## 5 2015-10-23 05:00:00      31.1
## 6 2015-10-23 06:00:00      28.2

range(pipe1_hourly$hour)

## [1] "2015-10-23 00:00:00 UTC" "2015-11-01 23:00:00 UTC"

range(pipe2_hourly$hour)

## [1] "2015-10-23 01:00:00 UTC" "2015-12-03 16:00:00 UTC"

par(mfrow = c(2, 1))

plot(pipe1_hourly$hour, pipe1_hourly$WaterFlow, type = "l",
     main = "Pipe1 hourly mean flow", xlab = "Hour", ylab = "WaterFlow")

plot(pipe2_hourly$hour, pipe2_hourly$WaterFlow, type = "l",
     main = "Pipe2 hourly mean flow", xlab = "Hour", ylab = "WaterFlow")

par(mfrow = c(1,1))

After hourly aggregation, both pipe series are much easier to work with. Pipe1 has a shorter history and some movement in level. Pipe2 is longer and looks more stable overall.

Build The Waterflow Forecasts

# create hourly time series objects
pipe1_ts <- ts(pipe1_hourly$WaterFlow, frequency = 24)
pipe2_ts <- ts(pipe2_hourly$WaterFlow, frequency = 24)

# check whether differencing is needed before forecasting
pipe_stationarity <- tibble(
  Series = c("Pipe1", "Pipe2"),
  ndiffs = c(ndiffs(pipe1_ts), ndiffs(pipe2_ts)),
  nsdiffs = c(nsdiffs(pipe1_ts), nsdiffs(pipe2_ts))
)

pipe_stationarity %>%
  kable(caption = "Part C stationarity check")

Part C stationarity check
Series	ndiffs	nsdiffs
Pipe1	0	0
Pipe2	1	0

# pipe1 uses Holt linear trend because the level changes over time
pipe1_fit <- holt(pipe1_ts, h = 168)

# pipe2 uses simple exponential smoothing because the level is steadier
pipe2_fit <- ses(pipe2_ts, h = 168)

# create future hourly time stamps for one week ahead
pipe1_future_time <- seq(max(pipe1_hourly$hour) + hours(1), by = "hour", length.out = 168)

pipe2_future_time <- seq(max(pipe2_hourly$hour) + hours(1), by = "hour", length.out = 168)

# build final forecast tables
pipe1_final <- tibble(
  DateTime = pipe1_future_time,
  Model = "Holt linear trend on hourly mean flow",
  Forecast_WaterFlow = as.numeric(pipe1_fit$mean)
)

pipe2_final <- tibble(
  DateTime = pipe2_future_time,
  Model = "Simple exponential smoothing on hourly mean flow",
  Forecast_WaterFlow = as.numeric(pipe2_fit$mean)
)

Part C Forecast Tables

The next tables show the first 12 forecast hours for each pipe. The full 168 hour forecast is exported to Excel at the end of the report.

pipe1_final %>%
  head(12) %>%
  kable(digits = 4, caption = "Part C Pipe1 forecast Sample")

Part C Pipe1 forecast Sample
DateTime	Model	Forecast_WaterFlow
2015-11-02 00:00:00	Holt linear trend on hourly mean flow	20.1320
2015-11-02 01:00:00	Holt linear trend on hourly mean flow	20.1324
2015-11-02 02:00:00	Holt linear trend on hourly mean flow	20.1328
2015-11-02 03:00:00	Holt linear trend on hourly mean flow	20.1333
2015-11-02 04:00:00	Holt linear trend on hourly mean flow	20.1337
2015-11-02 05:00:00	Holt linear trend on hourly mean flow	20.1341
2015-11-02 06:00:00	Holt linear trend on hourly mean flow	20.1345
2015-11-02 07:00:00	Holt linear trend on hourly mean flow	20.1350
2015-11-02 08:00:00	Holt linear trend on hourly mean flow	20.1354
2015-11-02 09:00:00	Holt linear trend on hourly mean flow	20.1358
2015-11-02 10:00:00	Holt linear trend on hourly mean flow	20.1363
2015-11-02 11:00:00	Holt linear trend on hourly mean flow	20.1367

pipe2_final %>%
  head(12) %>%
  kable(digits = 4, caption = "Part C Pipe2 forecast sample")

Part C Pipe2 forecast sample
DateTime	Model	Forecast_WaterFlow
2015-12-03 17:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-03 18:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-03 19:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-03 20:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-03 21:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-03 22:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-03 23:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-04 00:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-04 01:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-04 02:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-04 03:00:00	Simple exponential smoothing on hourly mean flow	44.3649
2015-12-04 04:00:00	Simple exponential smoothing on hourly mean flow	44.3649

For Part C, I first converted both waterflow files into hourly time series by taking the mean of all readings within each hour. This created a regular hourly sequence for both pipes. I then checked whether differencing was needed by using ndiffs and nsdiffs. The results suggested that Pipe1 did not need differencing, while Pipe2 may need one regular difference and no seasonal differencing. Since the goal of this part was a practical one week forecast, I kept simple forecasting methods after hourly aggregation rather than moving to a more complex differenced model. For Pipe1, I used Holt linear trend because the hourly series showed some movement in level over time. For Pipe2, I used simple exponential smoothing because the series looked more stable overall. I did not keep the raw irregular timestamps because the assignment required hourly aggregation before forecasting. The forecast starts one hour after the final observed timestamp in each data set.

Export The Final Forecasts To Excel

The next chunk saves all final forecast tables into one Excel workbook. This is useful because the project instructions ask for an Excel readable forecast file.

model_summary <- tibble(
  Part = c("Part A ATM1", "Part A ATM2", "Part A ATM3", "Part A ATM4",
           "Part B Power", "Part C Pipe1", "Part C Pipe2"),
  Chosen_Model = c(
    "Recent weekday median",
    "Recent weekday median",
    "Average of last 3 active days",
    "Recent weekday median",
    "Holt Winters additive trend and additive seasonality",
    "Holt linear trend on hourly mean flow",
    "Simple exponential smoothing on hourly mean flow"
  )
)

# write all forecast output to one workbook
write_xlsx(
  list(
    Model_Summary = model_summary,
    ATM_May2010_Forecast = atm_final,
    Power_2014_Forecast = power_final,
    Pipe1_1Week_Forecast = pipe1_final,
    Pipe2_1Week_Forecast = pipe2_final
  ),
  "Project1_Final_Forecasts.xlsx"
)

Final Conclusion

This project used simple and practical forecasting methods that fit the structure of each data set. The ATM section needed different treatment across machines because the series were not similar. The power section was a classic monthly seasonal forecast problem, so Holt Winters was a strong choice. The waterflow section required hourly aggregation first, then a short horizon forecast using Holt linear trend for Pipe1 and simple exponential smoothing for Pipe2.

Data 624 Project 1

Jayden Jiang

2026-03-23