This project has three parts. Part A forecasts daily ATM cash withdrawals for May 2010. Part B forecasts monthly residential electricity usage for 2014. Part C is the bonus section and forecasts one week of hourly waterflow.
library(readxl)
library(dplyr)
library(tidyr)
library(lubridate)
library(forecast)
library(writexl)
library(zoo)
library(ggplot2)
library(knitr)
# read the four Excel files from the same folder as the R file
atm_raw <- read_excel("ATM624Data.xlsx")
power_raw <- read_excel("ResidentialCustomerForecastLoad-624.xlsx")
pipe1_raw <- read_excel("Waterflow_Pipe1.xlsx")
pipe2_raw <- read_excel("Waterflow_Pipe2.xlsx")
# convert the Excel serial date into a proper R date
atm_raw <- atm_raw %>%
mutate(DATE = as.Date(DATE, origin = "1899-12-30"))
# convert the Excel numeric date time values into a proper R date time value
pipe1_raw <- pipe1_raw %>%
mutate(`Date Time` = as.POSIXct(`Date Time` * 86400,
origin = "1899-12-30",
tz = "UTC")) %>%
mutate(`Date Time` = lubridate::floor_date(`Date Time`, unit = "minute"))
pipe2_raw <- pipe2_raw %>%
mutate(`Date Time` = as.POSIXct(`Date Time` * 86400,
origin = "1899-12-30",
tz = "UTC")) %>%
mutate(`Date Time` = lubridate::floor_date(`Date Time`, unit = "minute"))
# quick preview for each dataset
head(atm_raw)
## # A tibble: 6 × 3
## DATE ATM Cash
## <date> <chr> <dbl>
## 1 2009-05-01 ATM1 96
## 2 2009-05-01 ATM2 107
## 3 2009-05-02 ATM1 82
## 4 2009-05-02 ATM2 89
## 5 2009-05-03 ATM1 85
## 6 2009-05-03 ATM2 90
head(power_raw)
## # A tibble: 6 × 3
## CaseSequence `YYYY-MMM` KWH
## <dbl> <chr> <dbl>
## 1 733 1998-Jan 6862583
## 2 734 1998-Feb 5838198
## 3 735 1998-Mar 5420658
## 4 736 1998-Apr 5010364
## 5 737 1998-May 4665377
## 6 738 1998-Jun 6467147
head(pipe1_raw)
## # A tibble: 6 × 2
## `Date Time` WaterFlow
## <dttm> <dbl>
## 1 2015-10-23 00:24:00 23.4
## 2 2015-10-23 00:40:00 28.0
## 3 2015-10-23 00:53:00 23.1
## 4 2015-10-23 00:55:00 30.0
## 5 2015-10-23 01:19:00 6.00
## 6 2015-10-23 01:23:00 15.9
head(pipe2_raw)
## # A tibble: 6 × 2
## `Date Time` WaterFlow
## <dttm> <dbl>
## 1 2015-10-23 01:00:00 18.8
## 2 2015-10-23 01:59:00 43.1
## 3 2015-10-23 03:00:00 38.0
## 4 2015-10-23 04:00:00 36.1
## 5 2015-10-23 04:59:00 31.9
## 6 2015-10-23 06:00:00 28.2
The goal in Part A is to forecast how much cash will be taken out of four different ATM machines for May 2010. The cash variable is recorded in hundreds of dollars.
I treated each ATM as a separate time series because the four machines do not behave the same way. ATM1, ATM2, and ATM4 show short term day of week behavior. ATM3 is very unusual because it is inactive for almost the whole data set and only becomes active very late in the series. Because of that, I did not force one single model onto all four machines.
# keep only rows with a valid ATM label and cash value
atm_data <- atm_raw %>%
filter(!is.na(ATM), !is.na(Cash)) %>%
mutate(DATE = as.Date(DATE)) %>%
arrange(ATM, DATE)
# check the date range and ATM labels
range(atm_data$DATE)
## [1] "2009-05-01" "2010-04-30"
unique(atm_data$ATM)
## [1] "ATM1" "ATM2" "ATM3" "ATM4"
# count the number of rows for each ATM
atm_data %>%
count(ATM)
## # A tibble: 4 × 2
## ATM n
## <chr> <int>
## 1 ATM1 362
## 2 ATM2 363
## 3 ATM3 365
## 4 ATM4 365
The ATM data runs from May 1, 2009 through April 30, 2010. There are four machines in the file, and each one shows a different pattern of activity.
# plot each ATM separately to compare level, volatility, and unusual behavior
atm_data %>%
ggplot(aes(x = DATE, y = Cash)) +
geom_line() +
facet_wrap(~ ATM, scales = "free_y", ncol = 2) +
labs(
title = "ATM daily cash withdrawls",
x = "Date",
y = "Cash in hundreds of dollars"
)
The time plot shows that ATM1, ATM2, and ATM4 each have their own level and pattern. ATM4 has much larger values and more volatility than the other machines. ATM3 is almost all zeros for most of the history, so it behaves more like a newly activated machine than a stable series. ATM4 also contains one very large spike, so I wanted a method that would be less sensitive to extreme values.
For ATM1, ATM2, and ATM4, I used a recent weekday median forecast. This was a practical business style approach because ATM usage often depends on the day of the week, and a recent window keeps the forecast responsive to the most current behavior. I used the median instead of the mean because it is less affected by large unusual withdrawals, especially for ATM4.
For ATM3, I used the average of the last 3 active days. I chose this because the machine had almost no normal operating history. A formal time series model would look more technical, but it would not be very trustworthy when the series is mostly zeros and only turns on at the end.
I did not use one single model for all four ATMs because the series behaved very differently. I also did not use a standard time series model for ATM3 because there was not enough stable history to support it.
# helper function for a weekday median forecast using a recent observations for each ATM
weekday_median_forecast <- function(data, atm_name, future_dates, window_days) {
recent_data <- data %>%
filter(ATM == atm_name) %>%
arrange(DATE) %>%
tail(window_days) %>%
mutate(weekday = wday(DATE, week_start = 1, label = TRUE))
weekday_profile <- recent_data %>%
group_by(weekday) %>%
summarise(forecast = median(Cash, na.rm = TRUE), .groups = "drop")
tibble(Date = future_dates) %>%
mutate(weekday = wday(Date, week_start = 1, label = TRUE)) %>%
left_join(weekday_profile, by = "weekday") %>%
select(Date, forecast)
}
# create the May 2010 forecast dates
may_dates <- seq(as.Date("2010-05-01"), as.Date("2010-05-31"), by = "day")
# ATM1 and ATM2 use the last 56 days
atm1_fc <- weekday_median_forecast(atm_data, "ATM1", may_dates, 56) %>%
mutate(ATM = "ATM1", Model = "Weekday median of last 56 days")
atm2_fc <- weekday_median_forecast(atm_data, "ATM2", may_dates, 56) %>%
mutate(ATM = "ATM2", Model = "Weekday median of last 56 days")
# ATM3 uses the mean of the last 3 active days
atm3_level <- atm_data %>%
filter(ATM == "ATM3", Cash > 0) %>%
arrange(DATE) %>%
tail(3) %>%
summarise(level = mean(Cash, na.rm = TRUE)) %>%
pull(level)
atm3_fc <- tibble(
Date = may_dates,
forecast = atm3_level,
ATM = "ATM3",
Model = "Average of last 3 active days"
)
# ATM4 uses a shorter recent window because the series is more volatile
atm4_fc <- weekday_median_forecast(atm_data, "ATM4", may_dates, 28) %>%
mutate(ATM = "ATM4", Model = "Weekday median of last 28 days")
# combine the four ATM forecasts into one table
atm_final <- bind_rows(atm1_fc, atm2_fc, atm3_fc, atm4_fc) %>%
rename(Forecast_Cash_Hundreds = forecast) %>%
select(Date, ATM, Model, Forecast_Cash_Hundreds) %>%
arrange(Date, ATM)
atm_final %>%
head(12)
## # A tibble: 12 × 4
## Date ATM Model Forecast_Cash_Hundreds
## <date> <chr> <chr> <dbl>
## 1 2010-05-01 ATM1 Weekday median of last 56 days 87.5
## 2 2010-05-01 ATM2 Weekday median of last 56 days 71
## 3 2010-05-01 ATM3 Average of last 3 active days 87.7
## 4 2010-05-01 ATM4 Weekday median of last 28 days 268.
## 5 2010-05-02 ATM1 Weekday median of last 56 days 110.
## 6 2010-05-02 ATM2 Weekday median of last 56 days 74.5
## 7 2010-05-02 ATM3 Average of last 3 active days 87.7
## 8 2010-05-02 ATM4 Weekday median of last 28 days 412.
## 9 2010-05-03 ATM1 Weekday median of last 56 days 68
## 10 2010-05-03 ATM2 Weekday median of last 56 days 9
## 11 2010-05-03 ATM3 Average of last 3 active days 87.7
## 12 2010-05-03 ATM4 Weekday median of last 28 days 437.
The table below shows the first three forecast days for each ATM. The full May forecast will be exported to Excel at the end of the report.
atm_final %>%
filter(Date <= as.Date("2010-05-03")) %>%
kable(digits = 2, caption = "Part A forecast sample")
| Date | ATM | Model | Forecast_Cash_Hundreds |
|---|---|---|---|
| 2010-05-01 | ATM1 | Weekday median of last 56 days | 87.50 |
| 2010-05-01 | ATM2 | Weekday median of last 56 days | 71.00 |
| 2010-05-01 | ATM3 | Average of last 3 active days | 87.67 |
| 2010-05-01 | ATM4 | Weekday median of last 28 days | 268.04 |
| 2010-05-02 | ATM1 | Weekday median of last 56 days | 110.50 |
| 2010-05-02 | ATM2 | Weekday median of last 56 days | 74.50 |
| 2010-05-02 | ATM3 | Average of last 3 active days | 87.67 |
| 2010-05-02 | ATM4 | Weekday median of last 28 days | 411.70 |
| 2010-05-03 | ATM1 | Weekday median of last 56 days | 68.00 |
| 2010-05-03 | ATM2 | Weekday median of last 56 days | 9.00 |
| 2010-05-03 | ATM3 | Average of last 3 active days | 87.67 |
| 2010-05-03 | ATM4 | Weekday median of last 28 days | 437.13 |
For Part A, I forecasted each ATM separately because the four machines showed very different behavior. ATM1, ATM2, and ATM4 showed clear day of week structure, so I used a recent weekday median forecast for those machines. This approach was simple, practical, and matched the short term business pattern in the data. It was also less affected by unusual spikes than a mean based forecast, which was especially important for ATM4.
ATM3 was different because it was inactive for almost the whole series and only turned on near the end. Since there was not enough stable operating history for a standard time series model, I used the average of the last 3 active days as a start up style forecast for May 2010. I did not use the same modeling approach for all four ATMs because ATM3 did not have enough normal historical behavior to support it.
Overall, the results suggest that each ATM has its own demand profile, so forecasting them separately was more appropriate than pooling them together. ATM1 and ATM2 show regular weekday movement, ATM3 behaves like a newly activated machine, and ATM4 has the highest scale and the most volatility.
The goal in Part B is to model residential power usage from January 1998 through December 2013 and forecast the monthly values for 2014.
This series is monthly, long, and seasonal. For utility usage, a model that captures a smooth trend and repeated yearly seasonality is a natural choice.
# create a monthly date variable and keep the KWH series
power_data <- power_raw %>%
transmute(
Date = as.Date(paste0(`YYYY-MMM`, "-01"), format = "%Y-%b-%d"),
KWH = as.numeric(KWH)
) %>%
arrange(Date)
# check the date range and missing values
range(power_data$Date)
## [1] "1998-01-01" "2013-12-01"
sum(is.na(power_data$KWH))
## [1] 1
# display the missing month if exists
power_data %>%
filter(is.na(KWH))
## # A tibble: 1 × 2
## Date KWH
## <date> <dbl>
## 1 2008-09-01 NA
# plot the monthly power series
power_data %>%
ggplot(aes(x = Date, y = KWH)) +
geom_line() +
labs(
title = "Residential monthly power usage",
x = "Date",
y = "KWH"
)
The plot shows strong seasonality and a changing long term level. There is one missing month in September 2008, so I filled that single gap before fitting the model. The series also shows a very sharp drop around 2010, which appears unusual relative to the rest of the history, but I kept it in the series because it is part of the observed data.
# convert the data to a monthly time series object
power_ts <- ts(power_data$KWH, start = c(1998, 1), frequency = 12)
# fill the one missing value using interpolation designed for time series
power_ts <- na.interp(power_ts)
# fill the one missing month using time series interpolation
power_fit <- hw(
power_ts,
h = 12,
seasonal = "additive",
damped = TRUE
)
# create the 2014 forecast table
power_final <- tibble(
Month = seq(as.Date("2014-01-01"), as.Date("2014-12-01"), by = "month"),
Model = "Holt Winters additive trend and additive seasonality",
Forecast_KWH = as.numeric(power_fit$mean)
)
power_final
## # A tibble: 12 × 3
## Month Model Forecast_KWH
## <date> <chr> <dbl>
## 1 2014-01-01 Holt Winters additive trend and additive seasonality 9064999.
## 2 2014-02-01 Holt Winters additive trend and additive seasonality 8153696.
## 3 2014-03-01 Holt Winters additive trend and additive seasonality 7154152.
## 4 2014-04-01 Holt Winters additive trend and additive seasonality 6480311.
## 5 2014-05-01 Holt Winters additive trend and additive seasonality 6239751.
## 6 2014-06-01 Holt Winters additive trend and additive seasonality 7715486.
## 7 2014-07-01 Holt Winters additive trend and additive seasonality 8622397.
## 8 2014-08-01 Holt Winters additive trend and additive seasonality 9505771.
## 9 2014-09-01 Holt Winters additive trend and additive seasonality 8932850.
## 10 2014-10-01 Holt Winters additive trend and additive seasonality 6925018.
## 11 2014-11-01 Holt Winters additive trend and additive seasonality 6181632.
## 12 2014-12-01 Holt Winters additive trend and additive seasonality 7466246.
# plot the historical series with the 2014 forecast
plot(power_fit, main = "Residential power forecast for 2014")
power_final %>%
mutate(
Month = format(Month, "%Y-%m"),
Forecast_KWH = format(round(Forecast_KWH, 0), big.mark = ",", scientific = FALSE)
) %>%
kable(caption = "Part B 2014 monthly forecast")
| Month | Model | Forecast_KWH |
|---|---|---|
| 2014-01 | Holt Winters additive trend and additive seasonality | 9,064,999 |
| 2014-02 | Holt Winters additive trend and additive seasonality | 8,153,696 |
| 2014-03 | Holt Winters additive trend and additive seasonality | 7,154,152 |
| 2014-04 | Holt Winters additive trend and additive seasonality | 6,480,311 |
| 2014-05 | Holt Winters additive trend and additive seasonality | 6,239,751 |
| 2014-06 | Holt Winters additive trend and additive seasonality | 7,715,486 |
| 2014-07 | Holt Winters additive trend and additive seasonality | 8,622,397 |
| 2014-08 | Holt Winters additive trend and additive seasonality | 9,505,771 |
| 2014-09 | Holt Winters additive trend and additive seasonality | 8,932,850 |
| 2014-10 | Holt Winters additive trend and additive seasonality | 6,925,018 |
| 2014-11 | Holt Winters additive trend and additive seasonality | 6,181,632 |
| 2014-12 | Holt Winters additive trend and additive seasonality | 7,466,246 |
For Part B, I treated the residential power data as a monthly seasonal time series. After checking the series, I found one missing month and filled it with interpolation so the data would be complete before modeling. I then fit a Holt Winters model with additive trend and additive seasonality because the series shows both a changing long term level and a repeating yearly pattern.
I did not use a nonseasonal model because the seasonal pattern is very clear in the plot, and I did not move to a more complex model because a seasonal smoothing approach already captured the main structure of the data well. The final result is a 12 month forecast for 2014.
The goal in Part C is to time base sequence the waterflow data, aggregate the readings by hour using the mean, check whether the data is stable enough to forecast, and then produce a one week forecast.
The two waterflow files have different timestamp structures. Pipe1 has irregular timestamps and Pipe2 is already hourly, but I still aggregate both to hourly mean flow so the treatment is consistent.
# convert the raw timestamps into hourly mean waterflow
prepare_pipe <- function(data) {
data %>%
mutate(hour = floor_date(`Date Time`, unit = "hour")) %>%
group_by(hour) %>%
summarise(WaterFlow = mean(WaterFlow, na.rm = TRUE), .groups = "drop") %>%
arrange(hour) %>%
complete(hour = seq(min(hour), max(hour), by = "hour")) %>%
mutate(WaterFlow = na.approx(WaterFlow, na.rm = FALSE)) %>%
mutate(WaterFlow = na.locf(WaterFlow, na.rm = FALSE)) %>%
mutate(WaterFlow = na.locf(WaterFlow, fromLast = TRUE, na.rm = FALSE))
}
pipe1_hourly <- prepare_pipe(pipe1_raw)
pipe2_hourly <- prepare_pipe(pipe2_raw)
# quick check
head(pipe1_hourly)
## # A tibble: 6 × 2
## hour WaterFlow
## <dttm> <dbl>
## 1 2015-10-23 00:00:00 26.1
## 2 2015-10-23 01:00:00 18.9
## 3 2015-10-23 02:00:00 15.2
## 4 2015-10-23 03:00:00 23.1
## 5 2015-10-23 04:00:00 15.5
## 6 2015-10-23 05:00:00 22.7
head(pipe2_hourly)
## # A tibble: 6 × 2
## hour WaterFlow
## <dttm> <dbl>
## 1 2015-10-23 01:00:00 30.9
## 2 2015-10-23 02:00:00 34.5
## 3 2015-10-23 03:00:00 38.0
## 4 2015-10-23 04:00:00 34.0
## 5 2015-10-23 05:00:00 31.1
## 6 2015-10-23 06:00:00 28.2
range(pipe1_hourly$hour)
## [1] "2015-10-23 00:00:00 UTC" "2015-11-01 23:00:00 UTC"
range(pipe2_hourly$hour)
## [1] "2015-10-23 01:00:00 UTC" "2015-12-03 16:00:00 UTC"
par(mfrow = c(2, 1))
plot(pipe1_hourly$hour, pipe1_hourly$WaterFlow, type = "l",
main = "Pipe1 hourly mean flow", xlab = "Hour", ylab = "WaterFlow")
plot(pipe2_hourly$hour, pipe2_hourly$WaterFlow, type = "l",
main = "Pipe2 hourly mean flow", xlab = "Hour", ylab = "WaterFlow")
par(mfrow = c(1,1))
After hourly aggregation, both pipe series are much easier to work with. Pipe1 has a shorter history and some movement in level. Pipe2 is longer and looks more stable overall.
# create hourly time series objects
pipe1_ts <- ts(pipe1_hourly$WaterFlow, frequency = 24)
pipe2_ts <- ts(pipe2_hourly$WaterFlow, frequency = 24)
# check whether differencing is needed before forecasting
pipe_stationarity <- tibble(
Series = c("Pipe1", "Pipe2"),
ndiffs = c(ndiffs(pipe1_ts), ndiffs(pipe2_ts)),
nsdiffs = c(nsdiffs(pipe1_ts), nsdiffs(pipe2_ts))
)
pipe_stationarity %>%
kable(caption = "Part C stationarity check")
| Series | ndiffs | nsdiffs |
|---|---|---|
| Pipe1 | 0 | 0 |
| Pipe2 | 1 | 0 |
# pipe1 uses Holt linear trend because the level changes over time
pipe1_fit <- holt(pipe1_ts, h = 168)
# pipe2 uses simple exponential smoothing because the level is steadier
pipe2_fit <- ses(pipe2_ts, h = 168)
# create future hourly time stamps for one week ahead
pipe1_future_time <- seq(max(pipe1_hourly$hour) + hours(1), by = "hour", length.out = 168)
pipe2_future_time <- seq(max(pipe2_hourly$hour) + hours(1), by = "hour", length.out = 168)
# build final forecast tables
pipe1_final <- tibble(
DateTime = pipe1_future_time,
Model = "Holt linear trend on hourly mean flow",
Forecast_WaterFlow = as.numeric(pipe1_fit$mean)
)
pipe2_final <- tibble(
DateTime = pipe2_future_time,
Model = "Simple exponential smoothing on hourly mean flow",
Forecast_WaterFlow = as.numeric(pipe2_fit$mean)
)
The next tables show the first 12 forecast hours for each pipe. The full 168 hour forecast is exported to Excel at the end of the report.
pipe1_final %>%
head(12) %>%
kable(digits = 4, caption = "Part C Pipe1 forecast Sample")
| DateTime | Model | Forecast_WaterFlow |
|---|---|---|
| 2015-11-02 00:00:00 | Holt linear trend on hourly mean flow | 20.1320 |
| 2015-11-02 01:00:00 | Holt linear trend on hourly mean flow | 20.1324 |
| 2015-11-02 02:00:00 | Holt linear trend on hourly mean flow | 20.1328 |
| 2015-11-02 03:00:00 | Holt linear trend on hourly mean flow | 20.1333 |
| 2015-11-02 04:00:00 | Holt linear trend on hourly mean flow | 20.1337 |
| 2015-11-02 05:00:00 | Holt linear trend on hourly mean flow | 20.1341 |
| 2015-11-02 06:00:00 | Holt linear trend on hourly mean flow | 20.1345 |
| 2015-11-02 07:00:00 | Holt linear trend on hourly mean flow | 20.1350 |
| 2015-11-02 08:00:00 | Holt linear trend on hourly mean flow | 20.1354 |
| 2015-11-02 09:00:00 | Holt linear trend on hourly mean flow | 20.1358 |
| 2015-11-02 10:00:00 | Holt linear trend on hourly mean flow | 20.1363 |
| 2015-11-02 11:00:00 | Holt linear trend on hourly mean flow | 20.1367 |
pipe2_final %>%
head(12) %>%
kable(digits = 4, caption = "Part C Pipe2 forecast sample")
| DateTime | Model | Forecast_WaterFlow |
|---|---|---|
| 2015-12-03 17:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-03 18:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-03 19:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-03 20:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-03 21:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-03 22:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-03 23:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-04 00:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-04 01:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-04 02:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-04 03:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
| 2015-12-04 04:00:00 | Simple exponential smoothing on hourly mean flow | 44.3649 |
For Part C, I first converted both waterflow files into hourly time series by taking the mean of all readings within each hour. This created a regular hourly sequence for both pipes. I then checked whether differencing was needed by using ndiffs and nsdiffs. The results suggested that Pipe1 did not need differencing, while Pipe2 may need one regular difference and no seasonal differencing. Since the goal of this part was a practical one week forecast, I kept simple forecasting methods after hourly aggregation rather than moving to a more complex differenced model. For Pipe1, I used Holt linear trend because the hourly series showed some movement in level over time. For Pipe2, I used simple exponential smoothing because the series looked more stable overall. I did not keep the raw irregular timestamps because the assignment required hourly aggregation before forecasting. The forecast starts one hour after the final observed timestamp in each data set.
The next chunk saves all final forecast tables into one Excel workbook. This is useful because the project instructions ask for an Excel readable forecast file.
model_summary <- tibble(
Part = c("Part A ATM1", "Part A ATM2", "Part A ATM3", "Part A ATM4",
"Part B Power", "Part C Pipe1", "Part C Pipe2"),
Chosen_Model = c(
"Recent weekday median",
"Recent weekday median",
"Average of last 3 active days",
"Recent weekday median",
"Holt Winters additive trend and additive seasonality",
"Holt linear trend on hourly mean flow",
"Simple exponential smoothing on hourly mean flow"
)
)
# write all forecast output to one workbook
write_xlsx(
list(
Model_Summary = model_summary,
ATM_May2010_Forecast = atm_final,
Power_2014_Forecast = power_final,
Pipe1_1Week_Forecast = pipe1_final,
Pipe2_1Week_Forecast = pipe2_final
),
"Project1_Final_Forecasts.xlsx"
)
This project used simple and practical forecasting methods that fit the structure of each data set. The ATM section needed different treatment across machines because the series were not similar. The power section was a classic monthly seasonal forecast problem, so Holt Winters was a strong choice. The waterflow section required hourly aggregation first, then a short horizon forecast using Holt linear trend for Pipe1 and simple exponential smoothing for Pipe2.