PROJECT 1- DATA 624- ATM FORECAST

library(readxl)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tsibble)

## 
## Attaching package: 'tsibble'
## 
## The following object is masked from 'package:lubridate':
## 
##     interval
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(fable)

## Loading required package: fabletools

library(feasts)

## Registered S3 methods overwritten by 'ggtime':
##   method           from      
##   autolayer.fbl_ts fabletools
##   autolayer.tbl_ts fabletools
##   autoplot.dcmp_ts fabletools
##   autoplot.fbl_ts  fabletools
##   autoplot.tbl_ts  fabletools
##   fortify.fbl_ts   fabletools

atm_data <- read_excel("ATM624Data (1).xlsx")
head(atm_data)

## # A tibble: 6 × 3
##    DATE ATM    Cash
##   <dbl> <chr> <dbl>
## 1 39934 ATM1     96
## 2 39934 ATM2    107
## 3 39935 ATM1     82
## 4 39935 ATM2     89
## 5 39936 ATM1     85
## 6 39936 ATM2     90

colnames(atm_data)

## [1] "DATE" "ATM"  "Cash"

atm_data <- atm_data %>%
  mutate(DATE = as.Date(DATE, origin = "1899-12-30")) %>%
  as_tsibble(index = DATE, key = ATM)

autoplot(atm_data, Cash) +
  facet_wrap(~ATM, scales = "free_y")

## Warning: `autoplot.tbl_ts()` was deprecated in fabletools 0.6.0.
## ℹ Please use `ggtime::autoplot.tbl_ts()` instead.
## ℹ Graphics functions have been moved to the {ggtime} package. Please use
##   `library(ggtime)` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_line()`).

atm_data <- atm_data %>%
  mutate(
    DATE = as.Date(DATE, origin = "1899-12-30"),
    Cash = as.numeric(Cash)
  ) %>%
  drop_na(Cash) %>%
  as_tsibble(index = DATE, key = ATM)

atm_data <- atm_data %>%
  group_by(ATM) %>%
  fill_gaps()

fit <- atm_data %>%
  model(
    ARIMA = ARIMA(Cash),
    ETS = ETS(Cash)
  )

forecast_data <- fit %>%
  forecast(h = "1 month")

autoplot(forecast_data, atm_data)

## Warning: `autoplot.fbl_ts()` was deprecated in fabletools 0.6.0.
## ℹ Please use `ggtime::autoplot.fbl_ts()` instead.
## ℹ Graphics functions have been moved to the {ggtime} package. Please use
##   `library(ggtime)` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `mutate_if()` ignored the following grouping variables:
## • Column `ATM`

Data Preparation

Before building forecasting models, several preprocessing steps were required to ensure the dataset was suitable for time series analysis.

First, the DATE variable was not initially stored in a standard date format. Instead, it appeared as numeric values due to Excel’s internal date representation system. To address this, the variable was converted into a proper date format in R using as.Date() with the appropriate origin (1899-12-30). This step was necessary to correctly interpret the temporal structure of the data and allow it to function as a time index.

Next, the Cash variable, which represents ATM withdrawals (in hundreds of dollars), was explicitly converted to numeric format. This ensured compatibility with forecasting models such as ARIMA and ETS, which require numeric input. During this step, any non-numeric or invalid entries were handled, and rows with missing values were removed to prevent model estimation issues.

After cleaning the variables, the dataset was converted into a time series object using a tsibble, with DATE as the time index and ATM as the key. This allowed each ATM machine to be treated as an individual time series, which is important given the variation in usage patterns across machines.

An additional issue identified in the dataset was the presence of implicit gaps in the time index, meaning that certain dates were missing from the sequence. Since forecasting models like ARIMA and ETS assume a continuous time structure, these gaps were addressed using fill_gaps(), which inserts the missing time points and preserves alignment across observations.

Overall, these preprocessing steps ensured that the data was properly structured, consistent, and suitable for accurate time series modeling and forecasting.

power_data <- read_excel("ResidentialCustomerForecastLoad-624 (1).xlsx")

head(power_data)

## # A tibble: 6 × 3
##   CaseSequence `YYYY-MMM`     KWH
##          <dbl> <chr>        <dbl>
## 1          733 1998-Jan   6862583
## 2          734 1998-Feb   5838198
## 3          735 1998-Mar   5420658
## 4          736 1998-Apr   5010364
## 5          737 1998-May   4665377
## 6          738 1998-Jun   6467147

colnames(power_data)

## [1] "CaseSequence" "YYYY-MMM"     "KWH"

power_data <- power_data %>%
  mutate(Month = yearmonth(`YYYY-MMM`)) %>%
  as_tsibble(index = Month)

autoplot(power_data, KWH)

Data Exploration (Part B)

The dataset consists of monthly residential power consumption (measured in kilowatt-hours) from January 1998 to December 2013. The time variable was provided in a year-month format and was converted into a proper monthly time index using the yearmonth() function.

A time series plot was generated to visualize the data. The plot shows clear seasonal patterns, with recurring peaks and troughs across years, indicating strong seasonality in electricity usage. There also appears to be some variation in overall consumption levels over time.

power_fit <- power_data %>%
  model(
    ARIMA = ARIMA(KWH),
    ETS = ETS(KWH)
  )

power_forecast <- power_fit %>%
  forecast(h = "12 months")

autoplot(power_forecast, power_data)

Modeling Approach (Part B)

The monthly power consumption data was modeled using both ARIMA and Exponential Smoothing (ETS) methods. ARIMA models are well-suited for capturing autocorrelation and seasonal patterns, while ETS models capture level, trend, and seasonality directly.

Given the strong seasonal structure observed in the data, both models were expected to perform well. Each model was fit to the full dataset, and forecasts were generated for the 12 months of 2014.

Forecast Results (Part B)

The forecast for 2014 reflects the seasonal nature of electricity consumption, with expected increases during peak usage periods and decreases during off-peak months.

Both ARIMA and ETS models produced consistent forecasts, reinforcing the presence of stable seasonal patterns in the data. These results provide a reasonable estimate of future residential electricity demand.

write.csv(power_forecast, "Power_Forecast_2014.csv")

pipe1 <- read_excel("Waterflow_Pipe1 (1).xlsx")
pipe2 <- read_excel("Waterflow_Pipe2 (1).xlsx")

head(pipe1)

## # A tibble: 6 × 2
##   `Date Time` WaterFlow
##         <dbl>     <dbl>
## 1      42300.     23.4 
## 2      42300.     28.0 
## 3      42300.     23.1 
## 4      42300.     30.0 
## 5      42300.      6.00
## 6      42300.     15.9

head(pipe2)

## # A tibble: 6 × 2
##   `Date Time` WaterFlow
##         <dbl>     <dbl>
## 1      42300.      18.8
## 2      42300.      43.1
## 3      42300.      38.0
## 4      42300.      36.1
## 5      42300.      31.9
## 6      42300.      28.2

colnames(pipe1)

## [1] "Date Time" "WaterFlow"

colnames(pipe2)

## [1] "Date Time" "WaterFlow"

pipe1 <- pipe1 %>% mutate(Source = "Pipe1")
pipe2 <- pipe2 %>% mutate(Source = "Pipe2")

combined <- bind_rows(pipe1, pipe2)

combined <- combined %>%
  mutate(DateTime = as.POSIXct(`Date Time`, origin = "1899-12-30"))

hourly_data <- combined %>%
  mutate(Hour = lubridate::floor_date(DateTime, "hour")) %>%
  group_by(Hour) %>%
  summarise(WaterFlow = mean(WaterFlow, na.rm = TRUE)) %>%
  ungroup()

hourly_data <- hourly_data %>%
  as_tsibble(index = Hour)

autoplot(hourly_data, WaterFlow)

## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?

Data Preparation (Part C)

The two datasets were combined into a single dataset to create a unified time series. Each dataset contained timestamped water flow measurements, but with potentially differing time intervals.

The timestamp variable was converted into a proper datetime format using as.POSIXct() with the appropriate origin. To standardize the data, timestamps were rounded down to the nearest hour using floor_date().

Since multiple observations could occur within the same hour, the data was aggregated by hour, taking the mean water flow value for each hour. This ensured a consistent time interval suitable for time series analysis.

gg_tsdisplay(hourly_data, WaterFlow)

## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## ℹ Graphics functions have been moved to the {ggtime} package. Please use
##   `library(ggtime)` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?

Stationarity Assessment

A visual inspection of the time series and its autocorrelation structure suggests that the data exhibits some temporal dependence. While the series may not be perfectly stationary, it appears stable enough to proceed with forecasting models such as ARIMA.

hourly_data <- combined %>%
  mutate(
    DateTime = as.POSIXct(`Date Time` * 86400, origin = "1899-12-30"),
    Hour = lubridate::floor_date(DateTime, "hour")
  ) %>%
  group_by(Hour) %>%
  summarise(WaterFlow = mean(WaterFlow, na.rm = TRUE)) %>%
  ungroup() %>%
  as_tsibble(index = Hour) %>%
  fill_gaps()

nrow(hourly_data)

## [1] 1001

pipe_fit <- hourly_data %>%
  model(ARIMA = ARIMA(WaterFlow))

pipe_forecast <- pipe_fit %>%
  forecast(h = "168 hours")

autoplot(pipe_forecast, hourly_data)

Data Preparation (Part C)

The two datasets were combined into a single dataset to create a unified time series of water flow measurements. Each dataset contained timestamp values recorded in Excel numeric format, which were converted into proper datetime values using as.POSIXct() with the appropriate origin.

To standardize the time intervals, the timestamps were rounded down to the nearest hour using floor_date(). Since multiple observations could occur within the same hour, the data was aggregated by hour by taking the mean water flow value.

This process ensured that the dataset had a consistent hourly time structure suitable for time series modeling.

Stationarity Assessment

A visual inspection of the time series plot suggests that the data exhibits variability over time but remains relatively stable. While not perfectly stationary, the data is suitable for applying ARIMA modeling for short-term forecasting.

Forecast Results (Part C)

An ARIMA model was applied to the hourly aggregated data to generate a one-week (168-hour) forecast. The forecast captures short-term patterns in water flow and provides a reasonable estimate of future values based on historical trends.

PROJECT 1- DATA 624- ATM FORECAST

IZZA KHAN

2026-03-29