library(fpp3)
library(httr)
library(readxl)
library(lubridate)
library(zoo)
library(writexl)

Data loading and preparation

url <- "https://raw.githubusercontent.com/farhodibr/CUNY-SPS-MSDS/main/DATA624/PROJECT1/Waterflow_Pipe1.xlsx"

dest_file <- tempfile(fileext = ".xlsx")

GET(url, write_disk(dest_file, overwrite = TRUE))
## Response [https://raw.githubusercontent.com/farhodibr/CUNY-SPS-MSDS/main/DATA624/PROJECT1/Waterflow_Pipe1.xlsx]
##   Date: 2026-03-30 05:22
##   Status: 200
##   Content-Type: application/octet-stream
##   Size: 39.1 kB
## <ON DISK>  C:\Users\farho\AppData\Local\Temp\RtmpUD9til\file7c302d5627b3.xlsx
water1 <- read_excel(
  dest_file,
  col_types = c("date", "numeric")
)

water1 <- water1 |>
  rename(
    Date_Time = `Date Time`,
    WaterFlow = WaterFlow
  ) |>
  as_tsibble(index = Date_Time)

water1
## # A tsibble: 1,000 x 2 [0.01493697254773µs] <UTC>
##    Date_Time           WaterFlow
##    <dttm>                  <dbl>
##  1 2015-10-23 00:24:06     23.4 
##  2 2015-10-23 00:40:02     28.0 
##  3 2015-10-23 00:53:51     23.1 
##  4 2015-10-23 00:55:40     30.0 
##  5 2015-10-23 01:19:17      6.00
##  6 2015-10-23 01:23:58     15.9 
##  7 2015-10-23 01:50:05     26.6 
##  8 2015-10-23 01:55:33     33.3 
##  9 2015-10-23 01:59:15     12.4 
## 10 2015-10-23 02:51:51     21.8 
## # ℹ 990 more rows
url <- "https://raw.githubusercontent.com/farhodibr/CUNY-SPS-MSDS/main/DATA624/PROJECT1/Waterflow_Pipe2.xlsx"

dest_file <- tempfile(fileext = ".xlsx")

GET(url, write_disk(dest_file, overwrite = TRUE))
## Response [https://raw.githubusercontent.com/farhodibr/CUNY-SPS-MSDS/main/DATA624/PROJECT1/Waterflow_Pipe2.xlsx]
##   Date: 2026-03-30 05:22
##   Status: 200
##   Content-Type: application/octet-stream
##   Size: 34.5 kB
## <ON DISK>  C:\Users\farho\AppData\Local\Temp\RtmpUD9til\file7c305d6d7793.xlsx
water2 <- read_excel(
  dest_file,
  col_types = c("date", "numeric")
)

water2 <- water2 |>
  rename(
    Date_Time = `Date Time`,
    WaterFlow = WaterFlow
  ) |>
  as_tsibble(index = Date_Time)

water2
## # A tsibble: 1,000 x 2 [1h] <UTC>
##    Date_Time           WaterFlow
##    <dttm>                  <dbl>
##  1 2015-10-23 01:00:00     18.8 
##  2 2015-10-23 02:00:00     43.1 
##  3 2015-10-23 03:00:00     38.0 
##  4 2015-10-23 04:00:00     36.1 
##  5 2015-10-23 05:00:00     31.9 
##  6 2015-10-23 06:00:00     28.2 
##  7 2015-10-23 07:00:00      9.86
##  8 2015-10-23 08:00:00     26.7 
##  9 2015-10-23 09:00:00     55.8 
## 10 2015-10-23 10:00:00     54.2 
## # ℹ 990 more rows
#water1 |> autoplot()

WATER2 Analysis

water2 |> autoplot(WaterFlow)+
  labs(title = 'Pipe2 water flow series')

This plot shows that the Pipe2 water flow stays fairly stable over time, fluctuating around the same average level with mostly constant variance. There is no clear trend, but the series is noisy with many short-term ups and downs. Overall, it already looks stationary and likely does not need differencing first.

Let’s check if there is seasonal pattern with ACF function

water2 |> ACF(log(WaterFlow), lag_max = 72) |> autoplot()

I see only one barely spiking over significance bound at lag 24. There are also several other pikes at other lags. I suggest there is a very weak seasonality in this series, and I think the data is more likely is a random white noise.

water2 |>
  gg_season(WaterFlow, period = "day")
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

As we see from gg_season function plot there is no seasonal pattern in this series.

Let’s see if differencing needed

water2 |>
  features(
    WaterFlow,
    list(
      ndiffs = unitroot_nsdiffs,
      nsdiffs = unitroot_ndiffs
    )
  )
## # A tibble: 1 × 2
##   ndiffs_nsdiffs nsdiffs_ndiffs
##            <int>          <int>
## 1              0              0

The results show no differencing is required.

From above findings I think the best option to forecast is to use SNAIVE model. The reason is because there almost non-existing seasonal pattern and series appear as a random white noise.

I’ll fit auto ARIMA and SNAIVE model to forecast.

fit_water2 <- water2 |> 
  model(
    arima = ARIMA(WaterFlow),
    snaive = SNAIVE(WaterFlow),
    ets = ETS(WaterFlow)
  )

fc_water2 <- fit_water2 |> 
  forecast(h = 7 )
fc_water2 |> 
  autoplot(water2, level = NULL) +
  coord_cartesian(xlim = c(as.POSIXct("2015-12-01"), NA))

As we see from the plot SNAIVE method is the best to forecast the waterflow. ETS model looks just projecting the mean as a forecast, while ARIMA didn’t catch full structure of the data.

 fc_water2_snaive <- fc_water2 |>
  filter(.model == "snaive") |>
  as_tibble() |>
  select(1, .mean) |>
  rename(Date = 1, Forecast = .mean)

write_xlsx(fc_water2_snaive, "water2_snaive_forecast.xlsx")