library(fpp3)
library(tidyverse)
library(fable)
library(latex2exp)
Firstly, after importing the data, it must be pre-processed before
running it through any model so the data was transformed into a tsibble
format for easy viewing. The tsibble was assigned DATE
(transformed into a date format) as the index and the atm assignment
ATM
as the key with the only other column Cash
left as the data for the tsibble atm
. After graphing the
data and viewing it there was a bit of data exploration to do, namely,
there were some rows missing the atm assignment in the ATM
column. These rows were also missing data for the Cash
columns and the dates seemed to be at the end of the series but after
the end date for the data of each atm so these rows were removed. Then
there were a handful of rows missing Cash
data for ATM1 and
ATM2 so these were given the average Cash
value for their
respective machines. The average was chosen since it will not likely
cause adverse effects on the prediction models while also not trying to
guess the actual value. There was one outlier visible for ATM4, which is
for the date 2010-02-09 with a Cash
value of 10920. Since
the data for these machines is in hundreds of dollars that means,
according to the data, this machine dispensed $1,092,000. After a
preliminary google search most ATM’s have a limited capacity of $800,000
this means that the ATM would have to be refilled at some point
throughout the day. While this is not impossible it seems unlikely since
February 9th is not a major holiday in the USA and it was a Tuesday.
Given these and the fact that the data point ends in a zero it is most
likely a data error and was corrected to be 10920 which falls in line
with the rest of the data. Now that the data has been pre-processed it
was ready for testing of forecasting models.
#reading in ATM data
atm <- read.csv('ATM624Data.csv')
colnames(atm)[1] <- "DATE"
#transforming raw data into a tsibble
atm <- atm |>
mutate(DATE = as.Date(as_datetime(DATE, format = "%m/%d/%Y %H:%M"))) |>
as_tsibble(index = DATE, key = ATM)
#viewing the data
atm |> autoplot() +
facet_wrap(~ATM, scale = "free_y") +
ggtitle("ATM Data")
#checking to see number of entries with missing atm value in ATM column
atm |>
filter(ATM == "")
## # A tsibble: 14 x 3 [1D]
## # Key: ATM [1]
## DATE ATM Cash
## <date> <chr> <int>
## 1 2010-05-01 "" NA
## 2 2010-05-02 "" NA
## 3 2010-05-03 "" NA
## 4 2010-05-04 "" NA
## 5 2010-05-05 "" NA
## 6 2010-05-06 "" NA
## 7 2010-05-07 "" NA
## 8 2010-05-08 "" NA
## 9 2010-05-09 "" NA
## 10 2010-05-10 "" NA
## 11 2010-05-11 "" NA
## 12 2010-05-12 "" NA
## 13 2010-05-13 "" NA
## 14 2010-05-14 "" NA
#removing rows without ATM value
atm <- atm |>
filter(ATM != "")
#viewing rows without cash value
atm |>
filter(is.na(Cash)) |>
print()
## # A tsibble: 5 x 3 [1D]
## # Key: ATM [2]
## DATE ATM Cash
## <date> <chr> <int>
## 1 2009-06-13 ATM1 NA
## 2 2009-06-16 ATM1 NA
## 3 2009-06-22 ATM1 NA
## 4 2009-06-18 ATM2 NA
## 5 2009-06-24 ATM2 NA
#viewing the top 10 highest cash days over all the atms
atm |>
arrange(desc(Cash)) |>
head(10) |>
print()
## # A tsibble: 10 x 3 [1D]
## # Key: ATM [1]
## DATE ATM Cash
## <date> <chr> <int>
## 1 2010-02-09 ATM4 10920
## 2 2009-09-22 ATM4 1712
## 3 2010-01-29 ATM4 1575
## 4 2009-09-04 ATM4 1495
## 5 2009-06-09 ATM4 1484
## 6 2009-09-05 ATM4 1301
## 7 2010-02-28 ATM4 1276
## 8 2009-08-23 ATM4 1246
## 9 2009-06-14 ATM4 1221
## 10 2009-09-29 ATM4 1195
#finding the mean value for each atm
means <- atm |>
as_tibble() |>
select(ATM, Cash) |>
group_by(ATM) |>
dplyr::summarize(mean = mean(Cash, na.rm = TRUE)) |>
print()
## # A tibble: 4 × 2
## ATM mean
## <chr> <dbl>
## 1 ATM1 83.9
## 2 ATM2 62.6
## 3 ATM3 0.721
## 4 ATM4 474.
#data clean-up for missing values using mean for the atm and outliers
#with data errors being corrected
atm <- atm |>
mutate(Cash = case_when(
is.na(Cash) & ATM == "ATM1" ~ means$mean[means$ATM == "ATM1"],
is.na(Cash) & ATM == "ATM2" ~ means$mean[means$ATM == "ATM2"],
Cash == 10920 ~ 1092,
.default = Cash)
)
#viewing all the data now adjusted
atm |> autoplot() +
facet_wrap(~ATM, scale = "free_y") +
ggtitle("ATM Adjusted Data")
To see if this data was viable for ARIMA it was tested to see if it
could be transformed to be stationary. The first check, the function
unitroot_kpss
, returned that only ATM4 was stationary.
After a box-cox lambda (using the guerrero method) produced values not
appreciably close to 1 it seemed necessary to attempt a box-cox
transform for each atm. Then tested again with
unitroot_kpss
, while the values were closer to the pass
value of 0.1 differencing was still needed. After differencing all the
values once the unitroot_kpss
values were all 0.1 or close
enough to justify the use of ARIMA.
#Testing initial data
atm |>
features(Cash, unitroot_kpss) |>
print()
## # A tibble: 4 × 3
## ATM kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 ATM1 0.452 0.0547
## 2 ATM2 2.15 0.01
## 3 ATM3 0.389 0.0818
## 4 ATM4 0.105 0.1
#calculating lambda for all the atm data
lambda <- atm |>
features(Cash, features = guerrero) |>
pull(lambda_guerrero) |>
print()
## [1] 0.2365721 0.7240522 1.3654211 0.4638517
#applying the Box-Cox transformation to all the atm data
bc_atm <- atm |>
mutate(Cash = case_when(
ATM == "ATM1" ~ box_cox(atm$Cash, lambda[1]),
ATM == "ATM2" ~ box_cox(atm$Cash, lambda[2]),
ATM == "ATM3" ~ box_cox(atm$Cash, lambda[3]),
ATM == "ATM4" ~ box_cox(atm$Cash, lambda[4]))
)
#Testing Box-Cox transformed data
bc_atm |>
features(Cash, unitroot_kpss) |>
print()
## # A tibble: 4 × 3
## ATM kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 ATM1 0.409 0.0733
## 2 ATM2 2.14 0.01
## 3 ATM3 0.389 0.0817
## 4 ATM4 0.0737 0.1
#Finding suggested difference
bc_atm |>
features(Cash, unitroot_ndiffs) |>
print()
## # A tibble: 4 × 2
## ATM ndiffs
## <chr> <int>
## 1 ATM1 0
## 2 ATM2 1
## 3 ATM3 0
## 4 ATM4 0
#applying difference of one found above
diff_bc_atm <- bc_atm |>
mutate(Cash = difference(Cash))
#Testing Box-Cox Differenced data
diff_bc_atm |>
features(Cash, unitroot_kpss) |>
print()
## # A tibble: 4 × 3
## ATM kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 ATM1 0.00876 0.1
## 2 ATM2 0.0129 0.1
## 3 ATM3 0.348 0.0997
## 4 ATM4 0.260 0.1
Now to test the models the training data is limited to the first 75%
of the data. STL
is applied to the data to see the
components, seasonality, trend, and remainder. Then the
ARIMA
and ETS
functions run wide open give a
model for each set of ATMs. Then these models suggested for each ATM is
applied to the whole set (this is under the assumption that the same
model has to be used for all the ATMs) to ensure it’s the same for all
the data. In addition SNAIVE
is also used as a basic
fail-safe.
#selecting about 75% of the data for training
train <- atm |>
filter_index("2009-05-01" ~ "2010-01-31")
#STL decomp to see each of the components
train |>
model(stl = STL(Cash)) |>
components() |>
autoplot() +
ggtitle("ATM Training Data STL Decomp")
#testing ARIMA and ETS with the training data
train |>
model(
ARIMA = ARIMA(Cash),
ETS = ETS(Cash)
) |>
print()
## # A mable: 4 x 3
## # Key: ATM [4]
## ATM ARIMA ETS
## <chr> <model> <model>
## 1 ATM1 <ARIMA(0,0,3)(0,1,1)[7]> <ETS(M,N,M)>
## 2 ATM2 <ARIMA(0,0,0)(1,1,1)[7] w/ drift> <ETS(A,N,A)>
## 3 ATM3 <ARIMA(0,0,0)(0,0,1)[7]> <ETS(A,N,N)>
## 4 ATM4 <ARIMA(0,0,0)(1,0,0)[7] w/ mean> <ETS(M,N,M)>
#testing a bunch of models with the test data
train_atm <- train |>
model(
SNAIVE = SNAIVE(Cash, type = "additive"),
`ETS(A,N,A)` = ETS(Cash ~ error("A") + trend("N") + season("A")),
`ETS(A,A,A)` = ETS(Cash ~ error("A") + trend("A") + season("A")),
`ETS(M,N,M)` = ETS(Cash ~ error("M") + trend("N") + season("M")),
ARIMA1 = ARIMA(Cash ~ pdq(0,0,3) + PDQ(0,1,1)),
ARIMA2 = ARIMA(Cash ~ 1 + pdq(0,0,0) + PDQ(1,1,1)),
ARIMA3 = ARIMA(Cash ~ pdq(0,0,0) + PDQ(1,0,0))
) |>
print() |>
forecast(h = 90)
## # A mable: 4 x 8
## # Key: ATM [4]
## ATM SNAIVE `ETS(A,N,A)` `ETS(A,A,A)` `ETS(M,N,M)`
## <chr> <model> <model> <model> <model>
## 1 ATM1 <SNAIVE> <ETS(A,N,A)> <ETS(A,A,A)> <ETS(M,N,M)>
## 2 ATM2 <SNAIVE> <ETS(A,N,A)> <ETS(A,A,A)> <ETS(M,N,M)>
## 3 ATM3 <SNAIVE> <ETS(A,N,A)> <ETS(A,A,A)> <NULL model>
## 4 ATM4 <SNAIVE> <ETS(A,N,A)> <ETS(A,A,A)> <ETS(M,N,M)>
## # ℹ 3 more variables: ARIMA1 <model>, ARIMA2 <model>, ARIMA3 <model>
After running all the selected models on the training data some
returned as NULL models, mostly for ATM3 since most of that data is
Cash
=0. After filtering out these models the
accuracy
function is used to view all the error
calculations to see how well the forecasts match up to the actual rest
of the data. While each does best for a different ATM
ETS(A,N,A)
seems to do the best overall when taking all the
ATMs into account across all error measurements.
# averaging RMSE per model
train_atm |>
accuracy(atm) |>
select(.model, ATM, RMSE) |>
filter(.model != "ETS(M,N,M)" & .model != "ARIMA1" & .model != "ARIMA2"
& .model != "ARIMA3") |>
arrange(ATM) |>
arrange(RMSE) |>
print()
## # A tibble: 12 × 3
## .model ATM RMSE
## <chr> <chr> <dbl>
## 1 ETS(A,A,A) ATM3 16.1
## 2 ETS(A,N,A) ATM3 16.1
## 3 SNAIVE ATM3 16.1
## 4 SNAIVE ATM1 49.1
## 5 ETS(A,N,A) ATM1 49.9
## 6 ETS(A,A,A) ATM1 50.2
## 7 SNAIVE ATM2 58.6
## 8 ETS(A,A,A) ATM2 59.5
## 9 ETS(A,N,A) ATM2 59.6
## 10 ETS(A,N,A) ATM4 377.
## 11 ETS(A,A,A) ATM4 377.
## 12 SNAIVE ATM4 605.
# averaging the error per model
train_atm |>
accuracy(atm) |>
select(!ATM & !.type) |>
filter(.model != "ETS(M,N,M)" & .model != "ARIMA1" & .model != "ARIMA2"
& .model != "ARIMA3") |>
group_by(.model) |>
summarize(across(everything(), mean)) |>
arrange(RMSE) |>
print()
## # A tibble: 3 × 9
## .model ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS(A,N,A) -6.12 126. 100. -Inf Inf Inf Inf 0.188
## 2 ETS(A,A,A) -9.78 126. 101. -Inf Inf Inf Inf 0.193
## 3 SNAIVE -48.6 182. 149. -Inf Inf Inf Inf 0.0572
While the forecast doesn’t match exactly for most of the ATMs it approximates pretty closely.
#plotting the training model ETS(A,N,A) vs the actual ATM data
train_atm |>
filter(.model == "ETS(A,N,A)") |>
autoplot(atm) +
ggtitle("ATM actual vs ETS(A,N,A)")
Now that the model has been selected it is just a matter of applying it to the whole data set and forecasting out 31 days to get the month of May in 2010. Then writing the forecast to a csv.
#produces forecast for May 2010 using ETS(A,N,A)
atm_forecast <- atm |>
model(ETS = ETS(Cash ~ error("A") + trend("N") + season("A"))) |>
forecast(h = 31)
#graphs forecast above
atm_forecast |>
autoplot(atm) +
ggtitle("ATM ETS(A,N,A) Forecast May 2010")
#saves forecast to separate file
atm_forecast |>
as_tibble() |>
select(DATE, ATM, .mean) |>
rename(Cash = .mean) |>
print() |>
write.csv("atmforecast.csv")
## # A tibble: 124 × 3
## DATE ATM Cash
## <date> <chr> <dbl>
## 1 2010-05-01 ATM1 87.1
## 2 2010-05-02 ATM1 101.
## 3 2010-05-03 ATM1 73.1
## 4 2010-05-04 ATM1 5.84
## 5 2010-05-05 ATM1 100.
## 6 2010-05-06 ATM1 79.5
## 7 2010-05-07 ATM1 85.7
## 8 2010-05-08 ATM1 87.1
## 9 2010-05-09 ATM1 101.
## 10 2010-05-10 ATM1 73.1
## # ℹ 114 more rows
Same as before the data must be pre-processed before being fed into
the forecasting model. So it was first put into a tsibble and
autoplot()
and there is very clearly a missing data point
and an outlier in this data. After filtering to the missing data point
there is only one so it was replaced by the mean for the data. The
outlier is view and it looks to be very similar to the atm data in that
the outlier appeared to have an extra digit, this data seemed to be
missing a digit. Especially, since it is power data, which is extremely
periodic and most likely wouldn’t have such a major dip unless there was
a long-term outage. Since the data is residential and in the millions of
KWH it has to be a number of residences so unless everyone went on
vacation at the same time and didn’t use as much electricity or there
was a natural disaster it is unlikely to be an accurate data point.
After the data was mutated the outlier fit much better into the
seasonality and the data seemed to be ready for testing.
#reading in data
power <- read.csv('ResidentialCustomerForecastLoad-624.csv')
#transforming data into a tsibble
power <- power |>
mutate(DATE = yearmonth(YYYY.MMM)) |>
select(DATE, KWH) |>
as_tsibble(index = DATE)
#plotting data for first look
power |> autoplot()
#viewing all missing data
power |>
filter(is.na(KWH)) |>
print()
## # A tsibble: 1 x 2 [1M]
## DATE KWH
## <mth> <int>
## 1 2008 Sep NA
#looking at the outliers for the data
power |>
ggplot() +
geom_boxplot(aes(KWH))
#mutating so missing data is replaced with average and outlier is corrected
power <- power |>
mutate(KWH = case_when(
is.na(KWH) ~ mean(power$KWH, na.rm = TRUE),
KWH == 770523 ~ 7705230,
.default = KWH)
)
#plotting data after final transformations
power |> autoplot()
The same process used for the ATM data was used here with the same
results. After testing the initial data with unitroot_kpss
it needed transformations to ensure ARIMA could be used. The lambda
found was very far from 1 making it a good candidate for Box-Cox
transformation. After the Box-Cox was applied the function
unitroot_ndiffs
indicated that one difference function
application would ensure that the data would be a good candidate for
ARIMA. After applying one difference function to KWH
the
data passed unitroot_kpss
, returning a value of 0.1.
#Testing initial data
power |>
features(KWH, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 1.65 0.01
#calculating lambda for all the power data
lambda <- power |>
features(KWH, features = guerrero) |>
pull(lambda_guerrero) |>
print()
## [1] -0.2094801
#applying the Box-Cox transformation to all the power data
bc_power <- power |>
mutate(KWH = box_cox(KWH, lambda))
#Testing Box-Cox transformed data
bc_power |>
features(KWH, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 1.64 0.01
#Finding suggested difference
bc_power |>
features(KWH, unitroot_ndiffs) |>
print()
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
#applying difference of one found above
diff_bc_power<- bc_power |>
mutate(KWH = difference(KWH))
#Testing Box-Cox Differenced data
diff_bc_power |>
features(KWH, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0412 0.1
A subset of the data was selected to test some models with some training data to see the viability of each model. The models ARIMA (with a box-cox transformation found above applied), ETS and SNAIVE were chosen and run open so they could test and select the best parameters. Then a forecast for the rest of the actual data was created.
#selecting about 75% of the data for training
train <- power |>
filter_index("1998-Jan" ~ "2009-Dec")
#STL decomp to see each of the components
train |>
model(stl = STL(KWH)) |>
components() |>
autoplot() +
ggtitle("power Training Data STL Decomp")
#testing a bunch of models with the test data and forecasting them
train_power <- train |>
model(
SNAIVE = SNAIVE(KWH, type = "additive"),
ETS = ETS(KWH),
ARIMA = ARIMA(box_cox(KWH, lambda))
) |>
print() |>
forecast(h = 48)
## # A mable: 1 x 3
## SNAIVE ETS ARIMA
## <model> <model> <model>
## 1 <SNAIVE> <ETS(M,N,A)> <ARIMA(0,0,1)(0,1,1)[12] w/ drift>
The training data forecast was then tested against the actual data
using the accuracy
function to see which model had the
lowest error. After looking ARIMA did the best by far, having the lowest
error for every type of error across the board.
#arranging data by RMSE to see which is the best model according to training
train_power |>
accuracy(power) |>
arrange(RMSE) |>
print()
## # A tibble: 3 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA Test 738027. 1065875. 792335. 9.00 9.90 1.41 1.49 0.263
## 2 SNAIVE Test 772956. 1144846. 888024. 9.25 11.2 1.58 1.60 0.254
## 3 ETS Test 972863. 1304574. 1000636. 12.0 12.5 1.78 1.83 0.297
Just to confirm that ARIMA was the right fit the residuals for the model were plotted to ensure they were normally distributed and that they looked to be like white noise which they did. The forecast matches extremely well to the actual data and produces a lot of confidence for the model.
#plotting residuals for arima function to see if it is valid for this dataset
power |>
model(ARIMA(box_cox(KWH, lambda) ~ pdq(0,0,3) + PDQ(0,1,1))) |>
gg_tsresiduals()
#plotting the training model ARIMA vs the actual power data
train_power |>
filter(.model == "ARIMA") |>
autoplot(power) +
ggtitle("power actual vs ARIMA")
After ARIMA was selected it was used to forecast the actual data with the parameters produced by the function running on the training data. It forecasted the monthly data for 2014 and then exported into a csv.
#produces forecast for 2014 using ARIMA
power_forecast <- power |>
model(ARIMA(box_cox(KWH, lambda) ~ pdq(0,0,3) + PDQ(0,1,1))) |>
print() |>
forecast(h = 12)
## # A mable: 1 x 1
## `ARIMA(box_cox(KWH, lambda) ~ pdq(0, 0, 3) + PDQ(0, 1, 1))`
## <model>
## 1 <ARIMA(0,0,3)(0,1,1)[12] w/ drift>
#graphs forecast above
power_forecast |>
autoplot(power) +
ggtitle("power ARIMA Forecast May 2010")
#saves forecast to separate file
power_forecast |>
as_tibble() |>
select(DATE, .mean) |>
rename(KWH = .mean) |>
print() |>
write.csv("powerforecast.csv")
## # A tibble: 12 × 2
## DATE KWH
## <mth> <dbl>
## 1 2014 Jan 10655013.
## 2 2014 Feb 8549197.
## 3 2014 Mar 7136705.
## 4 2014 Apr 5996602.
## 5 2014 May 5756337.
## 6 2014 Jun 7696568.
## 7 2014 Jul 8922429.
## 8 2014 Aug 9497618.
## 9 2014 Sep 8297594.
## 10 2014 Oct 5974682.
## 11 2014 Nov 5753939.
## 12 2014 Dec 7891127.
water1 <- read.csv('Waterflow_Pipe1.csv')
water2 <- read.csv('Waterflow_Pipe2.csv')
tswater1 <- water1 |>
mutate(Date.Time =
floor_date(
as_datetime(Date.Time, format = "%m/%d/%y %H:%M %Op"),
unit = "hour")
) |>
rbind(c(as_datetime("10/27/15 5:00 PM", format = "%m/%d/%y %H:%M %Op"), 0)) |>
rbind(c(as_datetime("10/28/15 12:00 AM", format = "%m/%d/%y %H:%M %Op"), 0)) |>
rbind(c(as_datetime("11/01/15 5:00 AM", format = "%m/%d/%y %H:%M %Op"), 0)) |>
rbind(c(as_datetime("11/01/15 9:00 AM", format = "%m/%d/%y %H:%M %Op"), 0)) |>
group_by(Date.Time) |>
summarize(WaterFlow = mean(WaterFlow, rm.na = TRUE)) |>
as_tsibble(index = Date.Time)
tswater1 |>
autoplot()
tswater2 <- water2 |>
mutate(Date.Time =
floor_date(
as_datetime(Date.Time, format = "%m/%d/%y %H:%M %Op"),
unit = "hour")
) |>
as_tsibble(index = Date.Time)
tswater2 |>
autoplot()
There is an easy method to test whether a data set is stationary and
that is to use the unitroot_kpss
function to get the
kpss_pvalue
. Given that the value here is 0.1 for both data
sets it can be assumed that they are both stationary.
#Testing if data is stationary
tswater1 |>
features(WaterFlow, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0915 0.1
#Testing if data is stationary
tswater2 |>
features(WaterFlow, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.105 0.1
Since these data sets are stationary the best bet is to use ARIMA to determine the forecasts. Unfortunately here ARIMA doesn’t do well and I think it’s because the data here is too much like white noise. It is hard to forecast white noise so the best ARIMA model it is able to produce is pdq = 0,0,0.
arimawater1 <- tswater1 |>
model(ARIMA(WaterFlow ~ PDQ(0,0,0))) |>
print()
## # A mable: 1 x 1
## `ARIMA(WaterFlow ~ PDQ(0, 0, 0))`
## <model>
## 1 <ARIMA(0,0,0) w/ mean>
arimawater2 <- tswater2 |>
model(ARIMA(WaterFlow ~ PDQ(0,0,0))) |>
print()
## # A mable: 1 x 1
## `ARIMA(WaterFlow ~ PDQ(0, 0, 0))`
## <model>
## 1 <ARIMA(0,0,0) w/ mean>
arimawater1 |>
gg_tsresiduals()
arimawater2 |>
gg_tsresiduals()
waterforecast1 <- arimawater1 |>
forecast(h = 24*7)
waterforecast2 <- arimawater2 |>
forecast(h = 24*7)
waterforecast1 |>
autoplot(tswater1)
waterforecast2 |>
autoplot(tswater2)
waterforecast1 |>
as_tibble() |>
select(Date.Time, .mean) |>
rename(WaterFlow = .mean) |>
write.csv("water1.csv")
waterforecast2 |>
as_tibble() |>
select(Date.Time, .mean) |>
rename(WaterFlow = .mean) |>
write.csv("water2.csv")
P.S. I don’t know if the forecast produced by the ARIMA function for
the water flow is what you were looking for but I do remember very early
on in the class you saying that there are some data sets out there where
the MEAN
function is the best forecast out there so I’m
sticking with what I got even if what I got doesn’t look like much of a
forecast.