Project 1 Description



This project consists of 3 parts - two required and one bonus and is worth 15% of your grade. The project is due at 11:59 PM on Sunday March 31. I will accept late submissions with a penalty until the meetup after that when we review some projects.

library(tidyverse)
library(scales)
library(readxl)
library(forecast)
library(lubridate)

Part A – ATM Forecast, ATM624Data.xlsx

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
df<-read_excel("/Users/hovig/Downloads/ATM624Data.xlsx")
df<-df %>%
  drop_na() %>%
  spread(ATM, Cash) %>% 
  mutate(DATE = as.Date(DATE, origin = "1899-12-30")) 

atm_ts<-ts(df %>% select(-DATE))
autoplot(atm_ts) +
  labs(title = "Withdrawn Cash From All 4 ATMS", subtitle = "05/09 - 04/10", x = "Day") +
  scale_y_continuous("Cash withdrawn in hundreds", labels = dollar) +
  scale_color_discrete(NULL) +
  theme(legend.position = c(0.1, 0.8))

df %>% gather(atm, Cash, -DATE) %>% 
  ggplot(aes(x = DATE, y = Cash, col = atm)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~ atm, ncol = 1, scales = "free_y") +
  labs(title = "Withdrawn Cash From All 4 ATMs", subtitle = "05/09 - 04/10", x = "Date") +
  scale_y_continuous("Cash withdrawn in hundreds", labels = dollar)

As explained above, each of the ATMs behaves differently, so each will be forecasted separately using the following approach based on the previously mentioned observations:

Separate timeseries objects are created to perform these forecasts:

atm1<-atm_ts[, "ATM1"]
atm2<-atm_ts[, "ATM2"]
atm3<-atm_ts[(nrow(atm_ts) - 2):nrow(atm_ts), "ATM3"]
atm3<-ts(atm3, start = 363)
atm3<-atm_ts[, "ATM3"]
atm3[which(atm3 == 0)]<-NA
atm4<-atm_ts[, "ATM4"]
atm4[which.max(atm4)]<-median(atm4, na.rm = TRUE)
autoplot(ts(atm_ts[1:61, ])) +
  labs(title = "Withdrawn Cash From All ATMs", subtitle = "05/09 - 06/10", x = "Day") +
  scale_y_continuous("Withdrawn Cash in hundreds", labels = dollar) +
  scale_color_discrete(NULL)

atm1<-ts(atm1, frequency = 7)
atm2<-ts(atm2, frequency = 7)
atm2[which(is.na(atm2))]<-median(atm2, na.rm = TRUE)
atm4<-ts(atm4, frequency = 7)
ATM 1
  • The ATM1 timeseries is displayed below with its ACF & spectrum plots
ggtsdisplay(atm1, points = FALSE, main = "Withdrawals from ATM1", xlab = "Week", ylab = "Cash in hundreds")

  • The ACF and spectrum plots show a very clear weekly seasonality
  • There are large spikes in the ACF lags 7, 14, and 21 as well as large spikes in the spectrum plot at frequencies 1, 2, and 3.
  • Both of these suggest a seasonal ARIMA model.
  • To account for the above-identified autocorrelation, the timeseries is differenced with a lag of 7
ggtsdisplay(diff(atm1, 7), points = FALSE,  main = "Differenced (lag-7) withdrawals from ATM1", xlab = "Week", ylab = "Cash in hundreds")

  • The timeseries appears stationary, so no non-seasonal differencing is suggested by the data.
  • The significant spikes in the ACF and PACF at \(k = 1\) suggest non-seasonal AR(1) and/or MA(1) components of the model.
  • The spikes in the ACF and PACF at \(k = 7\) followed by decreasing spikes at \(k = 14\) and \(k = 21\) suggest seasonal AR(1) and/or seasonal MA(1) components.
  • This suggests 15 possible models: ARIMA(p, 0, q)(P, 1, Q) for \(p, q, P, Q \in [0, 1]\) excluding the case where \(p, q, P, Q = 0\)
atm1_lambda<-BoxCox.lambda(atm1)
  • define function to create models & return AIC values for timeseries
  • create model with Box-Cox and specified ARIMA parameters; extract AIC
atm_aic<-function(p, d, q, P, D, Q) {
  AIC(Arima(atm1, order = c(p, d, q), seasonal = c(P, D, Q), lambda = atm1_lambda))
}
  • We will create possible combinations of p, q, P, Q except all zero
expand.grid(p = 0:1, q = 0:1, P = 0:1, Q = 0:1) %>%
  filter(p > 0 | q > 0 | P > 0 | Q > 0) %>% 
  mutate(aic = pmap_dbl(list(p, 0, q, P, 1, Q), atm_aic)) %>% 
  slice(which.min(aic))
##   p q P Q      aic
## 1 1 1 0 1 1096.583
  • The minimum AIC value is for non-seasonal AR(1) & MA(1) and seasonal AR(0) & MA(1). The model used is ARIMA(1,0,1)(0,1,1)
atm1_fit<-Arima(atm1, order = c(1, 0, 1), seasonal = c(0, 1, 1), lambda = atm1_lambda)
  • The residuals are investigated using a Ljung-Box test and diagnostic plotting:
Box.test(resid(atm1_fit), type = "L", fitdf = 3, lag = 7)
## 
##  Box-Ljung test
## 
## data:  resid(atm1_fit)
## X-squared = 8.0185, df = 4, p-value = 0.0909
ggtsdisplay(resid(atm1_fit), points = FALSE, plot.type = "histogram", main = "Residuals for ARIMA(1,0,1)(0,1,1) fit of ATM1 withdrawals", xlab = "Week", ylab = "Residual")
## Warning: Removed 3 rows containing non-finite values (stat_bin).

  • The Ljung-Box test returns a p-value > 0.05, suggesting that the residuals may be white noise.
  • The residuals appear to be approximately normally distributed with a mean around zero.
  • They do not appear to be autocorrelated, but there is an almost-significant spike at \(k=6\).
  • This model is acceptable and will be used for forecasting.
ATM 2
  • The same procedure is repeated for ATM2
ggtsdisplay(atm2, points = FALSE, main = "Withdrawals from ATM2", xlab = "Week", ylab = "Cash in hundreds")

  • The same weekly seasonality is seen as for ATM1 where it is also differenced with lag = 7
ggtsdisplay(diff(atm2, 7), points = FALSE, main = "Differenced (lag-7) withdrawals from ATM2", xlab = "Week", ylab = "Cash in hundreds")

  • As above, the large spike at \(k=7\) suggests \(D = 1\), while the stationary nature of the timeseries suggests \(d = 0\).
  • The spikes in ACF & PACF in the non-differenced series at \(k = 2\) & \(k = 5\) suggest \(p, q \in [0, 2, 5]\).
  • The spikes in the ACF & PACF of the differenced series do not strongly suggest any need for seasonal AR or MA elements, but since the values at \(k = 1\) are followed by decreasing values, \(P, Q \in [0,1]\) are also investigated.
  • Each of the above mentioned models are investigated using the function created above
atm2_lambda<-BoxCox.lambda(atm2)
atm_aic<-function(p, d, q, P, D, Q) {
  AIC(Arima(atm2, order = c(p, d, q), seasonal = c(P, D, Q), lambda = atm2_lambda))
}
  • We will get optimal lambda for Box-cox transformation
  • We will create model with Box-Cox and specified ARIMA parameters and extract AIC
  • We will create possible combinations of p, q, P, Q except all zeros
expand.grid(p = c(0, 2, 5), q = c(0, 2, 5), P = 0:1, Q = 0:1) %>%
  filter(p > 0 | q > 0 | P > 0 | Q > 0) %>% 
  mutate(aic = pmap_dbl(list(p, 0, q, P, 1, Q), atm_aic)) %>% 
  slice(which.min(aic))
##   p q P Q      aic
## 1 5 5 0 1 2544.791
  • The minimum AIC value is for non-seasonal AR(5) & MA(5) and seasonal AR(0) & MA(1).
  • ARIMA(5,0,5)(0,1,1) is the model used
atm2_fit<-Arima(atm2, order = c(5, 0, 5), seasonal = c(0, 1, 1), lambda = atm2_lambda)
  • The residuals are investigated using a Ljung-Box test and diagnostic plotting
Box.test(resid(atm2_fit), type = "L", fitdf = 11, lag = 14)
## 
##  Box-Ljung test
## 
## data:  resid(atm2_fit)
## X-squared = 4.6959, df = 3, p-value = 0.1955
ggtsdisplay(resid(atm2_fit), points = FALSE, plot.type = "histogram", main = "Residuals for ARIMA(5,0,5)(0,1,1) of ATM2 withdrawals", xlab = "Week", ylab = "Residual")

  • The Ljung-Box test, when using lag = 14 due to the high number of parameters in the fit, returns a p-value >> 0.05. This suggests that the residuals may be white noise.
  • The residuals appear to be approximately normally distributed with a mean around zero.
  • This model is acceptable and will be used for forecasting.
ATM 3
  • As mentioned above, there are only 3 observations at ATM3 and only these observations are used for the forecast.
  • A simple mean forecast will be used for this ATM.
ATM 4
  • Finally, the procedure used for ATM1 & ATM2 is repeated for ATM4
ggtsdisplay(atm4, points = FALSE, main = "Withdrawals from ATM4", xlab = "Week", ylab = "Cash in hundreds")

  • The same weekly seasonality is seen as for ATM1 & ATM2 and it is also differenced with lag = 7
ggtsdisplay(diff(atm4, 7), points = FALSE, main = "Differenced (lag-7) withdrawals from ATM4", xlab = "Week", ylab = "Cash in hundreds")

  • Again, the stationary timeseries with a large spike at \(k=7\) suggests \(D = 1\) and \(d = 0\).
  • Similar spikes in the ACF & PACF of both the original and differenced timeseries as ATM2 suggest \(p, q \in [0, 2, 5]\) and \(P, Q \in [0,1]\) (though the evidence for seasonal AR and/or MA components are stronger in this case).
  • The code from above is reused to investigate the same possible models for ATM4
atm4_lambda<-BoxCox.lambda(atm4)
atm_aic<-function(p, d, q, P, D, Q) {
  AIC(Arima(atm4, order = c(p, d, q), seasonal = c(P, D, Q), lambda = atm4_lambda))
}
  • We will get optimal lambda for Box-cox transformation
  • We will create model with Box-Cox and specified ARIMA parameters; extract AIC
  • We will create possible combinations of p, q, P, Q except all zeros
expand.grid(p = c(0, 2, 5), q = c(0, 2, 5), P = 0:1, Q = 0:1) %>%
  filter(p > 0 | q > 0 | P > 0 | Q > 0) %>% 
  mutate(aic = pmap_dbl(list(p, 0, q, P, 1, Q), atm_aic)) %>% 
  slice(which.min(aic))
##   p q P Q      aic
## 1 5 2 0 1 2863.061
  • The minimum AIC value is for non-seasonal AR(0) & MA(2) and seasonal AR(0) & MA(1).
  • ARIMA(0,0,2)(0,1,1) will be the model used here
atm4_fit<-Arima(atm4, order = c(0, 0, 2), seasonal = c(0, 1, 1), lambda = atm4_lambda)

The residuals are investigated using a Ljung-Box test and diagnostic plotting:

Box.test(resid(atm4_fit), type = "L", fitdf = 3, lag = 7)
## 
##  Box-Ljung test
## 
## data:  resid(atm4_fit)
## X-squared = 8.6819, df = 4, p-value = 0.06956
ggtsdisplay(resid(atm4_fit), points = FALSE, plot.type = "histogram", main = "Residuals for ARIMA(0,0,2)(0,1,1) of ATM4 withdrawals", xlab = "Week", ylab = "Residual")

  • The Ljung-Box test again returns a p-value >> 0.05, with residuals approximately normally distributed with a mean around zero.
  • This model is acceptable and will be used for forecasting.

The four forecasts identified above are performed for May 2010 (31 days):

atm1_forecast<-forecast(atm1_fit, 31, level = 95)
atm2_forecast<-forecast(atm2_fit, 31, level = 95)
atm3_forecast<-meanf(atm3, 31, level = 95)
atm4_forecast<-forecast(atm4_fit, 31, level = 95)

Each of the forecasts are plotted below:

gridExtra::grid.arrange(
  autoplot(atm1_forecast) + 
    labs(title = "ATM1: ARIMA(1,0,1)(0,1,1)", x = "Week", y = NULL) +
    theme(legend.position = "none"),
  autoplot(atm2_forecast) + 
    labs(title = "ATM2: ARIMA(5,0,5)(0,1,1)", x = "Week", y = NULL) +
    theme(legend.position = "none"),
  autoplot(atm3_forecast) + 
    labs(title = "ATM3: mean", x = "Day", y = NULL) +
    theme(legend.position = "none"),
  autoplot(atm4_forecast) + 
    labs(title = "ATM4: ARIMA(0,0,2)(0,1,1)", x = "Week", y = NULL) +
    theme(legend.position = "none"),
  top = grid::textGrob("Forecasted ATM withdrawals (in hundreds of dollars) for 05/10\n")
)
## Warning: Removed 362 rows containing missing values (geom_path).

  • As expected, these values show seasonality for ATMs 1, 2, and 4, with a single value forecast for ATM3.
  • The forecast values are gathered and output to a .csv, which is manually tranferred to Excel for submission
data_frame(DATE = rep(max(df$DATE) + 1:31, 4), ATM = rep(names(df)[-1], each = 31), Cash = c(atm1_forecast$mean, atm2_forecast$mean, atm3_forecast$mean, atm4_forecast$mean)) %>% 
  write_csv("/Users/hovig/Downloads/output_data624_project1_partA.csv")
## Warning: `data_frame()` is deprecated, use `tibble()`.
## This warning is displayed once per session.


Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
kwh<-read_excel("/Users/hovig/Downloads/ResidentialCustomerForecastLoad-624.xlsx")
kwh<-ts(kwh[, "KWH"], start = c(1998, 1), frequency = 12)

The timeseries is plotted to inspect its features:

autoplot(kwh) +
  labs(title = "Monthly Residential Power Usage", subtitle = "01/98 - 12/13")

  • There is a clear seasonality in this data where it appears to be semi-annual with a peak every six months, but it may be annual, as the peaks seem to alternate in their height.
  • There is a very noticeable dip in value in July 2010.
  • It seems that the variance of the series may increase with its level. Therefore a Box-cox transformation is also investigated.
kwh_lambda<-BoxCox.lambda(kwh)
kwh_trans<-BoxCox(kwh, kwh_lambda)

The data transformed using \(\lambda = 0.113\) are plotted below with lag-12 differencing:

ggtsdisplay(diff(kwh_trans, 12), points = FALSE, main = "Differenced (lag-12) Box-Cox transformed residential power usage")

  • The series appears stationary, so no non-seasonal differencing appears necessary.
  • The decaying seasonal spikes in the PACF suggests a seasonal AR(1) component, while the very quickly-decaying seasonal spikes in the ACF suggest the possibility of a seasonal MA(1) component.
  • Spikes in the PACF and ACF at \(k = 1\) and \(k = 4\) suggest non-seasonal AR(1) or AR(4) components, and non-seasonal MA(1) or MA(4) components.
  • The function used to select the model with lowest AIC in Part A is redefined for use on the kWh timeseries with \(D = 1\) and \(d = 0\)
kwh_aic<-function(p, q, P, Q) {
  AIC(Arima(kwh, order = c(p, 0, q), seasonal = c(P, 1, Q), lambda = kwh_lambda))
}
  • We will create a model with Box-Cox and specified ARIMA parameters
expand.grid(p = c(0, 1, 4), q = c(0, 1, 4), P = 0:1, Q = 0:1) %>%
  filter(p > 0 | q > 0 | P > 0 | Q > 0, p < 4 | q < 4 | P < 1 | Q < 1) %>%
  mutate(aic = pmap_dbl(list(p, q, P, Q), kwh_aic)) %>% 
  slice(which.min(aic))
##   p q P Q      aic
## 1 1 0 0 1 575.2659
  • We will create possible combinations except all zeros and p = q = 4; P = Q = 1 (returns error)
  • The minimum AIC value returned is for the ARIMA(1,0,0)(0,1,1) model
kwh_fit<-Arima(kwh, order = c(1, 0, 0), seasonal = c(0, 1, 1), lambda = kwh_lambda)
  • The residuals of this fit are investigated with a Ljung-Box test and diagnostic plotting
Box.test(resid(kwh_fit), type = "L", fitdf = 3, lag = 12)
## 
##  Box-Ljung test
## 
## data:  resid(kwh_fit)
## X-squared = 6.4543, df = 9, p-value = 0.6937
ggtsdisplay(resid(kwh_fit), points = FALSE, main = "Residential Power Usage Residuals for ARIMA(1,0,0)(0,1,1)")

  • The Ljung-Box test returns a p-value >> 0.05, but the spikes in ACF & PACF at \(k = 3\) and \(k = 4\) suggest the possibility of AR(3) or MA(3) components (since the spike at \(k = 4\) was addressed above).
  • Investigation of these does not yield any AIC values lower than that of the above-identified model
expand.grid(p = c(1, 3), q = c(1, 3)) %>%
  mutate(aic = pmap_dbl(list(p, q, 0, 1), kwh_aic))
##   p q      aic
## 1 1 1 575.7090
## 2 3 1 579.7385
## 3 1 3 579.9391
## 4 3 3 581.4511
  • Viewing the residuals of the fit model again with a histogram, the model is acceptable.
  • The residuals appear to be roughly normally distributed around zero (with the exception of the significant dip in July 2010) without any significant autocorrelation
ggtsdisplay(resid(kwh_fit), points = FALSE, plot.type = "histogram", main = "Residuals for ARIMA(1,0,0)(0,1,1) of residential power usage")
## Warning: Removed 1 rows containing non-finite values (stat_bin).

  • Using the ARIMA(1,0,0)(0,1,1) model, the next 12 months is forecasted and this forecast is plotted
kwh_forecast<-forecast(kwh_fit, 12, level = 95)
autoplot(kwh_forecast) + 
    labs(title = "Residential Energy Forecast for 2014", subtitle = "Using ARIMA(1,0,0)(0,1,1)", x = "Month", y = "kWh") +
    theme(legend.position = "none")

  • As expected, the forecast shows annual seasonality while showing some drift due to the non-seasonal autocorrelation.
  • The forecast values are output to a .csv for inclusion in the required Excel submission
data_frame(`YYYY-MMM` = paste0(2014, "-", month.abb), KWH = kwh_forecast$mean) %>% 
  write_csv("/Users/hovig/Downloads/output_data624_project1_partB.csv")


Part C – BONUS, optional (part or all), Waterflow_Pipe1.xlsx and Waterflow_Pipe2.xlsx

Part C consists of two data sets. These are simple 2 columns sets, however they have different time stamps. Your optional assignment is to time-base sequence the data and aggregate based on hour (example of what this looks like, follows). Note for multiple recordings within an hour, take the mean. Then to determine if the data is stationary and can it be forecast. If so, provide a week forward forecast and present results via Rpubs and .rmd and the forecast in an Excel readable file.

Prior to loading in the provided data for the two pipelines, the following changes are made in both files to make the data more easily readable by R:

  • Cell A1 is renamed from “Date Time” to “DateTime”
  • The format of column A is changed to yyyy-mm-dd hh:mm
  • The format of column A is changed to a number with 13 decimal places
water1<-read_excel("/Users/hovig/Downloads/Waterflow_Pipe1.xlsx",col_types =c("date", "numeric"))
water2<-read_excel("/Users/hovig/Downloads/Waterflow_Pipe2.xlsx",col_types =c("date", "numeric"))

colnames(water1)= c("DateTime","WaterFlow")
colnames(water2)= c("DateTime","WaterFlow")
  • Both sets of readings have the same number of observations and start on the same date (10/23/2015), but end on different dates and have different timestamps
  • Pipeline 2 has readings at the end of every hour through 12/3/2105, while pipeline 1 has readings in the middle of hours and sometimes more than once per hour through 11/1/2015.
  • In order to use the two series together, the readings for pipeline 1 must be converted to hourly
water1<-water1 %>% 
  mutate(Date = date(DateTime),
         Hour = hour(DateTime) + 1) %>% 
  group_by(Date, Hour) %>% 
  summarize(WaterFlow = mean(WaterFlow)) %>% 
  ungroup() %>%
  mutate(DateTime = ymd_h(paste(Date, Hour))) %>% 
  select(DateTime, WaterFlow)

water1
## # A tibble: 236 x 2
##    DateTime            WaterFlow
##    <dttm>                  <dbl>
##  1 2015-10-23 01:00:00      26.1
##  2 2015-10-23 02:00:00      18.9
##  3 2015-10-23 03:00:00      15.2
##  4 2015-10-23 04:00:00      23.1
##  5 2015-10-23 05:00:00      15.5
##  6 2015-10-23 06:00:00      22.7
##  7 2015-10-23 07:00:00      20.6
##  8 2015-10-23 08:00:00      18.4
##  9 2015-10-23 09:00:00      20.8
## 10 2015-10-23 10:00:00      21.2
## # … with 226 more rows
  • The dataset is separated by date and hours
  • Convert hour-to-hour ending to match pipeline 2
  • Get an average reading for each date and hour
  • Convert back to DateTime and drop separate date/hour columns
  • Noted is that only observations for pipeline 1 in 236 of the 1000 hours with observations for pipeline 2.
water_df<-full_join(water1, water2, by = "DateTime", suffix = c("_1", "_2")) %>% 
  mutate(WaterFlow_1 = ifelse(is.na(WaterFlow_1), 0, WaterFlow_1)) %>% 
  mutate(WaterFlow = WaterFlow_1 + WaterFlow_2) %>% 
  select(DateTime, WaterFlow)

water_ts<-ts(water_df$WaterFlow, frequency = 24)
  • The two datasets are joined and then converted to a timeseries with both observations for each hour
  • Convert missing pipeline 1 readings to zero and get total waterflow by hour through mutations
  • Drop individual numbers and create hourly timeseries object

Plot timeseries:

autoplot(water_ts) +
  labs(title = "Two Pipelines Hourly Water Flow", subtitle = "[October 23, 2015 - December 3, 2015]", x = "Day", y = "Total Waterflow")

  • This plot shows a decent amount of variability across the whole range, with an initial downward trend before day 10 followed by a roughly flat period through the end of the time window.

  • The variance seems roughly constant, but a Box-Cox transformation is performed nonetheless.
  • Due to the apparent non-stationarity, a lag-1 difference is taken

water_lambda<-BoxCox.lambda(water_ts)
water_trans<-BoxCox(water_ts, water_lambda)
ggtsdisplay(diff(water_trans), points = FALSE, main = "Transformed WaterFlow Differenced Box-Cox")

  • Get Box-cox paramater and transform
  • Plot differenced transformed series
  • This timeseries appears to be stationary but shows significant spikes in the ACF and PACF at \(k = 1\), strongly suggesting non-seasonal AR(1) and MA(1) components.
  • There is no apparent seasonal behavior.

ARIMA(1,1,1) model is used:

water_fit<-Arima(water_ts, order = c(1, 1, 1), lambda = water_lambda)

The residuals of this fit are investigated:

Box.test(resid(water_fit), type = "L")
## 
##  Box-Ljung test
## 
## data:  resid(water_fit)
## X-squared = 2.0096e-05, df = 1, p-value = 0.9964
ggtsdisplay(resid(water_fit), points = FALSE, plot.type = "histogram", main = "WaterFlow Residuals for ARIMA(1,1,1)")

  • The Ljung-Box test returns a value of almost 1
  • The residuals appear to be roughly normally distributed around 0 without significant autocorrelation.
  • The model is acceptable and will be used for forecasting.

Using the ARIMA(1,1,1) model with 168 hours for forecast:

water_forecast<-forecast(water_fit, 168, level = 95)

autoplot(water_forecast) + 
    labs(title = "WaterFlow Forecasted", subtitle = "Using ARIMA(1,1,1)", x = "Day", y = "Total WaterFlow") +
    theme(legend.position = "none")

  • Lacking seasonality
data_frame(DateTime = max(water_df$DateTime) + 
  hours(1:168), WaterFlow = water_forecast$mean) %>% 
  write_csv("/Users/hovig/Downloads/output_data624_project1_partC.csv")