ATM Cash Flow
Executive Summary
Given the differences in the data for each ATM, each ATM is modeled and forecast separately. Attached are the forecasts of cash flow needs for each ATM. The predictions should be considered a starting point for modeling cash needs, as further testing and refined are needed. This is especially true for ATM 3, which has insufficient data for a meaningful prediction.
Load Data and Clean Up
We will start by renaming the columns and pulling some trickery with the date to match the time interval listed in the project description.
data <- read_excel('./data/ATM624Data.xlsx')
data <-
data %>%
mutate(DATE= as.Date(DATE, origin="1899-12-30")) %>%
rename(date=DATE, atm=ATM, cash=Cash)
Summarize the data looking for oddities.
## date atm cash
## Min. :2009-05-01 Length:1474 Min. : 0.0
## 1st Qu.:2009-08-01 Class :character 1st Qu.: 0.5
## Median :2009-11-01 Mode :character Median : 73.0
## Mean :2009-10-31 Mean : 155.6
## 3rd Qu.:2010-02-01 3rd Qu.: 114.0
## Max. :2010-05-14 Max. :10919.8
## NA's :19
NA’s are present, so we should investigate.
## [1] "ATM1" "ATM2" NA "ATM3" "ATM4"
## # A tibble: 6 x 3
## date atm cash
## <date> <chr> <dbl>
## 1 2009-06-13 ATM1 NA
## 2 2009-06-16 ATM1 NA
## 3 2009-06-18 ATM2 NA
## 4 2009-06-22 ATM1 NA
## 5 2009-06-24 ATM2 NA
## 6 2010-05-01 <NA> NA
We are missing some information regarding the ATM as well as both cash and ATM values. Since cash flow is what we are trying to predict, we will need to drop those NA values. Also several of the dates are beyond the one year date interval from May 1, 2009 to May 1, 2010.
## date atm cash
## Min. :2009-05-01 Length:1455 Min. : 0.0
## 1st Qu.:2009-08-01 Class :character 1st Qu.: 0.5
## Median :2009-10-31 Mode :character Median : 73.0
## Mean :2009-10-30 Mean : 155.6
## 3rd Qu.:2010-01-29 3rd Qu.: 114.0
## Max. :2010-04-30 Max. :10919.8
Time Series Exploration
Next we will need to explore the time series data. This will be a series of plots to investigate the characteristics of the ATMs.
data_ts <-
data %>%
group_by(atm) %>%
group_map(~ ts(.x$cash, start=c(2009, 05, 01), frequency=7))
str(data_ts)
## List of 4
## $ : Time-Series [1:362] from 2010 to 2061: 96 82 85 90 99 88 8 104 87 93 ...
## $ : Time-Series [1:363] from 2010 to 2061: 107 89 90 55 79 19 2 103 107 118 ...
## $ : Time-Series [1:365] from 2010 to 2062: 0 0 0 0 0 0 0 0 0 0 ...
## $ : Time-Series [1:365] from 2010 to 2062: 777 524.4 792.8 908.2 52.8 ...
ATM 3 looks weird.
ATM 3 is weird. Seems like it was newly installed or fixed at the end of the time series. ATMs 1 and 2 appear to be standard TS data and ATM 4 has a serious outlier. We will need to take care of that outlier in some way to not skew the prediction. This is especially important due to the fact that the outlier is fairly close to the end of the time series, and that will have higher weight towards the future predictions.
Outlier Remediation
Since that appears to be only one outlier in the ATM 4 data. We will take the easy way to fix it using the forecast package. Hyndman gives a very concise description of this method on stack exchange. Hilariously, his answer is not the most up-voted nor is it the “checked as correct” answer. How dare they.
Seasonality
It seems reasonable to consider these data sets seasonal, but we will confirm via subseries plot
Each of these data sets have a seasonality to them in terms of days of the week in which money is drawn out. ATMs 1 and 2 have low use on sunday and/or Monday, ATM 3 is only used mid-week, and ATM 4 gets consistent use except on Wednesday.
Transformation and Differencing
ATM 3 has too little information to make much of a choice about tranformation, but ATM 1, 2, qnd 4 appear to be stationary though they have pretty large variance. We have tried several transformations (Box-Cox, log, etc) and landed on Box-Cox as a good choice for this data.
data_ts[[1]]<-
data_ts[[1]] %>%
BoxCox(., lambda = BoxCox.lambda(data_ts[[1]]))
data_ts[[2]]<-
data_ts[[2]] %>%
BoxCox(., lambda = BoxCox.lambda(data_ts[[2]]))
data_ts[[3]]<- data_ts[[3]]
data_ts[[4]]<-
data_ts[[4]] %>%
BoxCox(., lambda = BoxCox.lambda(data_ts[[4]]))
Differencing does not seem to be needed given the appearance of the data. Though a call to the ndiffs
function suggests that a KPSS test says otherwise for ATM 2. For ATM 1, the results of the KPSS test marginally side with no differencing. However after viewing the ACF ad PACF plots, it seems reasonable to first difference ATM 1 as well.
## [1] 1
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 2.1544
We will difference the data for ATM 1 and 2 only.
Modeling
We will forecast each ATM series separately as stated above.
ATM 1
We have determined that ATM 1 and 2 might need differencing and have been transformed using BoxCox. We will determine some parameters on the AR and MA components of the time series.
This looks like MA is order two and AR is order six. MA order two is due to the ACF dropping out after two lag values and the AR of order 6 is due to the PACF dropout at lag 6. We will also run a large breathe of auto arima because this type of modeling involves a fair amount of guesswork.
Seasonal differencing should also be considered we see strong ACF peaks on multiples of 7 lag values and decreasing PACF peaks. This is covered in the text as a possible seasonal (0,1,1) due to the presence of the MA term.
##
## Call:
## arima(x = data_ts[[1]], order = c(6, 1, 2), seasonal = c(0, 1, 1))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ma1 ma2
## 1.0014 -0.2146 0.1077 -0.0769 0.0229 0.0341 -1.9203 0.9203
## s.e. 0.0871 0.0753 0.0761 0.0752 0.0753 0.0572 0.0673 0.0678
## sma1
## -0.5713
## s.e. 0.0526
##
## sigma^2 estimated as 1.12: log likelihood = -527.99, aic = 1075.98
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.04742113 1.046658 0.5757712 -Inf Inf 0.3927448 -0.005033385
And we will check the residuals.
##
## Ljung-Box test
##
## data: Residuals from ARIMA(6,1,2)(0,1,1)[7]
## Q* = 4.9523, df = 5, p-value = 0.4217
##
## Model df: 9. Total lags used: 14
This appears to be a solid result in terms of residuals. They are centered at zero, appear to be normal, no significant ACF lags. The residual plot seems like it might have some variance issues, but these aren’t too much if a concern given the other points. Also the Ljung-Box test the residals are just noise.
Next we will try a full-space auto arima to see if we are way off base on this analysis.
auto_fit<- auto.arima(data_ts[[1]],
seasonal = TRUE,
stepwise = FALSE,
approximation = FALSE)
summary(auto_fit)
## Series: data_ts[[1]]
## ARIMA(1,0,2)(0,1,1)[7]
##
## Coefficients:
## ar1 ma1 ma2 sma1
## 0.7621 -0.6802 -0.1779 -0.5593
## s.e. 0.1761 0.1839 0.0611 0.0531
##
## sigma^2 estimated as 1.135: log likelihood=-525.74
## AIC=1061.48 AICc=1061.65 BIC=1080.84
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.01406161 1.049017 0.5835782 -Inf Inf 0.9321596 -0.006180272
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,2)(0,1,1)[7]
## Q* = 6.7256, df = 10, p-value = 0.7511
##
## Model df: 4. Total lags used: 14
Well, we were not way off in terms of choosing a model when compared to the auto arima search. Theh auto arima is simpler - a difference of 10 in AIC is significant - and the fewer the terms the better, generally.
For the reasons above, we will keep the auto arima model.
ATM 2
We have a similar result as ATM 1. We have seasonal differencing that is needed (strong peaks on 7 multiples) and PACF drops out at lag six. The last strong ACF peak is at lag 5 (omitting the 7s). So MA = 6 and AR = 5 with d=1.
##
## Call:
## arima(x = data_ts[[2]], order = c(6, 1, 5), seasonal = c(0, 1, 1))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ma1 ma2
## -0.9043 0.5583 0.7532 0.3242 -0.0638 -0.1703 -0.1053 -1.6201
## s.e. 0.5849 0.1776 0.2307 0.3244 0.1276 0.1750 0.5729 0.5996
## ma3 ma4 ma5 sma1
## -0.2102 0.7139 0.2254 -0.4876
## s.e. 0.3888 0.0961 0.3080 0.0957
##
## sigma^2 estimated as 54.38: log likelihood = -1220.64, aic = 2467.28
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.02396336 7.292414 5.127516 NaN Inf 0.4093278 -0.002975616
Checking residuals
##
## Ljung-Box test
##
## data: Residuals from ARIMA(6,1,5)(0,1,1)[7]
## Q* = 3.9237, df = 3, p-value = 0.2698
##
## Model df: 12. Total lags used: 15
Pretty good, though the residual plot shows some non-constant variance and the ACF has one significant peak at lag 23.
Now the auto fit. The auto fit model does not pass the Ljung-Box test and provides worse RMSE results (though only marginally worse). The AIC for the manual fit is also minorly better. We will stick with the manual fit in this case.
auto_fit_2 <- auto.arima(data_ts[[2]],
stepwise = FALSE,
approximation = FALSE,
seasonal = TRUE)
summary(auto_fit_2)
## Series: data_ts[[2]]
## ARIMA(2,0,2)(0,1,1)[7]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1
## -0.4195 -0.9000 0.4133 0.7751 -0.6432
## s.e. 0.0624 0.0502 0.0942 0.0679 0.0455
##
## sigma^2 estimated as 58.55: log likelihood=-1228.56
## AIC=2469.11 AICc=2469.35 BIC=2492.36
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.1421577 7.52426 5.163704 -Inf Inf 0.8533972 0.02379373
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,1,1)[7]
## Q* = 17.794, df = 9, p-value = 0.03764
##
## Model df: 5. Total lags used: 14
ATM 3
ATM 3 has very very few points, so it makes sense to simply predict using a simple, low-stakes method.
##
## Call:
## arima(x = data_ts[[3]][363:365], order = c(0, 0, 0))
##
## Coefficients:
## intercept
## 87.6667
## s.e. 3.4749
##
## sigma^2 estimated as 36.22: log likelihood = -9.64, aic = 23.28
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 4.157622e-11 6.01849 5.555556 -0.4557562 6.242793 0.6535948
## ACF1
## Training set -0.295501
ATM 4
Other than a strong seasonal component, there does not appear to be much here. Maybe an auto-regressive term from the presistance of the PACF peaks when compared to ACF.
##
## Call:
## arima(x = data_ts[[4]], order = c(1, 0, 0), seasonal = c(0, 1, 1))
##
## Coefficients:
## ar1 sma1
## 0.1005 -0.8703
## s.e. 0.0525 0.0314
##
## sigma^2 estimated as 163.7: log likelihood = -1425.5, aic = 2857.01
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02419802 12.67171 9.960528 -68.17711 93.65638 0.6276575
## ACF1
## Training set -0.001924168
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(0,1,1)[7]
## Q* = 16.67, df = 12, p-value = 0.1625
##
## Model df: 2. Total lags used: 14
Seems like a reasonable model. A broad search of the space is probably the best choice though.
auto_fit_4 <- auto.arima(data_ts[[4]],
stepwise = FALSE,
approximation = FALSE,
seasonal = TRUE)
summary(auto_fit_4)
## Series: data_ts[[4]]
## ARIMA(1,0,0)(2,0,0)[7] with non-zero mean
##
## Coefficients:
## ar1 sar1 sar2 mean
## 0.0801 0.2076 0.2031 28.5695
## s.e. 0.0526 0.0516 0.0524 1.2477
##
## sigma^2 estimated as 175.6: log likelihood=-1459.6
## AIC=2929.2 AICc=2929.37 BIC=2948.7
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.008143396 13.177 10.94165 -75.56991 101.3042 0.8208886
## ACF1
## Training set 0.0008660554
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(2,0,0)[7] with non-zero mean
## Q* = 16.891, df = 10, p-value = 0.07681
##
## Model df: 4. Total lags used: 14
This is not a better choice than the manual fit. We will keep the manually adjusted model.
Forecasting
We have out four models, so let’s now predict the next month worth of cash needs.
These all appear to have reasonable results, and the 80% PI covers a wide enough span of the data that it doesn’t look like this prediction will result in empty machines. That is, a relative surge in demand will not (likely) cause a the ATMs to run out of money.
How Much Money?
So how much should we put in those machines? Given the state of the models and the time interval used, we should air on the side of caution without better knowing cash demands around summer travel, holidays, crypto price run ups, stimulus checks, etc.
atm1<-as.integer(InvBoxCox(forecast(auto_fit, h=10)$mean,
lambda = attr(data_ts[[1]],'lambda')))
atm2<- as.integer(InvBoxCox(forecast(manual_fit_2, h=10)$mean,
lambda = attr(data_ts[[2]], 'lambda')))
atm3<- as.integer(forecast(fit_3, h=10)$mean)
atm4<- as.integer(InvBoxCox(forecast(manual_fit_4, h=10)$mean,
lambda = attr(data_ts[[4]],'lambda')))
atm_df<-
bind_cols(atm1, atm2, atm3, atm4) %>%
rename(ATM_1='...1', ATM_2='...2',
ATM_3='...3', ATM_4='...4') %>%
mutate(day=1:10)
## # A tibble: 7 x 5
## ATM_1 ATM_2 ATM_3 ATM_4 day
## <int> <int> <int> <int> <int>
## 1 88 65 87 312 1
## 2 101 63 87 388 2
## 3 75 7 87 398 3
## 4 3 1 87 68 4
## 5 100 88 87 456 5
## 6 79 90 87 306 6
## 7 85 66 87 511 7
Residential Power Usage
Executive Summary
An Auto-Arima model provides a nice forecast for the following year of electrical load for the area. Given the relative consistency of the data, we have reasonable prediction for mean electric load.
Data Load and Clean Up
data_power<- read_excel("./data/ResidentialCustomerForecastLoad-624.xlsx")
data_power<-
data_power %>%
rename(date='YYYY-MMM', case_seq=CaseSequence, kwh=KWH)
head(data_power)
## # A tibble: 6 x 3
## case_seq date kwh
## <dbl> <chr> <dbl>
## 1 733 1998-Jan 6862583
## 2 734 1998-Feb 5838198
## 3 735 1998-Mar 5420658
## 4 736 1998-Apr 5010364
## 5 737 1998-May 4665377
## 6 738 1998-Jun 6467147
This time format seems to puzzle R. I found this SO post about string manipulation and dates, which lead me to assigning an arbitrary day for each month. This made it easier for R to understand as a date. I chose the 15 of each month for no meaningful reason.
data_power<-
data_power %>%
mutate(date=as.Date(paste(date,"15",sep="-"), format="%Y-%b-%d")) %>%
select(date, kwh)
summary(data_power)
## date kwh
## Min. :1998-01-15 Min. : 770523
## 1st Qu.:2002-01-07 1st Qu.: 5429912
## Median :2005-12-30 Median : 6283324
## Mean :2005-12-29 Mean : 6502475
## 3rd Qu.:2009-12-22 3rd Qu.: 7620524
## Max. :2013-12-15 Max. :10655730
## NA's :1
One NA in the power data. We are going to deal with this later or maybe ignore it. It is likely to not have a meaningful impact.
Time Series Exploration
Big drop in early 2010. Perhaps a squirrel met it’s maker. Again, we will take the easy way out for a single outlier.
That took care of the outlier and the NA values.
Definitely have strong seasonality and a reasonable indication of trended data.
This seems like less of a clear upward trend and more like a cycle to the year-over-year power demand. Let us take aquick look at a decomposition.
So this is a trend, but seems to have some uncertainty in how the electrical demand is rising.
Transformation and Differencing
The log transform and Box-Cox are indistinguishable and both do a reasonable job of reducing scale, we will go with log.
Stabilized with the first difference. A call to ndiffs
yields the same result, d=1.
The strong bi-annual seasonality is hard to see past in the ACF, but at least one MA term would be a good choice. The PACF suggests a couple of AR terms as well. Perhaps a (1,1,2). Seasonality has at least second order MA.
Modeling
As above, we will fit our best guess and run a wide berth of auto arimas to cast a big net. We can sort the results from there.
##
## Call:
## arima(x = log_power, order = c(1, 1, 2), seasonal = c(0, 1, 2))
##
## Coefficients:
## ar1 ma1 ma2 sma1 sma2
## -0.4913 -0.2120 -0.6654 -0.8027 0.1204
## s.e. 0.1711 0.1427 0.1198 0.0828 0.0941
##
## sigma^2 estimated as 0.001434: log likelihood = 326.19, aic = -640.38
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.001645432 0.03660635 0.02801831 0.02233794 0.4113568 0.373005
## ACF1
## Training set 0.0150959
## Series: log_power
## ARIMA(0,0,3)(2,1,0)[12] with drift
##
## Coefficients:
## ma1 ma2 ma3 sar1 sar2 drift
## 0.2805 0.0596 0.2271 -0.7115 -0.4215 5e-04
## s.e. 0.0736 0.0809 0.0669 0.0724 0.0743 2e-04
##
## sigma^2 estimated as 0.001419: log likelihood=333.83
## AIC=-653.66 AICc=-653.01 BIC=-631.31
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.0001121366 0.03586286 0.02757534 -0.003724069 0.4051243
## MASE ACF1
## Training set 0.7008807 -0.0016198
These are quite different models, with very similar results. A change in 10 AIC is meaningful for simplicity and overfit. The difference in RMSE is marginal at best. We will go with the auto arima. Both residuals looked great, so this is just a matter of picking based on metrics only.
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,3)(2,1,0)[12] with drift
## Q* = 11.659, df = 18, p-value = 0.8643
##
## Model df: 6. Total lags used: 24
Forcasting
We want a one year forecast.
Looks good, even the 80 and 90% PIs seem tight and well-bounded.
How Much Demand?
For practical interpertation, the model predicts the following year of millions of KWH.
## Jan Feb Mar Apr May Jun Jul
## 2014 9.470666 8.583415 6.591807 5.950789 5.888873 8.222853 9.521278
## Aug Sep Oct Nov Dec
## 2014 10.045253 8.498445 5.818158 6.103728 7.628745
Water Flow Pipes
Executive Summary
The best for these two pipes as aggregated is a simple prediction of the mean value. Pipe one is missing significant amounts of data, but a forecast could be made. More data from that pipe would be helpful.
Data Load and Clean Up
This was way easier to clean up and aggregate than the power usage data set. I guess it’s all about what you already know.
data_pipe_1<-
read_excel("./data/Waterflow_Pipe1.xlsx",
col_types = c('date', 'numeric'),
col_names = c("datetime", "water_flow")) %>%
na.omit() # read_excel and name/type function add one extra NA row
data_pipe_2<-
read_excel("./data/Waterflow_Pipe2.xlsx",
col_types=c('date', 'numeric'),
col_names = c("datetime", "water_flow")) %>%
na.omit()
data_pipe_1<-
data_pipe_1 %>%
mutate(date = format(datetime,
format = "%Y-%m-%d:%H")) %>%
group_by(date) %>%
summarize(average_flow = mean(water_flow))
data_pipe_2<-
data_pipe_2 %>%
mutate(date= format(datetime,
format = "%Y-%m-%d:%H")) %>%
group_by(date) %>%
summarize(average_flow = mean(water_flow))
## date average_flow_1 average_flow_2
## Length:1001 Min. : 8.923 Min. : 1.885
## Class :character 1st Qu.:17.033 1st Qu.:28.140
## Mode :character Median :19.784 Median :39.682
## Mean :19.893 Mean :39.556
## 3rd Qu.:22.790 3rd Qu.:50.782
## Max. :31.730 Max. :78.303
## NA's :765 NA's :1
Now we have the flow across each pipe aggregated to the hour and matched in time we see that the first pipe has far less measurements than the second pipe. Seems like pipe one is measured in an auxiliary fashion or is defective, whereas pipe two is consistently reading flow rate on every hour. They are normally distributed, which is good and it seems like pipe 1 is missing most of the days and hours worth of data and is roughly half the flow measurement of pipe 2. As such, we will add the total flow of both pipes and model all three scenarios and see what happens.
Final clean up and convert to time series object
# replace NA with zero for total flow values.
data_pipe<-
data_pipe %>%
mutate(average_flow_1 = replace_na(average_flow_1,0),
average_flow_2 = replace_na(average_flow_2,0),
total_flow = average_flow_1+average_flow_2)
We will keep all three and model them separately and evaluate.
Flow Exploration
Transformation and Differencing
Box Cox showed no meaningful gain in terms of reduced variance or linearity. The first difference was returned by ndiffs
for both pipe 1 and the total flow.
## [1] 1
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.0858
So pipe two seems to be stationary and pipe one and the total flow could use a first difference.
Modeling
pipe 2
This might actaully have a day-over-day seasonality to it. The first difference of this data looks pretty good in terms of having reasonable patterns for MA and AR. The non-differenced data seems less clear.
Modeling Results
After looking at the data it seems like the best path forward would be to model pipe 1 and pipe two separately. The total flow is really just pipe two anyway. For both of these, we will use the auto arima and set it to long search. For pipe one, we will truncate the values to ignore the long tail of zeros after index number 236.
pipe_1_fit <- auto.arima(pipe_1_ts[1:236],
stepwise = FALSE,
approximation = FALSE,
seasonal= TRUE)
summary(pipe_1_fit)
## Series: pipe_1_ts[1:236]
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 19.8927
## s.e. 0.2754
##
## sigma^2 estimated as 17.98: log likelihood=-675.29
## AIC=1354.59 AICc=1354.64 BIC=1361.51
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 3.043898e-13 4.231136 3.332455 -5.064092 18.29517 0.7036506
## ACF1
## Training set 0.02101267
pipe_2_fit <- auto.arima(pipe_2_ts,
stepwise = FALSE,
approximation = FALSE,
seasonal= FALSE)
summary(pipe_2_fit)
## Series: pipe_2_ts
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 39.5160
## s.e. 0.5083
##
## sigma^2 estimated as 258.9: log likelihood=-4200.83
## AIC=8405.65 AICc=8405.67 BIC=8415.47
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 6.915431e-13 16.08186 13.10713 -Inf Inf 0.7458201 -0.01840344
We tried several models including forcing a first difference and seasonality and the above simple (0,0,0) models were the best.
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 5.432, df = 9, p-value = 0.7951
##
## Model df: 1. Total lags used: 10
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 61.426, df = 47, p-value = 0.07701
##
## Model df: 1. Total lags used: 48
The residuals for both models are white noise. There are some concerning ACF peaks for model two, but given the normal-ness of histogram, we can conclude that the residuals are within what is acceptable.
Forecasting
One week forecast is 168 hours. These models are simply predicting the mean value.
pipe_1_pred<- pipe_1_fit %>% forecast(h=168)
pipe_2_pred<- pipe_2_fit %>% forecast(h=168)
pred_df<- bind_cols(pipe_1_pred$mean, pipe_2_pred$mean) %>%
rename(pipe_1='...1', pipe_2='...2') %>%
mutate(hour=1:168)
head(pred_df)
## # A tibble: 6 x 3
## pipe_1 pipe_2 hour
## <dbl> <dbl> <int>
## 1 19.9 39.5 1
## 2 19.9 39.5 2
## 3 19.9 39.5 3
## 4 19.9 39.5 4
## 5 19.9 39.5 5
## 6 19.9 39.5 6