Data 624 Project

Part A – ATM Forecast, ATM624Data.xlsx

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

atmdata <- readxl::read_excel('C:\\Users\\charls.joseph\\Documents\\Cuny\\Data624\\project1\\ATM624Data.xlsx')
atmdf <- atmdata %>%
      drop_na() %>%
      spread(ATM, Cash) %>% 
      mutate(DATE = as.Date(DATE, origin='1899-12-30'))
head(atmdf)
## # A tibble: 6 x 5
##   DATE        ATM1  ATM2  ATM3  ATM4
##   <date>     <dbl> <dbl> <dbl> <dbl>
## 1 2009-05-01    96   107     0 777. 
## 2 2009-05-02    82    89     0 524. 
## 3 2009-05-03    85    90     0 793. 
## 4 2009-05-04    90    55     0 908. 
## 5 2009-05-05    99    79     0  52.8
## 6 2009-05-06    88    19     0  52.2

Lets check the pattern for 4 ATM data.

atmdf %>% gather(atm, Cash, -DATE) %>% 
  ggplot(aes(x = DATE, y = Cash, col = atm)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~ atm, ncol = 1, scales = "free_y") +
  labs(title = "Withdrawals from different ATMs", subtitle = "from May-2009 - Apr-2010", x = "Date") +
  scale_y_continuous("withdrawals - Amount")

Below are the observation

ATM1 - There is a weekly seasonality, but no trend. ATM2 - There is a weekly seasonality, but no trend. ATM3 - There are some non zero values for the last 1-2 weeks. ATM4 - Do not see a seasonality or trend, but seeing an outlier.

atm1_ts <- ts(atmdf[, "ATM1"], frequency = 7)
atm2_ts <- ts(atmdf[, "ATM2"] , frequency = 7)
atm3_ts <- ts(atmdf[, "ATM3"], frequency = 7)
atm4_ts <- ts(atmdf[, "ATM4"], frequency = 7)
ggtsdisplay(atm1_ts, points = FALSE, main = "Withdrawals ATM1", xlab = "Day", ylab = "Cash Amounts")

Lets make this stationary and analyze AR and MA component. In order to make it stationary, we need to make the variance and apply a seasonal lag difference .

atm1_lambda <- BoxCox.lambda(atm1_ts)
atm1_bc <- atm1_ts %>% BoxCox(atm1_lambda)
atm1_bc_diff <- atm1_bc %>% diff(lag=7)
ggtsdisplay(atm1_bc_diff, points = FALSE, main = "with transformation and differencing", xlab = "Day", ylab = "Cash Amounts")

Observation from ATM1 plots

From ACF plot, there is a significant correlation on every 7 lags which indicate this is seasonal. From PACF plot, there is a significant correlation on lag 7, 14 and 21. However it is gradually settled down.

This indicate that we may need to use seasonal ARIMA to model this pattern.

After applying transformation and difference From ACF plots, we can see that there is high correlation on lag 1 which indicate non seasonal MA(1) high correlation on lag 7 may indicate a seasonal MA(1) component.

From PACF plots, we can see that there is high correlation on lag 1 which indicate non seasonal AR(1) high correlation on lag 7,14,21 may indicate a seasonal AR(3) component.

Later, we will rely in getting the order from Auto.Arima() function since it has non seasonal and seasonal.

Lets look at the patter for ATM 2

ggtsdisplay(atm2_ts, points = FALSE, main = "Withdrawals ATM2", xlab = "Day", ylab = "Cash Amounts")

Lets make this stationary and analyze AR and MA component. In order to make it stationary, we need to make the variance and apply a seasonal lag difference.

atm2_lambda <- BoxCox.lambda(atm2_ts)
atm2_bc <- atm2_ts %>% BoxCox(atm2_lambda)
atm2_bc_diff <- atm2_bc %>% diff(lag=7)
ggtsdisplay(atm2_bc_diff, points = FALSE, main = "with transformation and differencing", xlab = "Day", ylab = "Cash Amounts")

Observation from ATM2 plots

From ACF plot, there is a significant correlation on every 7 lags which indicate this is seasonal. From PACF plot, there is a significant correlation on lag 7, 14 and 21. However it is gradually settled down.

After applying transformation and difference From ACF plots, we can see that there is high correlation on lag 7 may indicate a seasonal MA(1) component. But there is no non-seasonal MA component.

From PACF plots, we can see that high correlation on lag 7,14,21 may indicate a seasonal AR(3) component. But there is no non-seasonal AR component.

This is very similar for ATM 1. We will use seasonal ARIMA model to model this pattern

Lets look at the ATM-3 plots

ggtsdisplay(atm3_ts, points = FALSE, main = "Withdrawals ATM3", xlab = "Day", ylab = "Cash Amounts")

#### Observation from ATM3 plots

From ACF plots, there are significant auto-correlation for the lag 1,2. From PACF, the auto correlation gradually slow down.

Lets look at the ATM-3 plots

ggtsdisplay(atm4_ts, points = FALSE, main = "Withdrawals ATM3", xlab = "Day", ylab = "Cash Amounts")

Observation from ATM4 plots

From ACF plots and PACF, there is no significant auto correlation.

Modeling

atm1_lambda <- BoxCox.lambda(atm1_ts)
atm2_lambda <- BoxCox.lambda(atm2_ts)
arima_atm1 <- auto.arima(atm1_ts)
arima_atm2 <- auto.arima(atm2_ts)
summary(arima_atm1)
## Series: atm1_ts 
## ARIMA(1,0,0)(2,1,0)[7] 
## 
## Coefficients:
##          ar1     sar1     sar2
##       0.1511  -0.4698  -0.2048
## s.e.  0.0535   0.0517   0.0514
## 
## sigma^2 estimated as 606.7:  log likelihood=-1641.05
## AIC=3290.11   AICc=3290.22   BIC=3305.63
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.08937144 24.28804 15.37384 -98.72979 115.434 0.8697514
##                     ACF1
## Training set 0.009201474
summary(arima_atm2)
## Series: atm2_ts 
## ARIMA(0,0,0)(2,1,0)[7] 
## 
## Coefficients:
##          sar1     sar2
##       -0.5747  -0.1766
## s.e.   0.0521   0.0518
## 
## sigma^2 estimated as 652.8:  log likelihood=-1659.31
## AIC=3324.61   AICc=3324.68   BIC=3336.26
## 
## Training set error measures:
##                      ME     RMSE      MAE  MPE MAPE      MASE       ACF1
## Training set -0.1179362 25.23195 17.27829 -Inf  Inf 0.8434248 0.01502367

Auto Arima function suggested ARIMA(1,0,0)(2,1,0)[7] for ATM1 and ATM2 time series data. Please note there is a differencing needed with 7 lags as indicated in ACF and PACF.

Lets check the residual for ATM1

checkresiduals(arima_atm1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(2,1,0)[7]
## Q* = 18.297, df = 11, p-value = 0.07494
## 
## Model df: 3.   Total lags used: 14

The residuals are normally distributed

Lets check the residual for ATM2

checkresiduals(arima_atm2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0)(2,1,0)[7]
## Q* = 28.213, df = 12, p-value = 0.005149
## 
## Model df: 2.   Total lags used: 14

The residuals are normally distributed

Lets do modeling for ATM3. Since there is no trend or seasonality, we will try the mean and naive model.

atm3_mean <- meanf(atm3_ts, h = 14)
atm3_naive <- naive(atm3_ts, h = 14)
autoplot(atm3_ts) +
  autolayer(atm3_mean, series = "Average", PI = FALSE) +
  autolayer(atm3_naive, series = "Naive", PI = FALSE) 

checkresiduals(atm3_mean)

## 
##  Ljung-Box test
## 
## data:  Residuals from Mean
## Q* = 196.67, df = 13, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 14

Finally lets do the modeling for ATM 4. We will try auto.arima().

atm4_lambda <- BoxCox.lambda(atm4_ts)
arima_atm4 <- auto.arima(atm4_ts)

summary(arima_atm4)
## Series: atm4_ts 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##           mean
##       474.0433
## s.e.   34.0248
## 
## sigma^2 estimated as 423718:  log likelihood=-2882.03
## AIC=5768.06   AICc=5768.1   BIC=5775.86
## 
## Training set error measures:
##                         ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -1.514697e-10 650.0437 323.6001 -616.5219 646.8627 0.8051872
##                      ACF1
## Training set -0.009358419
checkresiduals(arima_atm4)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 4.6668, df = 13, p-value = 0.9818
## 
## Model df: 1.   Total lags used: 14

Fitting the model.

ATM1 forecast

atm1_fit<-Arima(atm1_ts, order = c(1, 0, 0), seasonal = c(2, 1, 0), lambda = atm1_lambda)
atm1_forecast<-forecast(atm1_fit, 31)  ## forecast for the 31 days. 

autoplot(atm1_ts) +
  autolayer(atm1_forecast, series = "Arima forcasted - ATM 1", PI = FALSE) 

#### ATM2 forecast

atm2_fit<-Arima(atm2_ts, order = c(1, 0, 0), seasonal = c(2, 1, 0), lambda = atm2_lambda)
atm2_forecast<-forecast(atm2_fit, 31)  ## forecast for the 31 days. 

autoplot(atm2_ts) +
  autolayer(atm2_forecast, series = "Arima forcasted ATM 2 ", PI = FALSE) 

#### ATM3 forecast

atm3_naive <- naive(atm3_ts, h = 31)
autoplot(atm3_ts) +
  autolayer(atm3_naive, series = "Naive forecasting - ATM 3 ", PI = FALSE) 

#### ATM4 forecast

atm4_fit<-Arima(atm4_ts, order = c(0, 0, 0), lambda = atm4_lambda)
atm4_forecast<-forecast(atm4_fit, 31)  ## forecast for the 31 days. 

autoplot(atm4_ts) +
  autolayer(atm2_forecast, series = "Arima forcasted ATM 2 ", PI = FALSE) 

max(atmdf$DATE) 
## [1] "2010-04-30"
names(atmdf)[-1]
## [1] "ATM1" "ATM2" "ATM3" "ATM4"
results <- data_frame(DATE = rep(max(atmdf$DATE) + 1:31, 4), ATM = rep(names(atmdf)[-1], each = 31), Cash = c(atm1_forecast$mean, atm2_forecast$mean,atm3_naive$mean, atm4_forecast$mean)) 
## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## write_csv(results, "DATA624_partA_Forecast_Results.csv")

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

powerdf <- readxl::read_excel('C:\\Users\\charls.joseph\\Documents\\Cuny\\Data624\\project1\\ResidentialCustomerForecastLoad-624.xlsx')
head(powerdf)
## # A tibble: 6 x 3
##   CaseSequence `YYYY-MMM`     KWH
##          <dbl> <chr>        <dbl>
## 1          733 1998-Jan   6862583
## 2          734 1998-Feb   5838198
## 3          735 1998-Mar   5420658
## 4          736 1998-Apr   5010364
## 5          737 1998-May   4665377
## 6          738 1998-Jun   6467147
powerts <- ts(powerdf[, "KWH"], start = c(1998, 1), frequency = 12)
ggtsdisplay(powerts, points = FALSE, main = "Monthly Power Usage 01/98 - 12/13 ", xlab = "Years", ylab = "Power Usage")

From the above plot, we see that there is seasonality, but no trend.

Modeling

We will try use Auto.Arima to determine the order of the ARIMA since there is a seasonal component.

power_lambda <- BoxCox.lambda(powerts)
power_trans <- BoxCox(powerts, power_lambda)

fc_arima_power <- auto.arima(powerts)
summary(fc_arima_power)
## Series: powerts 
## ARIMA(0,0,2)(2,1,0)[12] with drift 
## 
## Coefficients:
##          ma1     ma2     sar1     sar2     drift
##       0.1739  0.0505  -0.7591  -0.4124  8750.907
## s.e.  0.0766  0.0844   0.0697   0.0682  3214.839
## 
## sigma^2 estimated as 7.841e+11:  log likelihood=-2707.12
## AIC=5426.25   AICc=5426.73   BIC=5445.4
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE     MASE
## Training set -9730.731 845160.6 506279.9 -5.055329 11.57781 0.730904
##                     ACF1
## Training set 0.008533477

Fitting

power_fit<-Arima(powerts, order = c(0, 0,2), seasonal = c(2,1,0), lambda = power_lambda)
power_forecast<-forecast(power_fit, 12)  ## forecast for the 31 days. 


ggtsdisplay(resid(power_fit), points = FALSE, plot.type = "histogram", main = "Residuals for ARIMA(2,1,1)(0,0,2) of power consumption", xlab = "Year", ylab = "Residual")
## Warning: Removed 1 rows containing non-finite values (stat_bin).

autoplot(power_forecast) + 
    labs(title = "Energy Forecast for 2014", x = "Month", y = "kWh") +
    theme(legend.position = "none")

Results

result <- data_frame(`YYYY-MMM` = paste0(2014, "-", month.abb), KWH = power_forecast$mean) 
# write_csv(result, "DATA624_partB_Power_Results.csv")

Part C ( Bonus Question)

Requirement

Part C consists of two data sets. These are simple 2 columns sets, however they have different time stamps. Your optional assignment is to time-base sequence the data and aggregate based on hour (example of what this looks like, follows). Note for multiple recordings within an hour, take the mean. Then to determine if the data is stationary and can it be forecast. If so, provide a week forward forecast and present results via Rpubs and .rmd and the forecast in an Excel readable file.

We will first load the two dataset and observe a sample

water1 <- readxl::read_excel('C:\\Users\\charls.joseph\\Documents\\Cuny\\Data624\\project1\\Waterflow_Pipe1.xlsx', col_types = c("date","numeric"))
water2 <- readxl::read_excel('C:\\Users\\charls.joseph\\Documents\\Cuny\\Data624\\project1\\Waterflow_Pipe2.xlsx', col_types = c("date","numeric"))

head(water1)
## # A tibble: 6 x 2
##   `Date Time`         WaterFlow
##   <dttm>                  <dbl>
## 1 2015-10-23 00:24:06     23.4 
## 2 2015-10-23 00:40:02     28.0 
## 3 2015-10-23 00:53:51     23.1 
## 4 2015-10-23 00:55:40     30.0 
## 5 2015-10-23 01:19:17      6.00
## 6 2015-10-23 01:23:58     15.9
head(water2)
## # A tibble: 6 x 2
##   `Date Time`         WaterFlow
##   <dttm>                  <dbl>
## 1 2015-10-23 01:00:00      18.8
## 2 2015-10-23 02:00:00      43.1
## 3 2015-10-23 03:00:00      38.0
## 4 2015-10-23 04:00:00      36.1
## 5 2015-10-23 05:00:00      31.9
## 6 2015-10-23 06:00:00      28.2
#join them
water <- rbind(water1,water2)
water <- water[order(as.Date(water$`Date Time`, format="%Y-m%-d% h%:M%:s%")),]
#aggragate by hour and take mean...
water <- water %>%
  separate(`Date Time`, into = c("date", "time"), sep = " ") %>%
  separate(time, into = c("hour", "minute", "seccond"), sep = ":") %>%
  group_by(date, hour) %>%
  summarise(mean(WaterFlow,na.rm = T)) %>%
  ungroup() %>%
  arrange() %>%
  collect() %>%
  `colnames<-`(c("date", "time", "WaterFlow"))
## `summarise()` regrouping output by 'date' (override with `.groups` argument)
water_ts <- ts(water$WaterFlow,frequency=24)
head(water_ts)
## Time Series:
## Start = c(1, 1) 
## End = c(1, 6) 
## Frequency = 24 
## [1] 26.10280 18.84515 24.46806 25.56366 22.36159 23.63798
ggtsdisplay(water_ts, points = FALSE, main = "Water flow - hourly", xlab = "Date(hourly)", ylab = "Water flow")

This plot indicate that there is high variance. we dont see a trend or seasionality in this data. Lets make this series stationary by applying tranformation and differencing(lag=1)

water_lambda <- BoxCox.lambda(water_ts)
water_bc <- BoxCox(water_ts, water_lambda)
water_bc_d <- water_bc %>% diff()

ggtsdisplay(water_bc_d, points = FALSE, main = "Water flow - After tranformation and difference", xlab = "Date(hourly)", ylab = "Water flow")

From the ACF and PACF plots, we are seeing that there is high correlation on lag 1 and exponential decay for PACF. This indicate an MA(1) non seasonal model with d =1

water_fit <- Arima(water_ts, order = c(0, 1,1), lambda = water_lambda)
water_fc<-forecast(water_fit, 168)  ## forecast for the 168 observation( 7* 24 hours). 

summary(water_fit)
## Series: water_ts 
## ARIMA(0,1,1) 
## Box Cox transformation: lambda= -0.4973507 
## 
## Coefficients:
##           ma1
##       -0.9794
## s.e.   0.0065
## 
## sigma^2 estimated as 0.01188:  log likelihood=796.49
## AIC=-1588.98   AICc=-1588.97   BIC=-1579.16
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 5.970137 15.94753 12.06652 -7.811153 42.86658 0.7980706 0.02200742
autoplot(water_fc) + 
    labs(title = "Water flow - hourly - MA(1) ", x = "Date - Hourly", y = "water flow") +
    theme(legend.position = "none")

tail(water)
## # A tibble: 6 x 3
##   date       time  WaterFlow
##   <chr>      <chr>     <dbl>
## 1 2015-12-03 11         73.0
## 2 2015-12-03 12         31.5
## 3 2015-12-03 13         66.8
## 4 2015-12-03 14         42.9
## 5 2015-12-03 15         33.4
## 6 2015-12-03 16         66.7
library(lubridate)  
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
water_result <- data.frame(water_fc$mean)
colnames(water_result) <- "WaterFlow"
row.names(water_result) <- seq(ymd_hm("2015-12-3 17:00"), ymd_hm("2015-12-10 16:00"), by = "hour")
water_result
##                     WaterFlow
## 2015-12-03 17:00:00  35.02095
## 2015-12-03 18:00:00  35.02095
## 2015-12-03 19:00:00  35.02095
## 2015-12-03 20:00:00  35.02095
## 2015-12-03 21:00:00  35.02095
## 2015-12-03 22:00:00  35.02095
## 2015-12-03 23:00:00  35.02095
## 2015-12-04 00:00:00  35.02095
## 2015-12-04 01:00:00  35.02095
## 2015-12-04 02:00:00  35.02095
## 2015-12-04 03:00:00  35.02095
## 2015-12-04 04:00:00  35.02095
## 2015-12-04 05:00:00  35.02095
## 2015-12-04 06:00:00  35.02095
## 2015-12-04 07:00:00  35.02095
## 2015-12-04 08:00:00  35.02095
## 2015-12-04 09:00:00  35.02095
## 2015-12-04 10:00:00  35.02095
## 2015-12-04 11:00:00  35.02095
## 2015-12-04 12:00:00  35.02095
## 2015-12-04 13:00:00  35.02095
## 2015-12-04 14:00:00  35.02095
## 2015-12-04 15:00:00  35.02095
## 2015-12-04 16:00:00  35.02095
## 2015-12-04 17:00:00  35.02095
## 2015-12-04 18:00:00  35.02095
## 2015-12-04 19:00:00  35.02095
## 2015-12-04 20:00:00  35.02095
## 2015-12-04 21:00:00  35.02095
## 2015-12-04 22:00:00  35.02095
## 2015-12-04 23:00:00  35.02095
## 2015-12-05 00:00:00  35.02095
## 2015-12-05 01:00:00  35.02095
## 2015-12-05 02:00:00  35.02095
## 2015-12-05 03:00:00  35.02095
## 2015-12-05 04:00:00  35.02095
## 2015-12-05 05:00:00  35.02095
## 2015-12-05 06:00:00  35.02095
## 2015-12-05 07:00:00  35.02095
## 2015-12-05 08:00:00  35.02095
## 2015-12-05 09:00:00  35.02095
## 2015-12-05 10:00:00  35.02095
## 2015-12-05 11:00:00  35.02095
## 2015-12-05 12:00:00  35.02095
## 2015-12-05 13:00:00  35.02095
## 2015-12-05 14:00:00  35.02095
## 2015-12-05 15:00:00  35.02095
## 2015-12-05 16:00:00  35.02095
## 2015-12-05 17:00:00  35.02095
## 2015-12-05 18:00:00  35.02095
## 2015-12-05 19:00:00  35.02095
## 2015-12-05 20:00:00  35.02095
## 2015-12-05 21:00:00  35.02095
## 2015-12-05 22:00:00  35.02095
## 2015-12-05 23:00:00  35.02095
## 2015-12-06 00:00:00  35.02095
## 2015-12-06 01:00:00  35.02095
## 2015-12-06 02:00:00  35.02095
## 2015-12-06 03:00:00  35.02095
## 2015-12-06 04:00:00  35.02095
## 2015-12-06 05:00:00  35.02095
## 2015-12-06 06:00:00  35.02095
## 2015-12-06 07:00:00  35.02095
## 2015-12-06 08:00:00  35.02095
## 2015-12-06 09:00:00  35.02095
## 2015-12-06 10:00:00  35.02095
## 2015-12-06 11:00:00  35.02095
## 2015-12-06 12:00:00  35.02095
## 2015-12-06 13:00:00  35.02095
## 2015-12-06 14:00:00  35.02095
## 2015-12-06 15:00:00  35.02095
## 2015-12-06 16:00:00  35.02095
## 2015-12-06 17:00:00  35.02095
## 2015-12-06 18:00:00  35.02095
## 2015-12-06 19:00:00  35.02095
## 2015-12-06 20:00:00  35.02095
## 2015-12-06 21:00:00  35.02095
## 2015-12-06 22:00:00  35.02095
## 2015-12-06 23:00:00  35.02095
## 2015-12-07 00:00:00  35.02095
## 2015-12-07 01:00:00  35.02095
## 2015-12-07 02:00:00  35.02095
## 2015-12-07 03:00:00  35.02095
## 2015-12-07 04:00:00  35.02095
## 2015-12-07 05:00:00  35.02095
## 2015-12-07 06:00:00  35.02095
## 2015-12-07 07:00:00  35.02095
## 2015-12-07 08:00:00  35.02095
## 2015-12-07 09:00:00  35.02095
## 2015-12-07 10:00:00  35.02095
## 2015-12-07 11:00:00  35.02095
## 2015-12-07 12:00:00  35.02095
## 2015-12-07 13:00:00  35.02095
## 2015-12-07 14:00:00  35.02095
## 2015-12-07 15:00:00  35.02095
## 2015-12-07 16:00:00  35.02095
## 2015-12-07 17:00:00  35.02095
## 2015-12-07 18:00:00  35.02095
## 2015-12-07 19:00:00  35.02095
## 2015-12-07 20:00:00  35.02095
## 2015-12-07 21:00:00  35.02095
## 2015-12-07 22:00:00  35.02095
## 2015-12-07 23:00:00  35.02095
## 2015-12-08 00:00:00  35.02095
## 2015-12-08 01:00:00  35.02095
## 2015-12-08 02:00:00  35.02095
## 2015-12-08 03:00:00  35.02095
## 2015-12-08 04:00:00  35.02095
## 2015-12-08 05:00:00  35.02095
## 2015-12-08 06:00:00  35.02095
## 2015-12-08 07:00:00  35.02095
## 2015-12-08 08:00:00  35.02095
## 2015-12-08 09:00:00  35.02095
## 2015-12-08 10:00:00  35.02095
## 2015-12-08 11:00:00  35.02095
## 2015-12-08 12:00:00  35.02095
## 2015-12-08 13:00:00  35.02095
## 2015-12-08 14:00:00  35.02095
## 2015-12-08 15:00:00  35.02095
## 2015-12-08 16:00:00  35.02095
## 2015-12-08 17:00:00  35.02095
## 2015-12-08 18:00:00  35.02095
## 2015-12-08 19:00:00  35.02095
## 2015-12-08 20:00:00  35.02095
## 2015-12-08 21:00:00  35.02095
## 2015-12-08 22:00:00  35.02095
## 2015-12-08 23:00:00  35.02095
## 2015-12-09 00:00:00  35.02095
## 2015-12-09 01:00:00  35.02095
## 2015-12-09 02:00:00  35.02095
## 2015-12-09 03:00:00  35.02095
## 2015-12-09 04:00:00  35.02095
## 2015-12-09 05:00:00  35.02095
## 2015-12-09 06:00:00  35.02095
## 2015-12-09 07:00:00  35.02095
## 2015-12-09 08:00:00  35.02095
## 2015-12-09 09:00:00  35.02095
## 2015-12-09 10:00:00  35.02095
## 2015-12-09 11:00:00  35.02095
## 2015-12-09 12:00:00  35.02095
## 2015-12-09 13:00:00  35.02095
## 2015-12-09 14:00:00  35.02095
## 2015-12-09 15:00:00  35.02095
## 2015-12-09 16:00:00  35.02095
## 2015-12-09 17:00:00  35.02095
## 2015-12-09 18:00:00  35.02095
## 2015-12-09 19:00:00  35.02095
## 2015-12-09 20:00:00  35.02095
## 2015-12-09 21:00:00  35.02095
## 2015-12-09 22:00:00  35.02095
## 2015-12-09 23:00:00  35.02095
## 2015-12-10 00:00:00  35.02095
## 2015-12-10 01:00:00  35.02095
## 2015-12-10 02:00:00  35.02095
## 2015-12-10 03:00:00  35.02095
## 2015-12-10 04:00:00  35.02095
## 2015-12-10 05:00:00  35.02095
## 2015-12-10 06:00:00  35.02095
## 2015-12-10 07:00:00  35.02095
## 2015-12-10 08:00:00  35.02095
## 2015-12-10 09:00:00  35.02095
## 2015-12-10 10:00:00  35.02095
## 2015-12-10 11:00:00  35.02095
## 2015-12-10 12:00:00  35.02095
## 2015-12-10 13:00:00  35.02095
## 2015-12-10 14:00:00  35.02095
## 2015-12-10 15:00:00  35.02095
## 2015-12-10 16:00:00  35.02095

Lets try Auto.Arima() to determine p,q,d

fc_arima_water <- auto.arima(water_ts, seasonal = F)
summary(fc_arima_water)
## Series: water_ts 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.9629
## s.e.   0.0083
## 
## sigma^2 estimated as 212.9:  log likelihood=-4100.12
## AIC=8204.24   AICc=8204.25   BIC=8214.05
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.531199 14.57593 11.14363 -26.38446 47.91859 0.7370315
##                     ACF1
## Training set -0.02311872

Auto.Arima() suggested ARIMA(0,1,1).