Part A

ATM Forecast, ATM624Data.xlsx In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose. I am giving you data, please provide your written report on your findings, visuals, discussion and your R code all within a Word readable document, except the forecast which you will put in an Excel readable file. I must be able to cut and paste your R code and run it in R studio or other interpreter.

library("readxl")
library("dplyr")
library("tidyr")
library("plotly")
library("forecast")
library("fpp2")
library("lubridate")

"I start by reading the excel file from my local machine using readxl package. I then transformed the date column from a numeric to date and finally convert each atm dataset into a time series. Rows with no ATMs and Nas are removed from the final dataset."

## [1] "I start by reading the excel file from my local machine using readxl package. I then transformed the date column from a numeric to date and finally convert each atm dataset into a time series. Rows with no ATMs and Nas are removed from the final dataset."

ATM_Data = read_xlsx("ATM624Data.xlsx")
ATM_Data$DATE =as.Date(ATM_Data$DATE, origin = "1899-12-30")
ATM_Data = ATM_Data[!is.na(ATM_Data$ATM),]

" With the help of:
https://stackoverflow.com/questions/33128865/starting-a-daily-time-series-in-r

 I found a solution for generating a proper time series. The second input of start in the ts function calculates the date of the year. Finally, I replace missing values with 0. Each series is separated and plotted."

## [1] " With the help of:\nhttps://stackoverflow.com/questions/33128865/starting-a-daily-time-series-in-r\n\n I found a solution for generating a proper time series. The second input of start in the ts function calculates the date of the year. Finally, I replace missing values with 0. Each series is separated and plotted."

ATM1 = ATM_Data %>% filter(ATM=="ATM1")%>%replace_na(list(Cash=0)) %>%select(-DATE,-ATM) %>%  ts(start=c(2009,as.numeric(format(as.Date("2009-05-01"), "%j"))), frequency = 365)

ATM2 = ATM_Data %>% filter(ATM=="ATM2") %>% replace_na(list(Cash=0)) %>%select(-DATE,-ATM) %>%  ts(start=c(2009,as.numeric(format(as.Date("2009-05-01"), "%j"))), frequency = 365)

ATM3 = ATM_Data %>% filter(ATM=="ATM3",Cash>0) %>%replace_na(list(Cash=0)) %>%select(-DATE,-ATM) %>%   ts(start=c(2010,as.numeric(format(as.Date("2010-04-28"), "%j"))),end=c(2010,as.numeric(format(as.Date("2010-04-30"), "%j"))) ,frequency = 365)

ATM4 = ATM_Data %>% filter(ATM=="ATM4") %>% replace_na(list(Cash=0)) %>%select(-DATE,-ATM) %>%  ts(start=c(2009,as.numeric(format(as.Date("2009-05-01"), "%j"))), frequency = 365)

autoplot(ATM1)+labs(title = "Cash withdrawl ATM1",
       subtitle = "5/1/2009 - 4/30/2010",
       x = "Date",y = "Cash")

autoplot(ATM2)+labs(title = "Cash withdrawl ATM2", subtitle = "5/1/2009 - 4/30/2010",
       x = "Date",y = "Cash")

autoplot(ATM3)+labs(title = "Cash withdrawl ATM3",
       subtitle = "5/1/2009 - 4/30/2010",
       x = "Date",y = "Cash")

autoplot(ATM4)+labs(title = "Cash withdrawl ATM4",
       subtitle = "5/1/2009 - 4/30/2010",
       x = "Date",y = "Cash")

ATM 1 & 2 seems to have consistent transactions. ATM 3 might have been recently installed or data was not gathered considering the missing values.ATM 4 seems to have a spike in cash widthdrawl for the date 2/9/2010,the high withdrawl from this machine might be due to all money being withdrawned from the machine.

ATM 1

"I separated the data into test and training, but I was unable to apply the model correctly to compare the accuracy of the test and training model."

## [1] "I separated the data into test and training, but I was unable to apply the model correctly to compare the accuracy of the test and training model."

#ATM1_Train = ATM_Data %>% filter(ATM=="ATM1")%>%replace_na(list(Cash=0)) %>%select(-DATE,-ATM) %>%  ts(start=c(2009,as.numeric(format(as.Date("2009-05-01"), "%j"))),end=c(2010,as.numeric(format(as.Date("2010-03-30"), "%j"))), frequency = 365)

#ATM1_Test =ATM_Data %>% filter(ATM=="ATM1")%>%replace_na(list(Cash=0)) %>%select(-DATE,-ATM) %>%  ts(start=c(2010,as.numeric(format(as.Date("2010-03-31"), "%j"))),end=c(2010,as.numeric(format(as.Date("2010-04-30"), "%j"))), frequency = 365)
  
 " The ACF plot reveals that the data is not correlated and is not white noise."

## [1] " The ACF plot reveals that the data is not correlated and is not white noise."

ggtsdisplay(ATM1)

" The model chosen by the function is ETS(A,N,N) additive errors and no trend or seasonality.Furthermore, residuals are not normally distributed, and the model did not pass a Ljung-Box test indicating that interval might not be trusted. The model did not require a lambda value."

## [1] " The model chosen by the function is ETS(A,N,N) additive errors and no trend or seasonality.Furthermore, residuals are not normally distributed, and the model did not pass a Ljung-Box test indicating that interval might not be trusted. The model did not require a lambda value."

Model1_ATM1 = ets(ATM1,lambda=BoxCox.lambda(ATM1))
checkresiduals(Model1_ATM1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 2556.9, df = 362, p-value < 2.2e-16
## 
## Model df: 2.   Total lags used: 364

"The summary of the model reveals an AICc of 4,800.129 and RMSE of 37.23543."

## [1] "The summary of the model reveals an AICc of 4,800.129 and RMSE of 37.23543."

summary(Model1_ATM1)

## ETS(A,N,N) 
## 
## Call:
##  ets(y = ATM1, lambda = BoxCox.lambda(ATM1)) 
## 
##   Box-Cox transformation: lambda= 1 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
## 
##   Initial states:
##     l = 82.1913 
## 
##   sigma:  37.2354
## 
##      AIC     AICc      BIC 
## 4800.063 4800.129 4811.763 
## 
## Training set error measures:
##                       ME     RMSE     MAE  MPE MAPE MASE      ACF1
## Training set -0.01180702 37.23543 27.9423 -Inf  Inf  NaN 0.0215371

"Evident by the ets model it fails to capture the data trend."

## [1] "Evident by the ets model it fails to capture the data trend."

autoplot(forecast(Model1_ATM1, 31))

" Model 2 is a non-seasonal ARIMA(2,0,2) with 2 lag differences no ordinary AR lags and 2 Ma lags. The model fails the Ljung-Box test, and the ACF test indicates no white noise and residual are not normally distributed around zero."

## [1] " Model 2 is a non-seasonal ARIMA(2,0,2) with 2 lag differences no ordinary AR lags and 2 Ma lags. The model fails the Ljung-Box test, and the ACF test indicates no white noise and residual are not normally distributed around zero."

Model2_ATM1 =auto.arima(ATM1,lambda=BoxCox.lambda(ATM1))
checkresiduals(Model2_ATM1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,2) with non-zero mean
## Q* = 2156.7, df = 359, p-value < 2.2e-16
## 
## Model df: 5.   Total lags used: 364

"Summary of the model reveals an AICc of 3,665.34 and RMSE of 36.0517."

## [1] "Summary of the model reveals an AICc of 3,665.34 and RMSE of 36.0517."

summary(Model2_ATM1)

## Series: ATM1 
## ARIMA(2,0,2) with non-zero mean 
## Box Cox transformation: lambda= 1 
## 
## Coefficients:
##           ar1     ar2     ma1      ma2     mean
##       -0.2414  0.3195  0.2672  -0.5364  82.1961
## s.e.   0.1282  0.1107  0.1105   0.0987   1.4986
## 
## sigma^2 estimated as 1318:  log likelihood=-1826.55
## AIC=3665.1   AICc=3665.34   BIC=3688.5
## 
## Training set error measures:
##                      ME    RMSE      MAE  MPE MAPE MASE       ACF1
## Training set 0.01600632 36.0517 27.43585 -Inf  Inf  NaN 0.01751342

autoplot(forecast(Model2_ATM1, 31))

"Seen how the two previous models failed I group the series based on weeks. This allowed me to apply a lambda value and convert to white noise."

## [1] "Seen how the two previous models failed I group the series based on weeks. This allowed me to apply a lambda value and convert to white noise."

ATM1_Weekly = ts(ATM1, frequency = 7)

ggtsdisplay(ATM1_Weekly)

"The model given is ARIMA(0,0,0)(2,0,0)[7] with non-zero mean with 2 seasonal diferencing. The Ljung-Box test fails and the residuals are not normally distributed, ACF showed improvement."

## [1] "The model given is ARIMA(0,0,0)(2,0,0)[7] with non-zero mean with 2 seasonal diferencing. The Ljung-Box test fails and the residuals are not normally distributed, ACF showed improvement."

Model3_ATM1_Weekly =auto.arima(ATM1_Weekly,lambda=BoxCox.lambda(ATM1_Weekly))

checkresiduals(Model3_ATM1_Weekly)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Q* = 38.675, df = 11, p-value = 6.016e-05
## 
## Model df: 3.   Total lags used: 14

"Summary reveals AICc=2200.81 and RMSE of 27.88534 by RMSE this model appears to be the best."

## [1] "Summary reveals AICc=2200.81 and RMSE of 27.88534 by RMSE this model appears to be the best."

summary(Model3_ATM1_Weekly)

## Series: ATM1_Weekly 
## ARIMA(0,0,0)(2,0,0)[7] with non-zero mean 
## Box Cox transformation: lambda= 0.5621795 
## 
## Coefficients:
##         sar1    sar2     mean
##       0.4440  0.3191  18.5695
## s.e.  0.0487  0.0494   0.9870
## 
## sigma^2 estimated as 23.64:  log likelihood=-1096.35
## AIC=2200.7   AICc=2200.81   BIC=2216.29
## 
## Training set error measures:
##                    ME     RMSE      MAE  MPE MAPE      MASE       ACF1
## Training set 3.330723 27.88534 18.27435 -Inf  Inf 0.9502131 0.05114076

#c =ATM1 %>% ets() %>% forecast(h=30, level = 95) 
autoplot(forecast(Model3_ATM1_Weekly, 31))

ATM1_Forecast = forecast(Model3_ATM1_Weekly, 31, level = 95)

# https://stackoverflow.com/questions/37818975/how-to-store-forecasted-values-using-forecast-library-in-r-into-a-csv-file
F_ATM1 =data_frame(DATE = rep(max(ATM_Data$DATE) + 1:31),
           ATM = rep("ATM1"),
           Cash = as.numeric( ATM1_Forecast$mean))

ATM 2

In model 2 the data needed to be differenced and an ARIMA(3,1,0)(2,0,0)[7] was applied 3 ordinary lags, 1 differenced and 2 seasonal lags. The model fails Ljung-Box test and residuals do not appear typically distributed.

ATM2_Weekly = ts(ATM2, frequency = 7)

ggtsdisplay(ATM2_Weekly)

Model1_ATM2_Weekly =auto.arima(ATM2_Weekly,lambda=BoxCox.lambda(ATM2_Weekly))
checkresiduals(Model1_ATM2_Weekly)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,0)(2,0,0)[7]
## Q* = 41.27, df = 9, p-value = 4.467e-06
## 
## Model df: 5.   Total lags used: 14

"summary of the model reveals a RMSE of 28.72892 and AICc=2470.25"

## [1] "summary of the model reveals a RMSE of 28.72892 and AICc=2470.25"

summary(Model1_ATM2_Weekly)

## Series: ATM2_Weekly 
## ARIMA(3,1,0)(2,0,0)[7] 
## Box Cox transformation: lambda= 0.6498018 
## 
## Coefficients:
##           ar1      ar2      ar3    sar1    sar2
##       -0.7085  -0.5474  -0.3192  0.4521  0.4024
## s.e.   0.0499   0.0551   0.0498  0.0478  0.0485
## 
## sigma^2 estimated as 49.57:  log likelihood=-1229.01
## AIC=2470.01   AICc=2470.25   BIC=2493.4
## 
## Training set error measures:
##                    ME     RMSE      MAE  MPE MAPE      MASE        ACF1
## Training set 1.774661 28.72892 20.17921 -Inf  Inf 0.9866373 -0.04450353

autoplot(forecast(Model1_ATM2_Weekly, 31, level = 95))

ATM2_Forecast = forecast(Model1_ATM2_Weekly, 31, level = 95)

F_ATM2 =data_frame(DATE = rep(max(ATM_Data$DATE) + 1:31),
           ATM = rep("ATM2"),
           Cash = as.numeric( ATM2_Forecast$mean) )

ATM 3

For model 3 I filtered zero due to erroneous forecasting values.

ggtsdisplay(ATM3)

ATM3_Forecast_fit =meanf(ATM3, 31, level = 95)


autoplot(ATM3_Forecast_fit)

checkresiduals(ATM3_Forecast_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from Mean
## Q* = 1.2822, df = 1, p-value = 0.2575
## 
## Model df: 1.   Total lags used: 2

summary(ATM3_Forecast_fit)

## 
## Forecast method: Mean
## 
## Model Information:
## $mu
## [1] 87.66667
## 
## $mu.se
## [1] 4.255715
## 
## $sd
## [1] 7.371115
## 
## $call
## meanf(y = ATM3, h = 31, level = 95)
## 
## 
## Error measures:
##                         ME    RMSE      MAE        MPE     MAPE MASE
## Training set -4.737024e-15 6.01849 5.555556 -0.4557562 6.242793  NaN
##                   ACF1
## Training set -0.295501
## 
## Forecasts:
##           Point Forecast    Lo 95    Hi 95
## 2010.3288       87.66667 51.04494 124.2884
## 2010.3315       87.66667 51.04494 124.2884
## 2010.3342       87.66667 51.04494 124.2884
## 2010.3370       87.66667 51.04494 124.2884
## 2010.3397       87.66667 51.04494 124.2884
## 2010.3425       87.66667 51.04494 124.2884
## 2010.3452       87.66667 51.04494 124.2884
## 2010.3479       87.66667 51.04494 124.2884
## 2010.3507       87.66667 51.04494 124.2884
## 2010.3534       87.66667 51.04494 124.2884
## 2010.3562       87.66667 51.04494 124.2884
## 2010.3589       87.66667 51.04494 124.2884
## 2010.3616       87.66667 51.04494 124.2884
## 2010.3644       87.66667 51.04494 124.2884
## 2010.3671       87.66667 51.04494 124.2884
## 2010.3699       87.66667 51.04494 124.2884
## 2010.3726       87.66667 51.04494 124.2884
## 2010.3753       87.66667 51.04494 124.2884
## 2010.3781       87.66667 51.04494 124.2884
## 2010.3808       87.66667 51.04494 124.2884
## 2010.3836       87.66667 51.04494 124.2884
## 2010.3863       87.66667 51.04494 124.2884
## 2010.3890       87.66667 51.04494 124.2884
## 2010.3918       87.66667 51.04494 124.2884
## 2010.3945       87.66667 51.04494 124.2884
## 2010.3973       87.66667 51.04494 124.2884
## 2010.4000       87.66667 51.04494 124.2884
## 2010.4027       87.66667 51.04494 124.2884
## 2010.4055       87.66667 51.04494 124.2884
## 2010.4082       87.66667 51.04494 124.2884
## 2010.4110       87.66667 51.04494 124.2884

F_ATM3 =data_frame(DATE = rep(max(ATM_Data$DATE) + 1:31),
           ATM = rep("ATM3"),
           Cash = as.numeric( ATM3_Forecast_fit$mean))

ATM 4

ATM4_Weekly = ts(ATM4, frequency = 7)



ggtsdisplay(ATM4_Weekly)

" The model was ARIMA(0,0,0)(1,0,0)[7] with non-zero mean based on RMSE I will use the second model."

## [1] " The model was ARIMA(0,0,0)(1,0,0)[7] with non-zero mean based on RMSE I will use the second model."

Model1_ATM4_Weekly =auto.arima(ATM4_Weekly,lambda=BoxCox.lambda(ATM4_Weekly))

checkresiduals(Model1_ATM4_Weekly)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0)(1,0,0)[7] with non-zero mean
## Q* = 20.98, df = 12, p-value = 0.05068
## 
## Model df: 2.   Total lags used: 14

"The summary had AICc=990.71 RMSE 682.3723"

## [1] "The summary had AICc=990.71 RMSE 682.3723"

summary(Model1_ATM4_Weekly)

## Series: ATM4_Weekly 
## ARIMA(0,0,0)(1,0,0)[7] with non-zero mean 
## Box Cox transformation: lambda= -0.07372558 
## 
## Coefficients:
##         sar1    mean
##       0.3080  4.4898
## s.e.  0.0507  0.0699
## 
## sigma^2 estimated as 0.8723:  log likelihood=-492.32
## AIC=990.65   AICc=990.71   BIC=1002.35
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 217.7942 682.3723 322.4247 -225.6727 291.2924 0.8022625
##                     ACF1
## Training set -0.01275589

autoplot(forecast(Model1_ATM4_Weekly, 31, level = 95))

ATM4_Forecast = forecast(Model1_ATM4_Weekly, 31, level = 95)

"Model 2 is an ETS(M,N,A) multiplicative errors, no trend and additive seasonality."

## [1] "Model 2 is an ETS(M,N,A) multiplicative errors, no trend and additive seasonality."

Model2_ATM4_Weekly = ets(ATM4_Weekly)

"The model fails Ljung-Box test, but appear to have residual around 0 somewhat normal."

## [1] "The model fails Ljung-Box test, but appear to have residual around 0 somewhat normal."

checkresiduals(Model2_ATM4_Weekly)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,A)
## Q* = 13.559, df = 5, p-value = 0.01866
## 
## Model df: 9.   Total lags used: 14

"The model summary revealed an AICc 6718.904 and RMSE 635.1095"

## [1] "The model summary revealed an AICc 6718.904 and RMSE 635.1095"

summary(Model2_ATM4_Weekly)

## ETS(M,N,A) 
## 
## Call:
##  ets(y = ATM4_Weekly) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 461.9642 
##     s=-256.1942 -59.7629 175.6576 8.4921 60.8334 18.4638
##            52.5102
## 
##   sigma:  1.1478
## 
##      AIC     AICc      BIC 
## 6718.282 6718.904 6757.281 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 11.86083 635.1095 301.9556 -568.2801 598.7681 0.7513309
##                      ACF1
## Training set -0.004710367

autoplot(forecast(Model2_ATM4_Weekly, 31, level = 95))

ATM4_Forecast_ets = forecast(Model2_ATM4_Weekly, 31, level = 95)




F_ATM4 =data_frame(DATE = rep(max(ATM_Data$DATE) + 1:31),
           ATM = rep("ATM4"),
           Cash = as.numeric( ATM4_Forecast_ets$mean))



ATM_Fs = rbind(F_ATM1,F_ATM2,F_ATM3,F_ATM4)
write.csv(ATM_Fs,"Forecast_ATM.csv")

Part B

Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

For this series, I started by checking for Nas and found one on row 861 the value was replaced with the median. For this series, I will use an Arima and ets model with Boxcox transformation of 0.082.

Power_Data = read_xlsx("ResidentialCustomerForecastLoad-624.xlsx")
sum(is.na(Power_Data))

## [1] 1

Power_Data$KWH[is.na(Power_Data$KWH)]= median(Power_Data$KWH,na.rm = TRUE)

Power_ts_All =ts(Power_Data[,"KWH"],start = c(1998,1),frequency = 12)

" I will also split the dataset into training and test set."

## [1] " I will also split the dataset into training and test set."

Power_ts_Train = ts(Power_Data[,"KWH"],start = c(1998,1),end=c(2012,12),frequency = 12)

Power_ts_Test = ts(Power_Data[,"KWH"],start = c(2013,1),end=c(2013,12),frequency = 12)


autoplot(Power_ts_All)+labs(title = "Monthly Power Usage Jan-1998-Dec-2013")

"Seasonality can be observed in the below plot with a slight dip in case 883 Jul 2010."

## [1] "Seasonality can be observed in the below plot with a slight dip in case 883 Jul 2010."

ggseasonplot(Power_ts_All)

"Checking the ts display function identified the seasonality previously mentioned."

## [1] "Checking the ts display function identified the seasonality previously mentioned."

ggtsdisplay(Power_ts_Train)

"The first model for power time series is the ARIMA(1,0,0)(2,0,0)[12] ordinary lag of 1 and 2 seasonal lags."

## [1] "The first model for power time series is the ARIMA(1,0,0)(2,0,0)[12] ordinary lag of 1 and 2 seasonal lags."

Power_Model1 =auto.arima(Power_ts_Train,lambda=BoxCox.lambda(Power_ts_Train))

"The residual appears normal residuals mostly around 0, and Ljung-Box test had a p-value = 0.7106 which indicated white noise."

## [1] "The residual appears normal residuals mostly around 0, and Ljung-Box test had a p-value = 0.7106 which indicated white noise."

checkresiduals(Power_Model1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(2,0,0)[12] with non-zero mean
## Q* = 16.097, df = 20, p-value = 0.7106
## 
## Model df: 4.   Total lags used: 24

"Summary of the model reveals a AICc=437.5 and RMSE of 1,089,831 "

## [1] "Summary of the model reveals a AICc=437.5 and RMSE of 1,089,831 "

summary(Power_Model1)

## Series: Power_ts_Train 
## ARIMA(1,0,0)(2,0,0)[12] with non-zero mean 
## Box Cox transformation: lambda= 0.0827662 
## 
## Coefficients:
##          ar1    sar1    sar2     mean
##       0.1875  0.2781  0.2735  32.0788
## s.e.  0.0746  0.0693  0.0727   0.1439
## 
## sigma^2 estimated as 0.6291:  log likelihood=-213.58
## AIC=437.15   AICc=437.5   BIC=453.12
## 
## Training set error measures:
##                    ME    RMSE    MAE       MPE     MAPE     MASE      ACF1
## Training set 138495.8 1089831 729536 -4.273385 15.06852 1.033785 0.1669375

" In this step, I will compare my predictions to my testing set visually. The model appears to capture the trend of the data."

## [1] " In this step, I will compare my predictions to my testing set visually. The model appears to capture the trend of the data."

autoplot(forecast(Power_Model1, 12, level = 95))+autolayer(Power_ts_Test,series = "Test Data")

Power_Model1_Forecast = forecast(Power_Model1, 12, level = 95)

# Base on forcast errors test and trainig based on residuals

"The accuracy test will compare the forecast base on residuals and the testing data based on the forecast errors, I'm looking for the test set with the lowest RMSE. RMSE is 1,126,323."

## [1] "The accuracy test will compare the forecast base on residuals and the testing data based on the forecast errors, I'm looking for the test set with the lowest RMSE. RMSE is 1,126,323."

accuracy(Power_Model1_Forecast,Power_ts_Test)

##                     ME    RMSE      MAE        MPE     MAPE     MASE
## Training set  138495.8 1089831 729536.0  -4.273385 15.06852 1.033785
## Test set     -678188.8 1126323 995955.3 -14.142039 18.06837 1.411313
##                   ACF1 Theil's U
## Training set 0.1669375        NA
## Test set     0.3492143  1.062158

"Model 2 is based on ets the model returned is ETS(A,N,A) with Additive errors no trend and additive seasonality. The Ljung-Box test has a p-value = 0.1249, ACF and residuals seem normally distributed."

## [1] "Model 2 is based on ets the model returned is ETS(A,N,A) with Additive errors no trend and additive seasonality. The Ljung-Box test has a p-value = 0.1249, ACF and residuals seem normally distributed."

Power_Model2 =ets(Power_ts_Train,lambda=BoxCox.lambda(Power_ts_Train))
checkresiduals(Power_Model2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 15.202, df = 10, p-value = 0.1249
## 
## Model df: 14.   Total lags used: 24

"RMSE 87,7944.1 and AICc of 826.9934 is observed in the summary."

## [1] "RMSE 87,7944.1 and AICc of 826.9934 is observed in the summary."

summary(Power_Model2)

## ETS(A,N,A) 
## 
## Call:
##  ets(y = Power_ts_Train, lambda = BoxCox.lambda(Power_ts_Train)) 
## 
##   Box-Cox transformation: lambda= 0.0828 
## 
##   Smoothing parameters:
##     alpha = 0.0446 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 31.9427 
##     s=-0.1377 -0.9081 -0.3601 0.7124 0.9859 0.1758
##            0.0453 -0.8169 -0.6327 -0.2031 0.3195 0.8197
## 
##   sigma:  0.6766
## 
##      AIC     AICc      BIC 
## 824.0665 826.9934 871.9609 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 165832.5 877944.1 571839.9 -2.476022 12.30692 0.8103228
##                   ACF1
## Training set 0.2687275

"Comparing training model vs test data."

## [1] "Comparing training model vs test data."

autoplot(forecast(Power_Model2, 12, level = 95))+autolayer(Power_ts_Test,series = "Test Data")

Power_Model2_Forecast = forecast(Power_Model2, 12, level = 95)
"The RMSE for the test data is 1,165,274.4"

## [1] "The RMSE for the test data is 1,165,274.4"

accuracy(Power_Model2_Forecast,Power_ts_Test)

##                     ME      RMSE       MAE        MPE     MAPE      MASE
## Training set  165832.5  877944.1  571839.9  -2.476022 12.30692 0.8103228
## Test set     -690572.9 1165274.4 1014612.8 -13.347726 17.11358 1.4377519
##                   ACF1 Theil's U
## Training set 0.2687275        NA
## Test set     0.1956006 0.9513105

"After comparing the RMSE of in the accuracy test, I will choose the first model due to the lower score and better prediction capabilities of the model."

## [1] "After comparing the RMSE of in the accuracy test, I will choose the first model due to the lower score and better prediction capabilities of the model."

powerdf = data.frame(Date = paste0(month.abb,"-2014"),
           KWH = as.numeric(Power_Model1_Forecast$mean) )
write.csv(powerdf,"Power_Forecast2014.csv")

Part C

BONUS, optional (part or all), Waterflow_Pipe1.xlsx and Waterflow_Pipe2.xlsx

Part C consists of two data sets. These are simple 2 columns sets, however they have different time stamps. Your optional assignment is to time-base sequence the data and aggregate based on hour (example of what this looks like, follows). Note for multiple recordings within an hour, take the mean. Then to determine if the data is stationary and can it be forecast. If so, provide a week forward forecast and present results as above in a Word readable file and the forecast in an Excel readable file.

For this example, I start by reading the excel files using readxl I import the columns as date and numeric. This assists me in reducing the steps needed to convert the data. The data is rounded, group by and averaged.

w1 = read_excel("Waterflow_Pipe1.xlsx",col_types =c("date", "numeric"))
w2 = read_excel("Waterflow_Pipe2.xlsx",col_types =c("date", "numeric"))
colnames(w1)= c("date_time","WaterFlow")
colnames(w2)= c("Date_Time","WaterFlow")

Waterdf= w1 %>% mutate(Date_Time = lubridate::round_date(date_time,"hour") ) %>% select(Date_Time,WaterFlow) %>% bind_rows(w2) %>% group_by(Date_Time) %>% summarize(WaterFlowF = mean(WaterFlow, na.rm = T))


"The next step is to create the time series. I choose a frequency of 24 to simulate hourly other forms frequencies and times were tested. I also tried to separate the data into a training and test set but when testing the model the forecast was at the end and test data at the beginning of time series."

## [1] "The next step is to create the time series. I choose a frequency of 24 to simulate hourly other forms frequencies and times were tested. I also tried to separate the data into a training and test set but when testing the model the forecast was at the end and test data at the beginning of time series."

Water_ts = ts(Waterdf$WaterFlowF,frequency = 24)

"Here we examine the complete time series data and proceed to view the ts display."

## [1] "Here we examine the complete time series data and proceed to view the ts display."

autoplot(Water_ts)

"The ts ACF plot reveals that correlations are not random and that the time series is not white noise and can be used to forecast future values."

## [1] "The ts ACF plot reveals that correlations are not random and that the time series is not white noise and can be used to forecast future values."

ggtsdisplay(Water_ts)

"
We will first run an ARIMA model we get is ARIMA(0,1,1).The Arima model has been differenced once, and one past error has been used. We passed lambda -0.82 simulating an inverse. The residuals are close to zero with few outliers."

## [1] "\nWe will first run an ARIMA model we get is ARIMA(0,1,1).The Arima model has been differenced once, and one past error has been used. We passed lambda -0.82 simulating an inverse. The residuals are close to zero with few outliers."

Water_Model1 =auto.arima(Water_ts,lambda=BoxCox.lambda(Water_ts),stepwise = FALSE)

checkresiduals(Water_Model1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 41.388, df = 47, p-value = 0.7034
## 
## Model df: 1.   Total lags used: 48

"The RMSE for the first model is 16.73475 and AICc=-3270.76. "

## [1] "The RMSE for the first model is 16.73475 and AICc=-3270.76. "

summary(Water_Model1)

## Series: Water_ts 
## ARIMA(0,1,1) 
## Box Cox transformation: lambda= -0.8243695 
## 
## Coefficients:
##           ma1
##       -0.9869
## s.e.   0.0055
## 
## sigma^2 estimated as 0.002209:  log likelihood=1637.39
## AIC=-3270.77   AICc=-3270.76   BIC=-3260.96
## 
## Training set error measures:
##                    ME     RMSE      MAE      MPE     MAPE     MASE
## Training set 7.689601 16.73475 12.72097 -1.78783 42.43292 0.847649
##                    ACF1
## Training set 0.05335534

Water_Model1_Forecast = forecast(Water_Model1, 168, level=95)

autoplot(forecast(Water_Model1, 168, level = 95))

" Model two consist of an ets model. The Ljung-Box test identifies white noise residuals close to zero and some random autocorrelations. The model selected was ETS(A,N,N) with additive errors no seasonality and no trend. RMSE 16.76812 and AICc 798.2221."

## [1] " Model two consist of an ets model. The Ljung-Box test identifies white noise residuals close to zero and some random autocorrelations. The model selected was ETS(A,N,N) with additive errors no seasonality and no trend. RMSE 16.76812 and AICc 798.2221."

Water_Model2 = ets(Water_ts,lambda=BoxCox.lambda(Water_ts))

checkresiduals(Water_Model2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 41.298, df = 46, p-value = 0.6692
## 
## Model df: 2.   Total lags used: 48

summary(Water_Model2)

## ETS(A,N,N) 
## 
## Call:
##  ets(y = Water_ts, lambda = BoxCox.lambda(Water_ts)) 
## 
##   Box-Cox transformation: lambda= -0.8244 
## 
##   Smoothing parameters:
##     alpha = 0.0119 
## 
##   Initial states:
##     l = 1.1247 
## 
##   sigma:  0.0469
## 
##      AIC     AICc      BIC 
## 798.1980 798.2221 812.9243 
## 
## Training set error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.738348 16.76812 12.7472 -1.624566 42.41892 0.8493971
##                    ACF1
## Training set 0.05609743

autoplot(forecast(Water_Model2, 168, level = 95))

Water_Model2_Forecast = forecast(Water_Model2, 168, level=95) 
 

"I chose the first model due to the slightly better RMSE and AICc."

## [1] "I chose the first model due to the slightly better RMSE and AICc."

Water_csv= data_frame(Date_Time = max(Waterdf$Date_Time) + lubridate::hours(1:168),
           WaterFlowF = as.numeric( Water_Model1_Forecast$mean) )

write.csv(Water_csv,"Water_ForecastWeek.csv")

Data 624

Project 1

Christopher Estevez