In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
The first step is to read in the data and examine it to determine the next steps. We are reading in the data, converting it to a dataframe and converting the date into readable format.
setwd("/Users/elinaazrilyan/Downloads/")
atmdata <- readxl::read_excel("ATM624Data.xlsx", skip=0)
atmdf<-as.data.frame(atmdata)
atmdf$DATE <- as.Date(atmdf$DATE, origin = "1899-12-30")
summary(atmdf)
## DATE ATM Cash
## Min. :2009-05-01 Length:1474 Min. : 0.0
## 1st Qu.:2009-08-01 Class :character 1st Qu.: 0.5
## Median :2009-11-01 Mode :character Median : 73.0
## Mean :2009-10-31 Mean : 155.6
## 3rd Qu.:2010-02-01 3rd Qu.: 114.0
## Max. :2010-05-14 Max. :10919.8
## NA's :19
We see that there are NAs, so the next step is to drop the rows with missing ATM and Cash values.
atmdf <- atmdf %>% filter((!(is.na(.$ATM))))
summary(atmdf)
## DATE ATM Cash
## Min. :2009-05-01 Length:1460 Min. : 0.0
## 1st Qu.:2009-07-31 Class :character 1st Qu.: 0.5
## Median :2009-10-30 Mode :character Median : 73.0
## Mean :2009-10-30 Mean : 155.6
## 3rd Qu.:2010-01-29 3rd Qu.: 114.0
## Max. :2010-04-30 Max. :10919.8
## NA's :5
We can see that we have 1 year worth of data that ranges from 5/1/2009 to 4/30/2010. Cash withdrawals range from 0 to 1,091,980 which seems suspiciously high if we look at the mean (15,560) and median (7,300) values.
We will now separate the data for the 4 ATMs and examine those individually. We will aslo replace NAs with Median values for each ATM.
atm1 <- subset(atmdf, ATM == "ATM1")
atm2 <- subset(atmdf, ATM == "ATM2")
atm3 <- subset(atmdf, ATM == "ATM3")
atm4 <- subset(atmdf, ATM == "ATM4")
atm1$Cash <- atm1$Cash %>% replace(is.na(.), median(atm1$Cash, na.rm=T))
summary(atm1)
## DATE ATM Cash
## Min. :2009-05-01 Length:365 Min. : 1.00
## 1st Qu.:2009-07-31 Class :character 1st Qu.: 73.00
## Median :2009-10-30 Mode :character Median : 91.00
## Mean :2009-10-30 Mean : 83.95
## 3rd Qu.:2010-01-29 3rd Qu.:108.00
## Max. :2010-04-30 Max. :180.00
atm2$Cash <- atm2$Cash %>% replace(is.na(.), median(atm2$Cash, na.rm=T))
summary(atm2)
## DATE ATM Cash
## Min. :2009-05-01 Length:365 Min. : 0.0
## 1st Qu.:2009-07-31 Class :character 1st Qu.: 26.0
## Median :2009-10-30 Mode :character Median : 67.0
## Mean :2009-10-30 Mean : 62.6
## 3rd Qu.:2010-01-29 3rd Qu.: 93.0
## Max. :2010-04-30 Max. :147.0
atm3$Cash <- atm3$Cash %>% replace(is.na(.), median(atm3$Cash, na.rm=T))
summary(atm3)
## DATE ATM Cash
## Min. :2009-05-01 Length:365 Min. : 0.0000
## 1st Qu.:2009-07-31 Class :character 1st Qu.: 0.0000
## Median :2009-10-30 Mode :character Median : 0.0000
## Mean :2009-10-30 Mean : 0.7206
## 3rd Qu.:2010-01-29 3rd Qu.: 0.0000
## Max. :2010-04-30 Max. :96.0000
atm4$Cash <- atm4$Cash %>% replace(is.na(.), median(atm4$Cash, na.rm=T))
summary(atm4)
## DATE ATM Cash
## Min. :2009-05-01 Length:365 Min. : 1.563
## 1st Qu.:2009-07-31 Class :character 1st Qu.: 124.334
## Median :2009-10-30 Mode :character Median : 403.839
## Mean :2009-10-30 Mean : 474.043
## 3rd Qu.:2010-01-29 3rd Qu.: 704.507
## Max. :2010-04-30 Max. :10919.762
The next step is to convert the data into time series format and see if any additional adjustments and clean up will be needed.
atm1ts <- ts(atm1["Cash"], frequency = 7)
atm2ts <- ts(atm2["Cash"], frequency = 7)
atm3ts <- ts(atm3["Cash"], frequency = 7)
atm4ts <- ts(atm4["Cash"], frequency = 7)
autoplot(atm1ts)
autoplot(atm2ts)
autoplot(atm3ts)
autoplot(atm4ts)
I am concerned about a few things on the plots above - there is almost no data for ATM3 and there is a significant outlier in ATM4 data which is obvious from visually inspecting the data. I will now remove the outlier from ATM4 and replace it with the more realistic highest value.
tail(sort(atm4$Cash))
## [1] 1301.396 1484.127 1495.155 1574.779 1712.075 10919.762
atm4$Cash[atm4$Cash >=2000] <- 1712.075
tail(sort(atm4$Cash))
## [1] 1301.396 1484.127 1495.155 1574.779 1712.075 1712.075
atm4ts <- ts(atm4["Cash"], frequency = 7)
autoplot(atm4ts)
The data looks much more realistic and it is worth noting that the scale of daily withdrawals is much different from ATM1 and ATM2.
I have decided to fill missing ATM3 data with a mean value of ATM1 and ATM2, I believe it will work better than forecasting on only the 3 recent data points available.
for (i in 1:362)
{
atm3$Cash[i] <- mean(c(atm1$Cash[i],atm2$Cash[i]))
}
atm3ts <- ts(atm3["Cash"], frequency = 7)
autoplot(atm3ts)
Let’s determine whether there is weekly seasonality in our data.
ggseasonplot(atm1ts, polar=TRUE) +
ylab("ATM Withdrawals") +
ggtitle("Polar seasonal plot: ATM 1")
atm1ts %>% decompose(type="additive") %>%
autoplot() + xlab("Week") +
ggtitle("Classical additive decomposition of ATM 1")
It is obvious form the plot above that there is seasonality and no trend. I am a little concerned about the pattern in the error plot - there is a lot of variation as we get into more recent data.
I will now use Holt-Winters’ additive method of seasonaly adjusted data with auto lambda value for BoxCox transformation.
atm1fit1 <- decompose(atm1ts, type="additive")
atm1fit1 <- atm1fit1 %>% seasadj() %>% hw(h=14, damped=TRUE, lambda = "auto")
autoplot(forecast(atm1fit1))
accuracy(atm1fit1)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2515983 23.98107 15.41124 11.92344 64.67484 0.8703618
## ACF1
## Training set 0.1332457
Now let’s check residuals.
checkresiduals(atm1fit1)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' additive method
## Q* = 24.449, df = 3, p-value = 2.013e-05
##
## Model df: 12. Total lags used: 15
From ACF and normal residuals distribution and p-value of Ljung-Box test we can conclude that the residuals are not distinguishable from a white noise series.
I am going to try an Arima model as well and see what results that yields.
atm1fit2 <- auto.arima(atm1ts)
atm1fit2
## Series: atm1ts
## ARIMA(0,0,1)(0,1,2)[7]
##
## Coefficients:
## ma1 sma1 sma2
## 0.1695 -0.5867 -0.0989
## s.e. 0.0553 0.0508 0.0497
##
## sigma^2 estimated as 559.8: log likelihood=-1641.12
## AIC=3290.23 AICc=3290.34 BIC=3305.75
autoplot(forecast(atm1fit2))
accuracy(atm1fit2)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.07363941 23.3332 14.60281 -102.7359 117.6027 0.8247052
## ACF1
## Training set -0.008134059
Now let’s check residuals for Arima model.
checkresiduals(atm1fit2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,1,2)[7]
## Q* = 16.789, df = 11, p-value = 0.1143
##
## Model df: 3. Total lags used: 14
From ACF and normal residuals distribution looks ok but p-value of Ljung-Box test is preventing us from concluding that the residuals are not distinguishable from a white noise series.
Even though Arima results in lower RMSE, I am going to use Holt-Winters’ forecast since residuals look better and the RMSE difference is not that significant.
atm1fcdf <- data.frame(forecast(atm1fit1, h=14))
write_csv(atm1fcdf, "ATM1_Result.csv")
Let’s determine whether there is weekly seasonality in our data.
ggseasonplot(atm2ts, polar=TRUE) +
ylab("ATM Withdrawals") +
ggtitle("Polar seasonal plot: ATM 2")
atm2ts %>% decompose(type="additive") %>%
autoplot() + xlab("Week") +
ggtitle("Classical additive decomposition of ATM 2")
It is obvious form the plot above that there is seasonality and a little bit of a downward trend. The pattern in the error plot shows a lot of variation as we get into more recent data.
I will now use Holt-Winters’ additive method of seasonaly adjusted data with auto lambda value for BoxCox transformation.
atm2fit1 <- decompose(atm2ts, type="additive")
atm2fit1 <- atm2fit1 %>% seasadj() %>% hw(h=14, damped=TRUE, lambda = "auto")
autoplot(forecast(atm2fit1))
accuracy(atm2fit1)
## ME RMSE MAE MPE MAPE MASE
## Training set -4.438064 25.35359 18.42074 -2.799921 94.77185 0.8851845
## ACF1
## Training set 0.02623264
Now let’s check residuals.
checkresiduals(atm2fit1)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' additive method
## Q* = 25.901, df = 3, p-value = 1e-05
##
## Model df: 12. Total lags used: 15
From ACF and normal residuals distribution and p-value of Ljung-Box test we can conclude that the residuals are not distinguishable from a white noise series.
I am going to try an Arima model as well and see what results that yields.
atm2fit2 <- auto.arima(atm2ts, lambda = "auto")
atm2fit2
## Series: atm2ts
## ARIMA(3,0,3)(0,1,1)[7] with drift
## Box Cox transformation: lambda= 0.7278107
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 sma1 drift
## 0.4890 -0.4982 0.8330 -0.4843 0.3277 -0.7863 -0.7141 -0.0204
## s.e. 0.0855 0.0742 0.0606 0.1057 0.0946 0.0615 0.0456 0.0074
##
## sigma^2 estimated as 69.29: log likelihood=-1265.19
## AIC=2548.39 AICc=2548.9 BIC=2583.31
autoplot(forecast(atm2fit2))
accuracy(atm2fit2)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.44981 24.25163 17.12271 -Inf Inf 0.8228094 0.005867841
Now let’s check residuals for Arima model.
checkresiduals(atm2fit2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,0,3)(0,1,1)[7] with drift
## Q* = 8.859, df = 6, p-value = 0.1817
##
## Model df: 8. Total lags used: 14
From ACF and normal residuals distribution looks ok but p-value of Ljung-Box test is preventing us from concluding that the residuals are not distinguishable from a white noise series.
Since Arima results in lower RMSE, I am going to use that model even though Holt-Winters’ forecast has better residuals.
atm2fcdf <- data.frame(forecast(atm2fit2, h=14))
write_csv(atm2fcdf, "ATM2_Result.csv")
Let’s determine whether there is weekly seasonality in our data.
ggseasonplot(atm3ts, polar=TRUE) +
ylab("ATM Withdrawals") +
ggtitle("Polar seasonal plot: ATM 3")
atm3ts %>% decompose(type="additive") %>%
autoplot() + xlab("Week") +
ggtitle("Classical additive decomposition of ATM 3")
It is obvious form the plot above that there is seasonality and no trend. The pattern in the error plot shows a lot of variation as we get into more recent data.
I want to use Holt-Winters’ additive method and give more weight to recent data since that is the actual information we have from ATM 3 so I will use a high alpha parameter of 0.75.
I will now use Holt-Winters’ additive method of seasonaly adjusted data with auto lambda value for BoxCox transformation.
atm3fit1 <- decompose(atm3ts, type="additive")
atm3fit1 <- atm3fit1 %>% seasadj() %>% hw(h=14, alpha = 0.75, lambda = "auto")
autoplot(forecast(atm3fit1))
accuracy(atm3fit1)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.6183751 31.87542 20.7277 22.12999 94.21412 1.309426
## ACF1
## Training set -0.1383788
checkresiduals(atm3fit1)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 166.19, df = 3, p-value < 2.2e-16
##
## Model df: 11. Total lags used: 14
ACF is suggesting residuals might be corelated but from the normal residuals distribution and p-value of Ljung-Box test we can assume that the residuals are not distinguishable from a white noise series.
I am going to try an Arima model as well and see what results that yields.
atm3fit2 <- auto.arima(atm3ts, lambda = "auto")
atm3fit2
## Series: atm3ts
## ARIMA(2,0,2)(0,1,1)[7]
## Box Cox transformation: lambda= 0.7606979
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1
## -0.4636 -0.8721 0.5521 0.7807 -0.7124
## s.e. 0.0807 0.0528 0.1077 0.0720 0.0433
##
## sigma^2 estimated as 62.98: log likelihood=-1249.08
## AIC=2510.15 AICc=2510.39 BIC=2533.44
autoplot(forecast(atm3fit2))
accuracy(atm3fit2)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.17537 20.89342 13.80117 -56.17381 72.769 0.8718577
## ACF1
## Training set -0.008116625
Now let’s check residuals for Arima model.
checkresiduals(atm3fit2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,1,1)[7]
## Q* = 9.7159, df = 9, p-value = 0.374
##
## Model df: 5. Total lags used: 14
From ACF and normal residuals distribution looks ok but p-value of Ljung-Box test is preventing us from concluding that the residuals are not distinguishable from a white noise series.
Since Holt-Winters’ method allows us to give more weight to recent data I am going to use that model even though Arima forecast has a much lower RMSE.
atm3fcdf <- data.frame(forecast(atm3fit1, h=14))
write_csv(atm3fcdf, "ATM3_Result.csv")
Let’s determine whether there is weekly seasonality in our data.
ggseasonplot(atm4ts, polar=TRUE) +
ylab("ATM Withdrawals") +
ggtitle("Polar seasonal plot: ATM 4")
atm4ts %>% decompose(type="additive") %>%
autoplot() + xlab("Week") +
ggtitle("Classical additive decomposition of ATM 4")
There is no clear seasonality or trend in ATM4 data.
I will now use Simple Exponential Smoothing method with auto lambda value for BoxCox transformation.
atm4fit1 <- ses(atm4ts, damped=TRUE, lambda = "auto", h=14)
autoplot(forecast(atm4fit1))
accuracy(atm4fit1)
## ME RMSE MAE MPE MAPE MASE
## Training set 97.41179 369.7028 295.4643 -431.2928 479.2372 0.8430885
## ACF1
## Training set 0.06045465
Now let’s check residuals.
checkresiduals(atm4fit1)
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 54.867, df = 12, p-value = 1.912e-07
##
## Model df: 2. Total lags used: 14
From ACF and normal residuals distribution and p-value of Ljung-Box test we can conclude that the residuals are not distinguishable from a white noise series.
I am going to try an Arima model as well and see what results that yields.
atm4fit2 <- auto.arima(atm4ts, lambda = "auto")
atm4fit2
## Series: atm4ts
## ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Box Cox transformation: lambda= 0.447289
##
## Coefficients:
## sar1 sar2 mean
## 0.2157 0.1788 28.4276
## s.e. 0.0518 0.0522 1.1206
##
## sigma^2 estimated as 175.5: log likelihood=-1459.98
## AIC=2927.95 AICc=2928.06 BIC=2943.55
autoplot(forecast(atm4fit2))
accuracy(atm4fit2)
## ME RMSE MAE MPE MAPE MASE
## Training set 87.94535 359.4022 279.3686 -380.113 425.0579 0.7971604
## ACF1
## Training set 0.07698303
Now let’s check residuals for Arima model.
checkresiduals(atm4fit2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Q* = 14.337, df = 11, p-value = 0.2149
##
## Model df: 3. Total lags used: 14
From ACF and normal residuals distribution looks ok but p-value of Ljung-Box test is preventing us from concluding that the residuals are not distinguishable from a white noise series.
Since Arima results in lower RMSE, I am going to use that model.
atm4fcdf <- data.frame(forecast(atm4fit2, h=14))
write_csv(atm4fcdf, "ATM4_Result.csv")
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
Reading in the data from Excel:
setwd("/Users/elinaazrilyan/Downloads/")
powerdata <- readxl::read_excel("ResidentialCustomerForecastLoad-624.xlsx", skip=0)
powerdf<-as.data.frame(powerdata)
summary(powerdf)
## CaseSequence YYYY-MMM KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5429912
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6502475
## 3rd Qu.:876.2 3rd Qu.: 7620524
## Max. :924.0 Max. :10655730
## NA's :1
After reviwing the summary, we will proceed with data clean-up.
I will replace NA value with median of the data.
powerdf$KWH <- powerdf$KWH %>% replace(is.na(.), median(powerdf$KWH, na.rm=T))
summary(powerdf)
## CaseSequence YYYY-MMM KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5434539
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6501333
## 3rd Qu.:876.2 3rd Qu.: 7608792
## Max. :924.0 Max. :10655730
I will convert the data to time series and review the plot.
powerdfts <- ts(powerdf["KWH"], start = c(1998, 1), frequency = 12)
autoplot(powerdfts)
I will replace the visible outlier value with 75 percentile.
powerdf$KWH <- powerdf$KWH %>% replace(powerdf$KWH==min(powerdf$KWH), quantile(powerdf$KWH, 0.75))
summary(powerdf)
## CaseSequence YYYY-MMM KWH
## Min. :733.0 Length:192 Min. : 4313019
## 1st Qu.:780.8 Class :character 1st Qu.: 5443502
## Median :828.5 Mode :character Median : 6314472
## Mean :828.5 Mean : 6536949
## 3rd Qu.:876.2 3rd Qu.: 7617591
## Max. :924.0 Max. :10655730
powerdfts <- ts(powerdf["KWH"], start = c(1998, 1), frequency = 12)
autoplot(powerdfts)
ggAcf(powerdfts)
There is clear indication of seasonality in both of the plots above and there might be a bit of an upward trend.
autoplot(decompose(powerdfts))
Decomposition confirms seasonality and upward trend.
The polar plot makes it easy to see the seasonal pattern.
ggseasonplot(powerdfts, polar=TRUE) +
ggtitle("Polar seasonal plot: POWER")
It is clear that power consumption increases during the summer and winter months when AC and Heat might be consumming additional electricity. The values are much lower during spring and fall months. Here is another illustration of that:
ggsubseriesplot(powerdfts) +
ylab("KWH") +
ggtitle("Seasonal subseries plot: Power")
I will split the data into test and train datasets - I will use 2 years of data for testing.
powerdfts.train <- window(powerdfts, end=c(2011,12))
powerdfts.test <- window(powerdfts, start=2012)
autoplot(powerdfts) +
autolayer(powerdfts.train, series="Training") +
autolayer(powerdfts.test, series="Test")
I am going to try a variety of methods and see which results in the best forecast.
fc1 <- snaive(powerdfts.train,24)
accuracy(fc1, powerdfts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 114098.3 805602.6 624581.7 0.9213942 9.544575 1.000000
## Test set -272199.3 965567.1 742063.5 -3.8050476 9.295010 1.188097
## ACF1 Theil's U
## Training set 0.3472363 NA
## Test set -0.1987993 0.5349666
checkresiduals(fc1)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 82.562, df = 24, p-value = 2.365e-08
##
## Model df: 0. Total lags used: 24
From ACF and normal residuals distribution and p-value of Ljung-Box test we can conclude that the residuals are not distinguishable from a white noise series.
For my next forecast I will try to use Holt-Winters’ method.
fc2 <- decompose(powerdfts.train, type="additive")
fc2 <- powerdfts.train %>% hw(h=12, lambda = "auto")
autoplot(forecast(fc2))
accuracy(fc2, powerdfts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 22451.44 579996.7 446863.3 -0.3552597 6.961553 0.715460
## Test set -828557.80 983466.9 828955.6 -11.9153503 11.920368 1.327217
## ACF1 Theil's U
## Training set 0.20847896 NA
## Test set 0.08904977 0.6768765
checkresiduals(fc2)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 45.879, df = 8, p-value = 2.506e-07
##
## Model df: 16. Total lags used: 24
From ACF and normal residuals distribution and p-value of Ljung-Box test we can conclude that the residuals are not distinguishable from a white noise series.
I am going to try an Arima model next using auto.arima.
fc3 <- auto.arima(powerdfts.train, lambda = "auto")
fc3
## Series: powerdfts.train
## ARIMA(1,0,1)(2,1,0)[12] with drift
## Box Cox transformation: lambda= -0.2079193
##
## Coefficients:
## ar1 ma1 sar1 sar2 drift
## 0.8342 -0.6300 -0.7816 -0.4217 1e-04
## s.e. 0.1602 0.2144 0.0846 0.0814 1e-04
##
## sigma^2 estimated as 1.405e-05: log likelihood=658.19
## AIC=-1304.38 AICc=-1303.82 BIC=-1286.08
autoplot(forecast(fc3))
accuracy(forecast(fc3), powerdfts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 60279.67 604409.0 475986.4 0.3481779 7.429188 0.7620882
## Test set -105222.67 784633.6 631177.1 -2.0058979 8.212349 1.0105597
## ACF1 Theil's U
## Training set 0.16860725 NA
## Test set 0.05313159 0.452865
Auto.arima came up with ARIMA(1,0,1)(2,1,0)[12] with drift and it looks very promising so far.
checkresiduals(fc3)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1)(2,1,0)[12] with drift
## Q* = 31.121, df = 19, p-value = 0.03916
##
## Model df: 5. Total lags used: 24
From ACF and normal residuals distribution and p-value of Ljung-Box test we can conclude that the residuals are not distinguishable from a white noise series.
The last method I will try is an STL decomposition with ETS method:
fc4 <- stlf(powerdfts.train, method='ets', lambda = "auto")
autoplot(forecast(fc4))
accuracy(forecast(fc4), powerdfts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 75998.45 520886.4 395888.9 0.4939691 6.115148 0.6338464
## Test set -402705.94 848414.3 705178.0 -6.1638482 9.454028 1.1290405
## ACF1 Theil's U
## Training set 0.15959805 NA
## Test set 0.08677054 0.4642975
checkresiduals(fc4)
## Warning in checkresiduals(fc4): The fitted degrees of freedom is based on
## the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(M,N,N)
## Q* = 41.309, df = 22, p-value = 0.00757
##
## Model df: 2. Total lags used: 24
From ACF and normal residuals distribution and p-value of Ljung-Box test we can conclude that the residuals are not distinguishable from a white noise series.
finalpower <- data.frame(rbind(accuracy(forecast(fc1), powerdfts.test), accuracy(forecast(fc2), powerdfts.test), accuracy(forecast(fc3), powerdfts.test), accuracy(forecast(fc4), powerdfts.test)))
rownames(finalpower) <- c("Seasonal Naive Train", "Seasonal Naive Test", "Holt-Winter's Train", "Holt-Winter's Test", "ARIMA Train", "ARIMA Test", "STL-ETS Train", "STL-ETS Test")
finalpower
## ME RMSE MAE MPE MAPE
## Seasonal Naive Train 114098.31 805602.6 624581.7 0.9213942 9.544575
## Seasonal Naive Test -272199.29 965567.1 742063.5 -3.8050476 9.295010
## Holt-Winter's Train 22451.44 579996.7 446863.3 -0.3552597 6.961553
## Holt-Winter's Test -828557.80 983466.9 828955.6 -11.9153503 11.920368
## ARIMA Train 60279.67 604409.0 475986.4 0.3481779 7.429188
## ARIMA Test -105222.67 784633.6 631177.1 -2.0058979 8.212349
## STL-ETS Train 75998.45 520886.4 395888.9 0.4939691 6.115148
## STL-ETS Test -402705.94 848414.3 705178.0 -6.1638482 9.454028
## MASE ACF1 Theil.s.U
## Seasonal Naive Train 1.0000000 0.34723632 NA
## Seasonal Naive Test 1.1880967 -0.19879933 0.5349666
## Holt-Winter's Train 0.7154600 0.20847896 NA
## Holt-Winter's Test 1.3272171 0.08904977 0.6768765
## ARIMA Train 0.7620882 0.16860725 NA
## ARIMA Test 1.0105597 0.05313159 0.4528650
## STL-ETS Train 0.6338464 0.15959805 NA
## STL-ETS Test 1.1290405 0.08677054 0.4642975
The lowest RMSE is with ARIMA(1,0,1)(2,1,0)[12] with drift so that’s what I will use for my forecast.
fc_total <- auto.arima(powerdfts, lambda = "auto")
autoplot(forecast(fc_total))
powerfcdf <- data.frame(forecast(fc_total, h=12))
write_csv(powerfcdf, "Power_Result.csv")