In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
library(tidyverse)
library(fpp2)
library(plotly)
library(readxl)
library(knitr)
library(kableExtra)
First step is to import the data.
atm <- readxl::read_excel('ATM624Data.xlsx')
atm$DATE<-lubridate::ymd(atm$DATE)
Let us check for missing values
# Checking for missing values
colSums(is.na(atm))
## DATE ATM Cash
## 0 14 19
There are 14 blanks in ATM and cash is also blank for the same records. These records need to be removed as we cannot do much with these.
There are 5 records for ATM 1 and 2 which do not have cash data, and these will also be removed as the no of missing cases is insignificant.
# Remove empty rows with no data
atm <- atm[complete.cases(atm), ]
colSums(is.na(atm))
## DATE ATM Cash
## 0 0 0
Let us have a look at the data
ggplot(atm, aes(DATE, Cash)) + geom_line() + facet_grid(ATM~., scales="free") +
labs(title="ATM Withdrawals", y="Hundreds of dollars", x="") +
theme_minimal()
ATM3 has only 3 values and all other values are 0s. ATM 4 seems to have an outlier that we will be replacing with the median value.
Let us split the data into the four ATMs.
# Splitting the data
atm1 <- atm %>% filter(ATM == 'ATM1')
atm2 <- atm %>% filter(ATM == 'ATM2')
atm3 <- atm %>% filter(ATM == 'ATM3')
atm4 <- atm %>% filter(ATM == 'ATM4')
I converted the time series to weekly. There seems to be weekly seasonality in the transaction level with withdrawals dipping on Mondays and Wednesdays. I will check the ACF and PACF plots, run ndiffs, nsdiffs and Boxcox.lambda functions to see if any differencing is recommended and what type of model would be suitable.
atm1_ts <- ts(atm1$Cash, frequency = 7)
ggseasonplot(atm1_ts)
ggtsdisplay(atm1_ts, main="ATM 1 Transactions - Weekly")
ndiffs(atm1_ts)
## [1] 0
nsdiffs(atm1_ts)
## [1] 1
atm1_lambda <- BoxCox.lambda(atm1_ts)
atm1_lambda
## [1] 0.1714881
Let us plot the data after a first order seasonal difference and boxcox transformation.
atm1_ts %>% BoxCox(atm1_lambda) %>% diff(lag=7) %>% ggtsdisplay()
Most of the weekly seasonality has been eliminated and we are left with an almost stationery time series. There is however a significant spike at lag 7.
We are using the additive seasonality model as the data hints towards that.
atm1_holt <- atm1_ts %>% hw(h=31, seasonal="additive",
damped=TRUE, lambda = atm1_lambda)
autoplot(atm1_holt) +
theme_minimal()
data.frame(accuracy(atm1_holt))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 3.064041 28.60584 18.75831 -97.8536 119.717 0.996737 0.1206602
checkresiduals(atm1_holt)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' additive method
## Q* = 15.29, df = 3, p-value = 0.001585
##
## Model df: 12. Total lags used: 15
The residuals plot looks ok. The Ljung Box test with very small p values hints there is still some autocorrelation in the data. The forecast plot does not look too bad either although the confidence intervals look very wide.
atm1_ets <- atm1_ts %>% ets(model="ZZZ", lambda = atm1_lambda)
autoplot(atm1_ets)
autoplot(forecast(atm1_ets, h=31)) + theme_minimal()
data.frame(accuracy(atm1_ets))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 2.915666 28.01873 18.32632 -95.51671 116.9153 0.9737827 0.120238
checkresiduals(atm1_ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 17.389, df = 5, p-value = 0.003818
##
## Model df: 9. Total lags used: 14
The ets method has returned a slightly better RMSE.
atm1_arima <- auto.arima(atm1_ts)
autoplot(forecast(atm1_arima, h=31)) + theme_minimal()
data.frame(accuracy(atm1_arima))
## ME RMSE MAE MPE MAPE MASE
## Training set -0.09700089 25.49555 16.17552 -106.2792 122.4165 0.8594986
## ACF1
## Training set -0.01116544
checkresiduals(atm1_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,1,1)[7]
## Q* = 11.267, df = 12, p-value = 0.5062
##
## Model df: 2. Total lags used: 14
Let us compare the results from the 3 models
results <- data.frame(rbind(accuracy(atm1_holt), accuracy(atm1_ets), accuracy(atm1_arima)))
rownames(results) <- c("Holt-Winter's", "ETS", "ARIMA(0,0,1)(0,1,1)[7]")
results
## ME RMSE MAE MPE MAPE
## Holt-Winter's 3.06404062 28.60584 18.75831 -97.85360 119.7170
## ETS 2.91566598 28.01873 18.32632 -95.51671 116.9153
## ARIMA(0,0,1)(0,1,1)[7] -0.09700089 25.49555 16.17552 -106.27922 122.4165
## MASE ACF1
## Holt-Winter's 0.9967370 0.12066022
## ETS 0.9737827 0.12023805
## ARIMA(0,0,1)(0,1,1)[7] 0.8594986 -0.01116544
The arima method has the lowest RMSE. Looking at the p value from the Ljung-Box, the series is consistent with white noise. We will therefore use arima to predict cash withdrawal from ATM1.
atm2_ts <- ts(atm2$Cash, frequency = 7)
ggseasonplot(atm2_ts)
ggsubseriesplot(atm2_ts) +
labs(title = "ATM #2 Cash Withdrawal", subtitle ="1 May, 2009 to 30 April, 2010",
x = "Days of the Week")
ggtsdisplay(atm2_ts, main="ATM 2 Transactions - Weekly")
There seems to be weekly seasonality with ATM 2 also. We will follow the same procedure as ATM1.
ndiffs(atm2_ts)
## [1] 1
nsdiffs(atm2_ts)
## [1] 1
atm2_lambda <- BoxCox.lambda(atm2_ts)
## Warning in guerrero(x, lower, upper): Guerrero's method for selecting a Box-Cox
## parameter (lambda) is given for strictly positive data.
atm2_lambda
## [1] 0.6846687
Both first order differencing and seasonal differencing is recommended for this data series. Boxcox transformation with a lambda value of 0.68 will done. We will look at the data after these transformations.
atm2_ts %>% BoxCox(atm2_lambda) %>% diff(1)%>%diff(lag=7) %>% ggtsdisplay()
first order differencing results in too many spikes outside the critical values, so we will skip that
atm2_ts %>% BoxCox(atm2_lambda) %>% diff(lag=7) %>% ggtsdisplay()
The seasonality is clear from the chart, we will test the same three models as ATM1.
atm2_holt <- atm2_ts %>% hw(h=31, seasonal="additive", damped=TRUE,
lambda = atm2_lambda)
autoplot(atm2_holt) + theme_minimal()
data.frame(accuracy(atm2_holt))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.090429 28.38626 19.60476 -Inf Inf 0.912326 -0.005558446
checkresiduals(atm2_holt)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' additive method
## Q* = 35.201, df = 3, p-value = 1.105e-07
##
## Model df: 12. Total lags used: 15
atm2_ets <- atm2_ts %>% ets(model="ZZZ", lambda =atm2_lambda)
autoplot(atm2_ets) + theme_minimal()
autoplot(forecast(atm2_ets, h=31)) + theme_minimal()
data.frame(accuracy(atm2_ets))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.6626207 27.85318 19.35218 -Inf Inf 0.9005719 -0.0179467
checkresiduals(atm2_ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 30.585, df = 5, p-value = 1.131e-05
##
## Model df: 9. Total lags used: 14
atm2_arima <- auto.arima(atm2_ts, lambda = atm2_lambda)
autoplot(forecast(atm2_arima, h=31)) + theme_minimal()
data.frame(accuracy(atm2_arima))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.2336435 26.13957 18.36376 -Inf Inf 0.8545749 -0.03105451
checkresiduals(atm2_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,0,3)(0,1,1)[7]
## Q* = 10.925, df = 7, p-value = 0.1419
##
## Model df: 7. Total lags used: 14
Comparing the results
results <- data.frame(rbind(accuracy(atm2_holt), accuracy(atm2_ets),
accuracy(atm2_arima)))
rownames(results) <- c("Holt-Winter's", "ETS", "ARIMA(3,0,3)(0,1,1)[7]")
results
## ME RMSE MAE MPE MAPE MASE
## Holt-Winter's 1.0904294 28.38626 19.60476 -Inf Inf 0.9123260
## ETS 0.6626207 27.85318 19.35218 -Inf Inf 0.9005719
## ARIMA(3,0,3)(0,1,1)[7] -0.2336435 26.13957 18.36376 -Inf Inf 0.8545749
## ACF1
## Holt-Winter's -0.005558446
## ETS -0.017946703
## ARIMA(3,0,3)(0,1,1)[7] -0.031054513
Again Arima seems to be the winner. The RMSE is the lowest and p value from Ljung Box is consistent with a white noise. We will be using arima to predict ATM2 withdrawals.
atm3_ts <- ts(atm3$Cash, frequency = 7)
ggseasonplot(atm3_ts)
ggtsdisplay(atm3_ts, main="ATM 3 Transactions - Weekly")
ATM3 has only 3 valid cash withdrawal data points and the rest of the rows are filled with 0s. The only prediction we can do here is a mean of these 3 points but that does not really mean anything in business decision-making. We will therefore not attempt any of the models for ATM3.
plot_ly(y = atm3$Cash, type='box', name="Cash Transactions - ATM3")
We have seen that there is an outlier value in ATM 4. We will replace this value with the median value before proceeding.
summary(atm4$Cash)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.563 124.334 403.839 474.043 704.507 10919.762
# Replace max value with median
atm4$Cash[atm4$Cash==max(atm4$Cash)] <- median(atm4$Cash, na.rm = TRUE)
Let us visualize the series now.
atm4_ts <- ts(atm4$Cash, frequency = 7)
ggseasonplot(atm4_ts)
ggsubseriesplot(atm4_ts) +
labs(title = "ATM #4 Cash Withdrawal", subtitle ="1 May, 2009 to 30 April, 2010",
x = "Days of the Week")
ggtsdisplay(atm4_ts, main="ATM 4 Transactions - Weekly")
Again, there is weekly seasonality and we will follow the same procedures as ATM1 and 2.
ndiffs(atm4_ts)
## [1] 0
nsdiffs(atm4_ts)
## [1] 0
atm4_lambda <- BoxCox.lambda(atm4_ts)
atm4_lambda
## [1] 0.4525697
Only Boxcox transformation with lambda 0.45 is suggested for this dataset. However, taking out the weekly seasoanlity results in a transformation closer to white noise, so we are doing that.
atm4_ts %>% BoxCox(atm4_lambda) %>% diff(lag=7)%>%ggtsdisplay()
atm4_holt <- atm4_ts %>% hw(h=31, seasonal="additive", damped=TRUE,
lambda = atm2_lambda)
autoplot(atm4_holt) + theme_minimal()
data.frame(accuracy(atm4_holt))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 34.25693 332.6153 259.7949 -436.896 473.7205 0.7499503 0.09914502
checkresiduals(atm4_holt)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' additive method
## Q* = 19.044, df = 3, p-value = 0.0002678
##
## Model df: 12. Total lags used: 15
atm4_ets <- atm4_ts %>% ets(model="ZZZ", lambda =atm4_lambda)
autoplot(atm4_ets) + theme_minimal()
autoplot(forecast(atm4_ets, h=31)) + theme_minimal()
data.frame(accuracy(atm4_ets))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 67.23932 338.2241 259.7873 -376.9491 420.7907 0.7499281 0.09736364
checkresiduals(atm4_ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 19.488, df = 5, p-value = 0.001559
##
## Model df: 9. Total lags used: 14
atm4_arima <- auto.arima(atm4_ts, seasonal = TRUE, lambda = atm4_lambda)
autoplot(forecast(atm4_arima, h=31)) + theme_minimal()
data.frame(accuracy(atm4_arima))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 84.4142 351.8346 274.3411 -370.6976 415.1683 0.7919408 0.01996198
checkresiduals(atm4_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(2,0,0)[7] with non-zero mean
## Q* = 15.872, df = 10, p-value = 0.1034
##
## Model df: 4. Total lags used: 14
Comparing the results
results <- data.frame(rbind(accuracy(atm4_holt), accuracy(atm4_ets),
accuracy(atm4_arima)))
rownames(results) <- c("Holt-Winter's", "ETS", "ARIMA(0,0,1)(2,0,0)[7]")
results
## ME RMSE MAE MPE MAPE MASE
## Holt-Winter's 34.25693 332.6153 259.7949 -436.8960 473.7205 0.7499503
## ETS 67.23932 338.2241 259.7873 -376.9491 420.7907 0.7499281
## ARIMA(0,0,1)(2,0,0)[7] 84.41420 351.8346 274.3411 -370.6976 415.1683 0.7919408
## ACF1
## Holt-Winter's 0.09914502
## ETS 0.09736364
## ARIMA(0,0,1)(2,0,0)[7] 0.01996198
Holt Winter’s seems to be the best model for this one with the lowest RMSE. The residuals plot looks the best for Holt, along with the ACF and PACF with no values outside critical range. We will use Holt to predict ATM4 cash withdrawals.
We will forecast the withdrawals from the 4 ATMS separately.
ATM 1
temp<-forecast(atm1_arima, h=5)
temp1<-data.frame(temp)
temp1
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 52.71429 86.805607 53.71784 119.89337 36.20224 137.40898
## 52.85714 100.640560 67.09247 134.18865 49.33319 151.94793
## 53.00000 74.714560 41.16647 108.26265 23.40719 126.02193
## 53.14286 4.762635 -28.78545 38.31072 -46.54474 56.07001
## 53.28571 100.063192 66.51510 133.61128 48.75582 151.37057
#xlsx::write.xlsx(temp1, 'atm1_forecast.xlsx', row.names=FALSE, col.names=TRUE)
ATM 2
temp<-forecast(atm2_arima, h=5)
temp2<-data.frame(temp)
temp2
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 52.85714 64.209763 31.801756 102.98313 17.83250 125.70979
## 53.00000 70.343442 36.733087 110.11133 21.97421 133.31149
## 53.14286 7.074171 -4.657668 31.01796 -15.06804 47.40373
## 53.28571 1.390290 -11.943070 20.73025 -24.83506 35.53029
## 53.42857 94.165833 56.045458 137.98579 38.56514 163.21840
#xlsx::write.xlsx(temp2, 'atm2_forecast.xlsx', row.names=FALSE, col.names=TRUE)
ATM 3
atm3_mean<-meanf(atm3_ts, h=31) #forecast using mean
temp<-forecast(atm3_mean, h=5)
temp3<-data.frame(temp)
temp3
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 53.14286 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.28571 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.42857 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.57143 0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.71429 0.7205479 -9.49357 10.93467 -14.92427 16.36536
#xlsx::write.xlsx(temp3, 'atm3_forecast.xlsx', row.names=FALSE, col.names=TRUE)
ATM 4
temp<-forecast(atm4_holt, h=5)
temp4<-data.frame(temp)
temp4
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 53.14286 395.9198 56.797252 885.6359 -18.916820 1190.2341
## 53.28571 449.2195 87.216023 953.7610 -2.877173 1264.8552
## 53.42857 426.1876 73.720197 924.4600 -8.645367 1232.7987
## 53.57143 220.1593 -7.516375 650.2278 -125.566908 929.4993
## 53.71429 429.2477 75.484232 928.3654 -7.766152 1237.0750
#xlsx::write.xlsx(temp4, 'atm4_forecast.xlsx', row.names=FALSE, col.names=TRUE)
The missing values for all datasets were removed. Cash was missing for only 5 data points for ATM1 and ATM 2 and those were removed. ATM 4 had an outlier which was replaved with median value. Mean was used to forecast ATM 3 withdrawals as there were only 3 datapoints. For the rest, Holt, ETS and ARIMA was tested and transformation was done using suggested by ndiff and nsdiff. Transformed data were visualized to make final decisions. RMSE and Ljung tests were used to chose the models. Values for next 5 weeks were forecasted using best model, along with upper and lower points for 80% and 95% confidence levels.
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
Let us read in the data first
power <- read_excel("ResidentialCustomerForecastLoad-624.xlsx")
summary(power)
## CaseSequence YYYY-MMM KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5429912
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6502475
## 3rd Qu.:876.2 3rd Qu.: 7620524
## Max. :924.0 Max. :10655730
## NA's :1
head(power) %>% kable() %>% kable_styling(full_width = FALSE)
| CaseSequence | YYYY-MMM | KWH |
|---|---|---|
| 733 | 1998-Jan | 6862583 |
| 734 | 1998-Feb | 5838198 |
| 735 | 1998-Mar | 5420658 |
| 736 | 1998-Apr | 5010364 |
| 737 | 1998-May | 4665377 |
| 738 | 1998-Jun | 6467147 |
There is one missing value in the dataset.
We will replace the missing value with median and convert to a timeseries object.
# Replacing missing value with median
power$KWH[is.na(power$KWH)] = median(power$KWH, na.rm=TRUE)
# Converting the dataframe into timeseries object
power_ts <- ts(power$KWH, start=c(1998,1), frequency = 12)
power_ts
## Jan Feb Mar Apr May Jun Jul Aug
## 1998 6862583 5838198 5420658 5010364 4665377 6467147 8914755 8607428
## 1999 7183759 5759262 4847656 5306592 4426794 5500901 7444416 7564391
## 2000 7068296 5876083 4807961 4873080 5050891 7092865 6862662 7517830
## 2001 7538529 6602448 5779180 4835210 4787904 6283324 7855129 8450717
## 2002 7099063 6413429 5839514 5371604 5439166 5850383 7039702 8058748
## 2003 7256079 6190517 6120626 4885643 5296096 6051571 6900676 8476499
## 2004 7584596 6560742 6526586 4831688 4878262 6421614 7307931 7309774
## 2005 8225477 6564338 5581725 5563071 4453983 5900212 8337998 7786659
## 2006 7793358 5914945 5819734 5255988 4740588 7052275 7945564 8241110
## 2007 8031295 7928337 6443170 4841979 4862847 5022647 6426220 7447146
## 2008 7964293 7597060 6085644 5352359 4608528 6548439 7643987 8037137
## 2009 8072330 6976800 5691452 5531616 5264439 5804433 7713260 8350517
## 2010 9397357 8390677 7347915 5776131 4919289 6696292 770523 7922701
## 2011 8394747 8898062 6356903 5685227 5506308 8037779 10093343 10308076
## 2012 8991267 7952204 6356961 5569828 5783598 7926956 8886851 9612423
## 2013 10655730 7681798 6517514 6105359 5940475 7920627 8415321 9080226
## Sep Oct Nov Dec
## 1998 6989888 6345620 4640410 4693479
## 1999 7899368 5358314 4436269 4419229
## 2000 8912169 5844352 5041769 6220334
## 2001 7112069 5242535 4461979 5240995
## 2002 8245227 5865014 4908979 5779958
## 2003 7791791 5344613 4913707 5756193
## 2004 6690366 5444948 4824940 5791208
## 2005 7057213 6694523 4313019 6181548
## 2006 7296355 5104799 4458429 6226214
## 2007 7666970 5785964 4907057 6047292
## 2008 6283324 5101803 4555602 6442746
## 2009 7583146 5566075 5339890 7089880
## 2010 7819472 5875917 4800733 6152583
## 2011 8943599 5603920 6154138 8273142
## 2012 7559148 5576852 5731899 6609694
## 2013 7968220 5759367 5769083 9606304
We can see that the data is recorded on a monthly basis from 1998 till 2013. We are asked to forecast for 12 months in 2014.
Let us visualize the data
# Plot the series
ggtsdisplay(power_ts, main="Before Transformation, Monthly Power Consumption")
There seems to be one outlier in the dataset.That outlier is the minimum value and we will replace it with the median value.
power$KWH[which.min(power$KWH)] = median(power$KWH, na.rm=TRUE)
power_ts <- ts(power$KWH, start=c(1998,1), frequency = 12)
Let us look at the seasonal plots and subseries plots.
ggseasonplot(power_ts, polar=T)
ggsubseriesplot(power_ts)
There is clear monthly seasonality in the data with Kwh dropping in May and Nov.
power_ts %>%
stl(t.window=13, s.window="periodic", robust=TRUE) %>%
autoplot()
The seasonality is additive and there is a clear upward trend.
We will use ndiff and nsdiff to find recommended transformations.
ndiffs(power_ts)
## [1] 1
nsdiffs(power_ts)
## [1] 1
power_lambda <- BoxCox.lambda(power_ts)
power_lambda
## [1] -0.2134819
We will do the transformations and look at the series.
power_ts %>% BoxCox(power_lambda) %>% diff() %>% diff(lag=12) %>% ggtsdisplay()
The data is more stationery now. We will try the Holt Winters Additive seasonal method, ets and arima on this one also as it worked well with the ATM data.
power_holt <- power_ts %>% hw(h=12, seasonal="additive", damped=TRUE,
lambda = power_lambda)
autoplot(power_holt) + theme_minimal()
data.frame(accuracy(power_holt))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 68172.65 635326.8 467792.1 0.2440629 7.066414 0.7324877 0.1997796
checkresiduals(power_holt)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' additive method
## Q* = 42.687, df = 7, p-value = 3.834e-07
##
## Model df: 17. Total lags used: 24
power_ets <- power_ts %>% ets(model="ZZZ", lambda = power_lambda)
autoplot(power_ets) + theme_minimal()
autoplot(forecast(power_ets, h=12)) + theme_minimal()
data.frame(accuracy(power_ets))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 83565.8 651760.3 492095.4 0.5214465 7.396141 0.7705427 0.2319153
checkresiduals(power_ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 34.232, df = 10, p-value = 0.0001687
##
## Model df: 14. Total lags used: 24
power_arima <- auto.arima(power_ts, seasonal = TRUE, lambda = power_lambda)
autoplot(forecast(power_arima, h=12)) + theme_minimal()
data.frame(accuracy(power_arima))
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 55793.68 652393.9 498502.7 0.2329465 7.561432 0.7805756 0.1092852
checkresiduals(power_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(2,1,0)[12] with drift
## Q* = 32.613, df = 20, p-value = 0.03719
##
## Model df: 4. Total lags used: 24
This is an interesting method and combines ets and arima models.
library(forecastHybrid)
## Warning: package 'forecastHybrid' was built under R version 4.0.4
## Loading required package: thief
## Warning: package 'thief' was built under R version 4.0.4
power_hybrid<-power_ts%>%hybridModel(model = 'ae', lambda = power_lambda)
## Fitting the auto.arima model
## Fitting the ets model
autoplot(forecast(power_hybrid, h=12)) + theme_minimal()
data.frame(accuracy(power_hybrid))
## ME RMSE MAE MPE MAPE ACF1 Theil.s.U
## Test set 69679.74 627641.4 467851.8 0.3771965 7.03551 0.1499348 0.4407237
checkresiduals(power_hybrid)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
Comparing the results, the hybrid model results in the lowest RMSE. The ACF resembles a white noise and the residuals look almost normal. We will therefore be using this hybrid model to forecast the power data.
test <- forecast(power_hybrid, h=12)
test1 <- data.frame(test)
#xlsx::write.xlsx(test1, 'POwer 12 months prediction.xlsx')
The power dataset had monthly data from 1998 to 2013. We were asked to predict for 12 months in 2014. There was one missing value and one outlier, both replaced by the median values. Different techniques that included seasonality were tested and finally the hybrid model with the lowest RMSE was used to predict the series.