Part A ATM

Problem

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

library(tidyverse)
library(fpp2)
library(plotly)
library(readxl)
library(knitr)
library(kableExtra)

First step is to import the data.

atm <- readxl::read_excel('ATM624Data.xlsx')
atm$DATE<-lubridate::ymd(atm$DATE)

Let us check for missing values

# Checking for missing values
colSums(is.na(atm))

## DATE  ATM Cash 
##    0   14   19

There are 14 blanks in ATM and cash is also blank for the same records. These records need to be removed as we cannot do much with these.

There are 5 records for ATM 1 and 2 which do not have cash data, and these will also be removed as the no of missing cases is insignificant.

# Remove empty rows with no data
atm <- atm[complete.cases(atm), ]
colSums(is.na(atm))

## DATE  ATM Cash 
##    0    0    0

Data Visualize

Let us have a look at the data

ggplot(atm, aes(DATE, Cash)) + geom_line() + facet_grid(ATM~., scales="free") +
  labs(title="ATM Withdrawals", y="Hundreds of dollars", x="") +
  theme_minimal()

ATM3 has only 3 values and all other values are 0s. ATM 4 seems to have an outlier that we will be replacing with the median value.

Let us split the data into the four ATMs.

# Splitting the data
atm1 <- atm %>% filter(ATM == 'ATM1')
atm2 <- atm %>% filter(ATM == 'ATM2')
atm3 <- atm %>% filter(ATM == 'ATM3')
atm4 <- atm %>% filter(ATM == 'ATM4')

Models ATM1

I converted the time series to weekly. There seems to be weekly seasonality in the transaction level with withdrawals dipping on Mondays and Wednesdays. I will check the ACF and PACF plots, run ndiffs, nsdiffs and Boxcox.lambda functions to see if any differencing is recommended and what type of model would be suitable.

atm1_ts <- ts(atm1$Cash, frequency = 7)
ggseasonplot(atm1_ts)

ggtsdisplay(atm1_ts, main="ATM 1 Transactions - Weekly")

ndiffs(atm1_ts)

## [1] 0

nsdiffs(atm1_ts)

## [1] 1

atm1_lambda <- BoxCox.lambda(atm1_ts)
atm1_lambda

## [1] 0.1714881

Let us plot the data after a first order seasonal difference and boxcox transformation.

atm1_ts %>% BoxCox(atm1_lambda) %>% diff(lag=7) %>% ggtsdisplay()

Most of the weekly seasonality has been eliminated and we are left with an almost stationery time series. There is however a significant spike at lag 7.

Holt Winters

We are using the additive seasonality model as the data hints towards that.

atm1_holt <- atm1_ts %>% hw(h=31, seasonal="additive", 
                           damped=TRUE, lambda = atm1_lambda)
autoplot(atm1_holt) + 
  theme_minimal()

data.frame(accuracy(atm1_holt))

##                    ME     RMSE      MAE      MPE    MAPE     MASE      ACF1
## Training set 3.064041 28.60584 18.75831 -97.8536 119.717 0.996737 0.1206602

checkresiduals(atm1_holt)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 15.29, df = 3, p-value = 0.001585
## 
## Model df: 12.   Total lags used: 15

The residuals plot looks ok. The Ljung Box test with very small p values hints there is still some autocorrelation in the data. The forecast plot does not look too bad either although the confidence intervals look very wide.

ETS

atm1_ets <- atm1_ts %>% ets(model="ZZZ", lambda = atm1_lambda)
autoplot(atm1_ets)

autoplot(forecast(atm1_ets, h=31)) + theme_minimal()

data.frame(accuracy(atm1_ets))

##                    ME     RMSE      MAE       MPE     MAPE      MASE     ACF1
## Training set 2.915666 28.01873 18.32632 -95.51671 116.9153 0.9737827 0.120238

checkresiduals(atm1_ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 17.389, df = 5, p-value = 0.003818
## 
## Model df: 9.   Total lags used: 14

The ets method has returned a slightly better RMSE.

ARIMA

atm1_arima <- auto.arima(atm1_ts)
autoplot(forecast(atm1_arima, h=31)) + theme_minimal()

data.frame(accuracy(atm1_arima))

##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.09700089 25.49555 16.17552 -106.2792 122.4165 0.8594986
##                     ACF1
## Training set -0.01116544

checkresiduals(atm1_arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(0,1,1)[7]
## Q* = 11.267, df = 12, p-value = 0.5062
## 
## Model df: 2.   Total lags used: 14

Let us compare the results from the 3 models

results <- data.frame(rbind(accuracy(atm1_holt), accuracy(atm1_ets), accuracy(atm1_arima)))
rownames(results) <- c("Holt-Winter's", "ETS", "ARIMA(0,0,1)(0,1,1)[7]")
results

##                                 ME     RMSE      MAE        MPE     MAPE
## Holt-Winter's           3.06404062 28.60584 18.75831  -97.85360 119.7170
## ETS                     2.91566598 28.01873 18.32632  -95.51671 116.9153
## ARIMA(0,0,1)(0,1,1)[7] -0.09700089 25.49555 16.17552 -106.27922 122.4165
##                             MASE        ACF1
## Holt-Winter's          0.9967370  0.12066022
## ETS                    0.9737827  0.12023805
## ARIMA(0,0,1)(0,1,1)[7] 0.8594986 -0.01116544

The arima method has the lowest RMSE. Looking at the p value from the Ljung-Box, the series is consistent with white noise. We will therefore use arima to predict cash withdrawal from ATM1.

Models ATM2

atm2_ts <- ts(atm2$Cash, frequency = 7)
ggseasonplot(atm2_ts)

ggsubseriesplot(atm2_ts) +
  labs(title = "ATM #2 Cash Withdrawal", subtitle ="1 May, 2009 to 30 April, 2010",
       x = "Days of the Week")

ggtsdisplay(atm2_ts, main="ATM 2 Transactions - Weekly")

There seems to be weekly seasonality with ATM 2 also. We will follow the same procedure as ATM1.

ndiffs(atm2_ts)

## [1] 1

nsdiffs(atm2_ts)

## [1] 1

atm2_lambda <- BoxCox.lambda(atm2_ts)

## Warning in guerrero(x, lower, upper): Guerrero's method for selecting a Box-Cox
## parameter (lambda) is given for strictly positive data.

atm2_lambda

## [1] 0.6846687

Both first order differencing and seasonal differencing is recommended for this data series. Boxcox transformation with a lambda value of 0.68 will done. We will look at the data after these transformations.

atm2_ts %>% BoxCox(atm2_lambda) %>% diff(1)%>%diff(lag=7) %>% ggtsdisplay()

first order differencing results in too many spikes outside the critical values, so we will skip that

atm2_ts %>% BoxCox(atm2_lambda) %>% diff(lag=7) %>% ggtsdisplay()

The seasonality is clear from the chart, we will test the same three models as ATM1.

Holt Winters

atm2_holt <- atm2_ts %>% hw(h=31, seasonal="additive", damped=TRUE, 
                            lambda = atm2_lambda)
autoplot(atm2_holt) + theme_minimal()

data.frame(accuracy(atm2_holt))

##                    ME     RMSE      MAE  MPE MAPE     MASE         ACF1
## Training set 1.090429 28.38626 19.60476 -Inf  Inf 0.912326 -0.005558446

checkresiduals(atm2_holt)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 35.201, df = 3, p-value = 1.105e-07
## 
## Model df: 12.   Total lags used: 15

ETS

atm2_ets <- atm2_ts %>% ets(model="ZZZ", lambda =atm2_lambda)
autoplot(atm2_ets) + theme_minimal()

autoplot(forecast(atm2_ets, h=31)) + theme_minimal()

data.frame(accuracy(atm2_ets))

##                     ME     RMSE      MAE  MPE MAPE      MASE       ACF1
## Training set 0.6626207 27.85318 19.35218 -Inf  Inf 0.9005719 -0.0179467

checkresiduals(atm2_ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 30.585, df = 5, p-value = 1.131e-05
## 
## Model df: 9.   Total lags used: 14

Arima

atm2_arima <- auto.arima(atm2_ts, lambda = atm2_lambda)
autoplot(forecast(atm2_arima, h=31)) + theme_minimal()

data.frame(accuracy(atm2_arima))

##                      ME     RMSE      MAE  MPE MAPE      MASE        ACF1
## Training set -0.2336435 26.13957 18.36376 -Inf  Inf 0.8545749 -0.03105451

checkresiduals(atm2_arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,0,3)(0,1,1)[7]
## Q* = 10.925, df = 7, p-value = 0.1419
## 
## Model df: 7.   Total lags used: 14

Comparing the results

results <- data.frame(rbind(accuracy(atm2_holt), accuracy(atm2_ets),
                            accuracy(atm2_arima)))
rownames(results) <- c("Holt-Winter's", "ETS", "ARIMA(3,0,3)(0,1,1)[7]")
results

##                                ME     RMSE      MAE  MPE MAPE      MASE
## Holt-Winter's           1.0904294 28.38626 19.60476 -Inf  Inf 0.9123260
## ETS                     0.6626207 27.85318 19.35218 -Inf  Inf 0.9005719
## ARIMA(3,0,3)(0,1,1)[7] -0.2336435 26.13957 18.36376 -Inf  Inf 0.8545749
##                                ACF1
## Holt-Winter's          -0.005558446
## ETS                    -0.017946703
## ARIMA(3,0,3)(0,1,1)[7] -0.031054513

Again Arima seems to be the winner. The RMSE is the lowest and p value from Ljung Box is consistent with a white noise. We will be using arima to predict ATM2 withdrawals.

Models ATM 3

atm3_ts <- ts(atm3$Cash, frequency = 7)
ggseasonplot(atm3_ts)

ggtsdisplay(atm3_ts, main="ATM 3 Transactions - Weekly")

ATM3 has only 3 valid cash withdrawal data points and the rest of the rows are filled with 0s. The only prediction we can do here is a mean of these 3 points but that does not really mean anything in business decision-making. We will therefore not attempt any of the models for ATM3.

plot_ly(y = atm3$Cash, type='box', name="Cash Transactions - ATM3")

Models ATM4

We have seen that there is an outlier value in ATM 4. We will replace this value with the median value before proceeding.

summary(atm4$Cash)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     1.563   124.334   403.839   474.043   704.507 10919.762

# Replace max value with median
atm4$Cash[atm4$Cash==max(atm4$Cash)] <- median(atm4$Cash, na.rm = TRUE)

Let us visualize the series now.

atm4_ts <- ts(atm4$Cash, frequency = 7)
ggseasonplot(atm4_ts)

ggsubseriesplot(atm4_ts) +
  labs(title = "ATM #4 Cash Withdrawal", subtitle ="1 May, 2009 to 30 April, 2010",
       x = "Days of the Week")

ggtsdisplay(atm4_ts, main="ATM 4 Transactions - Weekly")

Again, there is weekly seasonality and we will follow the same procedures as ATM1 and 2.

ndiffs(atm4_ts)

## [1] 0

nsdiffs(atm4_ts)

## [1] 0

atm4_lambda <- BoxCox.lambda(atm4_ts)
atm4_lambda

## [1] 0.4525697

Only Boxcox transformation with lambda 0.45 is suggested for this dataset. However, taking out the weekly seasoanlity results in a transformation closer to white noise, so we are doing that.

atm4_ts %>% BoxCox(atm4_lambda) %>% diff(lag=7)%>%ggtsdisplay()

Holt Winters

atm4_holt <- atm4_ts %>% hw(h=31, seasonal="additive", damped=TRUE, 
                            lambda = atm2_lambda)
autoplot(atm4_holt) + theme_minimal()

data.frame(accuracy(atm4_holt))

##                    ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
## Training set 34.25693 332.6153 259.7949 -436.896 473.7205 0.7499503 0.09914502

checkresiduals(atm4_holt)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 19.044, df = 3, p-value = 0.0002678
## 
## Model df: 12.   Total lags used: 15

ETS

atm4_ets <- atm4_ts %>% ets(model="ZZZ", lambda =atm4_lambda)
autoplot(atm4_ets) + theme_minimal()

autoplot(forecast(atm4_ets, h=31)) + theme_minimal()

data.frame(accuracy(atm4_ets))

##                    ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 67.23932 338.2241 259.7873 -376.9491 420.7907 0.7499281 0.09736364

checkresiduals(atm4_ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 19.488, df = 5, p-value = 0.001559
## 
## Model df: 9.   Total lags used: 14

Arima

atm4_arima <- auto.arima(atm4_ts, seasonal = TRUE, lambda = atm4_lambda)
autoplot(forecast(atm4_arima, h=31)) + theme_minimal()

data.frame(accuracy(atm4_arima))

##                   ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 84.4142 351.8346 274.3411 -370.6976 415.1683 0.7919408 0.01996198

checkresiduals(atm4_arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(2,0,0)[7] with non-zero mean
## Q* = 15.872, df = 10, p-value = 0.1034
## 
## Model df: 4.   Total lags used: 14

Comparing the results

results <- data.frame(rbind(accuracy(atm4_holt), accuracy(atm4_ets),
                            accuracy(atm4_arima)))
rownames(results) <- c("Holt-Winter's", "ETS", "ARIMA(0,0,1)(2,0,0)[7]")
results

##                              ME     RMSE      MAE       MPE     MAPE      MASE
## Holt-Winter's          34.25693 332.6153 259.7949 -436.8960 473.7205 0.7499503
## ETS                    67.23932 338.2241 259.7873 -376.9491 420.7907 0.7499281
## ARIMA(0,0,1)(2,0,0)[7] 84.41420 351.8346 274.3411 -370.6976 415.1683 0.7919408
##                              ACF1
## Holt-Winter's          0.09914502
## ETS                    0.09736364
## ARIMA(0,0,1)(2,0,0)[7] 0.01996198

Holt Winter’s seems to be the best model for this one with the lowest RMSE. The residuals plot looks the best for Holt, along with the ACF and PACF with no values outside critical range. We will use Holt to predict ATM4 cash withdrawals.

Predictions

We will forecast the withdrawals from the 4 ATMS separately.

ATM 1

temp<-forecast(atm1_arima, h=5)
temp1<-data.frame(temp)
temp1

##          Point.Forecast     Lo.80     Hi.80     Lo.95     Hi.95
## 52.71429      86.805607  53.71784 119.89337  36.20224 137.40898
## 52.85714     100.640560  67.09247 134.18865  49.33319 151.94793
## 53.00000      74.714560  41.16647 108.26265  23.40719 126.02193
## 53.14286       4.762635 -28.78545  38.31072 -46.54474  56.07001
## 53.28571     100.063192  66.51510 133.61128  48.75582 151.37057

#xlsx::write.xlsx(temp1, 'atm1_forecast.xlsx', row.names=FALSE, col.names=TRUE)

ATM 2

temp<-forecast(atm2_arima, h=5)
temp2<-data.frame(temp)
temp2

##          Point.Forecast      Lo.80     Hi.80     Lo.95     Hi.95
## 52.85714      64.209763  31.801756 102.98313  17.83250 125.70979
## 53.00000      70.343442  36.733087 110.11133  21.97421 133.31149
## 53.14286       7.074171  -4.657668  31.01796 -15.06804  47.40373
## 53.28571       1.390290 -11.943070  20.73025 -24.83506  35.53029
## 53.42857      94.165833  56.045458 137.98579  38.56514 163.21840

#xlsx::write.xlsx(temp2, 'atm2_forecast.xlsx', row.names=FALSE, col.names=TRUE)

ATM 3

atm3_mean<-meanf(atm3_ts, h=31) #forecast using mean
temp<-forecast(atm3_mean, h=5)
temp3<-data.frame(temp)
temp3

##          Point.Forecast    Lo.80    Hi.80     Lo.95    Hi.95
## 53.14286      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.28571      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.42857      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.57143      0.7205479 -9.49357 10.93467 -14.92427 16.36536
## 53.71429      0.7205479 -9.49357 10.93467 -14.92427 16.36536

#xlsx::write.xlsx(temp3, 'atm3_forecast.xlsx', row.names=FALSE, col.names=TRUE)

ATM 4

temp<-forecast(atm4_holt, h=5)
temp4<-data.frame(temp)
temp4

##          Point.Forecast     Lo.80    Hi.80       Lo.95     Hi.95
## 53.14286       395.9198 56.797252 885.6359  -18.916820 1190.2341
## 53.28571       449.2195 87.216023 953.7610   -2.877173 1264.8552
## 53.42857       426.1876 73.720197 924.4600   -8.645367 1232.7987
## 53.57143       220.1593 -7.516375 650.2278 -125.566908  929.4993
## 53.71429       429.2477 75.484232 928.3654   -7.766152 1237.0750

#xlsx::write.xlsx(temp4, 'atm4_forecast.xlsx', row.names=FALSE, col.names=TRUE)

Conclusion

The missing values for all datasets were removed. Cash was missing for only 5 data points for ATM1 and ATM 2 and those were removed. ATM 4 had an outlier which was replaved with median value. Mean was used to forecast ATM 3 withdrawals as there were only 3 datapoints. For the rest, Holt, ETS and ARIMA was tested and transformation was done using suggested by ndiff and nsdiff. Transformed data were visualized to make final decisions. RMSE and Ljung tests were used to chose the models. Values for next 5 weeks were forecasted using best model, along with upper and lower points for 80% and 95% confidence levels.

Part B - Forecasting power

Problem

Data Exploration

Let us read in the data first

power <- read_excel("ResidentialCustomerForecastLoad-624.xlsx")
summary(power)

##   CaseSequence     YYYY-MMM              KWH          
##  Min.   :733.0   Length:192         Min.   :  770523  
##  1st Qu.:780.8   Class :character   1st Qu.: 5429912  
##  Median :828.5   Mode  :character   Median : 6283324  
##  Mean   :828.5                      Mean   : 6502475  
##  3rd Qu.:876.2                      3rd Qu.: 7620524  
##  Max.   :924.0                      Max.   :10655730  
##                                     NA's   :1

head(power) %>% kable() %>% kable_styling(full_width = FALSE)

CaseSequence	YYYY-MMM	KWH
733	1998-Jan	6862583
734	1998-Feb	5838198
735	1998-Mar	5420658
736	1998-Apr	5010364
737	1998-May	4665377
738	1998-Jun	6467147

There is one missing value in the dataset.

We will replace the missing value with median and convert to a timeseries object.

# Replacing missing value with median
power$KWH[is.na(power$KWH)] = median(power$KWH, na.rm=TRUE)

# Converting the dataframe into timeseries object
power_ts <- ts(power$KWH, start=c(1998,1), frequency = 12)
power_ts

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1998  6862583  5838198  5420658  5010364  4665377  6467147  8914755  8607428
## 1999  7183759  5759262  4847656  5306592  4426794  5500901  7444416  7564391
## 2000  7068296  5876083  4807961  4873080  5050891  7092865  6862662  7517830
## 2001  7538529  6602448  5779180  4835210  4787904  6283324  7855129  8450717
## 2002  7099063  6413429  5839514  5371604  5439166  5850383  7039702  8058748
## 2003  7256079  6190517  6120626  4885643  5296096  6051571  6900676  8476499
## 2004  7584596  6560742  6526586  4831688  4878262  6421614  7307931  7309774
## 2005  8225477  6564338  5581725  5563071  4453983  5900212  8337998  7786659
## 2006  7793358  5914945  5819734  5255988  4740588  7052275  7945564  8241110
## 2007  8031295  7928337  6443170  4841979  4862847  5022647  6426220  7447146
## 2008  7964293  7597060  6085644  5352359  4608528  6548439  7643987  8037137
## 2009  8072330  6976800  5691452  5531616  5264439  5804433  7713260  8350517
## 2010  9397357  8390677  7347915  5776131  4919289  6696292   770523  7922701
## 2011  8394747  8898062  6356903  5685227  5506308  8037779 10093343 10308076
## 2012  8991267  7952204  6356961  5569828  5783598  7926956  8886851  9612423
## 2013 10655730  7681798  6517514  6105359  5940475  7920627  8415321  9080226
##           Sep      Oct      Nov      Dec
## 1998  6989888  6345620  4640410  4693479
## 1999  7899368  5358314  4436269  4419229
## 2000  8912169  5844352  5041769  6220334
## 2001  7112069  5242535  4461979  5240995
## 2002  8245227  5865014  4908979  5779958
## 2003  7791791  5344613  4913707  5756193
## 2004  6690366  5444948  4824940  5791208
## 2005  7057213  6694523  4313019  6181548
## 2006  7296355  5104799  4458429  6226214
## 2007  7666970  5785964  4907057  6047292
## 2008  6283324  5101803  4555602  6442746
## 2009  7583146  5566075  5339890  7089880
## 2010  7819472  5875917  4800733  6152583
## 2011  8943599  5603920  6154138  8273142
## 2012  7559148  5576852  5731899  6609694
## 2013  7968220  5759367  5769083  9606304

We can see that the data is recorded on a monthly basis from 1998 till 2013. We are asked to forecast for 12 months in 2014.

Let us visualize the data

# Plot the series
ggtsdisplay(power_ts, main="Before Transformation, Monthly Power Consumption")

There seems to be one outlier in the dataset.That outlier is the minimum value and we will replace it with the median value.

power$KWH[which.min(power$KWH)] = median(power$KWH, na.rm=TRUE)
power_ts <- ts(power$KWH, start=c(1998,1), frequency = 12)

Let us look at the seasonal plots and subseries plots.

ggseasonplot(power_ts, polar=T)

ggsubseriesplot(power_ts)

There is clear monthly seasonality in the data with Kwh dropping in May and Nov.

power_ts %>%
  stl(t.window=13, s.window="periodic", robust=TRUE) %>%
  autoplot()

The seasonality is additive and there is a clear upward trend.

We will use ndiff and nsdiff to find recommended transformations.

ndiffs(power_ts)

## [1] 1

nsdiffs(power_ts)

## [1] 1

power_lambda <- BoxCox.lambda(power_ts)
power_lambda

## [1] -0.2134819

We will do the transformations and look at the series.

power_ts %>% BoxCox(power_lambda) %>% diff() %>% diff(lag=12) %>% ggtsdisplay()

The data is more stationery now. We will try the Holt Winters Additive seasonal method, ets and arima on this one also as it worked well with the ATM data.

Holt Winters

power_holt <- power_ts %>% hw(h=12, seasonal="additive", damped=TRUE, 
                           lambda = power_lambda)
autoplot(power_holt) + theme_minimal()

data.frame(accuracy(power_holt))

##                    ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 68172.65 635326.8 467792.1 0.2440629 7.066414 0.7324877 0.1997796

checkresiduals(power_holt)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 42.687, df = 7, p-value = 3.834e-07
## 
## Model df: 17.   Total lags used: 24

Ets

power_ets <- power_ts %>% ets(model="ZZZ", lambda = power_lambda)
autoplot(power_ets) + theme_minimal()

autoplot(forecast(power_ets, h=12)) + theme_minimal()

data.frame(accuracy(power_ets))

##                   ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 83565.8 651760.3 492095.4 0.5214465 7.396141 0.7705427 0.2319153

checkresiduals(power_ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 34.232, df = 10, p-value = 0.0001687
## 
## Model df: 14.   Total lags used: 24

Auto Arima

power_arima <- auto.arima(power_ts, seasonal = TRUE, lambda = power_lambda)
autoplot(forecast(power_arima, h=12)) + theme_minimal()

data.frame(accuracy(power_arima))

##                    ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 55793.68 652393.9 498502.7 0.2329465 7.561432 0.7805756 0.1092852

checkresiduals(power_arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(2,1,0)[12] with drift
## Q* = 32.613, df = 20, p-value = 0.03719
## 
## Model df: 4.   Total lags used: 24

Hybrid

This is an interesting method and combines ets and arima models.

library(forecastHybrid)

## Warning: package 'forecastHybrid' was built under R version 4.0.4

## Loading required package: thief

## Warning: package 'thief' was built under R version 4.0.4

power_hybrid<-power_ts%>%hybridModel(model = 'ae', lambda = power_lambda)

## Fitting the auto.arima model

## Fitting the ets model

autoplot(forecast(power_hybrid, h=12)) + theme_minimal()

data.frame(accuracy(power_hybrid))

##                ME     RMSE      MAE       MPE    MAPE      ACF1 Theil.s.U
## Test set 69679.74 627641.4 467851.8 0.3771965 7.03551 0.1499348 0.4407237

checkresiduals(power_hybrid)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

Comparing the results, the hybrid model results in the lowest RMSE. The ACF resembles a white noise and the residuals look almost normal. We will therefore be using this hybrid model to forecast the power data.

test <- forecast(power_hybrid, h=12)
test1 <- data.frame(test)
#xlsx::write.xlsx(test1, 'POwer 12 months prediction.xlsx')

Conclusion

The power dataset had monthly data from 1998 to 2013. We were asked to predict for 12 months in 2014. There was one missing value and one outlier, both replaced by the median values. Different techniques that included seasonality were tested and finally the hybrid model with the lowest RMSE was used to predict the series.

Data 624 Project01

Farhana Zahir

3/28/2021

Part A ATM

Problem

Data Visualize

Models ATM1

Holt Winters

ETS

ARIMA

Models ATM2

Holt Winters

ETS

Arima

Models ATM 3

Models ATM4

Holt Winters

ETS

Arima

Predictions

Conclusion

Part B - Forecasting power

Problem

Data Exploration

Holt Winters

Ets

Auto Arima

Hybrid

Conclusion