Part A – ATM Forecast

Files: ATM624Data.xlsx

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

Import and Clean Data

First step is to import the data from excel. When I imported it the first time it was not reading the dates correctly, by simply opening the file in excel and formatting it as a date, and saving the file, I was able to import the dates correctly the second time.

atm <- read_excel("ATM624Data.xlsx")

Next step is to examine the data before converting it to a time series to see if there is any missing data or other problems with the data, and since we have 4 ATMs we need to move each ATM to a separate column. Upon doing this I discovered a few problems that needed to be dealt with:

there are NA’s in the ATM field that needed to be removed
ATM3 has almost no values and is mostly zeros
there is an outlier in ATM4’s data that is far above the normal
there are 3 NA’s in the Cash field for ATM1 and 2 for ATM2

# Format DATE column in R date format
atm$DATE <- lubridate::ymd(atm$DATE)

# Remove empty rows with no data
atm <- atm[complete.cases(atm), ]

# Plot data
ggplot(atm, aes(DATE, Cash)) + geom_line() + facet_grid(ATM~., scales="free") +
  labs(title="ATM Withdrawals", y="Hundreds of dollars", x="") +
  theme(panel.background = element_blank())

# Convert each ATM to a separate column
atm <- atm %>% pivot_wider(names_from = ATM, values_from = Cash)
atm %>% select(-DATE) %>% summary()

##       ATM1             ATM2             ATM3              ATM4          
##  Min.   :  1.00   Min.   :  0.00   Min.   : 0.0000   Min.   :    1.563  
##  1st Qu.: 73.00   1st Qu.: 25.50   1st Qu.: 0.0000   1st Qu.:  124.334  
##  Median : 91.00   Median : 67.00   Median : 0.0000   Median :  403.839  
##  Mean   : 83.89   Mean   : 62.58   Mean   : 0.7206   Mean   :  474.043  
##  3rd Qu.:108.00   3rd Qu.: 93.00   3rd Qu.: 0.0000   3rd Qu.:  704.507  
##  Max.   :180.00   Max.   :147.00   Max.   :96.0000   Max.   :10919.762  
##  NA's   :3        NA's   :2

# Replace NA's and Outlier
atm$ATM1 <- atm$ATM1 %>% replace(is.na(.), median(atm$ATM1, na.rm = TRUE))
atm$ATM2 <- atm$ATM2 %>% replace(is.na(.), median(atm$ATM2, na.rm = TRUE))
atm$ATM4[atm$ATM4==max(atm$ATM4)] <- median(atm$ATM4, na.rm = TRUE)

The NA’s in the ATM field turned out to be the dates that will be forecast with no data in the other columns, so they were removed using the complete.cases function.

The NA’s in ATM1 and ATM2’s data were imputed using the median since the number of missing values was so small.

ATM4’s outlier was also replaced with the median since it was so far above the norm, and the only abnormal data point in the entire ATM4 series, so it seemed likely to be an error.

Cleaned up data

# Plot and summarize again after cleaning up 
atm2 <- atm %>% pivot_longer(cols = starts_with("ATM"), 
                             names_to = "ATM", values_to = "Cash") 
ggplot(atm2, aes(DATE, Cash)) + geom_line() + facet_grid(ATM~., scales="free") +
  labs(title="ATM Withdrawals", y="Hundreds of dollars", x="") +
  theme(panel.background = element_blank())

atm %>% select(-DATE) %>% summary()

##       ATM1             ATM2            ATM3              ATM4         
##  Min.   :  1.00   Min.   :  0.0   Min.   : 0.0000   Min.   :   1.563  
##  1st Qu.: 73.00   1st Qu.: 26.0   1st Qu.: 0.0000   1st Qu.: 124.334  
##  Median : 91.00   Median : 67.0   Median : 0.0000   Median : 403.839  
##  Mean   : 83.95   Mean   : 62.6   Mean   : 0.7206   Mean   : 445.233  
##  3rd Qu.:108.00   3rd Qu.: 93.0   3rd Qu.: 0.0000   3rd Qu.: 704.192  
##  Max.   :180.00   Max.   :147.0   Max.   :96.0000   Max.   :1712.075

ATM’s 1 and 2 appear to have some weekly seasonality. ATM3 with only 3 data points does not have enough data to do a proper forecast so a simple mean will be used. ATM4 Appears to be white noise, but there may still be some seasonality that we will investigate further o determine. There is no apparent trend to any of the series.

Convert to Time Series

Time series objects were created for each ATM since there seemed to be some weekly and possibly monthly seasonality in the plots of the data (although I was tempted to just sum the data on a monthly basis since the instructions were not clear about whether or not we were required to create daily forecasts or if a monthly forecast would be adequate).

atm1.ts <- atm2 %>% filter(ATM=="ATM1") %>% select(Cash) %>%
  ts(start = 1, frequency = 7)
atm2.ts <- atm2 %>% filter(ATM=="ATM2") %>% select(Cash) %>%
  ts(start = 1, frequency = 7)
atm3.ts <- atm2 %>% filter(ATM=="ATM3") %>% select(Cash) %>%
  ts(start = 1, frequency = 2)
atm4.ts <- atm2 %>% filter(ATM=="ATM4") %>% select(Cash) %>%
  ts(start = 1, frequency = 7)

ATM1

For each of ATM1, ATM2 and ATM4, I want to try running at least a Holt-Winter’s additive model with damped trend, since the seasonal variations are roughly constant through the series, and ETS and ARIMA models to see if they result in better performance than the Holt-Winter’s model.

Before running any models I will check the ACF and PACF plots, and the ndiffs, nsdiffs, and BoxCox.lambda functions to see what they recommend for differencing and what type of model they suggest might be most appropriate.

ggtsdisplay(atm1.ts)

ndiffs(atm1.ts)

## [1] 0

nsdiffs(atm1.ts)

## [1] 1

atm1.lambda <- BoxCox.lambda(atm1.ts)
atm1.lambda

## [1] 0.2446101

For ATM1 no first order differencing is recommended, only a first order seasonal difference and a box-cox transformation with \(\lambda\) = 0.2446101. Let’s plot the data again after these transformations are performed to see what impact they have.

atm1.ts %>% BoxCox(atm1.lambda) %>% diff(lag=7) %>% ggtsdisplay()

The plot above shows that most of the seasonality has been eliminated and we are left with an almost stationary time series although there are still spikes in the ACF plot at lag 7 and in the PACF plot at lags 7, 14, and 21. By adding a first order differencing we can make

Holt-Winters

atm1.fit1 <- atm1.ts %>% hw(h=31, seasonal="additive", 
                           damped=TRUE, lambda = atm1.lambda)
autoplot(atm1.fit1) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm1.fit1)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	2.100449	25.21299	16.10613	-92.92647	111.5487	0.9096064	0.1149874

checkresiduals(atm1.fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 19.421, df = 3, p-value = 0.0002237
## 
## Model df: 12.   Total lags used: 15

The residuals plot looks not too bad, but our Ljung-Box test has an extremely small p-value indicating that there is still some autocorrelation in our data as we saw in the plot of the transformed data. Our forecast plot looks not too bad either although those confidence intervals extend way past what we have seen historically in the data.

ETS

atm1.fit2 <- atm1.ts %>% ets(model="ZZZ", lambda = atm1.lambda)
# as.character(atm1.fit2)
autoplot(atm1.fit2) + theme(panel.background = element_blank())

autoplot(forecast(atm1.fit2, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm1.fit2)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	2.337381	25.04745	16.1103	-91.70147	110.6442	0.9098418	0.1213482

checkresiduals(atm1.fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 19.696, df = 5, p-value = 0.001425
## 
## Model df: 9.   Total lags used: 14

The ETS model gave us almost exactly the same results with only slightly better RMSE and Ljung-Box results.

ARIMA

atm1.fit3 <- auto.arima(atm1.ts)
# as.character(atm1.fit3)
autoplot(forecast(atm1.fit3, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm1.fit3)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	-0.0736394	23.3332	14.60281	-102.7359	117.6027	0.8247052	-0.0081341

checkresiduals(atm1.fit3)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(0,1,2)[7]
## Q* = 16.789, df = 11, p-value = 0.1143
## 
## Model df: 3.   Total lags used: 14

The ARIMA model resulted in the best fit with the best RMSE and a Ljung-Box p-value that means we cannot reject the null hypothesis that the series is consistent with white noise. The plot of the forecast also looks like a more reasonable estimate of what we can expect based on the historical data.

results <- data.frame(rbind(accuracy(atm1.fit1), accuracy(atm1.fit2), accuracy(atm1.fit3)))
rownames(results) <- c("Holt-Winter's", "ETS", "ARIMA(0,0,1)(0,1,2)[7]")
kab_tab2(results)


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Holt-Winter’s	2.1004493	25.21299	16.10613	-92.92647	111.5487	0.9096064	0.1149874
ETS	2.3373807	25.04745	16.11030	-91.70147	110.6442	0.9098418	0.1213482
ARIMA(0,0,1)(0,1,2)[7]	-0.0736394	23.33320	14.60281	-102.73591	117.6027	0.8247052	-0.0081341

atm1.fc <- data.frame(forecast(atm1.fit3, h=31)) %>% remove_rownames()
write_csv(atm1.fc, "ATM1_Forecast.csv")

ATM2

Following the same procedure for ATM2.

ggtsdisplay(atm2.ts)

ndiffs(atm2.ts)

## [1] 1

nsdiffs(atm2.ts)

## [1] 1

atm2.lambda <- BoxCox.lambda(atm2.ts)
atm2.lambda

## [1] 0.7278107

For ATM2 both first order differencing and seasonal differencing are recommended, and a box-cox transformation with \(\lambda\) = 0.7278107. Let’s plot the data again after these transformations are performed to see what impact they have.

Once again we can see clear seasonality in the ACF and PACF plots before transformation in the plot above, and after in the plot below. Note the first order differencing was not applied because it resulted in a PACF with many spikes outside the critical values.

atm2.ts %>% BoxCox(atm2.lambda) %>% diff(lag=7) %>% ggtsdisplay()

Holt-Winters

atm2.fit1 <- atm2.ts %>% hw(h=31, seasonal="additive", damped=TRUE, 
                            lambda = atm2.lambda)
autoplot(atm2.fit1) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm2.fit1)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.6284115	25.40858	17.92711	-Inf	Inf	0.8614639	0.0199519

checkresiduals(atm2.fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 34.137, df = 3, p-value = 1.853e-07
## 
## Model df: 12.   Total lags used: 15

ETS

atm2.fit2 <- atm2.ts %>% ets(model="ZZZ", lambda =atm2.lambda)
# as.character(atm2.fit2)
autoplot(atm2.fit2) + theme(panel.background = element_blank())

autoplot(forecast(atm2.fit2, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm2.fit2)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.5166014	25.36411	17.84184	-Inf	Inf	0.8573662	0.019386

checkresiduals(atm2.fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 33.355, df = 5, p-value = 3.199e-06
## 
## Model df: 9.   Total lags used: 14

ARIMA

atm2.fit3 <- auto.arima(atm2.ts, lambda = atm2.lambda)
# as.character(atm2.fit3)
autoplot(forecast(atm2.fit3, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm2.fit3)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	1.44981	24.25163	17.12271	-Inf	Inf	0.8228094	0.0058678

checkresiduals(atm2.fit3)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,0,3)(0,1,1)[7] with drift
## Q* = 8.859, df = 6, p-value = 0.1817
## 
## Model df: 8.   Total lags used: 14

ARIMA with first order differencing

Since the ndiffs function recommended first order differencing but the auto.arima function did not use differencing in the model, let’s try manually adding it to see if we can improve the model.

atm2.fit4 <- Arima(diff(atm2.ts), order=c(3,1,3),seasonal=c(0,1,1), lambda = atm2.lambda)
# as.character(atm2.fit4)
autoplot(forecast(atm2.fit4, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm2.fit4)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	2.332289	28.9457	20.52568	NaN	Inf	0.7141976	-0.0447085

checkresiduals(atm2.fit4)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,3)(0,1,1)[7]
## Q* = 24.274, df = 7, p-value = 0.001019
## 
## Model df: 7.   Total lags used: 14

results <- data.frame(rbind(accuracy(atm2.fit1), accuracy(atm2.fit2),
                            accuracy(atm2.fit3),accuracy(atm2.fit4)))
rownames(results) <- c("Holt-Winter's", "ETS", "ARIMA(3,0,3)(0,1,1)[7]",
                       "ARIMA(3,1,3)(0,1,1)[7]")
kab_tab2(results)


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Holt-Winter’s	0.6284115	25.40858	17.92711	-Inf	Inf	0.8614639	0.0199519
ETS	0.5166014	25.36411	17.84184	-Inf	Inf	0.8573662	0.0193860
ARIMA(3,0,3)(0,1,1)[7]	1.4498098	24.25163	17.12271	-Inf	Inf	0.8228094	0.0058678
ARIMA(3,1,3)(0,1,1)[7]	2.3322890	28.94570	20.52568	NaN	Inf	0.7141976	-0.0447085

The auto.arima function gave us the best results so that model will be used for predictions.

atm2.fc <- data.frame(forecast(atm2.fit3, h=31)) %>% remove_rownames()
write_csv(atm2.fc, "ATM2_Forecast.csv")

ATM3

As mentioned earlier there just is not enough data to make a good forecast for ATM3, so a simple mean will be used to forecast until more data can be collected.

atm3.ts <- window(atm3.ts, start=182)
autoplot(atm3.ts)

atm3.fit = meanf(atm3.ts, h=31)
autoplot(atm3.fit)

atm3.fc <- data.frame(forecast(atm3.fit, h=31)) %>% remove_rownames()
write_csv(atm3.fc, "ATM3_Forecast.csv")

ATM4

Once again following the same procedure for ATM4 as done earlier for ATM’s 1 and 2.

ggtsdisplay(atm4.ts)

ndiffs(atm4.ts)

## [1] 0

nsdiffs(atm4.ts)

## [1] 0

atm4.lambda <- BoxCox.lambda(atm4.ts)
atm4.lambda

## [1] 0.4525697

atm4.ts %>% BoxCox(atm4.lambda) %>% ggtsdisplay()

Here no differencing or seasonal differencing was recommended but a box-cox transformation with \(\lambda\) = 0.7278107 was. Looking at the plots however, it’s not clear that the box-cox transformation improved the stationarity of the data. Seasonal spikes are still apparent.

Holt-Winters

atm4.fit1 <- atm4.ts %>% hw(h=31, seasonal="additive", damped=TRUE, 
                           lambda = atm4.lambda)
autoplot(atm4.fit1) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm4.fit1)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	72.46451	340.8815	261.8729	-369.0055	413.0543	0.7559486	0.1006656

checkresiduals(atm4.fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 21.115, df = 3, p-value = 9.965e-05
## 
## Model df: 12.   Total lags used: 15

ETS

atm4.fit2 <- atm4.ts %>% ets(model="ZZZ", lambda = atm4.lambda)
# as.character(atm4.fit2)
autoplot(atm4.fit2) + theme(panel.background = element_blank())

autoplot(forecast(atm4.fit2, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm4.fit2)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	67.23932	338.2241	259.7873	-376.9491	420.7907	0.7499281	0.0973636

checkresiduals(atm4.fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 19.488, df = 5, p-value = 0.001559
## 
## Model df: 9.   Total lags used: 14

ARIMA

atm4.fit3 <- auto.arima(atm4.ts, seasonal = TRUE, lambda = atm4.lambda)
# as.character(atm4.fit3)
autoplot(forecast(atm4.fit3, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm4.fit3)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	84.4142	351.8346	274.3411	-370.6976	415.1683	0.7919408	0.019962

checkresiduals(atm4.fit3)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(2,0,0)[7] with non-zero mean
## Q* = 15.872, df = 10, p-value = 0.1034
## 
## Model df: 4.   Total lags used: 14

On ATM4 the ARIMA model performs poorly as compared with either the Holt-Winter’s or the ETS models. But since the auto.arima function did not choose to use any seasonal differencing and some seasonality seems apparent in the plots, various arima models were tested using first order seasonal differencing until the best performance was attained using the model below.

atm4.fit4 <- Arima(atm4.ts, order=c(0,0,1),seasonal=c(14,1,0), 
                   lambda = atm4.lambda)
# as.character(atm4.fit4)
autoplot(forecast(atm4.fit4, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(atm4.fit4)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	58.30159	334.008	251.7076	-339.9261	382.9995	0.7266044	0.0143578

checkresiduals(atm4.fit4)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(14,1,0)[7]
## Q* = 16.453, df = 3, p-value = 0.0009157
## 
## Model df: 15.   Total lags used: 18

results <- data.frame(rbind(accuracy(atm4.fit1), accuracy(atm4.fit2),
                            accuracy(atm4.fit3),accuracy(atm4.fit4)))
rownames(results) <- c("Holt-Winter's", "ETS", "ARIMA(0,0,1)(2,0,0)[7]",
                       "ARIMA(0,0,1)(14,1,0)[7]")
kab_tab2(results)


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Holt-Winter’s	72.46451	340.8815	261.8729	-369.0055	413.0543	0.7559486	0.1006656
ETS	67.23932	338.2241	259.7873	-376.9491	420.7907	0.7499281	0.0973636
ARIMA(0,0,1)(2,0,0)[7]	84.41420	351.8346	274.3411	-370.6976	415.1683	0.7919408	0.0199620
ARIMA(0,0,1)(14,1,0)[7]	58.30159	334.0080	251.7076	-339.9261	382.9995	0.7266044	0.0143578

Although similar performance was eventually reached with the ARIMA model the much less complex ETS model is preferred since the improvement in performance is very minimal.

atm4.fc <- data.frame(forecast(atm4.fit2, h=31)) %>% remove_rownames()
write_csv(atm4.fc, "ATM4_Forecast.csv")

Part B – Forecasting Power

Files: ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

pow <- read_excel("ResidentialCustomerForecastLoad-624.xlsx")
summary(pow)

##   CaseSequence     YYYY-MMM              KWH          
##  Min.   :733.0   Length:192         Min.   :  770523  
##  1st Qu.:780.8   Class :character   1st Qu.: 5429912  
##  Median :828.5   Mode  :character   Median : 6283324  
##  Mean   :828.5                      Mean   : 6502475  
##  3rd Qu.:876.2                      3rd Qu.: 7620524  
##  Max.   :924.0                      Max.   :10655730  
##                                     NA's   :1

# Format DATE column in R date format
pow$`YYYY-MMM` <- paste0(pow$`YYYY-MMM`,"-01")
pow$date <- lubridate::ymd(pow$`YYYY-MMM`)

# Plot data
ggplot(pow, aes(date, KWH)) + geom_line() + 
  labs(title="residential power usage", y="KWH", x="") +
  theme(panel.background = element_blank())

Here again like in the ATM data, we can clearly see an outlier that is most likely a data error so we imputed that point with the mean of the other data for the same month. We did the same with another NA data point.

pow$month <- month(pow$date)

# Remove NA in Sept 2008
pow[is.na(pow$KWH),]

CaseSequence	YYYY-MMM	KWH	date	month
861	2008-Sep-01	NA	2008-09-01	9

pow$KWH[is.na(pow$KWH)] <- mean(pow$KWH[pow$month==9], na.rm = TRUE)

# Outlier is in July 2010
pow[pow$KWH==min(pow$KWH),]

CaseSequence	YYYY-MMM	KWH	date	month
883	2010-Jul-01	770523	2010-07-01	7

pow$KWH[pow$KWH==min(pow$KWH)] <- mean(pow$KWH[pow$month==7], na.rm = TRUE)

pow.ts <- ts(pow$KWH, start = c(1998,1), frequency = 12)

# Plot data
autoplot(pow.ts) + theme(panel.background = element_blank())

ggseasonplot(pow.ts, polar = TRUE)

ggsubseriesplot(pow.ts)

ggtsdisplay(pow.ts)

ndiffs(pow.ts)

## [1] 1

nsdiffs(pow.ts)

## [1] 1

pow.lambda <- BoxCox.lambda(pow.ts)
pow.lambda

## [1] -0.2018638

pow.ts %>% BoxCox(pow.lambda) %>% diff() %>% diff(lag=7) %>% ggtsdisplay()

Holt-Winters

pow.fit1 <- pow.ts %>% hw(h=31, seasonal="additive", damped=TRUE, 
                           lambda = pow.lambda)
autoplot(pow.fit1) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(pow.fit1)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	78844.59	620901.7	455088.7	0.477213	6.802764	0.7435267	0.2858823

checkresiduals(pow.fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 44.62, df = 7, p-value = 1.621e-07
## 
## Model df: 17.   Total lags used: 24

pow.fit2 <- pow.ts %>% hw(h=31, seasonal="multiplicative", damped=TRUE)
autoplot(pow.fit2) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(pow.fit2)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	42476.76	617283	459499.3	-0.1199566	6.936035	0.7507327	0.2469939

checkresiduals(pow.fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 45.599, df = 7, p-value = 1.046e-07
## 
## Model df: 17.   Total lags used: 24

Here a Holt-Winter’s multiplicative model resulted in just slightly better performance than the additive model with box-cox transformation.

ETS

pow.fit3 <- pow.ts %>% ets(model="ZZZ", lambda = pow.lambda)
# pow.fit1
autoplot(pow.fit3) + theme(panel.background = element_blank())

autoplot(forecast(pow.fit3, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(pow.fit3)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	32423.43	613027.6	453744.7	-0.2772083	6.836306	0.7413308	0.2817121

checkresiduals(pow.fit3)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,A,A)
## Q* = 45.157, df = 8, p-value = 3.436e-07
## 
## Model df: 16.   Total lags used: 24

ARIMA

pow.fit4 <- auto.arima(pow.ts, seasonal = TRUE, lambda = pow.lambda)
# pow.fit2
autoplot(forecast(pow.fit4, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(pow.fit4)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	48157	626456.3	477342.9	0.1542706	7.247545	0.7798858	0.0998182

checkresiduals(pow.fit4)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(2,1,0)[12] with drift
## Q* = 29.88, df = 20, p-value = 0.07183
## 
## Model df: 4.   Total lags used: 24

pow.fit5 <- Arima(pow.ts, order=c(2,1,1), seasonal=c(2,1,2))
autoplot(forecast(pow.fit5, h=31)) + theme(panel.background = element_blank())

kab_tab2(data.frame(accuracy(pow.fit5)))


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	45040.95	594600.3	432060.5	0.1024936	6.5269	0.7059031	-0.0003831

checkresiduals(pow.fit5)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,1)(2,1,2)[12]
## Q* = 20.892, df = 17, p-value = 0.2312
## 
## Model df: 7.   Total lags used: 24

results <- data.frame(rbind(accuracy(pow.fit1), accuracy(pow.fit2),
                            accuracy(pow.fit3), accuracy(pow.fit4),
                            accuracy(pow.fit5)))
rownames(results) <- c("Holt-Winter's Additive", "Holt-Winter's Multiplicative", 
                       "ETS", "ARIMA(0,0,1)(2,1,0)[12]",
                       "ARIMA(2,1,1)(2,1,2)[12]")
kab_tab2(results)


	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Holt-Winter’s Additive	78844.59	620901.7	455088.7	0.4772130	6.802764	0.7435267	0.2858823
Holt-Winter’s Multiplicative	42476.76	617283.0	459499.3	-0.1199566	6.936035	0.7507327	0.2469939
ETS	32423.43	613027.6	453744.7	-0.2772083	6.836306	0.7413308	0.2817121
ARIMA(0,0,1)(2,1,0)[12]	48157.00	626456.3	477342.9	0.1542706	7.247545	0.7798858	0.0998182
ARIMA(2,1,1)(2,1,2)[12]	45040.95	594600.3	432060.5	0.1024936	6.526900	0.7059031	-0.0003831

Some trial and error resulted in an ARIMA(2,1,1)(2,1,2)[12] model (without any box-cox transformation) that is not so complex and has significantly improved performance, so that it is the preferred model for forecasting.

pow.fc <- data.frame(forecast(pow.fit5, h=31)) %>% remove_rownames()
write_csv(pow.fc, "Power_Use_Forecast.csv")

DATA624 Project 1

Betsy Rosalen

5/11/2020

Part A – ATM Forecast

Import and Clean Data

Cleaned up data

Convert to Time Series

ATM1

Holt-Winters

ETS

ARIMA

ATM2

Holt-Winters

ETS

ARIMA

ARIMA with first order differencing

ATM3

ATM4

Holt-Winters

ETS

ARIMA

Part B – Forecasting Power

Holt-Winters

ETS

ARIMA