Given historical data of daily ATM withdrawals, from 4 different machines, forecast the withdrawal amounts for the next month (Mat 2010).
First, we load the data. The given data file was changed the Excel file into a CSV to avoid trouble with how Microsoft stores dates.
# Excel file was saved as CSV to avoid issues with how Microsoft stores dates
rawATM <- read_csv("ATM624Data.csv")
head(rawATM)
## # A tibble: 6 x 3
## DATE ATM Cash
## <chr> <chr> <dbl>
## 1 5/1/2009 12:00:00 AM ATM1 96
## 2 5/1/2009 12:00:00 AM ATM2 107
## 3 5/2/2009 12:00:00 AM ATM1 82
## 4 5/2/2009 12:00:00 AM ATM2 89
## 5 5/3/2009 12:00:00 AM ATM1 85
## 6 5/3/2009 12:00:00 AM ATM2 90
The data was imported as character types and a double. Before we beginning cleaning, we will put the data into a new data frame so the original data stays as it was imported.
atm <- rawATM
First, we do some basic cleansing: correct the DATE column to be an actual date data type, and rename the columns
# Change DATE column to a proper date format
atm$DATE <- as.Date(mdy_hms(atm$DATE, tz="EST"))
# Change column names
names(atm) <- c("date","machine","amount")
head(atm)
## # A tibble: 6 x 3
## date machine amount
## <date> <chr> <dbl>
## 1 2009-05-01 ATM1 96
## 2 2009-05-01 ATM2 107
## 3 2009-05-02 ATM1 82
## 4 2009-05-02 ATM2 89
## 5 2009-05-03 ATM1 85
## 6 2009-05-03 ATM2 90
Next I check for missing values:
missing <- atm %>% filter(is.na(date)) %>% nrow() %>% tibble(date = .)
missing <- atm %>% filter(is.na(machine)) %>% nrow() %>%
tibble(machine = .) %>% cbind(missing,.)
missing <- atm %>% filter(is.na(amount)) %>% nrow() %>%
tibble(amount = .) %>% cbind(missing,.)
rownames(missing) <- "NA Values"
kbl(missing) %>% kable_paper("striped", full_width = F, position="left") %>%
add_header_above(c("Column Name" = 4))
|
Column Name
|
|||
|---|---|---|---|
| date | machine | amount | |
| NA Values | 0 | 14 | 19 |
We have 19 records where no amount is recorded and 14 records where no machine is recorded. We’ll look at which records those are:
atm %>% filter(is.na(amount) | is.na(machine))
## # A tibble: 19 x 3
## date machine amount
## <date> <chr> <dbl>
## 1 2009-06-13 ATM1 NA
## 2 2009-06-16 ATM1 NA
## 3 2009-06-18 ATM2 NA
## 4 2009-06-22 ATM1 NA
## 5 2009-06-24 ATM2 NA
## 6 2010-05-01 <NA> NA
## 7 2010-05-02 <NA> NA
## 8 2010-05-03 <NA> NA
## 9 2010-05-04 <NA> NA
## 10 2010-05-05 <NA> NA
## 11 2010-05-06 <NA> NA
## 12 2010-05-07 <NA> NA
## 13 2010-05-08 <NA> NA
## 14 2010-05-09 <NA> NA
## 15 2010-05-10 <NA> NA
## 16 2010-05-11 <NA> NA
## 17 2010-05-12 <NA> NA
## 18 2010-05-13 <NA> NA
## 19 2010-05-14 <NA> NA
Looking at the output, the majority of the records that are missing amount are also missing machine. They are also dates that we are supposed to forecast. So, we’ll remove those from the data set.
# Keep only where both amount and machine are not NA
atm <- atm %>% filter(!(is.na(amount) & is.na(machine)))
# Re-check
atm %>% filter(is.na(amount))
## # A tibble: 5 x 3
## date machine amount
## <date> <chr> <dbl>
## 1 2009-06-13 ATM1 NA
## 2 2009-06-16 ATM1 NA
## 3 2009-06-18 ATM2 NA
## 4 2009-06-22 ATM1 NA
## 5 2009-06-24 ATM2 NA
What’s left are missing amounts for 5 dates. We will have to deal with that missing data.
For our missing values of amount we will impute values using the previous amount for the same machine on the same day of the week.
First we add the day of the week:
atm$weekday <- atm$date %>% wday()
Now we get the last value:
# Function to return the value of the amount from 7 days previous for the
# same machine.
lastWeekday <- function(d, m){
d <- d-7
i <- atm %>% filter(date == d, machine == m) %>%
select(amount) %>% as.double()
return(i)
}
lastWeekday <- Vectorize(lastWeekday)
atm[is.na(atm$amount),"amount"] <- atm %>% filter(is.na(amount)) %>%
mutate(imputed = lastWeekday(date, machine)) %>% select(imputed)
Next we search for outliers. We’ll start by plotting the data:
atm %>% ggplot(aes(x=date,y=amount,color=machine)) + geom_line() +
facet_wrap(machine ~ ., scales = "free") +
theme_light() +
scale_x_date(date_breaks = "3 months", date_labels = "%b %y") +
scale_color_discrete(name="Machine") +
labs(title="Cash Withdrawals by ATM", y="Amount (100's of $)",
x="Date")
Two issues can be seen in the graphs above. The first is that ATM3 only has a few non-zero data points. Forecasting with those data will be nearly impossible to do accurately.
The second issue is that ATM4 appears to have a single, large outlier at date 2010-02-09. Let’s look more closely at that.
days <- c("Sunday","Monday","Tuesday","Wednesday","Thursday",
"Friday","Saturday")
atm %>% filter(machine == "ATM4") %>% ggplot(aes(x=date, y=amount)) +
geom_line() + theme_light() +
theme(axis.text.x = element_blank()) +
facet_wrap(ordered(weekday,labels=days) ~ .,scales="free_y") +
labs(title="ATM4", subtitle="By Day of Week", x=NULL, y= NULL)
We can see that the suspected outlier was recorded on a Tuesday and is well above the other values recorded on a Tuesday (or any other weekday for that matter).
If we look at the data points surrounding it, that outlier value does not look like a part of a localized trend (say, in advance of Valentine’s Day):
atm %>% filter(date <= as.Date('2010-03-09'),
date >= as.Date('2010-01-09'),
machine=="ATM4") %>%
ggplot(aes(x=date,y=amount)) + geom_line() +
theme_light() +
scale_x_date(date_breaks = "1 week", date_labels = "%b %d") +
labs(title="Cash Withdrawals at ATM4",
subtitle="01/09/10 - 03/09/10",
y="Amount (100's of $)",
x="Date")
So, we assume this is some sort of data entry issue and will replace this value with the mean of the Tuesday amounts for ATM4.
# Mean value withdrawn on Tuesdays at ATM4
replacement <- atm %>% filter(machine=="ATM4",weekday==3) %>% summarize(mean(amount))
# Replace outlier value
atm[which(atm$date==as.Date("2010-02-09") & atm$machine=="ATM4"),"amount"] <- replacement
Now that we have the data more or less how we want it in terms of data types, we will split each machine into it’s own data set and convert them into time series.
atm1 <- atm %>% filter(machine=="ATM1") %>% select(amount) %>%
ts(frequency = 7, start=c(strftime(as.Date("2009-05-01"),'%w'),
strftime(as.Date("2009-05-01"),'%W')))
atm2 <- atm %>% filter(machine=="ATM2") %>% select(amount) %>%
ts(frequency = 7, start=c(strftime(as.Date("2009-05-01"),'%w'),
strftime(as.Date("2009-05-01"),'%W')))
atm3 <- atm %>% filter(machine=="ATM3") %>% select(amount) %>%
ts(frequency = 7, start=c(strftime(as.Date("2009-05-01"),'%w'),
strftime(as.Date("2009-05-01"),'%W')))
atm4 <- atm %>% filter(machine=="ATM4") %>% select(amount) %>%
ts(frequency = 7, start=c(strftime(as.Date("2009-05-01"),'%w'),
strftime(as.Date("2009-05-01"),'%W')))
First, let’s plot the series for each ATM:
atm1 %>% autoplot() + theme_light() +
labs(title="Withdrawals from ATM1", y="Amount (100's of $)",
x="Week")
atm2 %>% autoplot() + theme_light() +
labs(title="Withdrawals from ATM2", y="Amount (100's of $)",
x="Week")
atm3 %>% autoplot() + theme_light() +
labs(title="Withdrawals from ATM3", y="Amount (100's of $)",
x="Week")
atm4 %>% autoplot() + theme_light() +
labs(title="Withdrawals from ATM4", y="Amount (100's of $)",
x="Week")
ATM1 appears to have some weekly seasonality.
ATM2 also appears to have a similar weekly seasonality.
ATM3, as previously discussed, does not have enough data to really forecast well.
ATM4, unlike the other machines’ data, appears to be rather random - like white noise.
Now we’ll look at models for each ATM, starting with ATM1.
First we’ll try to stabilize the series with a Box Cox transform:
l <- BoxCox.lambda(atm1)
atm1.trans <- BoxCox(atm1,lambda=l)
autoplot(cbind(original = atm1, transformed=atm1.trans), facets=T) +
theme_light() +
labs(title="Withdrawals from ATM1", y="Amount (100's of $)",
x="Week")
Looking at the plots above, the Box Cox transform seems to stabilize some of the variance in the data using \(\lambda =\) 0.25.
Next we’ll split the data sets into a training set and a testing set, holding out 60 days for testing and the rest for training the models:
atm1.train <- subset(atm1.trans,end=length(atm1.trans)-60)
atm1.test <- subset(atm1.trans, start=length(atm1.trans)-59)
Next, we’ll look at various models to see how they perform on the training data set.
We’ll begin using a seasonal naive forecast:
atm1.snaive <- snaive(atm1.train,h=31, biasadj=T)
accuracy(atm1.snaive)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.004699424 1.632284 0.8453411 -Inf Inf 1 0.1045403
There appears to be some numerical instability in some of the error rates, but the RMSE and MASE are interpretable. The MASE shows us that this forecast is equal to a naive forecast (which this is a seasonal version of). Let’s look at the plot:
autoplot(atm1.snaive, PI=F) + theme_light() +
labs(title="Withdrawals from ATM1",
subtitle = "Seasonal Naive Model",
y="Amount (100's of $)",
x="Week")
As expected, the snaive() model is really just adding in the seasonal component to the last value.
Next, we’ll try the Holt-Winters method using additive seasonality:
# Use Holt-Winters method
atm1.hw <- hw(atm1.train, seasonal = "additive", h=31, biasadj = T)
accuracy(atm1.hw)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.000598947 1.308741 0.6893755 -Inf Inf 0.8154998 0.1040176
Our RMSE and MASE accuracy measures both decreased from the seasonal naive method above.
autoplot(atm1.hw, PI=F, series = "Holt-Winters") + theme_light() +
labs(title="Withdrawals from ATM1",
subtitle = "Holt-Winters Seasonal Model",
y="Amount (100's of $)",
x="Week")
The predictions from the Holt-Winters are flatter than the Seasonal Naive forecast, and don’t seem to take into account some of the positive spikes seen in the past data. This may be ok, though, as it may generalize better on unseen data.
The error measures RMSE and MASE are an improvement over the Seasonal Naive model, though.
Finally, we’ll try a seasonal ARIMA model:
atm1.arima <- auto.arima(atm1.train)
atm1.arima
## Series: atm1.train
## ARIMA(0,0,2)(0,1,1)[7]
##
## Coefficients:
## ma1 ma2 sma1
## 0.1282 -0.0945 -0.8596
## s.e. 0.0579 0.0570 0.0493
##
## sigma^2 estimated as 1.726: log likelihood=-507.35
## AIC=1022.7 AICc=1022.84 BIC=1037.49
The auto.arima() function chose an ARIMA\((0,0,2)(0,1,1)_{[7]}\) model. This makes sense, as it there should be little correlation with previous withdrawal amounts, yet some change in seasonality variance over time.
accuracy(atm1.arima)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.02862017 1.291929 0.6996114 -Inf Inf 0.8276084 -0.001612681
The RMSE is only slightly lower with this ARIMA model and the MASE is actually a bit worse, when compared to the Holt-Winters model.
atm1.pred <- forecast(atm1.arima,h=31)
autoplot(atm1.pred, PI=F) + theme_light() +
labs(title="Withdrawals from ATM1",
subtitle = "ARIMA(0,0,1)(0,1,2)[7] Model",
y="Amount (100's of $)",
x="Week")
The ARIMA model predictions strongly resemble the Holt-Winters model above. Let’s look at the residuals:
checkresiduals(atm1.arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,2)(0,1,1)[7]
## Q* = 7.7234, df = 11, p-value = 0.7378
##
## Model df: 3. Total lags used: 14
The residuals are not really normal, though the Ljung-Box test confirms there is a lack of significant correlation.
Now, we will evaluate the accuracy measures with the test set:
# Check test set accuracy measures
testMeasures <- cbind(method = "Seasonal Naive",
data.frame(
as.list(accuracy(atm1.snaive, atm1.test)["Test set",])
))
testMeasures <- rbind(testMeasures,
cbind(method = "Holt-Winters",
data.frame(
as.list(accuracy(atm1.hw, atm1.test)["Test set",])
))
)
atm1.pred <- Arima(atm1.test,model=atm1.arima,h=31, biasadj=T)
testMeasures <- rbind(testMeasures,
cbind(method = "ARIMA",
data.frame(
as.list(accuracy(atm1.pred)["Training set",])
), Theil.s.U = NA))
testMeasures %>% select(method, RMSE, MASE) %>%
kable() %>% kable_styling(bootstrap_options = "striped", full_width = F)
| method | RMSE | MASE |
|---|---|---|
| Seasonal Naive | 1.2043282 | 1.1263666 |
| Holt-Winters | 2.8380034 | 2.3113934 |
| ARIMA | 0.5211845 | 0.7523489 |
The ARIMA model seems to generalize very well, as it has the best error metrics using the testing data set. So we will use that model for our predicted values for May 2010:
atm1.forecast <- forecast(atm1, model = atm1.arima, h=31, biasadj = T)
atm1.forecast %>%
kable() %>% kable_styling(bootstrap_options = "striped") %>%
scroll_box(width = "100%", height = "200px")
| Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
|---|---|---|---|---|---|
| 59.42857 | 88.54968 | 86.86618 | 90.23319 | 85.97499 | 91.12438 |
| 59.57143 | 104.26506 | 102.56778 | 105.96234 | 101.66929 | 106.86083 |
| 59.71429 | 74.26779 | 72.56307 | 75.97250 | 71.66064 | 76.87493 |
| 59.85714 | 27.14883 | 25.44411 | 28.85355 | 24.54169 | 29.75597 |
| 60.00000 | 96.76130 | 95.05659 | 98.46602 | 94.15416 | 99.36844 |
| 60.14286 | 73.60436 | 71.89964 | 75.30908 | 70.99722 | 76.21150 |
| 60.28571 | 91.53222 | 89.82750 | 93.23693 | 88.92507 | 94.13936 |
| 60.42857 | 90.03139 | 88.31036 | 91.75242 | 87.39930 | 92.66348 |
| 60.57143 | 103.59256 | 101.87126 | 105.31385 | 100.96006 | 106.22505 |
| 60.71429 | 74.26779 | 72.54634 | 75.98923 | 71.63507 | 76.90050 |
| 60.85714 | 27.14883 | 25.42739 | 28.87027 | 24.51611 | 29.78155 |
| 61.00000 | 96.76130 | 95.03986 | 98.48275 | 94.12858 | 99.39402 |
| 61.14286 | 73.60436 | 71.88292 | 75.32580 | 70.97164 | 76.23708 |
| 61.28571 | 91.53222 | 89.81077 | 93.25366 | 88.89950 | 94.16493 |
| 61.42857 | 90.03139 | 88.29379 | 91.76899 | 87.37397 | 92.68882 |
| 61.57143 | 103.59256 | 101.85469 | 105.33042 | 100.93472 | 106.25039 |
| 61.71429 | 74.26779 | 72.52978 | 76.00579 | 71.60974 | 76.92584 |
| 61.85714 | 27.14883 | 25.41082 | 28.88684 | 24.49078 | 29.80688 |
| 62.00000 | 96.76130 | 95.02330 | 98.49931 | 94.10325 | 99.41935 |
| 62.14286 | 73.60436 | 71.86635 | 75.34236 | 70.94631 | 76.26241 |
| 62.28571 | 91.53222 | 89.79421 | 93.27022 | 88.87417 | 94.19027 |
| 62.42857 | 90.03139 | 88.27738 | 91.78540 | 87.34887 | 92.71392 |
| 62.57143 | 103.59256 | 101.83828 | 105.34683 | 100.90963 | 106.27548 |
| 62.71429 | 74.26779 | 72.51337 | 76.02220 | 71.58464 | 76.95093 |
| 62.85714 | 27.14883 | 25.39442 | 28.90324 | 24.46569 | 29.83197 |
| 63.00000 | 96.76130 | 95.00689 | 98.51572 | 94.07816 | 99.44445 |
| 63.14286 | 73.60436 | 71.84995 | 75.35877 | 70.92122 | 76.28750 |
| 63.28571 | 91.53222 | 89.77780 | 93.28663 | 88.84907 | 94.21536 |
| 63.42857 | 90.03139 | 88.26113 | 91.80166 | 87.32400 | 92.73878 |
| 63.57143 | 103.59256 | 101.82203 | 105.36308 | 100.88477 | 106.30034 |
| 63.71429 | 74.26779 | 72.49712 | 76.03845 | 71.55978 | 76.97579 |
# Output point forecasts
output <- cbind(date = seq(from=as.Date("2010-05-01"),
to=as.Date("2010-05-31"),by=1),
point_forecast = as.data.frame(atm1.forecast$mean))
write.xlsx(output,"forecast_ATM1.xlsx")
autoplot(atm1.forecast, PI=T) +
theme_light() +
labs(title="Withdrawals from ATM1",
subtitle = "ARIMA(0,0,1)(0,1,2)[7] Forecasts",
y="Amount (100's of $)",
x="Week")
As we did for ATM1 above, we’ll try to stabilize the series with a Box Cox transform:
l <- BoxCox.lambda(atm2)
atm2.trans <- BoxCox(atm2,lambda=l)
autoplot(cbind(original = atm2, transformed=atm2.trans), facets=T) +
theme_light() +
labs(title="Withdrawals from ATM2", y="Amount (100's of $)",
x="Week")
There seems to be some stability gained by the transform, so we’ll keep it. As before, we split the data into train and test sets.
atm2.train <- subset(atm2.trans,end=length(atm2.trans)-60)
atm2.test <- subset(atm2.trans, start=length(atm2.trans)-59)
We’ll again begin using a seasonal naive forecast:
atm2.snaive <- snaive(atm2.train,h=31, biasadj=T)
accuracy(atm2.snaive)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.05817628 8.21813 5.634022 -Inf Inf 1 0.09936693
autoplot(atm2.snaive, PI = F) +
theme_light() +
labs(title="Withdrawals from ATM2",
subtitle = "Seasonal Naive Model",
y="Amount (100's of $)",
x="Week")
Visually, there appears to be a bit of a downward trend in the data, but this Seasonal Naive model doesn’t take that into account. This might not be a good model, so we’ll keep looking.
We’ll try Holt-Winters as our second model:
# Use Holt-Winters method
atm2.hw <- hw(atm2.train, seasonal = "additive", h=31, biasadj = T)
accuracy(atm2.hw)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.02198949 6.697468 4.46445 -Inf Inf 0.7924091 0.02699244
Our chosen error metrics: RMSE and MASE, both decrease compared to the Seasonal Naive model above.
autoplot(atm2.hw, PI = F) +
theme_light() +
labs(title="Withdrawals from ATM2",
subtitle = "Holt-Winters Seasonal Model",
y="Amount (100's of $)",
x="Week")
We see that this model does use some of the downward trend we saw, though not much.
Finally, we’ll try ARIMA on these data:
atm2.arima <- auto.arima(atm2.train)
atm2.arima
## Series: atm2.train
## ARIMA(0,0,0)(0,1,2)[7]
##
## Coefficients:
## sma1 sma2
## -0.7271 -0.0828
## s.e. 0.0601 0.0594
##
## sigma^2 estimated as 46.04: log likelihood=-995.94
## AIC=1997.88 AICc=1997.97 BIC=2008.98
Interestingly, the auto.arima() function chose a different model than it did for the ATM1 data. Let’s check the accuracy measures:
accuracy(atm2.arima)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.4826825 6.684235 4.445472 -Inf Inf 0.7890405 0.04564369
The ARIMA model has very nearly the same RMSE and MASE that the Holt-Winter model did.
atm2.pred <- forecast(atm2.arima,h=31)
autoplot(atm2.pred, PI=T) + theme_light() +
labs(title="Withdrawals from ATM2",
subtitle = "ARIMA(0,0,0)(0,1,2)[7] Model",
y="Amount (100's of $)",
x="Week")
We’ll check the accuracy over the testing data set to see how well the models generalize:
# Check test set accuracy measures
testMeasures <- cbind(method = "Seasonal Naive",
data.frame(
as.list(accuracy(atm2.snaive, atm1.test)["Test set",])
))
testMeasures <- rbind(testMeasures,
cbind(method = "Holt-Winters",
data.frame(
as.list(accuracy(atm2.hw, atm1.test)["Test set",])
))
)
atm2.pred <- Arima(atm2.test,model=atm2.arima,h=31, biasadj=T)
testMeasures <- rbind(testMeasures,
cbind(method = "ARIMA",
data.frame(
as.list(accuracy(atm2.pred)["Training set",])
), Theil.s.U = NA))
testMeasures %>% select(method, RMSE, MASE) %>%
kable() %>% kable_styling(bootstrap_options = "striped", full_width = F)
| method | RMSE | MASE |
|---|---|---|
| Seasonal Naive | 14.703665 | 2.0390192 |
| Holt-Winters | 12.178901 | 1.8777154 |
| ARIMA | 4.629508 | 0.7121923 |
As before, ARIMA does better with data it has not seen. So, we’ll use this model to make our forecasts:
atm2.forecast <- forecast(atm2, model = atm2.arima, h=31, biasadj = T)
atm2.forecast %>%
kable() %>% kable_styling(bootstrap_options = "striped") %>%
scroll_box(width = "100%", height = "200px")
| Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
|---|---|---|---|---|---|
| 59.42857 | 64.80428 | 56.108830 | 73.49972 | 51.505739 | 78.10281 |
| 59.57143 | 73.03464 | 64.339191 | 81.73008 | 59.736100 | 86.33318 |
| 59.71429 | 17.74685 | 9.051407 | 26.44230 | 4.448316 | 31.04539 |
| 59.85714 | 10.85213 | 2.156684 | 19.54758 | -2.446407 | 24.15067 |
| 60.00000 | 93.34042 | 84.644969 | 102.03586 | 80.041878 | 106.63895 |
| 60.14286 | 81.31553 | 72.620085 | 90.01098 | 68.016994 | 94.61407 |
| 60.28571 | 67.14308 | 58.447630 | 75.83852 | 53.844538 | 80.44161 |
| 60.42857 | 65.44402 | 56.430659 | 74.45739 | 51.659272 | 79.22878 |
| 60.57143 | 71.08524 | 62.071880 | 80.09861 | 57.300492 | 84.87000 |
| 60.71429 | 18.44272 | 9.429351 | 27.45608 | 4.657964 | 32.22747 |
| 60.85714 | 11.72797 | 2.714601 | 20.74133 | -2.056786 | 25.51272 |
| 61.00000 | 91.90718 | 82.893816 | 100.92055 | 78.122429 | 105.69193 |
| 61.14286 | 80.67954 | 71.666178 | 89.69291 | 66.894791 | 94.46430 |
| 61.28571 | 64.36705 | 55.353684 | 73.38041 | 50.582296 | 78.15180 |
| 61.42857 | 65.44402 | 56.280326 | 74.60772 | 51.429358 | 79.45869 |
| 61.57143 | 71.08524 | 61.921546 | 80.24894 | 57.070578 | 85.09991 |
| 61.71429 | 18.44272 | 9.279018 | 27.60641 | 4.428050 | 32.45738 |
| 61.85714 | 11.72797 | 2.564268 | 20.89166 | -2.286701 | 25.74263 |
| 62.00000 | 91.90718 | 82.743483 | 101.07088 | 77.892514 | 105.92185 |
| 62.14286 | 80.67954 | 71.515845 | 89.84324 | 66.664877 | 94.69421 |
| 62.28571 | 64.36705 | 55.203350 | 73.53075 | 50.352382 | 78.38172 |
| 62.42857 | 65.44402 | 56.132420 | 74.75563 | 51.203154 | 79.68489 |
| 62.57143 | 71.08524 | 61.773640 | 80.39685 | 56.844375 | 85.32611 |
| 62.71429 | 18.44272 | 9.131112 | 27.75432 | 4.201846 | 32.68359 |
| 62.85714 | 11.72797 | 2.416362 | 21.03957 | -2.512904 | 25.96884 |
| 63.00000 | 91.90718 | 82.595576 | 101.21878 | 77.666311 | 106.14805 |
| 63.14286 | 80.67954 | 71.367939 | 89.99115 | 66.438673 | 94.92041 |
| 63.28571 | 64.36705 | 55.055444 | 73.67865 | 50.126179 | 78.60792 |
| 63.42857 | 65.44402 | 55.986827 | 74.90122 | 50.980488 | 79.90756 |
| 63.57143 | 71.08524 | 61.628047 | 80.54244 | 56.621709 | 85.54878 |
| 63.71429 | 18.44272 | 8.985518 | 27.89991 | 3.979180 | 32.90625 |
# Output point forecasts
output <- cbind(date = seq(from=as.Date("2010-05-01"),
to=as.Date("2010-05-31"),by=1),
point_forecast = as.data.frame(atm2.forecast$mean))
write.xlsx(output,"forecast_ATM2.xlsx")
autoplot(atm2.forecast, PI=T) +
theme_light() +
labs(title="Withdrawals from ATM2",
subtitle = "ARIMA(0,0,0)(0,1,2)[7] Forecasts",
y="Amount (100's of $)",
x="Week")
Given the fact that there are only 3 data points for ATM3, there isn’t a good model choice. The best we could likely do is a naive forecast, but even that I would not be comfortable using in practice:
autoplot(atm3) + autolayer(naive(atm3))
The data for ATM4, as mentioned above, seemed to have no real seasonal pattern and almost resembled white noise. So, we’ll take a slightly different approach.
First, though, we will check to see if a Box-Cox transformation is useful:
l <- BoxCox.lambda(atm4)
atm4.trans <- BoxCox(atm4,lambda=l)
autoplot(cbind(original = atm4, transformed=atm4.trans), facets=T) +
theme_light() +
labs(title="Withdrawals from ATM4", y="Amount (100's of $)",
x="Week")
The transformation did reduce some of the variance, so we’ll keep it.
Next, we’ll see if there is any evidence of seasonality that we did not see in the overall plot.
ggAcf(atm4.trans)
Surprisingly, there does appear to be some weekly seasonality like ATM1 and ATM2. We’ll split the data and use an STL decomposition to see what that looks like:
atm4.train <- subset(atm4.trans,end=length(atm2.trans)-60)
atm4.test <- subset(atm4.trans, start=length(atm2.trans)-59)
atm4.decomp <- stl(atm4.train, s.window="periodic")
autoplot(atm4.decomp)
The stl() function seemed to tease out the seasonality pretty well. We’ll try to use this decomposition to forecast, and see how well it does:
atm4.stl <- seasadj(atm4.decomp) %>% snaive(h=31)
accuracy(atm4.stl)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.1070168 17.97907 14.21384 -34.43799 174.691 1 0.04499377
atm4.stl %>% autoplot() +
labs(title="Withdrawals from ATM4",
subtitle="STL Seasonal Naive Model",
y="Amount (100's of $)",
x="Week")
The STL method does ok, but we can probably do better. We’ll try a Holt-Winter model:
atm4.hw <- hw(atm4.train, h = 31, seasonal = "additive", biasadj = T,
lambda=l)
accuracy(atm4.hw)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.269489 13.16263 10.47636 -58.62438 82.00934 0.7370535
## ACF1
## Training set 0.06498851
The Holt-Winter model does well in both RMSE and MASE. Let’s plot the results and see what the model predictions look like based on the training data:
autoplot(atm4.hw, PI = T) +
theme_light() +
labs(title="Withdrawals from ATM4",
subtitle = "Holt-Winters Seasonal Model",
y="Amount (100's of $)",
x="Week")
The predictions look ok, though they have a large prediction interval.
Finally, we try an ARIMA model next to see if it works better:
atm4.arima <- auto.arima(atm4.train, biasadj = T)
atm4.arima
## Series: atm4.train
## ARIMA(0,0,1)(2,0,0)[7] with non-zero mean
##
## Coefficients:
## ma1 sar1 sar2 mean
## 0.0883 0.2092 0.1443 30.0424
## s.e. 0.0576 0.0577 0.0586 1.3337
##
## sigma^2 estimated as 200.8: log likelihood=-1239.71
## AIC=2489.43 AICc=2489.63 BIC=2508.03
The auto.arima() function chose an ARIMA\((0,0,1)(2,0,0)_{[7]}\) model. We’ll first check the accuracy measures on the training set and then plot them.
accuracy(atm2.arima)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.4826825 6.684235 4.445472 -Inf Inf 0.7890405 0.04564369
atm4.pred <- forecast(atm4.arima,h=31)
autoplot(atm4.pred, PI=T) + theme_light() +
labs(title="Withdrawals from ATM4",
subtitle = "ARIMA(0,0,1)(2,0,0)[7] Model",
y="Amount (100's of $)",
x="Week")
The RMSE accuracy measure appears better than the STL Seasonal Naive model, but the MASE is actually a bit worse. Also, looking at the graph above, the predictions don’t look very stable.
We’ll check the residuals next:
checkresiduals(atm4.arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(2,0,0)[7] with non-zero mean
## Q* = 11.319, df = 10, p-value = 0.3332
##
## Model df: 4. Total lags used: 14
The residuals aren’t exactly normal, though there appear to be no significant correlations between them and the Ljung-Box confirms the same.
Finally, let’s look at all of the models’ performance with the test data:
# Check test set accuracy measures
testMeasures <- cbind(method = "STL",
data.frame(
as.list(accuracy(atm4.stl, atm4.test)["Test set",])
))
testMeasures <- rbind(testMeasures,
cbind(method = "Holt-Winters",
data.frame(
as.list(accuracy(atm4.hw, atm4.test)["Test set",])
))
)
atm4.pred <- Arima(atm4.test,model=atm4.arima,biasadj=T)
testMeasures <- rbind(testMeasures,
cbind(method = "ARIMA",
data.frame(
as.list(accuracy(atm4.pred)["Training set",])
), Theil.s.U = NA))
testMeasures %>% select(method, RMSE, MASE) %>%
kable() %>% kable_styling(bootstrap_options = "striped", full_width = F)
| method | RMSE | MASE |
|---|---|---|
| STL | 21.28007 | 1.1667665 |
| Holt-Winters | 17.59295 | 0.9856173 |
| ARIMA | 12.33961 | 0.8749154 |
Surprisingly, the ARIMA model does better than all the others in terms of our chosen accuracy measures on the test data set. However, the instability of the model predictions makes me hesitant to use it.
In conclusion, I would choose the Holt-Winter model here, despite it’s worse performance in accuracy measures, because I feel it provides more stable predictions of the three models.
atm4.forecast <- forecast(atm4.trans, model=atm4.hw, h=31,
use.initial.values=TRUE,
biasadj = T)
atm4.forecast %>%
kable() %>% kable_styling(bootstrap_options = "striped") %>%
scroll_box(width = "100%", height = "200px")
| Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
|---|---|---|---|---|---|
| 59.42857 | 26.05965 | 10.470474 | 49.50389 | 5.2010605 | 65.25192 |
| 59.57143 | 27.51257 | 11.324439 | 51.64204 | 5.7757149 | 67.78627 |
| 59.71429 | 28.18119 | 11.699643 | 52.66995 | 6.0224496 | 69.03117 |
| 59.85714 | 27.16074 | 11.031793 | 51.32039 | 5.5456069 | 67.52032 |
| 60.00000 | 23.22162 | 8.635555 | 45.77789 | 3.9248643 | 61.11479 |
| 60.14286 | 10.16172 | 1.927026 | 25.85077 | 0.2560098 | 37.39999 |
| 60.28571 | 28.53184 | 11.760629 | 53.51730 | 6.0073660 | 70.23031 |
| 60.42857 | 26.05043 | 10.209555 | 50.09025 | 4.9321807 | 66.30303 |
| 60.57143 | 27.50308 | 11.052246 | 52.24166 | 5.4910460 | 68.85870 |
| 60.71429 | 28.17157 | 11.422702 | 53.27568 | 5.7313119 | 70.11364 |
| 60.85714 | 27.15131 | 10.763731 | 51.91736 | 5.2675011 | 68.58935 |
| 61.00000 | 23.21296 | 8.401166 | 46.33875 | 3.6950086 | 62.12720 |
| 61.14286 | 10.15619 | 1.824018 | 26.26222 | 0.2059312 | 38.17594 |
| 61.28571 | 28.52215 | 11.483438 | 54.12705 | 5.7171902 | 71.32070 |
| 61.42857 | 26.04121 | 9.953083 | 50.67825 | 4.6719199 | 67.35948 |
| 61.57143 | 27.49359 | 10.784547 | 52.84291 | 5.2150999 | 69.93647 |
| 61.71429 | 28.16195 | 11.150264 | 53.88306 | 5.4489254 | 71.20148 |
| 61.85714 | 27.14189 | 10.500105 | 52.51602 | 4.9980256 | 69.66383 |
| 62.00000 | 23.20431 | 8.171013 | 46.90141 | 3.4733746 | 63.14521 |
| 62.14286 | 10.15067 | 1.724389 | 26.67582 | 0.1618406 | 38.95794 |
| 62.28571 | 28.51247 | 11.210683 | 54.73854 | 5.4356753 | 72.41664 |
| 62.42857 | 26.03200 | 9.700924 | 51.26806 | 4.4200735 | 68.42159 |
| 62.57143 | 27.48410 | 10.521203 | 53.44597 | 4.9476692 | 71.01990 |
| 62.71429 | 28.15234 | 10.882190 | 54.49225 | 5.1750833 | 72.29501 |
| 62.85714 | 27.13246 | 10.240785 | 53.11654 | 4.7369799 | 70.74407 |
| 63.00000 | 23.19565 | 7.944979 | 47.46604 | 3.2597776 | 64.16911 |
| 63.14286 | 10.14514 | 1.628061 | 27.09168 | 0.1235512 | 39.74622 |
| 63.28571 | 28.50279 | 10.942235 | 55.35195 | 5.1626242 | 73.51843 |
| 63.42857 | 26.02278 | 9.452956 | 51.85985 | 4.1764548 | 69.48964 |
| 63.57143 | 27.47461 | 10.262092 | 54.05102 | 4.6885653 | 72.10929 |
| 63.71429 | 28.14273 | 10.618357 | 55.10345 | 4.9095971 | 73.39452 |
# Output point forecasts
output <- cbind(date = seq(from=as.Date("2010-05-01"),
to=as.Date("2010-05-31"),by=1),
point_forecast = as.data.frame(InvBoxCox(atm4.forecast$mean,lambda=l)))
write.xlsx(output,"forecast_ATM4.xlsx")
autoplot(atm4.forecast, PI=T) +
theme_light() +
labs(title="Withdrawals from ATM4",
subtitle = "Holt-Winters Forecasts",
y="Amount (100's of $)",
x="Week")
Given a dataset of residential power usage for January 1998 until December 2013, model these data and produce a monthly forecast for 2014.
# Read in the data
power_raw <- read_xlsx("power.xlsx")
# Create a new data frame, keeping the original data safe
power <- power_raw
As we did with the ATM data, we first do some basic cleansing: correct the DATE column to be an actual date data type, and rename the columns.
# Assign easier names
colnames(power) <- c("seq","date","kwh")
# Fix the date column so it can be converted
power$date <- paste0(power$date,"-01")
power$date <- ymd(power$date)
head(power)
## # A tibble: 6 x 3
## seq date kwh
## <dbl> <date> <dbl>
## 1 733 1998-01-01 6862583
## 2 734 1998-02-01 5838198
## 3 735 1998-03-01 5420658
## 4 736 1998-04-01 5010364
## 5 737 1998-05-01 4665377
## 6 738 1998-06-01 6467147
Next, we check if any values are missing:
missing <- power %>% filter(is.na(date)) %>% nrow() %>% tibble(date = .)
missing <- power %>% filter(is.na(seq)) %>% nrow() %>%
tibble(seq = .) %>% cbind(missing,.)
missing <- power %>% filter(is.na(kwh)) %>% nrow() %>%
tibble(kwh = .) %>% cbind(missing,.)
rownames(missing) <- "NA Values"
kbl(missing) %>% kable_paper("striped", full_width = F, position="left") %>%
add_header_above(c("Column Name" = 4))
|
Column Name
|
|||
|---|---|---|---|
| date | seq | kwh | |
| NA Values | 0 | 0 | 1 |
Thankfully, we only have one missing value in the kwh column. Specifically:
power %>% filter(is.na(kwh)) %>% kbl %>%
kable_styling(bootstrap_options = "striped", full_width = F)
| seq | date | kwh |
|---|---|---|
| 861 | 2008-09-01 | NA |
For our missing value of kwh, we’ll look at all the values in the same month over the data set to see if the values are stable enough to impute our missing value:
power %>% filter(month(date)==9) %>%
ggplot(aes(x=year(date),y=kwh)) + geom_col() + theme_light() +
labs(title="Residential Power Usage", subtitle="September Months",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))
The plot above shows that using the previous value (of the same month) should be an acceptable replacement for the missing value. This way, if there is a seasonality change, we don’t break that pattern with our imputation.
imputed <- power %>% filter(date == as.Date("2007-09-01")) %>%
select(kwh) %>% as.numeric()
power[is.na(power$kwh),"kwh"] <- imputed
Next we’ll look for outliers. We’ll start by plotting the raw data:
power %>% ggplot(aes(x=date,y=kwh)) + geom_line() +
theme_light() +
labs(title="Residential Power Usage",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))
There appears to be a visual inconsistency with the data around late 2010. We’ll look closer:
power %>% filter(date >= as.Date("2009-07-01"), date <= as.Date("2011-07-01")) %>%
ggplot(aes(x=date,y=kwh)) + geom_line() +
theme_light() +
labs(title="Residential Power Usage", subtitle="(selected date range)",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))
The unusually low point in question is July 2010. Let’s compare to other July months to see if it is indeed that unusual:
power %>% filter(month(date)==7) %>%
ggplot(aes(x=year(date),y=kwh)) + geom_col() + theme_light() +
labs(title="Residential Power Usage", subtitle="July Months",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))
Compared to fellow July entries, it is pretty obvious that the 2010 entry is likely wrong. Not knowing anything about the data set (where it is from, for example) we don’t know if there is a legitimate reason for it to be so low.
In fact, the value for July 2010 seems to be about 10x smaller than other July entries:
power %>% filter(month(date)==7) %>%
mutate(kwh = comma(kwh)) %>%
select(date, kwh) %>%
kbl() %>% kable_styling(bootstrap_options = "striped", full_width = F) %>%
row_spec(13, bold = T, color = "white", background = "#D7261E")
| date | kwh |
|---|---|
| 1998-07-01 | 8,914,755 |
| 1999-07-01 | 7,444,416 |
| 2000-07-01 | 6,862,662 |
| 2001-07-01 | 7,855,129 |
| 2002-07-01 | 7,039,702 |
| 2003-07-01 | 6,900,676 |
| 2004-07-01 | 7,307,931 |
| 2005-07-01 | 8,337,998 |
| 2006-07-01 | 7,945,564 |
| 2007-07-01 | 6,426,220 |
| 2008-07-01 | 7,643,987 |
| 2009-07-01 | 7,713,260 |
| 2010-07-01 | 770,523 |
| 2011-07-01 | 10,093,343 |
| 2012-07-01 | 8,886,851 |
| 2013-07-01 | 8,415,321 |
It looks like what was probably intended to be entered was 7,705,230. That puts it in-line with the previous July value.
So, we will fix that value:
power[which(power$date==as.Date("2010-07-01")),"kwh"] <- power[which(power$date==as.Date("2010-07-01")),"kwh"] * 10
Now our data looks a lot better:
power %>% ggplot(aes(x=date,y=kwh)) + geom_line() +
theme_light() +
labs(title="Residential Power Usage",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))
Now we can begin our modeling. First, we turn the data into a formal time series:
power <- ts(power$kwh, frequency = 12, start=c(1998,01))
power %>% autoplot() +
theme_light() +
labs(title="Residential Power Usage",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))
Obviously there is seasonality here as well as a generally upward trend. We’ll split the series into training and testing sets first:
# Split data. Use 3 years as a test set
power.train <- window(power, end=c(2010,12))
power.test <- window(power, start=c(2011,01))
Next we will see if a Box-Cox transformation is necessary to stabilize variance:
l <- BoxCox.lambda(power)
cbind(original = power, transformed = BoxCox(power, lambda=l)) %>%
autoplot(facets=T) +
theme_light() +
labs(title="Residential Power Usage",
subtitle="Training Data - Transformed",
y="kWh", x="year")
The Box-Cox transform didn’t seem to do very much, so we will proceed without it.
As far as models, we will start with a seasonal naive model:
power.snaive <- snaive(power.train, h=12, biasadj = T)
accuracy(power.snaive)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 57974.93 715450.2 558876.6 0.2900295 8.851673 1 0.2682315
The accuracy measures are reasonable as a baseline for other models we’ll fit below. Let’s look at the forecasts provided by the model:
power.snaive %>% autoplot() +
theme_light() +
labs(title="Residential Power Usage",
subtitle="Training data with Seasonal Naive predictions",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))
Next we’ll try an exponential smoothing model to see how well it works in comparison:
power.ets <- ets(power.train, model="ZZZ", biasadj=T)
summary(power.ets)
## ETS(M,N,M)
##
## Call:
## ets(y = power.train, model = "ZZZ", biasadj = T)
##
## Smoothing parameters:
## alpha = 0.0561
## gamma = 1e-04
##
## Initial states:
## l = 6197932.6926
## s = 0.921 0.7451 0.8944 1.2076 1.2518 1.1998
## 0.9861 0.7702 0.814 0.9345 1.0588 1.2168
##
## sigma: 0.0881
##
## AIC AICc BIC
## 4923.378 4926.807 4969.126
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 50934.76 535570 406708 0.147575 6.413358 0.7277242 0.3239843
We see that the ets() function chose a model with multiplicative error and seasonality terms, with no trend term.
power.ets %>% autoplot()
autoplot(power.train) + autolayer(forecast(power.ets, h=12, biasadj=T)) +
theme_light() +
labs(title="Residential Power Usage",
subtitle="Training data with ETS(M,N,M) predictions",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))
The ETS model seemed to do ok, as the predictions on the test data set look reasonable.
Next we will try an ARIMA model:
power.arima <- auto.arima(power.train)
summary(power.arima)
## Series: power.train
## ARIMA(0,0,5)(2,1,0)[12] with drift
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 sar1 sar2 drift
## 0.2665 -0.0552 0.2594 -0.053 -0.1339 -0.7438 -0.4444 4818.548
## s.e. 0.0852 0.0866 0.0838 0.083 0.0820 0.0840 0.0819 2233.982
##
## sigma^2 estimated as 2.787e+11: log likelihood=-2102.32
## AIC=4222.64 AICc=4223.98 BIC=4249.37
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -9898.183 492916.6 374398.5 -0.6403311 6.032486 0.6699127
## ACF1
## Training set -0.002832901
Interestingly, the auto.arima() function chose an ARIMA\((0,0,5)(2,1,0)_{[12]}\) model with drift. Looking at the training set accuracy measures, both RMSE and MASE were better than the ETS model above.
Let’s look at the residuals of this ARIMA model:
checkresiduals(power.arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,5)(2,1,0)[12] with drift
## Q* = 11, df = 16, p-value = 0.8095
##
## Model df: 8. Total lags used: 24
The plot of the residuals is relatively close to normal. The ACF plot shows no significant autocorrelation (with one small exception on lag 35). The Ljung-Box test has a high p-value, indicating that the residuals are uncorrelated.
Let’s look at the predictions we get from the training data:
power.pred <- forecast(power.arima, h=12, biasadj=T)
power.pred %>% autoplot() +
theme_light() +
labs(title="Residential Power Usage",
subtitle="Training data with ARIMA predictions",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))
The predictions look good and have smaller estimated prediction intervals than the ETS model did.
Now, we will compare accuracy measures when these two models are applied to the test data (which was not used in training the models):
# Check test set accuracy measures
testMeasures <- cbind(method = "Seasonal Naive",
data.frame(as.list(
accuracy(power.snaive, power.test)["Test set",])
)
)
power.pred.ets <- predict(power.ets,h=36, biasadj=T)
testMeasures <- rbind(testMeasures,
cbind(method = "ETS",
data.frame(as.list(
accuracy(power.pred.ets, power.test)["Test set",])
)
))
power.pred.arima <- forecast(power.arima,h=36,biasadj=T)
testMeasures <- rbind(testMeasures,
cbind(method = "ARIMA",
data.frame(
as.list(
accuracy(power.pred.arima, power.test)["Test set",]
)
)))
testMeasures %>% select(method, RMSE, MASE) %>%
kable() %>% kable_styling(bootstrap_options = "striped", full_width = F)
| method | RMSE | MASE |
|---|---|---|
| Seasonal Naive | 1398107 | 2.111974 |
| ETS | 1205701 | 1.667069 |
| ARIMA | 1058430 | 1.392549 |
The accuracy measures on the test set tell us that the ARIMA model outperforms the other two.
Thus we will use the ARIMA model for our forecast of 2014:
power.forecast <- forecast(power, model = power.arima, h=12, biasadj = T)
power.forecast %>%
kable() %>% kable_styling(bootstrap_options = "striped")
| Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
|---|---|---|---|---|---|
| Jan 2014 | 10051232 | 9374678 | 10727786 | 9016532 | 11085932 |
| Feb 2014 | 8353735 | 7653568 | 9053901 | 7282922 | 9424547 |
| Mar 2014 | 7136993 | 6435833 | 7838154 | 6064662 | 8209325 |
| Apr 2014 | 5728649 | 5005862 | 6451435 | 4623242 | 6834055 |
| May 2014 | 5488768 | 4765092 | 6212445 | 4382001 | 6595536 |
| Jun 2014 | 8101105 | 7371782 | 8830428 | 6985701 | 9216509 |
| Jul 2014 | 9428686 | 8699363 | 10158010 | 8313282 | 10544090 |
| Aug 2014 | 9911712 | 9182388 | 10641035 | 8796308 | 11027115 |
| Sep 2014 | 8405698 | 7676375 | 9135021 | 7290294 | 9521102 |
| Oct 2014 | 5762170 | 5032846 | 6491493 | 4646766 | 6877574 |
| Nov 2014 | 6055582 | 5326258 | 6784905 | 4940178 | 7170985 |
| Dec 2014 | 8243222 | 7513898 | 8972545 | 7127818 | 9358625 |
# Output point forecasts
output <- cbind(date = seq(from=as.Date("2014-01-01"),
to=as.Date("2014-12-01"),by="month"),
point_forecast = as.data.frame(power.forecast$mean))
write.xlsx(output,"forecast_Power.xlsx")
autoplot(power.forecast, PI=T) +
theme_light() +
labs(title="Residential Power Usage",
subtitle="Forecasts using ARIMA(0,0,5)(2,1,0)[12] w/ drift",
y="kWh", x="year") +
scale_y_continuous(labels=number_format(scale=10^-6, suffix = "M"))