library(forecast)
library(fpp3)
library(ggplot2)
library(knitr)
library(openxlsx)
library(readxl)
library(seasonal)
library(tidyverse)
library(VIM)
Part A – ATM Forecast, ATM624Data.xlsx
In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file. Also please submit the forecast which you will put in an Excel readable file.
We’ve been tasked with analyzing historical ATM data and forecasting what the next month of transactions could look like. As with any good analysis, we must start off by getting familiar with the data. Afterwards we address any issues we may have found in the data that could impede a smooth analysis. Then, once prepared, we begin testing different forecast models to try and see which one fits best. Finally, we use the best fit model to forecast the solicited data and present it to our new boss.
We begin by importing and loading our desired data set and running some summary functions to get familiar with it. We then make any necessary fixes and transformations to prepare it for analysis.
atm_raw <- read.csv("https://raw.githubusercontent.com/Stevee-G/Data624/refs/heads/main/Project1/ATMData.csv")
str(atm_raw)
## 'data.frame': 1474 obs. of 3 variables:
## $ DATE: chr "5/1/2009 12:00:00 AM" "5/1/2009 12:00:00 AM" "5/2/2009 12:00:00 AM" "5/2/2009 12:00:00 AM" ...
## $ ATM : chr "ATM1" "ATM2" "ATM1" "ATM2" ...
## $ Cash: int 96 107 82 89 85 90 90 55 99 79 ...
head(atm_raw)
## DATE ATM Cash
## 1 5/1/2009 12:00:00 AM ATM1 96
## 2 5/1/2009 12:00:00 AM ATM2 107
## 3 5/2/2009 12:00:00 AM ATM1 82
## 4 5/2/2009 12:00:00 AM ATM2 89
## 5 5/3/2009 12:00:00 AM ATM1 85
## 6 5/3/2009 12:00:00 AM ATM2 90
summary(atm_raw)
## DATE ATM Cash
## Length:1474 Length:1474 Min. : 0.0
## Class :character Class :character 1st Qu.: 0.5
## Mode :character Mode :character Median : 73.0
## Mean : 155.6
## 3rd Qu.: 114.0
## Max. :10920.0
## NA's :19
atm_raw %>%
filter(is.na(Cash))
## DATE ATM Cash
## 1 6/13/2009 12:00:00 AM ATM1 NA
## 2 6/16/2009 12:00:00 AM ATM1 NA
## 3 6/18/2009 12:00:00 AM ATM2 NA
## 4 6/22/2009 12:00:00 AM ATM1 NA
## 5 6/24/2009 12:00:00 AM ATM2 NA
## 6 5/1/2010 12:00:00 AM NA
## 7 5/2/2010 12:00:00 AM NA
## 8 5/3/2010 12:00:00 AM NA
## 9 5/4/2010 12:00:00 AM NA
## 10 5/5/2010 12:00:00 AM NA
## 11 5/6/2010 12:00:00 AM NA
## 12 5/7/2010 12:00:00 AM NA
## 13 5/8/2010 12:00:00 AM NA
## 14 5/9/2010 12:00:00 AM NA
## 15 5/10/2010 12:00:00 AM NA
## 16 5/11/2010 12:00:00 AM NA
## 17 5/12/2010 12:00:00 AM NA
## 18 5/13/2010 12:00:00 AM NA
## 19 5/14/2010 12:00:00 AM NA
As can be seen above, there are a few issues already. The
DATE variable is in character format, rendering useless as
a time series, and there are both missing values and blank rows. Below,
we address this.
atm_raw$DATE <- str_extract(atm_raw$DATE, "^[^ ]+") %>%
as.Date(format = "%m/%d/%Y")
str(atm_raw)
## 'data.frame': 1474 obs. of 3 variables:
## $ DATE: Date, format: "2009-05-01" "2009-05-01" ...
## $ ATM : chr "ATM1" "ATM2" "ATM1" "ATM2" ...
## $ Cash: int 96 107 82 89 85 90 90 55 99 79 ...
atm_filtered <- atm_raw %>%
filter(ATM != "")
atm_wider <- atm_filtered %>%
pivot_wider(names_from = ATM,
values_from = Cash)
str(atm_wider)
## tibble [365 × 5] (S3: tbl_df/tbl/data.frame)
## $ DATE: Date[1:365], format: "2009-05-01" "2009-05-02" ...
## $ ATM1: int [1:365] 96 82 85 90 99 88 8 104 87 93 ...
## $ ATM2: int [1:365] 107 89 90 55 79 19 2 103 107 118 ...
## $ ATM3: int [1:365] 0 0 0 0 0 0 0 0 0 0 ...
## $ ATM4: int [1:365] 777 524 793 908 53 52 55 559 904 879 ...
atm_imputed <- kNN(atm_wider, k = 5) %>%
select(c(1:5))
sum(is.na(atm_imputed))
## [1] 0
summary(atm_imputed)
## DATE ATM1 ATM2 ATM3
## Min. :2009-05-01 Min. : 1.00 Min. : 0.00 Min. : 0.0000
## 1st Qu.:2009-07-31 1st Qu.: 73.00 1st Qu.: 25.00 1st Qu.: 0.0000
## Median :2009-10-30 Median : 91.00 Median : 66.00 Median : 0.0000
## Mean :2009-10-30 Mean : 83.99 Mean : 62.32 Mean : 0.7205
## 3rd Qu.:2010-01-29 3rd Qu.:108.00 3rd Qu.: 93.00 3rd Qu.: 0.0000
## Max. :2010-04-30 Max. :180.00 Max. :147.00 Max. :96.0000
## ATM4
## Min. : 2
## 1st Qu.: 124
## Median : 404
## Mean : 474
## 3rd Qu.: 705
## Max. :10920
ggplot(atm_imputed %>%
pivot_longer(cols = c(ATM1, ATM2, ATM3, ATM4),
names_to = "ATM",
values_to = "Cash"),
aes(x = DATE, y = Cash, color = ATM)) +
geom_line() +
labs(title = "ATM Cashflow")
As shown above, we addressed the DATE variable format
conflict by transforming it into an official date variable. We then
filtered out the rows with blank ATM entries as these
cannot be used for analysis. Afterwards, we pivot the data frame wider
in order to have all cash values correspond to their respective ATM. We
then impute the remaining empty Cash cells by using the
K-Nearest Neighbor method looking towards up to 5 neighbor cells. The
summary and plot of the outcome can be seen above.
atm1 <- select(atm_imputed, c(DATE, ATM1)) %>%
as_tsibble(index = DATE)
head(atm1)
## # A tsibble: 6 x 2 [1D]
## DATE ATM1
## <date> <int>
## 1 2009-05-01 96
## 2 2009-05-02 82
## 3 2009-05-03 85
## 4 2009-05-04 90
## 5 2009-05-05 99
## 6 2009-05-06 88
autoplot(atm1) +
labs(title = "ATM1 Cashflow")
atm2 <- select(atm_imputed, c(DATE, ATM2)) %>%
as_tsibble(index = DATE)
head(atm2)
## # A tsibble: 6 x 2 [1D]
## DATE ATM2
## <date> <int>
## 1 2009-05-01 107
## 2 2009-05-02 89
## 3 2009-05-03 90
## 4 2009-05-04 55
## 5 2009-05-05 79
## 6 2009-05-06 19
autoplot(atm2) +
labs(title = "ATM2 Cashflow")
atm3 <- select(atm_imputed, c(DATE, ATM3)) %>%
as_tsibble(index = DATE)
head(atm3)
## # A tsibble: 6 x 2 [1D]
## DATE ATM3
## <date> <int>
## 1 2009-05-01 0
## 2 2009-05-02 0
## 3 2009-05-03 0
## 4 2009-05-04 0
## 5 2009-05-05 0
## 6 2009-05-06 0
autoplot(atm3) +
labs(title = "ATM3 Cashflow")
atm4 <- select(atm_imputed, c(DATE, ATM4)) %>%
as_tsibble(index = DATE)
head(atm4)
## # A tsibble: 6 x 2 [1D]
## DATE ATM4
## <date> <int>
## 1 2009-05-01 777
## 2 2009-05-02 524
## 3 2009-05-03 793
## 4 2009-05-04 908
## 5 2009-05-05 53
## 6 2009-05-06 52
autoplot(atm4) +
labs(title = "ATM4 Cashflow")
Above we prepared individual time series tables, one for each ATM, in order to better analyze the patterns of each. Their respective plots are also displayed.
gg_tsdisplay(atm1)
lambda_atm1 <- atm1 %>%
features(ATM1, features = guerrero) %>%
pull(lambda_guerrero)
lambda_atm1
## [1] 0.2395458
afit_atm1 <- auto.arima(atm1, seasonal = TRUE)
afit_atm1
## Series: atm1
## ARIMA(0,0,1)(0,1,2)[7]
##
## Coefficients:
## ma1 sma1 sma2
## 0.1671 -0.5893 -0.0962
## s.e. 0.0558 0.0508 0.0497
##
## sigma^2 = 564.3: log likelihood = -1642.57
## AIC=3293.14 AICc=3293.25 BIC=3308.66
checkresiduals(afit_atm1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,1,2)[7]
## Q* = 19.089, df = 11, p-value = 0.05951
##
## Model df: 3. Total lags used: 14
tfit_atm1 <- Arima(atm1$ATM1, order = c(0, 0, 1), seasonal = c(0, 1, 2), lambda = lambda_atm1)
fc_atm1 <- tfit_atm1 %>%
forecast(h = 31)
autoplot(fc_atm1) +
labs(title = "ATM1 ARIMA Forecast")
mam_fit_a1 <- atm1 %>%
model(MAM = ETS(box_cox(ATM1, lambda_atm1) ~ error('M') + trend('A') + season('M')))
mam_fc_a1 <- mam_fit_a1 %>%
forecast(h = 31)
mam_fc_a1 %>%
autoplot(atm1) +
labs(title = "ATM1 ETS Forecast")
ATM1 displays seasonal behavior and does not contain any
abnormalities in its entries. Above we get a sense of its distribution,
variation, and residuals using the gg_tsdisplay() function.
We start off the analysis by calculating both lambda and ARIMA fit
values in order to understand which direction we should go in. As can be
seen, we were suggested an ARIMA (0,0,1)(0,1,2) model. Using what we
know, we went ahead and performed the forecast and obtained the results
shown. Unfortunately, the ARIMA model did not provide such an ideal
forecast. For this reason, we took a second approach and looked towards
ETS modeling. Given the features of the time series, an MAM ETS model
was fit and used to forecast. The results can be seen directly the
results for the ARIMA model.
gg_tsdisplay(atm2)
lambda_atm2 <- atm2 %>%
features(ATM2, features = guerrero) %>%
pull(lambda_guerrero)
lambda_atm2
## [1] 0.6691865
afit_atm2 <- auto.arima(atm2)
afit_atm2
## Series: atm2
## ARIMA(2,0,2)(0,1,1)[7]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1
## -0.4237 -0.9118 0.4594 0.7985 -0.7488
## s.e. 0.0548 0.0423 0.0849 0.0576 0.0388
##
## sigma^2 = 585.9: log likelihood = -1648.6
## AIC=3309.2 AICc=3309.44 BIC=3332.49
checkresiduals(afit_atm2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,1,1)[7]
## Q* = 8.9793, df = 9, p-value = 0.4392
##
## Model df: 5. Total lags used: 14
tfit_atm2 <- Arima(atm2$ATM2, order = c(2, 0, 2), seasonal = c(0, 1, 1), lambda = lambda_atm2)
fc_atm2 <- tfit_atm2 %>%
forecast(h = 31)
autoplot(fc_atm2) +
labs(title = "ATM2 ARIMA Forecast")
madm_fit_a2 <- atm2 %>%
model(MAdM = ETS(box_cox(ATM2, lambda_atm2) ~ error('M') + trend('Ad') + season('M')))
madm_fc_a2 <- madm_fit_a2 %>%
forecast(h = 31)
madm_fc_a2 %>%
autoplot(atm2) +
labs(title = "ATM2 ETS Forecast")
ATM2 also displays seasonal behavior. However, its variability also
seems to dip as time goes on. For this reason, it was especially
necessary to consider the appropriate lambda to address this decrease in
variance. As with ATM1, we get a sense of its distribution, variation,
and residuals using the gg_tsdisplay() function. Just like
before, we again calculate both lambda and ARIMA fit values in order to
understand which direction we should go in. As can be seen, we were
suggested an ARIMA (2,0,2)(0,1,1) model. Using this, we went ahead and
performed the forecast and obtained the results shown. Sadly, yet again,
the ARIMA model did not provide such an ideal forecast. Therefore, we
once again looked towards ETS modeling. Given the features of the time
series, an MAdM ETS model was fit and used to forecast. The results can
be seen directly the results for the ARIMA model.
gg_tsdisplay(atm3)
lambda_atm3 <- atm3 %>%
features(ATM3, features = guerrero) %>%
pull(lambda_guerrero)
lambda_atm3
## [1] 1.365421
afit_atm3 <- auto.arima(atm3)
afit_atm3
## Series: atm3
## ARIMA(0,0,2) with zero mean
##
## Coefficients:
## ma1 ma2
## 0.8392 0.8557
## s.e. 0.0496 0.0611
##
## sigma^2 = 25.4: log likelihood = -1108.69
## AIC=2223.39 AICc=2223.46 BIC=2235.09
checkresiduals(afit_atm3)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,2) with zero mean
## Q* = 0.1322, df = 12, p-value = 1
##
## Model df: 2. Total lags used: 14
tfit_atm3 <- Arima(atm3$ATM3, order = c(2, 0, 2), include.mean = FALSE, lambda = lambda_atm3)
fc_atm3 <- tfit_atm3 %>%
forecast(h = 31)
autoplot(fc_atm3) +
labs(title = "ATM3 ARIMA Forecast")
ses_fit_a3 <- atm3 %>%
model(SES = ETS(box_cox(ATM3, lambda_atm3) ~ error('A') + trend('N') + season('N')))
ses_fc_a3 <- ses_fit_a3 %>%
forecast(h = 31)
ses_fc_a3 %>%
autoplot(atm3) +
labs(title = "ATM3 ETS Forecast")
With ATM3, we will continue what we’ve already done with ATM1 and
ATM2. ATM3 does not show any seasonal behavior or anything we could use
to identify a pattern since the majority of Cash values are
0 up until the very last 3 entries. Seeing this would suggest that we
could get away with just performing a NAIVE or SNAIVE model analysis
with the final entries, but we will continue to approach as previously.
As with ATM1, we get a sense of its distribution, variation, and
residuals using the gg_tsdisplay() function. Just like
before, we again calculate both lambda and ARIMA fit values in order to
understand which direction we should go in. As can be seen, we were
suggested an ARIMA (0,0,2) model with zero mean. Using this, we went
ahead and performed the forecast and obtained the results shown. Once
again the ARIMA model did not provide a good forecast so we fitted an
ETS model. Given the features of the time series, an SES ETS model was
fit and used to forecast. The results can be seen directly the results
for the ARIMA model.
atm4 %>%
sort_by(atm4$ATM4, decreasing = TRUE)
## # A tsibble: 365 x 2 [1D]
## DATE ATM4
## <date> <int>
## 1 2010-02-09 10920
## 2 2009-09-22 1712
## 3 2010-01-29 1575
## 4 2009-09-04 1495
## 5 2009-06-09 1484
## 6 2009-09-05 1301
## 7 2010-02-28 1276
## 8 2009-08-23 1246
## 9 2009-06-14 1221
## 10 2009-09-29 1195
## # ℹ 355 more rows
atm4$ATM4[which.max(atm4$ATM4)] <- median(atm4$ATM4, na.rm = TRUE)
atm4 %>%
sort_by(atm4$ATM4, decreasing = TRUE)
## # A tsibble: 365 x 2 [1D]
## DATE ATM4
## <date> <int>
## 1 2009-09-22 1712
## 2 2010-01-29 1575
## 3 2009-09-04 1495
## 4 2009-06-09 1484
## 5 2009-09-05 1301
## 6 2010-02-28 1276
## 7 2009-08-23 1246
## 8 2009-06-14 1221
## 9 2009-09-29 1195
## 10 2009-06-12 1191
## # ℹ 355 more rows
gg_tsdisplay(atm4)
lambda_atm4 <- atm4 %>%
features(ATM4, features = guerrero) %>%
pull(lambda_guerrero)
lambda_atm4
## [1] 0.4524373
afit_atm4 <- auto.arima(atm4)
afit_atm4
## Series: atm4
## ARIMA(0,0,3)(1,0,0)[7] with non-zero mean
##
## Coefficients:
## ma1 ma2 ma3 sar1 mean
## 0.0875 0.0162 0.0886 0.1838 444.5817
## s.e. 0.0532 0.0583 0.0516 0.0526 26.1028
##
## sigma^2 = 119373: log likelihood = -2648.96
## AIC=5309.92 AICc=5310.15 BIC=5333.32
checkresiduals(afit_atm4)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,3)(1,0,0)[7] with non-zero mean
## Q* = 16.207, df = 10, p-value = 0.09385
##
## Model df: 4. Total lags used: 14
tfit_atm4 <- Arima(atm4$ATM4, order = c(0, 0, 3), seasonal = c(1, 0, 0), include.mean = TRUE, lambda = lambda_atm4)
fc_atm4 <- tfit_atm4 %>%
forecast(h = 31)
autoplot(fc_atm4) +
labs(title = "ATM4 ARIMA Forecast")
mam_fit_a4 <- atm4 %>%
model(MAM = ETS(box_cox(ATM4, lambda_atm4) ~ error('M') + trend('A') + season('M')))
mam_fc_a4 <- mam_fit_a4 %>%
forecast(h = 31)
mam_fc_a4 %>%
autoplot(atm4) +
labs(title = "ATM4 ETS Forecast")
ATM4 also displays seasonal behavior. However, it seems to contain an
overtly obvious outlier. We deal with this be first identifying how many
outliers there are. Doing this we can see there is only one. We addrress
this outlier by replacing it with the mean for the data set. We once
again check to see if it worked and everything looked fine. We continue
on to get a sense of its distribution, variation, and residuals using
the gg_tsdisplay() function. Just like before, we calculate
both lambda and ARIMA fit values in order to understand which direction
we should go in. As can be seen, we were suggested an ARIMA
(0,0,3)(1,0,0) model with non-zero mean. Using this, we went ahead and
performed the forecast and obtained the results shown. Once again the
ARIMA model did not provide a good forecast so we fitted an ETS model.
Given the features of the time series, an MAM ETS model was fit and used
to forecast. The results can be seen directly the results for the ARIMA
model.
data_frame(DATE = seq(as.Date("2010-05-01"), by ="day", length.out = 31),
ATM1 = mam_fc_a1$.mean,
ATM2 = madm_fc_a2$.mean,
ATM3 = ses_fc_a3$.mean,
ATM4 = mam_fc_a4$.mean**2) %>%
gather(key = ATM, value = Cash, -DATE) %>%
write.xlsx("ATM_Forecasts.xlsx")
Finally, we take the ETS forecast results and export them as one table to an excel spreadsheet.
Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
For this portion, we’ve been tasked with analyzing historical residential power data and forecasting what the next year of usage could look like. Just as with the previous analysis, we start off by getting familiar with the data. We then address any issues we may have found in the data that could impede a smooth analysis. Then, we begin testing different forecast models to try and see which one fits best. Finally, we use the best fit model to forecast the solicited data and present it to our new boss.
Once again, we begin by importing and loading our desired data set and running some summary functions to get familiar with it. We then make any necessary fixes and transformations to prepare it for analysis.
pwr_raw <- read.csv("https://raw.githubusercontent.com/Stevee-G/Data624/refs/heads/main/Project1/PowerData.csv")
str(pwr_raw)
## 'data.frame': 192 obs. of 3 variables:
## $ CaseSequence: int 733 734 735 736 737 738 739 740 741 742 ...
## $ YYYY.MMM : chr "1998-Jan" "1998-Feb" "1998-Mar" "1998-Apr" ...
## $ KWH : int 6862583 5838198 5420658 5010364 4665377 6467147 8914755 8607428 6989888 6345620 ...
head(pwr_raw)
## CaseSequence YYYY.MMM KWH
## 1 733 1998-Jan 6862583
## 2 734 1998-Feb 5838198
## 3 735 1998-Mar 5420658
## 4 736 1998-Apr 5010364
## 5 737 1998-May 4665377
## 6 738 1998-Jun 6467147
summary(pwr_raw)
## CaseSequence YYYY.MMM KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5429912
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6502475
## 3rd Qu.:876.2 3rd Qu.: 7620524
## Max. :924.0 Max. :10655730
## NA's :1
pwr_raw %>%
filter(is.na(KWH))
## CaseSequence YYYY.MMM KWH
## 1 861 2008-Sep NA
As can be seen above, the 2 main issues with this data set are its
poorly formatted date variable, YYYY.MMM, and one missing
value. We make sure to address these discrepancies below.
pwr_raw$YYYY.MMM <- yearmonth(pwr_raw$YYYY.MMM)
str(pwr_raw)
## 'data.frame': 192 obs. of 3 variables:
## $ CaseSequence: int 733 734 735 736 737 738 739 740 741 742 ...
## $ YYYY.MMM : mth [1:192] 1998 Jan, 1998 Feb, 1998 Mar, 1998 Apr, 1998 May, 1998 Jun...
## $ KWH : int 6862583 5838198 5420658 5010364 4665377 6467147 8914755 8607428 6989888 6345620 ...
pwr_imputed <- kNN(pwr_raw, k = 5) %>%
select(c(1:3))
sum(is.na(pwr_imputed))
## [1] 0
head(pwr_imputed)
## CaseSequence YYYY.MMM KWH
## 1 733 1998 Jan 6862583
## 2 734 1998 Feb 5838198
## 3 735 1998 Mar 5420658
## 4 736 1998 Apr 5010364
## 5 737 1998 May 4665377
## 6 738 1998 Jun 6467147
ggplot(pwr_imputed, aes(x = YYYY.MMM, y = KWH)) +
geom_line() +
labs(title = "KWH per Month")
As shown above, we addressed the YYYY.MMM variable
formatting by transforming it into an official date variable using the
yearmonth function. We then imputed the empty
KWH cell using the K-Nearest Neighbor method looking
towards up to 5 neighbor cells. A glimpse and plot of the outcome can be
seen above.
power <- pwr_imputed %>%
select(-CaseSequence) %>%
as_tsibble(index = YYYY.MMM)
head(power)
## # A tsibble: 6 x 2 [1M]
## YYYY.MMM KWH
## <mth> <int>
## 1 1998 Jan 6862583
## 2 1998 Feb 5838198
## 3 1998 Mar 5420658
## 4 1998 Apr 5010364
## 5 1998 May 4665377
## 6 1998 Jun 6467147
autoplot(power) +
labs(title = "KWH per Month")
Above we performed a simple tsibble transformation to
format the data set into an analyzable time series.
gg_tsdisplay(power)
lambda_p <- power %>%
features(KWH, features = guerrero) %>%
pull(lambda_guerrero)
lambda_p
## [1] 0.107132
afit_p <- auto.arima(power, seasonal = TRUE)
afit_p
## Series: power
## ARIMA(0,0,1)(1,1,1)[12] with drift
##
## Coefficients:
## ma1 sar1 sma1 drift
## 0.2440 -0.1519 -0.7310 7614.784
## s.e. 0.0723 0.0963 0.0821 1954.202
##
## sigma^2 = 7.468e+11: log likelihood = -2719.93
## AIC=5449.85 AICc=5450.2 BIC=5465.82
checkresiduals(afit_p)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(1,1,1)[12] with drift
## Q* = 14.261, df = 21, p-value = 0.8581
##
## Model df: 3. Total lags used: 24
tfit_p <- Arima(power$KWH, order = c(0, 0, 1), seasonal = c(1, 1, 1), lambda = lambda_p)
fc_p <- tfit_p %>%
forecast(h = 12)
autoplot(fc_p) +
labs(title = "KWH ARIMA Forecast")
mam_fit_p <- power %>%
model(MAM = ETS(box_cox(KWH, lambda_p) ~ error('M') + trend('A') + season('M')))
mam_fc_p <- mam_fit_p %>%
forecast(h = 12)
mam_fc_p %>%
autoplot(power) +
labs(title = "KWH ETS Forecast")
The power usage data displays seasonal behavior and does not contain
any abnormalities in its entries. Above we get a sense of its
distribution, variation, and residuals using the
gg_tsdisplay() function. We start off the analysis by
calculating both lambda and ARIMA fit values in order to understand
which direction we should go in. As can be seen, we were suggested an
ARIMA (0,0,1)(0,1,1) model. Using what we know, we went ahead and
performed the forecast and obtained the results shown. Unfortunately,
and yet again, the ARIMA model did not provide such an ideal forecast.
Therefore, we once again looked towards ETS modeling. Given the features
of the time series, an MAM ETS model was fit and used to forecast. The
results can be seen directly the results for the ARIMA model.
data_frame(`YYYY.MMM` = paste0(2014, "-", month.abb), KWH = mam_fc_p$.mean) %>%
write.xlsx("Power_KWH_Forecasts.xlsx")
Finally, we take the ETS forecast result and export it to an excel spreadsheet.