This project consists of 3 parts - two required and one bonus and is worth 15% of your grade. The project is due at 11:59 PM on Sunday Oct 25. I will accept late submissions with a penalty until the meetup after that when we review some projects. Part A – ATM Forecast, ATM624Data.xlsx
In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
Part C – BONUS, optional (part or all), Waterflow_Pipe1.xlsx and Waterflow_Pipe2.xlsx
Part C consists of two data sets. These are simple 2 columns sets, however they have different time stamps. Your optional assignment is to time-base sequence the data and aggregate based on hour (example of what this looks like, follows). Note for multiple recordings within an hour, take the mean. Then to determine if the data is stationary and can it be forecast. If so, provide a week forward forecast and present results via Rpubs and .rmd and the forecast in an Excel readable file.
library(readxl)
library(fpp3)
library(forecast)
library(ggplot2)
library(imputeTS)
library(readr)
library(summarytools)
library(tsibbledata)
library(lubridate)
library(DataExplorer)
library(ggfortify)
library(stats)
library(latex2exp)
library(aTSA)
library(zoo)
#read in data and specify column types
atm_raw <- read_xlsx("ATM624Data.xlsx", col_types = c("date", "text", "numeric"))
dfSummary(atm_raw,
plain.ascii = FALSE,
style = "grid",
graph.magnif = 0.75,
valid.col = FALSE,
tmp.img.dir = "/tmp")
## ### Data Frame Summary
## #### atm_raw
## **Dimensions:** 1474 x 3
## **Duplicates:** 0
##
## +----+-------------------+----------------------------+---------------------+----------------------+---------+
## | No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing |
## +====+===================+============================+=====================+======================+=========+
## | 1 | DATE\ | min : 2009-05-01\ | 379 distinct values |  | 0\ |
## | | [POSIXct, POSIXt] | med : 2009-11-01\ | | | (0.0%) |
## | | | max : 2010-05-14\ | | | |
## | | | range : 1y 0m 13d | | | |
## +----+-------------------+----------------------------+---------------------+----------------------+---------+
## | 2 | ATM\ | 1\. ATM1\ | 365 (25.0%)\ |  | 14\ |
## | | [character] | 2\. ATM2\ | 365 (25.0%)\ | | (0.9%) |
## | | | 3\. ATM3\ | 365 (25.0%)\ | | |
## | | | 4\. ATM4 | 365 (25.0%) | | |
## +----+-------------------+----------------------------+---------------------+----------------------+---------+
## | 3 | Cash\ | Mean (sd) : 155.6 (376.5)\ | 509 distinct values |  | 19\ |
## | | [numeric] | min < med < max:\ | | | (1.3%) |
## | | | 0 < 73 < 10919.8\ | | | |
## | | | IQR (CV) : 113.5 (2.4) | | | |
## +----+-------------------+----------------------------+---------------------+----------------------+---------+
head(atm_raw)
## # A tibble: 6 × 3
## DATE ATM Cash
## <dttm> <chr> <dbl>
## 1 2009-05-01 00:00:00 ATM1 96
## 2 2009-05-01 00:00:00 ATM2 107
## 3 2009-05-02 00:00:00 ATM1 82
## 4 2009-05-02 00:00:00 ATM2 89
## 5 2009-05-03 00:00:00 ATM1 85
## 6 2009-05-03 00:00:00 ATM2 90
#filter raw data by ATM
atm_raw %>%
filter(ATM == "ATM1")%>%
head(10)
## # A tibble: 10 × 3
## DATE ATM Cash
## <dttm> <chr> <dbl>
## 1 2009-05-01 00:00:00 ATM1 96
## 2 2009-05-02 00:00:00 ATM1 82
## 3 2009-05-03 00:00:00 ATM1 85
## 4 2009-05-04 00:00:00 ATM1 90
## 5 2009-05-05 00:00:00 ATM1 99
## 6 2009-05-06 00:00:00 ATM1 88
## 7 2009-05-07 00:00:00 ATM1 8
## 8 2009-05-08 00:00:00 ATM1 104
## 9 2009-05-09 00:00:00 ATM1 87
## 10 2009-05-10 00:00:00 ATM1 93
Let’s look at the correlations
atm_raw$DATE = ymd(as.Date(atm_raw$DATE, origin = "1899-12-30"))
atm_ts = as_tsibble(tibble(atm_raw),index='DATE',key='ATM')
plot_correlation(atm_raw)
ATM2
#filter raw data by ATM
atm_raw %>%
filter(ATM == "ATM2")%>%
head(10)
## # A tibble: 10 × 3
## DATE ATM Cash
## <date> <chr> <dbl>
## 1 2009-05-01 ATM2 107
## 2 2009-05-02 ATM2 89
## 3 2009-05-03 ATM2 90
## 4 2009-05-04 ATM2 55
## 5 2009-05-05 ATM2 79
## 6 2009-05-06 ATM2 19
## 7 2009-05-07 ATM2 2
## 8 2009-05-08 ATM2 103
## 9 2009-05-09 ATM2 107
## 10 2009-05-10 ATM2 118
ATM3
#filter raw data by ATM
atm_raw %>%
filter(ATM == "ATM3")%>%
head(10)
## # A tibble: 10 × 3
## DATE ATM Cash
## <date> <chr> <dbl>
## 1 2009-05-01 ATM3 0
## 2 2009-05-02 ATM3 0
## 3 2009-05-03 ATM3 0
## 4 2009-05-04 ATM3 0
## 5 2009-05-05 ATM3 0
## 6 2009-05-06 ATM3 0
## 7 2009-05-07 ATM3 0
## 8 2009-05-08 ATM3 0
## 9 2009-05-09 ATM3 0
## 10 2009-05-10 ATM3 0
ATM4
#filter raw data by ATM
atm_raw %>%
filter(ATM == "ATM4")%>%
head(10)
## # A tibble: 10 × 3
## DATE ATM Cash
## <date> <chr> <dbl>
## 1 2009-05-01 ATM4 777.
## 2 2009-05-02 ATM4 524.
## 3 2009-05-03 ATM4 793.
## 4 2009-05-04 ATM4 908.
## 5 2009-05-05 ATM4 52.8
## 6 2009-05-06 ATM4 52.2
## 7 2009-05-07 ATM4 55.5
## 8 2009-05-08 ATM4 559.
## 9 2009-05-09 ATM4 904.
## 10 2009-05-10 ATM4 879.
ATM4 Contains values with decimals.
Let’s visualize the cash withdrawals
atm_raw %>%
filter(DATE < "2010-05-01", !is.na(ATM)) %>%
ggplot(aes(x = Cash)) +
geom_histogram(bins = 30) +
facet_wrap(~ ATM, ncol = 2, scales = "free") +
labs(title = "ATM Withdrawals") +
scale_y_continuous("Withdrawals (hundreds)")
atm_raw %>%
filter(DATE < "2010-05-01", !is.na(ATM)) %>%
ggplot(aes(x = DATE, y = Cash, col = ATM)) +
geom_line(show.legend = FALSE) +
facet_wrap(~ ATM, ncol = 1, scales = "free_y") +
labs(title = "ATM Withdrawals", x = "Date") +
scale_y_continuous("Withdrawals (hundreds)")
atm <- atm_raw %>%
#lowercase column names
rename(date=DATE, atm=ATM, cash=Cash) %>%
#POSIXct to date type , cash = cash*100
mutate(date = as.Date(date)) %>%
#convert from long to wide by expanding the atm column to get individual column for each atm
pivot_wider(names_from=atm, values_from = cash) %>%
#remove the NA column
select(-"NA") %>%
#filter out what we will be forecasting
filter(date < "2010-05-01") %>%
as_tsibble(index=date)
head(atm)
## # A tsibble: 6 x 5 [1D]
## date ATM1 ATM2 ATM3 ATM4
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2009-05-01 96 107 0 777.
## 2 2009-05-02 82 89 0 524.
## 3 2009-05-03 85 90 0 793.
## 4 2009-05-04 90 55 0 908.
## 5 2009-05-05 99 79 0 52.8
## 6 2009-05-06 88 19 0 52.2
NA value
apply(is.na(atm),2,sum)
## date ATM1 ATM2 ATM3 ATM4
## 0 3 2 0 0
atm[!complete.cases(atm), ]
## # A tsibble: 5 x 5 [1D]
## date ATM1 ATM2 ATM3 ATM4
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2009-06-13 NA 91 0 746.
## 2 2009-06-16 NA 82 0 373.
## 3 2009-06-18 21 NA 0 92.5
## 4 2009-06-22 NA 90 0 80.6
## 5 2009-06-24 66 NA 0 90.6
There are only 5 NA values. 3 from ATM1 and 2 from ATM2
plot_boxplot(atm_ts,by="ATM")
Summary statistics
summary(atm)
## date ATM1 ATM2 ATM3
## Min. :2009-05-01 Min. : 1.00 Min. : 0.00 Min. : 0.0000
## 1st Qu.:2009-07-31 1st Qu.: 73.00 1st Qu.: 25.50 1st Qu.: 0.0000
## Median :2009-10-30 Median : 91.00 Median : 67.00 Median : 0.0000
## Mean :2009-10-30 Mean : 83.89 Mean : 62.58 Mean : 0.7206
## 3rd Qu.:2010-01-29 3rd Qu.:108.00 3rd Qu.: 93.00 3rd Qu.: 0.0000
## Max. :2010-04-30 Max. :180.00 Max. :147.00 Max. :96.0000
## NA's :3 NA's :2
## ATM4
## Min. : 1.563
## 1st Qu.: 124.334
## Median : 403.839
## Mean : 474.043
## 3rd Qu.: 704.507
## Max. :10919.762
##
ATM1: Contains 3 NA values. Seasonal pattern detected with no trend where the time series is affected by the day of the week. The distribution of the cash values are left skewed.
ATM2: Contains 2 NA values. Seasonal pattern detected with no trend where the time series is affected by the day of the week. The distribution of the cash values are left skewed.
ATM3: There are only three dates with amounts which are 04/28/2010, 04/29/2010, and 04/30/2010, all other dates have cash values of 0. Conducting a forecast for ATM3 will not produce much value due to lack of data
ATM4: ATM4 shows seasonality with no trend. The distribution of the cash values are right skewed. For the purpose of this project, the data will be rounded to get rid of the additional values suggesting odd dollar and coin amounts.
atm$ATM4 <- round(atm$ATM4, 0)
head(atm)
## # A tsibble: 6 x 5 [1D]
## date ATM1 ATM2 ATM3 ATM4
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2009-05-01 96 107 0 777
## 2 2009-05-02 82 89 0 524
## 3 2009-05-03 85 90 0 793
## 4 2009-05-04 90 55 0 908
## 5 2009-05-05 99 79 0 53
## 6 2009-05-06 88 19 0 52
Using the na.interp() function from the forecast
library, the missing values for ATM1 and ATM2 are generated through
interpolation.
#generate estimates for NAs and replace with new values
atm$ATM1 <- na.interp(atm$ATM1)
atm$ATM2 <- na.interp(atm$ATM2)
Next, using the tsoutliers() function from the forecast
library, we identify and replace the outlier skewing the data for
ATM4.
#ATM4
#identify the outlier index and generate a replacement
tsoutliers(atm$ATM4)
## $index
## [1] 285
##
## $replacements
## [1] 230
#do the replacement
atm$ATM4[285] <- tsoutliers(atm$ATM4)$replacements
As skews detected for ATMs 1 and 2, we will also use
tsoutliers() to detect any existing outliers and replace
them accordingly.
tsoutliers(atm$ATM1)
## $index
## integer(0)
##
## $replacements
## numeric(0)
tsoutliers(atm$ATM2)
## $index
## integer(0)
##
## $replacements
## numeric(0)
Let’s check if there is any NA values remaining in ATMs 1 and 2.
#verify how many NA values
apply(is.na(atm),2,sum)
## date ATM1 ATM2 ATM3 ATM4
## 0 0 0 0 0
summary(atm)
## date ATM1 ATM2 ATM3
## Min. :2009-05-01 Min. : 1.00 Min. : 0.00 Min. : 0.0000
## 1st Qu.:2009-07-31 1st Qu.: 73.00 1st Qu.: 26.00 1st Qu.: 0.0000
## Median :2009-10-30 Median : 91.00 Median : 67.00 Median : 0.0000
## Mean :2009-10-30 Mean : 84.15 Mean : 62.59 Mean : 0.7206
## 3rd Qu.:2010-01-29 3rd Qu.:108.00 3rd Qu.: 93.00 3rd Qu.: 0.0000
## Max. :2010-04-30 Max. :180.00 Max. :147.00 Max. :96.0000
## ATM4
## Min. : 2.0
## 1st Qu.: 124.0
## Median : 403.0
## Mean : 444.7
## 3rd Qu.: 704.0
## Max. :1712.0
ATM4 has a high right skew on the data which is an indicator that a transformation may be required.
ATM_1 <- atm %>%
select(date, ATM1)
ATM_1 %>%
model(
STL(ATM1 ~ trend(window = 7) +
season(window = "periodic"),
robust = TRUE)) %>%
components() %>%
autoplot()
ATM_1 %>%
ACF(ATM1, lag_max = 28) %>%
autoplot()
#gg_tsdisplay(plot_type='partial', lag_max = 28)
The ACF plot suggest lags at 2, 5, and 7 with 7 being the most significant lag. The ACF seems to be decreasing relatively slowly which could be an indicator that the data is non-stationary and could potentially benefit from differencing but additional checks are required.
Following from above and in preparation for an ARIMA model,
ndiffs() is used to determine if any differencing is
required.
ndiffs(ATM_1$ATM1)
## [1] 0
Based on the output, there is no differencing required.
Alternative models for this series are ETS and ARIMA models. An
additional model that has been included is the Auto ARIMA,
which is an automatically selected model by R with optimal values.
atm4_out = atm_ts %>% filter(ATM=='ATM4') %>% select(Cash)
atm3_out = atm_ts %>% filter(ATM=='ATM3') %>% select(Cash)
atm2_out = atm_ts %>% filter(ATM=='ATM2') %>% select(Cash)
atm1_out = atm_ts %>% filter(ATM=='ATM1') %>% select(Cash)
atm4_out$Cash[tsoutliers(atm4_out$Cash)$index] = tsoutliers(atm4_out$Cash)$replacements
clean_atm_ts = atm_ts %>% drop_na()
plot_missing(clean_atm_ts)
clean_atm_ts %>% filter(ATM=='ATM1') %>% autoplot(.vars=Cash)
clean_atm_ts %>% filter(ATM=='ATM2') %>% autoplot(.vars=Cash)
clean_atm_ts %>% filter(ATM=='ATM3') %>% autoplot(.vars=Cash)
clean_atm_ts %>% filter(ATM=='ATM4') %>% autoplot(.vars=Cash)
atm_ts %>% filter(ATM=='ATM1') %>% ACF() %>% autoplot(.vars=Cash)
atm_ts %>% filter(ATM=='ATM2') %>% ACF() %>% autoplot(.vars=Cash)
atm_ts %>% filter(ATM=='ATM3') %>% ACF() %>% autoplot(.vars=Cash)
atm_ts %>% filter(ATM=='ATM4') %>% ACF() %>% autoplot(.vars=Cash)
atm4_lambda = atm4_out %>%
features(Cash, features = guerrero) %>%
pull(lambda_guerrero)
atm4_trans = atm4_out %>% ACF(difference(box_cox(Cash,atm4_lambda),31)) %>% autoplot()
atm4_trans_data = atm4_out %>% mutate(Cash = difference(box_cox(Cash,atm4_lambda),31))
atm4_trans
atm3_lambda = atm3_out %>%
features(Cash, features = guerrero) %>%
pull(lambda_guerrero)
atm3_trans = atm3_out %>% ACF(difference(box_cox(Cash,atm3_lambda),31)) %>% autoplot()
atm3_trans_data = atm3_out %>% mutate(Cash = difference(box_cox(Cash,atm3_lambda),31))
atm3_trans
atm2_lambda = atm2_out %>%
features(Cash, features = guerrero) %>%
pull(lambda_guerrero)
atm2_trans = atm2_out %>% ACF(difference(box_cox(Cash,atm2_lambda),31)) %>% autoplot()
atm2_trans_data = atm2_out %>% mutate(Cash = difference(box_cox(Cash,atm2_lambda),31))
atm2_trans
atm1_lambda = atm1_out %>%
features(Cash, features = guerrero) %>%
pull(lambda_guerrero)
atm1_trans = atm1_out %>% ACF(difference(box_cox(Cash,atm1_lambda),31)) %>% autoplot()
atm1_trans_data = atm1_out %>% mutate(Cash = difference(box_cox(Cash,atm1_lambda),31))
atm1_trans
atm1_trans = atm1_trans_data %>% drop_na()
atm2_trans = atm2_trans_data %>% drop_na()
atm3_trans = atm3_trans_data %>% drop_na()
atm4_trans = atm4_trans_data %>% drop_na()
atmfit_arima_seach1 = atm1_trans_data %>% model(search = ARIMA(Cash,stepwise=FALSE))
atmfit_arima_seach2 = atm2_trans_data %>% model(search = ARIMA(Cash,stepwise=FALSE))
atmfit_arima_seach3 = atm3_trans_data %>% model(search = ARIMA(Cash,stepwise=FALSE))
atmfit_arima_seach4 = atm4_trans_data %>% model(search = ARIMA(Cash,stepwise=FALSE))
report(atmfit_arima_seach1)
## Series: Cash
## Model: ARIMA(0,0,0)(2,1,0)[7]
##
## Coefficients:
## sar1 sar2
## -0.5846 -0.2379
## s.e. 0.0557 0.0558
##
## sigma^2 estimated as 2.743: log likelihood=-636.12
## AIC=1278.24 AICc=1278.31 BIC=1289.88
report(atmfit_arima_seach2)
## Series: Cash
## Model: ARIMA(5,0,0)(1,1,0)[7]
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 sar1
## 0.0307 -0.1299 -0.0581 0.0208 -0.1488 -0.4586
## s.e. 0.0552 0.0556 0.0556 0.0553 0.0555 0.0511
##
## sigma^2 estimated as 128.1: log likelihood=-1257.68
## AIC=2529.36 AICc=2529.68 BIC=2556.52
report(atmfit_arima_seach3)
## Series: Cash
## Model: ARIMA(2,0,0) w/ mean
##
## Coefficients:
## ar1 ar2 constant
## 0.7955 0.1624 0.9352
## s.e. 0.0541 0.0615 1.2004
##
## sigma^2 estimated as 383.4: log likelihood=-1482.05
## AIC=2972.1 AICc=2972.21 BIC=2987.7
report(atmfit_arima_seach4)
## Series: Cash
## Model: ARIMA(0,0,0)(2,0,0)[7]
##
## Coefficients:
## sar1 sar2
## 0.2068 0.2231
## s.e. 0.0538 0.0546
##
## sigma^2 estimated as 346.6: log likelihood=-1465.07
## AIC=2936.14 AICc=2936.21 BIC=2947.84
atmfit_arima_seach1 %>% gg_tsresiduals()
atmfit_arima_seach2 %>% gg_tsresiduals()
atmfit_arima_seach3 %>% gg_tsresiduals()
atmfit_arima_seach4 %>% gg_tsresiduals()
glance(atmfit_arima_seach1)
## # A tibble: 1 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 search 2.74 -636. 1278. 1278. 1290. <cpl [14]> <cpl [0]>
glance(atmfit_arima_seach2)
## # A tibble: 1 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 search 128. -1258. 2529. 2530. 2557. <cpl [12]> <cpl [0]>
glance(atmfit_arima_seach3)
## # A tibble: 1 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 search 383. -1482. 2972. 2972. 2988. <cpl [2]> <cpl [0]>
glance(atmfit_arima_seach4)
## # A tibble: 1 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 search 347. -1465. 2936. 2936. 2948. <cpl [14]> <cpl [0]>
atm_arima_model1 = auto.arima(atm1_trans, lambda = "auto", seasonal = TRUE)
forcast1 = forecast::forecast(atm_arima_model1, h=31)
autoplot(forcast1)
atm_arima_model2 = auto.arima(atm2_trans, lambda = "auto", seasonal = TRUE)
forcast2 = forecast::forecast(atm_arima_model2, h=30)
autoplot(forcast2)
atm_arima_model3 = auto.arima(atm3_trans, lambda = "auto", seasonal = TRUE)
forcast3 = forecast::forecast(atm_arima_model3, h=30)
autoplot(forcast3)
atm_arima_model4 = auto.arima(atm4_trans, lambda = "auto", seasonal = TRUE)
forcast4 = forecast::forecast(atm_arima_model4, h=30)
autoplot(forcast4)
power_raw <- read_xlsx("ResidentialCustomerForecastLoad-624.xlsx")
str(power_raw)
## tibble [192 × 3] (S3: tbl_df/tbl/data.frame)
## $ CaseSequence: num [1:192] 733 734 735 736 737 738 739 740 741 742 ...
## $ YYYY-MMM : chr [1:192] "1998-Jan" "1998-Feb" "1998-Mar" "1998-Apr" ...
## $ KWH : num [1:192] 6862583 5838198 5420658 5010364 4665377 ...
The YYYY-MM column will need to be converted to a date
type in order to be used as the index for the required tsibble
object.
#NA
power_raw[!complete.cases(power_raw), ]
## # A tibble: 1 × 3
## CaseSequence `YYYY-MMM` KWH
## <dbl> <chr> <dbl>
## 1 861 2008-Sep NA
colnames(power_raw) = c('CS','DATE','KWH')
power_raw = power_raw %>% select(DATE,KWH)
power_raw$DATE = yearmonth(as.Date(as.yearmon(power_raw$DATE, "%Y-%b")))
power_raw$KWH = as.numeric(power_raw$KWH)
power_ts = as_tsibble(tibble(power_raw),index='DATE')
plot_str(power_ts)
plot_density(power_ts)
plot_missing(power_ts)
plot_histogram(power_ts)
power_ts %>% ACF() %>% autoplot()
power_ts %>% autoplot(.vars=KWH)
power_ts$KWH[tsoutliers(ts(power_ts$KWH, frequency=12))$index] = tsoutliers(ts(power_ts$KWH, frequency=12))$replacements
power_lambda = power_ts %>%
features(KWH, features = guerrero) %>%
pull(lambda_guerrero)
power_trans = power_ts %>% ACF(difference(box_cox(KWH,power_lambda),12)) %>% autoplot()
power_trans_data = power_ts %>% mutate(KWH = difference(box_cox(KWH,power_lambda),12))
power_trans
power_trans_data %>% autoplot()
plot_density(power_trans_data)
t1_p = power_trans_data %>% drop_na()
adf.test(t1_p$KWH)
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -9.82 0.01
## [2,] 1 -7.86 0.01
## [3,] 2 -5.80 0.01
## [4,] 3 -6.19 0.01
## [5,] 4 -5.13 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -9.88 0.01
## [2,] 1 -7.93 0.01
## [3,] 2 -5.88 0.01
## [4,] 3 -6.29 0.01
## [5,] 4 -5.24 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -9.90 0.01
## [2,] 1 -7.96 0.01
## [3,] 2 -5.89 0.01
## [4,] 3 -6.33 0.01
## [5,] 4 -5.26 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
power_arima_search = power_trans_data %>% model(search = ARIMA(KWH,stepwise=FALSE))
report(power_arima_search)
## Series: KWH
## Model: ARIMA(4,0,0)(2,0,0)[12] w/ mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 sar1 sar2 constant
## 0.2556 -0.0062 0.2442 -0.1481 -0.7166 -0.4151 0.0023
## s.e. 0.0750 0.0750 0.0742 0.0742 0.0739 0.0747 0.0008
##
## sigma^2 estimated as 9.298e-05: log likelihood=565.58
## AIC=-1115.16 AICc=-1114.38 BIC=-1089.1
glance(power_arima_search)
## # A tibble: 1 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 search 0.0000930 566. -1115. -1114. -1089. <cpl [28]> <cpl [0]>
power_arima_search %>% gg_tsresiduals()
power_arima_search = auto.arima(power_trans_data, lambda = "auto", stepwise=FALSE, seasonal = TRUE)
power_forcast = forecast::forecast(power_arima_search, h=12)
autoplot(power_forcast)