load("workspace.RData")
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
## Registered S3 method overwritten by 'xts':
## method from
## as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
## method from
## fitted.fracdiff fracdiff
## residuals.fracdiff fracdiff
library(fpp2)
## Warning: package 'fpp2' was built under R version 3.6.1
## Loading required package: ggplot2
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 3.6.1
## Loading required package: fma
## Warning: package 'fma' was built under R version 3.6.1
## Loading required package: expsmooth
## Warning: package 'expsmooth' was built under R version 3.6.1
KPSS is the unit root test of stationariy, p<0.05 means differencing is required. function is ur.kpss().Lets apply it to Google stock Price
library(urca)
goog %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 7 lags.
##
## Value of test-statistic is: 10.7223
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Test statistic is greater than critical value, so differencing is required. Lets difference the data and apply the test again.
goog %>% diff() %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 7 lags.
##
## Value of test-statistic is: 0.0324
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Here critical value is less than the test value, hence differencing is not required.
You can also find the number of differences required
ndiffs(goog)
## [1] 1
A similar function is if we require seasonal difference it is nsdiffs. We can apply nsdiffs first to a series, if it says 1 then we apply ndiffs. eg.
usmelec %>% log() %>% nsdiffs()
## [1] 1
Now we apply ndiffs
usmelec%>% log() %>% diff(lag=12)%>% ndiffs()
## [1] 1
So we apply nidffs after applying seasonal differencing which is diff(lag=12)
If we apply differencing with Autoregression and moving average model, we obtain a non-seasonal ARIMA model. eg.
autoplot(uschange[,"Consumption"]) +
xlab("Year") + ylab("Quarterly percentage change")
Fitting AUTOARIMA
(fit <- auto.arima(uschange[,"Consumption"]))
## Series: uschange[, "Consumption"]
## ARIMA(1,0,3)(1,0,1)[4] with non-zero mean
##
## Coefficients:
## ar1 ma1 ma2 ma3 sar1 sma1 mean
## -0.3548 0.5958 0.3437 0.4111 -0.1376 0.3834 0.7460
## s.e. 0.1592 0.1496 0.0960 0.0825 0.2117 0.1780 0.0886
##
## sigma^2 estimated as 0.3481: log likelihood=-163.34
## AIC=342.67 AICc=343.48 BIC=368.52
forcasting
fit %>% forecast(h=10) %>% autoplot(include=80)
Plotting ACF
ggAcf(uschange[,"Consumption"],main="")
## Warning: Ignoring unknown parameters: main
PACF
ggPacf(uschange[,"Consumption"],main="")
## Warning: Ignoring unknown parameters: main
graph help in finding p and q if (0,d,q) or (p,d,0), if both p and q are positive then graphs are not helpful. Data is AR if spike at PACF and decreasing at ACF Data is MA if decreasing at PACF and spike at ACF
Here in ACF there are three significant spike and then one more spike ( almost sinusidal) In PACF there are three significanat spike then no spike ( Spike at PACF)
So it is an AR process of order 3. Fitting
(fit2 <- Arima(uschange[,"Consumption"], order=c(3,0,0)))
## Series: uschange[, "Consumption"]
## ARIMA(3,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 mean
## 0.2274 0.1604 0.2027 0.7449
## s.e. 0.0713 0.0723 0.0712 0.1029
##
## sigma^2 estimated as 0.3494: log likelihood=-165.17
## AIC=340.34 AICc=340.67 BIC=356.5
It is better than earlier model of auto arima (2,0,2) Auto Arima can be made better by using stepwise=FALSE and approximation=FALSE
(fit3 <- auto.arima(uschange[,"Consumption"], seasonal=FALSE,
stepwise=FALSE, approximation=FALSE))
## Series: uschange[, "Consumption"]
## ARIMA(3,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 mean
## 0.2274 0.1604 0.2027 0.7449
## s.e. 0.0713 0.0723 0.0712 0.1029
##
## sigma^2 estimated as 0.3494: log likelihood=-165.17
## AIC=340.34 AICc=340.67 BIC=356.5
You see it is giving the same result.
Steps for ARIMA
1.Plot the data and identify any unusual observations. 2. If necessary, transform the data (using a Box-Cox transformation) to stabilise the variance. 3. If the data are non-stationary, take rst di|erences of the data until the data are stationary. 4. Examine the ACF/PACF: Is an ARIMA(p,d,0) or ARIMA(0,d,q ) model appropriate? 5. Try your chosen model(s), and use the AICc to search for a better model. 6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modied model. 7. Once the residuals look like white noise, calculate forecasts.
Lets forecast with out last model
autoplot(forecast(fit3))
accuracy
accuracy(forecast(fit3))
## ME RMSE MAE MPE MAPE MASE
## Training set 0.001106571 0.5847286 0.4294193 47.28068 171.0212 0.6728123
## ACF1
## Training set 0.01499451
save.image("workspace.Rdata")