load("workspace.RData")
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
##   method             from    
##   fitted.fracdiff    fracdiff
##   residuals.fracdiff fracdiff
library(fpp2)
## Warning: package 'fpp2' was built under R version 3.6.1
## Loading required package: ggplot2
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 3.6.1
## Loading required package: fma
## Warning: package 'fma' was built under R version 3.6.1
## Loading required package: expsmooth
## Warning: package 'expsmooth' was built under R version 3.6.1

KPSS is the unit root test of stationariy, p<0.05 means differencing is required. function is ur.kpss().Lets apply it to Google stock Price

library(urca)
goog %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 10.7223 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Test statistic is greater than critical value, so differencing is required. Lets difference the data and apply the test again.

goog %>% diff() %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.0324 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Here critical value is less than the test value, hence differencing is not required.

You can also find the number of differences required

ndiffs(goog)
## [1] 1

A similar function is if we require seasonal difference it is nsdiffs. We can apply nsdiffs first to a series, if it says 1 then we apply ndiffs. eg.

usmelec %>% log() %>% nsdiffs()
## [1] 1

Now we apply ndiffs

usmelec%>% log() %>% diff(lag=12)%>% ndiffs()
## [1] 1

So we apply nidffs after applying seasonal differencing which is diff(lag=12)

ARIMA

If we apply differencing with Autoregression and moving average model, we obtain a non-seasonal ARIMA model. eg.

autoplot(uschange[,"Consumption"]) +
xlab("Year") + ylab("Quarterly percentage change")

Fitting AUTOARIMA

(fit <- auto.arima(uschange[,"Consumption"]))
## Series: uschange[, "Consumption"] 
## ARIMA(1,0,3)(1,0,1)[4] with non-zero mean 
## 
## Coefficients:
##           ar1     ma1     ma2     ma3     sar1    sma1    mean
##       -0.3548  0.5958  0.3437  0.4111  -0.1376  0.3834  0.7460
## s.e.   0.1592  0.1496  0.0960  0.0825   0.2117  0.1780  0.0886
## 
## sigma^2 estimated as 0.3481:  log likelihood=-163.34
## AIC=342.67   AICc=343.48   BIC=368.52

forcasting

fit %>% forecast(h=10) %>% autoplot(include=80)

Plotting ACF

ggAcf(uschange[,"Consumption"],main="")
## Warning: Ignoring unknown parameters: main

PACF

ggPacf(uschange[,"Consumption"],main="")
## Warning: Ignoring unknown parameters: main

graph help in finding p and q if (0,d,q) or (p,d,0), if both p and q are positive then graphs are not helpful. Data is AR if spike at PACF and decreasing at ACF Data is MA if decreasing at PACF and spike at ACF

Here in ACF there are three significant spike and then one more spike ( almost sinusidal) In PACF there are three significanat spike then no spike ( Spike at PACF)

So it is an AR process of order 3. Fitting

(fit2 <- Arima(uschange[,"Consumption"], order=c(3,0,0)))
## Series: uschange[, "Consumption"] 
## ARIMA(3,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     ar3    mean
##       0.2274  0.1604  0.2027  0.7449
## s.e.  0.0713  0.0723  0.0712  0.1029
## 
## sigma^2 estimated as 0.3494:  log likelihood=-165.17
## AIC=340.34   AICc=340.67   BIC=356.5

It is better than earlier model of auto arima (2,0,2) Auto Arima can be made better by using stepwise=FALSE and approximation=FALSE

(fit3 <- auto.arima(uschange[,"Consumption"], seasonal=FALSE,
stepwise=FALSE, approximation=FALSE))
## Series: uschange[, "Consumption"] 
## ARIMA(3,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     ar3    mean
##       0.2274  0.1604  0.2027  0.7449
## s.e.  0.0713  0.0723  0.0712  0.1029
## 
## sigma^2 estimated as 0.3494:  log likelihood=-165.17
## AIC=340.34   AICc=340.67   BIC=356.5

You see it is giving the same result.

Steps for ARIMA

1.Plot the data and identify any unusual observations. 2. If necessary, transform the data (using a Box-Cox transformation) to stabilise the variance. 3. If the data are non-stationary, take rst di|erences of the data until the data are stationary. 4. Examine the ACF/PACF: Is an ARIMA(p,d,0) or ARIMA(0,d,q ) model appropriate? 5. Try your chosen model(s), and use the AICc to search for a better model. 6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modied model. 7. Once the residuals look like white noise, calculate forecasts.

Lets forecast with out last model

autoplot(forecast(fit3))

accuracy

accuracy(forecast(fit3))
##                       ME      RMSE       MAE      MPE     MAPE      MASE
## Training set 0.001106571 0.5847286 0.4294193 47.28068 171.0212 0.6728123
##                    ACF1
## Training set 0.01499451
save.image("workspace.Rdata")