ARIMA Model in R

by Kunaal Naik YouTube - www.youtube.com/fxexcel GitHub Link - Download Code and Dataset

Links open in another window

1/ Import Libraries

library(forecast)
library(tseries)

2/ Change working Directory

Provide the path in which you data is present

setwd("C:\\Users\\DELL\\Documents\\__Fun_X_Excel_Channel_Videos\\Arima\\R")

3/ Import Sales Dataset

sales <- read.csv("sales.csv")

4/ Convert sales_k column to Time Series object

sales_ts <- ts(sales$Sales_k,start=c(1972),frequency=12)

5/ Plot Sales Time series using autoplot (forecast library)

autoplot(sales_ts)

6/ Check Stationarity

Stationarity : A stationary process has a mean and variance that do not change overtime and the process does not have trend.

Perform ADF Test

Null Hypothesis - Non Stationary (Do NOT Reject if P value > sig lvl {1%, 5%, 10%} )

adf.test(sales_ts)

## Warning in adf.test(sales_ts): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  sales_ts
## Dickey-Fuller = -8.7644, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

Since P is not greater sig lvl - The Series is NON Stationary

We will take the first difference to make it Stationary

Perform ADF Test on First Difference

sales_ts_d1 <- diff(sales_ts, differences = 1)
adf.test(sales_ts_d1)

## Warning in adf.test(sales_ts_d1): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  sales_ts_d1
## Dickey-Fuller = -10.501, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

autoplot(sales_ts_d1)

Much better and Stationary.

ARIMA (p,d,q)

q term will remain 1 - since we took the first difference

7/ Run ACF test to select AR term or the p term

ACF - Correlation between lags

Acf(sales_ts)

##### We will run the same test with differenced series

Acf(sales_ts_d1)

8/ Run PACF test to select MA term or the q term

PACF - Correlation between moving averages

Pacf(sales_ts)

##### We will run the same test with differenced series

Pacf(sales_ts_d1)

9/ BASIC ARIMA

We will use p and q as 6; d will be 1

tsMod <- Arima(y = sales_ts,order = c(6,1,6))

10/ Summary of the model

tsMod

## Series: sales_ts 
## ARIMA(6,1,6) 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ar5      ar6     ma1      ma2
##       0.0083  -0.0185  0.0170  -0.0218  0.0152  -0.9819  0.0057  -0.1451
## s.e.  0.0161   0.0172  0.0222   0.0190  0.0155   0.0120  0.0560   0.0478
##           ma3      ma4      ma5     ma6
##       -0.3816  -0.1804  -0.0486  0.9697
## s.e.   0.0509   0.0439   0.0560  0.0574
## 
## sigma^2 estimated as 223.7:  log likelihood=-643.19
## AIC=1312.38   AICc=1314.96   BIC=1351.94

11/ Forecast 12 periods ahead

forecast(tsMod,h=12)

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1985       847.1119 827.7200 866.5039 817.4546 876.7693
## Feb 1985       887.8929 860.4025 915.3834 845.8499 929.9360
## Mar 1985       904.2875 872.3911 936.1838 855.5062 953.0688
## Apr 1985       928.0817 894.8973 961.2662 877.3306 978.8329
## May 1985       943.8799 910.2542 977.5055 892.4539 995.3058
## Jun 1985       935.9272 901.9288 969.9257 883.9312 987.9233
## Jul 1985       889.9770 855.6982 924.2557 837.5521 942.4018
## Aug 1985       849.6974 815.1469 884.2480 796.8569 902.5379
## Sep 1985       833.9970 798.6300 869.3639 779.9079 888.0861
## Oct 1985       810.8785 772.6859 849.0711 752.4680 869.2891
## Nov 1985       795.6589 753.4106 837.9072 731.0457 860.2722
## Dec 1985       803.6800 757.5914 849.7686 733.1936 874.1665

12/ Plot Sales with forecast

autoplot(forecast(tsMod,h=12))

13/ LJung test for serial correlation on Residuals

Null Hypothesis : No Serial correlation up to a certain lag
Do NOT reject NUll Hypothesis if p-value greater than significant level

Box.test(tsMod$residuals, type = 'Ljung-Box')

## 
##  Box-Ljung test
## 
## data:  tsMod$residuals
## X-squared = 0.037092, df = 1, p-value = 0.8473