Time Series Forecasting Using ARIMA

Now we will add data to the model from #Yahoo Finance

Here we have used WIPRO as our stock for forecasting with the last 10 years of stock data.

options("getSymbols.warning4.0"=FALSE)
options("getSymbols.yahoo.warning"=FALSE)
getSymbols("WIPRO.NS",src = 'yahoo', from='2011-05-01', to='2021-06-20')

## [1] "WIPRO.NS"

class(WIPRO.NS)

## [1] "xts" "zoo"

Firstly we will remove the null values from our dataset abd then convert it into a time-series. Then have a look at this time series

WIPRO.NS1 <- na.locf(WIPRO.NS,fromLast = FALSE)
DAta <- ts(WIPRO.NS1[,4], start = c(2011,1),end = c(2020,1),frequency = 252)
autoplot(DAta)

class(DAta)

## [1] "ts"

Before applying ARIMA model, we need to make sure that our time series is stationary. For that we look at ACF and PACF graphs respectively, then go on further with ADF test to be certain.

par(mfrow = c(1,2))
acf(DAta)
Pacf(DAta)

In the ACF plot, we can see that the data is not stationary and the ACF value is degrading over a period of lag. So to make our data stationary, we take log of it and then difference the entire series once.

par(mfrow = c(1,2))
lddata <- diff(log(DAta))
acf(lddata)
pacf(lddata)

As we can see in these plots, the series look stationary, and have AR and MA values at lag 1 and lag 0 respectively and d value as 1 , since we differenced the time series once. Now to verify that our time-series is staionary , we perform ADF(Augmented Dickey-Fuller) test.

lddata <- na.omit(lddata)
adf.test(lddata)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  lddata
## Dickey-Fuller = -12.634, Lag order = 13, p-value = 0.01
## alternative hypothesis: stationary

As we get our p-value as 0.01 which accepts our alternative hypothesis as stationary. We can now move further with our ARIMA model. Note:d stands for number of time we need to difference our data to make it staionary. Here we will use arima function of the Forecast package with our corresponding values of AR and MA as (1,1,0):

fitA <- arima(DAta, order = c(1,0,0))
summary((fitA))

## 
## Call:
## arima(x = DAta, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.9967   198.4091
## s.e.  0.0016    17.4154
## 
## sigma^2 estimated as 9.632:  log likelihood = -5791.87,  aic = 11589.74
## 
## Training set error measures:
##                      ME     RMSE      MAE         MPE     MAPE     MASE
## Training set 0.02519473 3.103611 2.220736 -0.01399217 1.137449 1.000045
##                     ACF1
## Training set -0.03621876

We will further conduct some tests with different values of arima to see if our model is efficient as compared to them or not. This time we take (1,1,0):

fitB <- arima(DAta, order = c(1,1,0))
summary((fitB))

## 
## Call:
## arima(x = DAta, order = c(1, 1, 0))
## 
## Coefficients:
##           ar1
##       -0.0385
## s.e.   0.0210
## 
## sigma^2 estimated as 9.635:  log likelihood = -5787.12,  aic = 11578.24
## 
## Training set error measures:
##                      ME    RMSE      MAE          MPE    MAPE    MASE
## Training set 0.02732817 3.10335 2.222613 0.0007961679 1.13857 1.00089
##                       ACF1
## Training set -3.666146e-05

Now taking (1,1,1):

fitC <- arima(DAta, order = c(1,0,0))
summary((fitC))

## 
## Call:
## arima(x = DAta, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.9967   198.4091
## s.e.  0.0016    17.4154
## 
## sigma^2 estimated as 9.632:  log likelihood = -5791.87,  aic = 11589.74
## 
## Training set error measures:
##                      ME     RMSE      MAE         MPE     MAPE     MASE
## Training set 0.02519473 3.103611 2.220736 -0.01399217 1.137449 1.000045
##                     ACF1
## Training set -0.03621876

And now, to cross-check our deductions, we can use auto-arima model on our time series data:

fitA <- auto.arima(DAta, seasonal = FALSE)
summary((fitA))

## Series: DAta 
## ARIMA(1,1,0) 
## 
## Coefficients:
##           ar1
##       -0.0385
## s.e.   0.0210
## 
## sigma^2 estimated as 9.639:  log likelihood=-5787.12
## AIC=11578.24   AICc=11578.25   BIC=11589.7
## 
## Training set error measures:
##                      ME    RMSE      MAE          MPE    MAPE       MASE
## Training set 0.02732817 3.10335 2.222613 0.0007961679 1.13857 0.07481713
##                       ACF1
## Training set -3.666146e-05

Now we clearly see that our AIC value is lowest in the model chosen by us and the model chosen by auto-arima, so now we can go on with forecasting with the model (1,1,0):

Before that, we take a look at our model’s residuals:

tsdisplay(residuals(fitB), lag.max = 40, main = "Residuals at auto")

Predicting the next 30 days data using our model:

term <- 30
predict(fitB,n.ahead = 30)

## $pred
## Time Series:
## Start = c(2020, 2) 
## End = c(2020, 31) 
## Frequency = 252 
##  [1] 227.3884 227.3966 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963
##  [9] 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963
## [17] 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963
## [25] 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963
## 
## $se
## Time Series:
## Start = c(2020, 2) 
## End = c(2020, 31) 
## Frequency = 252 
##  [1]  3.104032  4.306165  5.241997  6.034315  6.734049  7.367624  7.950871
##  [8]  8.494163  9.004736  9.487873  9.947572 10.386946 10.808474 11.214169
## [15] 11.605690 11.984427 12.351557 12.708085 13.054880 13.392698 13.722201
## [22] 14.043976 14.358542 14.666363 14.967854 15.263391 15.553314 15.837930
## [29] 16.117522 16.392345

fcast1 <- forecast(fitA, h=term,level = c(80, 95))

Forecasting the data on plot:

autoplot(fcast1)

Time Series Forecasting Using ARIMA

Mohammad Sharique Salman

7/1/2021

Now we will add data to the model from #Yahoo Finance