library(quantmod)
library(tseries)
library(timeSeries)
library(forecast)
library(xts)
Here we have used WIPRO as our stock for forecasting with the last 10 years of stock data.
options("getSymbols.warning4.0"=FALSE)
options("getSymbols.yahoo.warning"=FALSE)
getSymbols("WIPRO.NS",src = 'yahoo', from='2011-05-01', to='2021-06-20')
## [1] "WIPRO.NS"
class(WIPRO.NS)
## [1] "xts" "zoo"
Firstly we will remove the null values from our dataset abd then convert it into a time-series. Then have a look at this time series
WIPRO.NS1 <- na.locf(WIPRO.NS,fromLast = FALSE)
DAta <- ts(WIPRO.NS1[,4], start = c(2011,1),end = c(2020,1),frequency = 252)
autoplot(DAta)
class(DAta)
## [1] "ts"
Before applying ARIMA model, we need to make sure that our time series is stationary. For that we look at ACF and PACF graphs respectively, then go on further with ADF test to be certain.
par(mfrow = c(1,2))
acf(DAta)
Pacf(DAta)
In the ACF plot, we can see that the data is not stationary and the ACF value is degrading over a period of lag. So to make our data stationary, we take log of it and then difference the entire series once.
par(mfrow = c(1,2))
lddata <- diff(log(DAta))
acf(lddata)
pacf(lddata)
As we can see in these plots, the series look stationary, and have AR and MA values at lag 1 and lag 0 respectively and d value as 1 , since we differenced the time series once. Now to verify that our time-series is staionary , we perform ADF(Augmented Dickey-Fuller) test.
lddata <- na.omit(lddata)
adf.test(lddata)
##
## Augmented Dickey-Fuller Test
##
## data: lddata
## Dickey-Fuller = -12.634, Lag order = 13, p-value = 0.01
## alternative hypothesis: stationary
As we get our p-value as 0.01 which accepts our alternative hypothesis as stationary. We can now move further with our ARIMA model. Note:d stands for number of time we need to difference our data to make it staionary. Here we will use arima function of the Forecast package with our corresponding values of AR and MA as (1,1,0):
fitA <- arima(DAta, order = c(1,0,0))
summary((fitA))
##
## Call:
## arima(x = DAta, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.9967 198.4091
## s.e. 0.0016 17.4154
##
## sigma^2 estimated as 9.632: log likelihood = -5791.87, aic = 11589.74
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02519473 3.103611 2.220736 -0.01399217 1.137449 1.000045
## ACF1
## Training set -0.03621876
We will further conduct some tests with different values of arima to see if our model is efficient as compared to them or not. This time we take (1,1,0):
fitB <- arima(DAta, order = c(1,1,0))
summary((fitB))
##
## Call:
## arima(x = DAta, order = c(1, 1, 0))
##
## Coefficients:
## ar1
## -0.0385
## s.e. 0.0210
##
## sigma^2 estimated as 9.635: log likelihood = -5787.12, aic = 11578.24
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02732817 3.10335 2.222613 0.0007961679 1.13857 1.00089
## ACF1
## Training set -3.666146e-05
Now taking (1,1,1):
fitC <- arima(DAta, order = c(1,0,0))
summary((fitC))
##
## Call:
## arima(x = DAta, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.9967 198.4091
## s.e. 0.0016 17.4154
##
## sigma^2 estimated as 9.632: log likelihood = -5791.87, aic = 11589.74
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02519473 3.103611 2.220736 -0.01399217 1.137449 1.000045
## ACF1
## Training set -0.03621876
And now, to cross-check our deductions, we can use auto-arima model on our time series data:
fitA <- auto.arima(DAta, seasonal = FALSE)
summary((fitA))
## Series: DAta
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.0385
## s.e. 0.0210
##
## sigma^2 estimated as 9.639: log likelihood=-5787.12
## AIC=11578.24 AICc=11578.25 BIC=11589.7
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02732817 3.10335 2.222613 0.0007961679 1.13857 0.07481713
## ACF1
## Training set -3.666146e-05
Now we clearly see that our AIC value is lowest in the model chosen by us and the model chosen by auto-arima, so now we can go on with forecasting with the model (1,1,0):
Before that, we take a look at our model’s residuals:
tsdisplay(residuals(fitB), lag.max = 40, main = "Residuals at auto")
Predicting the next 30 days data using our model:
term <- 30
predict(fitB,n.ahead = 30)
## $pred
## Time Series:
## Start = c(2020, 2)
## End = c(2020, 31)
## Frequency = 252
## [1] 227.3884 227.3966 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963
## [9] 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963
## [17] 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963
## [25] 227.3963 227.3963 227.3963 227.3963 227.3963 227.3963
##
## $se
## Time Series:
## Start = c(2020, 2)
## End = c(2020, 31)
## Frequency = 252
## [1] 3.104032 4.306165 5.241997 6.034315 6.734049 7.367624 7.950871
## [8] 8.494163 9.004736 9.487873 9.947572 10.386946 10.808474 11.214169
## [15] 11.605690 11.984427 12.351557 12.708085 13.054880 13.392698 13.722201
## [22] 14.043976 14.358542 14.666363 14.967854 15.263391 15.553314 15.837930
## [29] 16.117522 16.392345
fcast1 <- forecast(fitA, h=term,level = c(80, 95))
Forecasting the data on plot:
autoplot(fcast1)