# Load library
library(forecast)
library(ggplot2)
library(tseries)
library(car)
library(lmtest)
library(xts)
library(dygraphs)
library(tidyverse)
library(RColorBrewer)Time Series Analysis of Reliance Share Price
ABSTRACT
This Article is aimed towards applying Time Series Analysis on Reliance Stock Price for the years 2005-2023, analyze trends and patterns and forecasting future values using different statistical methods.
Introduction
Reliance Industries Limited is an Indian multinational conglomerate headquartered in Mumbai. Its businesses include energy, petrochemicals, natural gas, retail, telecommunications, mass media, and textiles. Reliance is the largest public company in India by market capitalization and revenue, and the 100th largest company worldwide. The number of shares are 644.51 crores. The company’s equity shares are listed on National Stock Exchange (NSE) and Bombay Stock Exchange (BSE).
The data we’ll use is taken from Yahoo Finance Website. Click here
Let’s start by loading the respective libraries.
Let’s import our data and convert it into time series object.
reliance = read.csv("RELIANCE.BO1.csv")
reliance = reliance$Close
reliance_data = ts(reliance , start =c(2005,1), end = c(2023,52) , frequency = 52)
plot(reliance_data,main = "Closing weekly stock price of Reliance" , xlab ="Year",ylab ="stock price of Reliance")#taking log
reliance = ts(log(reliance) , start = c(2005,1) ,end = c(2023,52), frequency = 52)
plot(reliance , main = "Closing weekly stock price of Reliance" , xlab ="Year" , ylab = "log(closing stock price)")In the 1st graph above, we plot the closing daily Reliance Stock Prices in original units , and can see that the trend and variations are very huge (from 0 to 2500( (in rupees)). So to curb this exponential nature we take log of daily closing prices of Reliance Stock price and can see that the closing prices are somewhere between 5 to 8.
Now we will split the data into training and testing sets. The training set is used to create our models and for this we will use data from Jan 2005 to Dec 2021. The test set is used to check performance of our models, for which we compare forecasted future values and actual values. The test set includes values from Jan 2022 to Dec 2023.
ts_train = window(reliance , start = c(2005,1) , end = c(2021,52) , frequency = 52)
ts_test = window(reliance , start = c(2022,1) , end = c(2023,52) , frequency = 52)Exploratory Data Analysis
In this section, we will use different graphs to visualize different underlying trend and seasonality in our stock price data.
ggtsdisplay(ts_train , points = T , cex = 0.5 , smooth = T , plot.type = "partial", main = "Log weekly closing stock price of Reliance" , xlab = "Years" , ylab = "Log(closing stock prices)")ggtsdisplay gives 3 plots: (1) Plot of original series, (2) Plot of ACF and (3) Plot of PACF
we can see that the Reliance stock does not seem to be constant and it’s variability too is not constant.
The ACF decays slowly which implies that the future values of the series are correlated i.e heavily affected by past values.
The PACF cuts-off initially.
So, this tells us that the data is not stationary.
ggseasonplot(ts_train, year.labels = T) +
ggtitle("Seasonal Plot", sub="Closing stock prices")Here, we have plot to visualize seasonality aspects.
Every year gets it’s own curve
From the above plot, it is visible that there is no common seasonal plot.
The Y-axis indicates that the variation due to seasonality is not too much.
decomposed = stl(ts_train, s.window = "periodic")
plot(decomposed)Here, we breakdown our stock price into Trend + Seasonality + Randomness. Since, we are adding the three components, we say that this is Additive Decomposition. Other option would be multiplication, giving us Multiplicative Decomposition.
Using the STL command, we get error bars on the right-hand side. Anything between the bars is statistically insignificant. Thus, we can say that the seasonality component is not significant.
Let’s check the range of seasonality and randomness.
range(decomposed$time.series[,1]) # range of seasonality[1] -0.05635890 0.05419218
range(decomposed$time.series[,3]) # range of randomness[1] -0.5011871 0.2838602
Range of seasonality is -0.05635890 to 0.05419218 i.e due to the changing season , the stock price falls by 5.635% and rises by 5.419%. This is the size of our seasonal variation.
Range of Randomness is -0.5011871 to 0.2838602 i.e because of the randomness, the stock price falls by 50.11871% and rises by 28.38602% which is very huge. This is an unpredictable part of the data.
Let’s visualize Trend, Seasonality and Randomness.
We saw that the data is not stationary. So, we need to difference the series. This means that we subtract current value with some previous value. If we subtract current value with previous value, we say that the order of differencing is one. To determine the correct order of differencing, we run this for loop.
# for loop
for (i in 1:5) {
print(paste("Order of Differencing :",i,
"| Value of Variance :",
round(var(diff(ts_train, differences = i)),4))
)
}[1] "Order of Differencing : 1 | Value of Variance : 0.002"
[1] "Order of Differencing : 2 | Value of Variance : 0.0041"
[1] "Order of Differencing : 3 | Value of Variance : 0.0128"
[1] "Order of Differencing : 4 | Value of Variance : 0.0441"
[1] "Order of Differencing : 5 | Value of Variance : 0.1587"
The for loop differences the series and calculates the variance. If the variability increases due to differencing, we need to stop. Since the value of variance of difference increases after the first value, we difference the series once.
diff1 = diff(ts_train, lag = 1, differences = 1)
ggtsdisplay(diff1, points = T, plot.type = "partial", smooth = T, cex=0.5,
main="Differenced Series", xlab="Years", ylab="Change in log(stock price)")Looking at this differenced series, we can see that the variation have now died down. However, to confirm if it is stationary, we need to perform some unit root tests.
Augmented Dickey Fuller (ADF) Test
H0: Series is not stationary
H1: Series is stationary
adf.test(ts_train)
Augmented Dickey-Fuller Test
data: ts_train
Dickey-Fuller = -1.8003, Lag order = 9, p-value = 0.6629
alternative hypothesis: stationary
adf.test(diff1)
Augmented Dickey-Fuller Test
data: diff1
Dickey-Fuller = -9.8652, Lag order = 9, p-value = 0.01
alternative hypothesis: stationary
Looking at the p-values the differenced series is stationary, but the original series is not.
Kwiatkowski Philips Schmidt Shin (KPSS) Test
H0: Series is stationary
H1: Series is not stationary
Here, the hypotheses are just the opposite of ADF test, so KPSS test results sort of confirm the results of ADF test and vice-versa.
kpss.test(ts_train)
KPSS Test for Level Stationarity
data: ts_train
KPSS Level = 8.8764, Truncation lag parameter = 6, p-value = 0.01
kpss.test(diff1)
KPSS Test for Level Stationarity
data: diff1
KPSS Level = 0.11127, Truncation lag parameter = 6, p-value = 0.1
Forecasting Models
Now it is time to make some predictions ! For this task, we will take up the following models one by one.
# Naive Model
ts_train_naive = naive(ts_train, h = 104)
plot(ts_train_naive,xlab = "Year" , ylab = "log(closing share price)")# SNaive Model
ts_train_snaive = snaive(ts_train, h = 104)
plot(ts_train_snaive,xlab = "Year" , ylab = "log(closing share price)")Note: The dark and light blue shaded regions around the forecast line represent 80% and 95% Confidence Intervals.
Look at the data from 2022 and 2023:
Naive models are very simple.
Forecast_today = forecast_tomorrow
So last observed value of week 52 of year 2021 is 7.8075
Thus, all forecasts are same as 7.8075
Note that these are the log values and not the actual values.
SNaive or seasonal Naive extends the Naive model introducing seasonality.
Forecast of first week of Jan 2022 = Observed of first week of Jan 2021 = 7.597
Forecast of second week of Jan 2022 = Observed of second week of Jan 2021 = 7.5978 and so on.
The Naive model are vital because they act as a good basis to compare the rest of the models. If a model is performing poorer than the Naive models, it is best we re-evaluate it or discard it.
ts_train_rwd <- forecast(ts_train, method = "rwdrift", h=104)
plot(ts_train_rwd , xlab = "Year" , ylab = "Stock price")Here, “drift” implies that there is an ability to pick up the trend and seasonality.
We can see that that forecast is not constant. It has picked up some trend and seasonality.
The dark and light shaded region around the forecast represent 80% and 95% CI.
SARIMA stands for Seasonal Auto-regressive Integrated Moving Average.
sarima = auto.arima(ts_train, ic="aic")
sarimaSeries: ts_train
ARIMA(2,1,2)(1,0,0)[52] with drift
Coefficients:
ar1 ar2 ma1 ma2 sar1 drift
-0.0266 -0.5603 0.0052 0.6781 -0.0001 0.0033
s.e. 0.1452 0.1478 0.1291 0.1313 0.0343 0.0016
sigma^2 = 0.001929: log likelihood = 1509.76
AIC=-3005.53 AICc=-3005.4 BIC=-2972.04
Based on this, we need to difference the series once. This confirms what we concluded using the for loop before.
sarima212 = auto.arima(diff1, ic = "aic")
sarima212Series: diff1
ARIMA(2,0,2)(1,0,0)[52] with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 sar1 mean
-0.0266 -0.5603 0.0052 0.6781 -0.0001 0.0033
s.e. 0.1452 0.1478 0.1291 0.1313 0.0343 0.0016
sigma^2 = 0.001929: log likelihood = 1509.76
AIC=-3005.53 AICc=-3005.4 BIC=-2972.04
SARIMA does not provide a good fit. So, SARIMA model is fitted along with Linear Regression.
The idea here is to model the series as Linear Regression but with errors from SARIMA model.
# Creating time covariate object
time = seq(2005,2021+51/52, by = 1/52)
# Model
reg.arma = auto.arima(ts_train, xreg = time, ic="aic")
# Prediction
reg.arma.p = forecast(reg.arma, xreg = seq(2022, 2023+51/52, 1/12))
reg.armaSeries: ts_train
Regression with ARIMA(0,0,3)(0,0,2)[52] errors
Coefficients:
ma1 ma2 ma3 sma1 sma2 intercept xreg
1.3506 1.1972 0.6124 0.3930 0.2513 -242.6653 0.1237
s.e. 0.0440 0.0352 0.0261 0.0355 0.0327 7.6570 0.0038
sigma^2 = 0.007754: log likelihood = 889.64
AIC=-1763.28 AICc=-1763.12 BIC=-1725.01
plot(reg.arma.p,xlab = "Year" , ylab = "log(closing share price)")This model forecasts future values based on the equation:
forecast_tomorrow = actual_today + forecast_today
The above equation is processed by taking into account respective weights.
It is like a weighted average of past, with weightage more towards recent ones. Thus, we can call this Exponentially Weighted Moving Average or Simple Exponential Smoothing as their weights are Exponential Smoothing Weights.
ts_train_ses <- ses(ts_train, h=104)
plot(ts_train_ses,xlab = "Year" , ylab = "log(closing share price)")R has given the same values for all the weeks because it assumes that forecast_today = actual_today.
Thus, forecast_week1_2022 = actual_week1_2022 = 7.807
ts_train_HW <- holt(ts_train, h=104)
plot(ts_train_HW,xlab = "Year" , ylab = "log(closing share price)")Conclusion
Accuracy Measures: RSME
RMSE is Square Root of Mean Square Error. It is analogous to standard deviation.
What does RMSE say?
If RMSE = 2 and forecast = 20, it roughly implies that we are 95% confident of observing reality between 16 and 24 i.e. (20(+/-)1.96*2)
Accuracy Measures: MAPE
It is Mean Absolute Errors in percentage (%). It is a measure of prediction accuracy of a forecasting method in statistics.
What does MAPE say?
If MAPE = 0.036, it implies that on average we are 3.6% away from the actual value. But it does not specify the direction i.e.+/-.
For a good forecast, we want small values of RMSE and MAPE as it implies that we have committed less errors.
Bias-Variance trade off
This is another important aspect in building and choosing any statistical model.
Bias refers to assumptions made in the model to simplify the task at hand. Whereas, Variance refers to sensitivity of model to changes in the training data.
If we want to reduce the variability of the model, we need lesser parameters to be estimated.
But as soon as we estimate less parameters, we are not estimating correctly i.e.creating a bias.
Reducing Variability => Reducing parameters => Increasing Bias
From an analytical perspective, to some extent, RMSE tracks Variance and MAPE tracks bias.
So, a model with low MAPE has less BIAS and a model with low RMSE has less variability in results.
Model performance Comparison (tabulated in Excel)
This table is formulated in Excel to properly compare the results.
Based on the above results, the Simple Exponential Model seems to be the best fit with lowest RMSE and MAPE and Linear Regression with SARIMA errors seems to be the second best fit.
However, it is important to remember that:
All models are wrong. But some models are useful!
Final Model
Let’s finally run the Simple Exponential Model on our stock price data.
ts_train_data <- window(reliance_data, start = c(2005, 1),
end = c(2021, 52), frequency = 52)
ts_test_data <- window(reliance_data, start = c(2022, 1),
frequency = 52)
original <- ses(ts_train_data, h=104)
plot(original , xlab = "Year" , ylab = "Share Price")Also running the Linear Regression with SARIMA errors model:
ts_train_data <- window(reliance_data, start = c(2005, 1),
end= c(2021, 52), frequency = 52)
ts_test_data <- window(reliance_data, start = c(2022, 1),
frequency = 52)
# Creating time covariate object
time = seq(2005,2021+51/52, by = 1/52)
# Model
reg.arma_data = auto.arima(ts_train_data, xreg = time, ic="aic")
# Prediction
reg.arma.p_data = forecast(reg.arma_data, xreg = seq(2022, 2023+51/52, 1/52))
plot(reg.arma.p_data)