Data analysis involves analyzing available data to find out the pattern and trend in the data for future predictions and decision making. Russel 2000 index tracks 2000 small cap companies in the United States. Market cap is the total value of a given company’s total share of stock. The historical data obtained covers a ten year period (22nd November, 2011-22nd November, 2021). The data has an underlying price of 2343.50. The data is obtained from the Yahoo website.
The objectives of data analysis include: Obtaining trend of time series data and do an extensive analysis of the same. Obtaining the non-linear trend for time series data and study the behaviour of residuals resulting from the trend. Obtain log returns and fit the GEOMETRIC BROWNIAN MOTION process. Study the residual properties resulting from the GBM process.
The data obtained is for Russel 2000 index from Yahoo website. Russel 2000 index tracks 2000 small cap companies in the United States. Market cap is the total value of a given company’s total share of stock. The data was downloaded on 22nd November, 2021 and covers a period of 10 years (22nd November, 2011-22nd November, 2021). The underlying price during the time of download was 2343.50. The computation of this data is done using R software. Data is downloaded, cleaned and arranged according to maturity. The data is saved in a csv format and then imported into R software using read.csv command. The data has 2516 observations of seven variables namely: Date, opening price, high price, low price, closing price, adjusted closing price and trade volume.
setwd("C:\\Users\\Peter Mokua\\Desktop\\SAC 703")
RUSSEL2000<-read.csv("RUSSEL2000.CSV", sep=",", header=TRUE)
dim(RUSSEL2000)
## [1] 2516 7
attach(RUSSEL2000)
names(RUSSEL2000)
## [1] "Date" "Open" "High" "Low" "Close" "Adj.Close"
## [7] "Volume"
head(RUSSEL2000)
tail(RUSSEL2000)
We are interested in Adjacent closing prices. We subset it and view its head and tail.
Adj.Close<-subset(RUSSEL2000,select = c(6))
head(Adj.Close)
tail(Adj.Close)
We confirm the data type for adjusting closing prices whether it is a data frame or time series.If it turns out to be a data frame we convert it to a time series.
class(Adj.Close)
## [1] "data.frame"
Adj.Close.ts<-ts(Adj.Close, frequency=365)
class(Adj.Close.ts)
## [1] "ts"
The data has been transformed to time series for analysis.
We install the e1071 package
require(e1071)
## Loading required package: e1071
Checking for missing values in the data.
sum(is.na(Adj.Close.ts))
## [1] 0
There are no missing values in the data. We proceed to check the dimension and summary statistics for the data.
dim(Adj.Close.ts)
## [1] 2516 1
summary(Adj.Close.ts)
## Adj.Close
## Min. : 666.2
## 1st Qu.:1114.6
## Median :1264.1
## Mean :1351.1
## 3rd Qu.:1552.4
## Max. :2442.7
var(Adj.Close.ts)
## Adj.Close
## Adj.Close 148013.8
skewness(Adj.Close.ts)
## [1] 0.7901006
var(Adj.Close.ts)
## Adj.Close
## Adj.Close 148013.8
kurtosis(Adj.Close.ts)
## [1] 0.4045816
plot(Adj.Close.ts,main='PLOT OF ADJUSTED CLOSING STOCK PRICES 2011-2021',col='red',ylab='Adj.Close',xlab='TIME(years)')
We can confirm from the graph that Russel 2000 has an upward trend with some outliers in the lower values. The plot does not appear to be stationary . This can be confirmed using the ACF function. The data is first decomposed to separate observations, trend, seasonality and error components.
Trend<-decompose(Adj.Close.ts)
plot(Trend,col='red')
The observed components and trend components both have an upward trend as revealed by the data earlier.
Using the ACF function we confirm if the time series has a trend.
acf(Adj.Close.ts,lag.max = 40)
The ACF function shows no decay to zero hence presence of a trend.
We assume the data can be modelled using GBM process. GBM for a random process X can be specified by ;
\(dX_t\)=\(μX_t\)\(dt\)+\(σX_t\)\(dW_t\) \(μ\) , \(σ\) > \(0\)
We get the soluion for the process using Ito formular,
\(df\)=\(f_t\)\(dt\)+\(f_x\)\(dx\)+\(0.5\)\(f_x\)\(_x\)\((dx)\)\(^2\)
We get the solution,
\(X_T\)=\(X_o\)\(exp\)([\(μ\)-\(0.5\)\(σ^2\)]\(T\)+\(σW_T\))
\(logX_T\)=\(N\)(\(logX_o\)+[\(μ\)-\(0.5\)\(σ^2\)]\(T\), \(σ^2\)\(T\))
We assume the trend of the process is non-linear and we try to fit a non-linear equation given by
\(P_t\)=\(A\)\(exp\)[\(Bt\)+\(θ_t\)] \(θ\)~\(N\)(\(0\),\(σ^2\))
We convert the non-linear equation to a linear one by introducing natural log,
\(logP_t\)=\(logA\)+\(Bt\)+\(θ_t\)
Where \(logP_t\)=\(Y\) and \(logA\)=\(X\)
\(Y\)=\(X\)+\(Bt\)+\(θ_t\)
Where \(θ_t\) is a residue . We transform the Adjusting closing prices to logs.
log.pt<-log(Adj.Close.ts)
head(log.pt)
## Adj.Close
## [1,] 6.545723
## [2,] 6.513734
## [3,] 6.501530
## [4,] 6.548076
## [5,] 6.545493
## [6,] 6.603158
tail(log.pt)
## Adj.Close
## [1,] 7.788120
## [2,] 7.783611
## [3,] 7.785314
## [4,] 7.773599
## [5,] 7.767937
## [6,] 7.759256
We estimate the value of X and B using simple regression in R.
lm(log.pt~time(log.pt))
##
## Call:
## lm(formula = log.pt ~ time(log.pt))
##
## Coefficients:
## (Intercept) time(log.pt)
## 6.5993 0.1283
summary(lm(log.pt~time(log.pt)))
##
## Call:
## lm(formula = log.pt ~ time(log.pt))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.56388 -0.06735 0.01177 0.08625 0.21619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.599348 0.005409 1220.1 <2e-16 ***
## time(log.pt) 0.128316 0.001111 115.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1108 on 2514 degrees of freedom
## Multiple R-squared: 0.8415, Adjusted R-squared: 0.8415
## F-statistic: 1.335e+04 on 1 and 2514 DF, p-value: < 2.2e-16
The following results are obtained
\(B\)=0.1283 \(X\)=6.5993
we estimate the value of \(A\) ,\(A\)=\(exp\)(6.5993)=734.58
\(P_t\)=\(734.58\)\(exp\)[\(0.1283t\)+\(θ_t\)] \(θ\)~\(N\)(\(0\),\(σ^2\))
From the GBM model, \(X_T\)=\(X_o\)\(exp\)([\(μ\)-\(0.5\)\(σ^2\)]\(T\)+\(σW_T\)),
\(X_o\)=\(734.58\), [\(μ\)-\(0.5\)\(σ^2\)]\(T\)= \(0.1283\)\(T\)
plot(lm(log.pt~time(log.pt)),col='red')
Using the obtained equation we can predict future values.
Predict.pt<-predict(lm(log.pt~time(log.pt)))
head(Predict.pt)
## 1 2 3 4 5 6
## 6.727665 6.728016 6.728368 6.728719 6.729071 6.729422
We can plot the predicted values.
plot.ts(Predict.pt,main='PREDICTED ADJUSTING CLOSING PRICES',col='red',xlab='Time',ylab='Predicted.ts')
We transform the predicted values back to the original form.
ActualPredict.pt<-exp(Predict.pt)
head(ActualPredict.pt)
## 1 2 3 4 5 6
## 835.1945 835.4882 835.7820 836.0758 836.3698 836.6639
plot.ts(ActualPredict.pt,main='PREDICTED ADJUSTING CLOSING PRICES',col='red',xlab='Time',ylab='Predicted.ts')
Residuals<-residuals(lm(log.pt~time(log.pt)))
summary(Residuals)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.56388 -0.06735 0.01177 0.00000 0.08625 0.21619
We test for normality of the residuals using shapiro test
Null hypothesis: Residuals conform to a normal distribution
Alternative hypothesis:Residuals do not conform to a normal distribution
shapiro.test(Residuals)
##
## Shapiro-Wilk normality test
##
## data: Residuals
## W = 0.94574, p-value < 2.2e-16
The p-value is less than the level of significance (0.05) hence we conclude that the residuals conform to a normal distribution. The normality can be supported by qq plots and histogram.
hist(Residuals)
qqnorm(Residuals)