DATA ANALYSIS

The data obtained is for Russel 2000 index from Yahoo website. Russel 2000 index tracks 2000 small cap companies in the United States. Market cap is the total value of a given company’s total share of stock. The data was downloaded on 22nd November, 2021 and covers a period of 10 years (22nd November, 2011-22nd November, 2021). The underlying price during the time of download was 2343.50. The computation of this data is done using R software. Data is downloaded, cleaned and arranged according to maturity. The data is saved in a csv format and then imported into R software using read.csv command. The data has 2516 observations of seven variables namely: Date, opening price, high price, low price, closing price, adjusted closing price and trade volume.

Importing the CSV data into R

setwd("C:\\Users\\Peter Mokua\\Desktop\\SAC 703")
RUSSEL2000<-read.csv("RUSSEL2000.CSV", sep=",", header=TRUE)
dim(RUSSEL2000)

## [1] 2516    7

attach(RUSSEL2000)
names(RUSSEL2000)

## [1] "Date"      "Open"      "High"      "Low"       "Close"     "Adj.Close"
## [7] "Volume"

head(RUSSEL2000)

tail(RUSSEL2000)

We are interested in Adjacent closing prices. We subset it and view its head and tail.

Adj.Close<-subset(RUSSEL2000,select = c(6))
head(Adj.Close)

tail(Adj.Close)

We confirm the data type for adjusting closing prices whether it is a data frame or time series.If it turns out to be a data frame we convert it to a time series.

class(Adj.Close)

## [1] "data.frame"

Adj.Close.ts<-ts(Adj.Close, frequency=365)

class(Adj.Close.ts)

## [1] "ts"

The data has been transformed to time series for analysis.

Outputing summary statistics for Adjacent close data

We install the e1071 package

require(e1071)

## Loading required package: e1071

Checking for missing values in the data.

sum(is.na(Adj.Close.ts))

## [1] 0

There are no missing values in the data. We proceed to check the dimension and summary statistics for the data.

dim(Adj.Close.ts)

## [1] 2516    1

summary(Adj.Close.ts)

##    Adj.Close     
##  Min.   : 666.2  
##  1st Qu.:1114.6  
##  Median :1264.1  
##  Mean   :1351.1  
##  3rd Qu.:1552.4  
##  Max.   :2442.7

var(Adj.Close.ts)

##           Adj.Close
## Adj.Close  148013.8

skewness(Adj.Close.ts)

## [1] 0.7901006

var(Adj.Close.ts)

##           Adj.Close
## Adj.Close  148013.8

kurtosis(Adj.Close.ts)

## [1] 0.4045816

Plotting time series

plot(Adj.Close.ts,main='PLOT OF ADJUSTED CLOSING STOCK PRICES 2011-2021',col='red',ylab='Adj.Close',xlab='TIME(years)')

We can confirm from the graph that Russel 2000 has an upward trend with some outliers in the lower values. The plot does not appear to be stationary . This can be confirmed using the ACF function. The data is first decomposed to separate observations, trend, seasonality and error components.

Trend<-decompose(Adj.Close.ts)

plot(Trend,col='red')

The observed components and trend components both have an upward trend as revealed by the data earlier.

Using the ACF function we confirm if the time series has a trend.

acf(Adj.Close.ts,lag.max = 40)

The ACF function shows no decay to zero hence presence of a trend.

Modelling Data using Geometric Brownian Motion

We assume the data can be modelled using GBM process. GBM for a random process X can be specified by ;

\(dX_t\)=\(μX_t\)\(dt\)+\(σX_t\)\(dW_t\) \(μ\) , \(σ\) > \(0\)

We get the soluion for the process using Ito formular,

\(df\)=\(f_t\)\(dt\)+\(f_x\)\(dx\)+\(0.5\)\(f_x\)\(_x\)\((dx)\)\(^2\)

We get the solution,

\(X_T\)=\(X_o\)\(exp\)([\(μ\)-\(0.5\)\(σ^2\)]\(T\)+\(σW_T\))

\(logX_T\)=\(N\)(\(logX_o\)+[\(μ\)-\(0.5\)\(σ^2\)]\(T\), \(σ^2\)\(T\))

We assume the trend of the process is non-linear and we try to fit a non-linear equation given by

\(P_t\)=\(A\)\(exp\)[\(Bt\)+\(θ_t\)] \(θ\)~\(N\)(\(0\),\(σ^2\))

We convert the non-linear equation to a linear one by introducing natural log,

\(logP_t\)=\(logA\)+\(Bt\)+\(θ_t\)

Where \(logP_t\)=\(Y\) and \(logA\)=\(X\)

\(Y\)=\(X\)+\(Bt\)+\(θ_t\)

Where \(θ_t\) is a residue . We transform the Adjusting closing prices to logs.

log.pt<-log(Adj.Close.ts)
head(log.pt)

##      Adj.Close
## [1,]  6.545723
## [2,]  6.513734
## [3,]  6.501530
## [4,]  6.548076
## [5,]  6.545493
## [6,]  6.603158

tail(log.pt)

##      Adj.Close
## [1,]  7.788120
## [2,]  7.783611
## [3,]  7.785314
## [4,]  7.773599
## [5,]  7.767937
## [6,]  7.759256

We estimate the value of X and B using simple regression in R.

lm(log.pt~time(log.pt))

## 
## Call:
## lm(formula = log.pt ~ time(log.pt))
## 
## Coefficients:
##  (Intercept)  time(log.pt)  
##       6.5993        0.1283

summary(lm(log.pt~time(log.pt)))

## 
## Call:
## lm(formula = log.pt ~ time(log.pt))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56388 -0.06735  0.01177  0.08625  0.21619 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.599348   0.005409  1220.1   <2e-16 ***
## time(log.pt) 0.128316   0.001111   115.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1108 on 2514 degrees of freedom
## Multiple R-squared:  0.8415, Adjusted R-squared:  0.8415 
## F-statistic: 1.335e+04 on 1 and 2514 DF,  p-value: < 2.2e-16

The following results are obtained

\(B\)=0.1283 \(X\)=6.5993

we estimate the value of \(A\) ,\(A\)=\(exp\)(6.5993)=734.58

\(P_t\)=\(734.58\)\(exp\)[\(0.1283t\)+\(θ_t\)] \(θ\)~\(N\)(\(0\),\(σ^2\))

From the GBM model, \(X_T\)=\(X_o\)\(exp\)([\(μ\)-\(0.5\)\(σ^2\)]\(T\)+\(σW_T\)),

\(X_o\)=\(734.58\), [\(μ\)-\(0.5\)\(σ^2\)]\(T\)= \(0.1283\)\(T\)

plot(lm(log.pt~time(log.pt)),col='red')

Using the obtained equation we can predict future values.

Predict.pt<-predict(lm(log.pt~time(log.pt)))

head(Predict.pt)

##        1        2        3        4        5        6 
## 6.727665 6.728016 6.728368 6.728719 6.729071 6.729422

We can plot the predicted values.

plot.ts(Predict.pt,main='PREDICTED ADJUSTING CLOSING PRICES',col='red',xlab='Time',ylab='Predicted.ts')

We transform the predicted values back to the original form.

ActualPredict.pt<-exp(Predict.pt)
head(ActualPredict.pt)

##        1        2        3        4        5        6 
## 835.1945 835.4882 835.7820 836.0758 836.3698 836.6639

plot.ts(ActualPredict.pt,main='PREDICTED ADJUSTING CLOSING PRICES',col='red',xlab='Time',ylab='Predicted.ts')

Residuals from the regression model

Residuals<-residuals(lm(log.pt~time(log.pt)))

summary(Residuals)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.56388 -0.06735  0.01177  0.00000  0.08625  0.21619

We test for normality of the residuals using shapiro test

Null hypothesis: Residuals conform to a normal distribution

Alternative hypothesis:Residuals do not conform to a normal distribution

shapiro.test(Residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  Residuals
## W = 0.94574, p-value < 2.2e-16

The p-value is less than the level of significance (0.05) hence we conclude that the residuals conform to a normal distribution. The normality can be supported by qq plots and histogram.

hist(Residuals)

qqnorm(Residuals)

R Notebook