Testing for normality of financial returns

As the dates were not formatted as dates but rather as characters it was also a requirement to convert these character columns into a date format instead.

Once all the data was imported and the date column was formatted correctly the two data sets could be merged together.

For the purpose of this exercise the date range was chosen from 9/4/2014 until 29/3/2018, 1001 data points.

#SETUP

#set working directory 
setwd('C:/Users/tesso/Documents/R Assignment')

#data ranges from 29/3/2018 to 9/4/2014

#import S&P 500 data
SP<-read.csv(file="SP.csv", stringsAsFactors = FALSE)
#convert Date column into dates format
SP$Date<-as.Date.character(SP$Date, tryFormats = c("%Y-%m-%d"))


#import Johnson & Johnson data
JNJ<-read.csv(file="JNJ.csv", stringsAsFactors = FALSE)
#convert Date column into dates format
JNJ$Date<-as.Date.character(JNJ$Date, tryFormats = c("%Y-%m-%d"))


#Merge SP and JNJ by Dates
New_1<-merge(x=SP,y=JNJ, by="Date", all=FALSE)

As the data may not necessarily be in date order it is important to first of all to order the outputs by date.

This will assist when creating the log returns as the date range will need to be ordered to complete the subtractions.


Data_ordered<-New_1[order(New_1$Date, decreasing = TRUE),]
Data_1001<-Data_ordered[1:1001,]

After the inital set up was completed a loop function can then be set up to calculate the log returns.


n_1<-nrow(Data_1001)

#create new column for Log Returns and set to zero
Data_1001$R_SPt<-NA

#in column R_SPt calculate the log returns using a for-loop function
for(i in 1:n_1) {
  Data_1001$R_SPt[i] <- log(Data_1001$Adj.Close.x[i]) - log(Data_1001$Adj.Close.x[i+1])
}

##create new column for Log Returns and set to zero
Data_1001$R_t<-NA

#in column R_t calculate the log returns using a for-loop function
for(j in 1:n_1) {
  Data_1001$R_t[j] <- log(Data_1001$Adj.Close.y[j]) - log(Data_1001$Adj.Close.y[j+1])
}

View(Data_1001)

The log returns were then compared against a normal distribution to check if the data would fit.

Below are a few examples that have been chosen to test the data against the normal distribution.

The first test in each section is to compare the S&P 500 Index against a normal distrubution. The second then compares the Johnson and Johnson data.


library(fitdistrplus)
#> Loading required package: MASS
#> Loading required package: survival
library(metRology)
#> 
#> Attaching package: 'metRology'
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
library(normtest)
library(goftest)



#set RSP to log returns of SP ; RJNJ to log returns of JNJ ; omitting row 1001
Data_1000<-Data_1001[-1001,]

attach(Data_1000)

n<-nrow(Data_1000)

#compute a Jacque-Berra test for normality
jb.norm.test(R_SPt)
#> 
#>  Jarque-Bera test for normality
#> 
#> data:  R_SPt
#> JB = 564.57, p-value < 2.2e-16
jb.norm.test(R_t)
#> 
#>  Jarque-Bera test for normality
#> 
#> data:  R_t
#> JB = 482.03, p-value < 2.2e-16

#compute a Cramer-von Mises test for normality
cvm.test(R_SPt, null = "pnorm")
#> 
#>  Cramer-von Mises test of goodness-of-fit
#>  Null hypothesis: Normal distribution
#> 
#> data:  R_SPt
#> omega2 = 81.658, p-value = 0.003435
cvm.test(R_t, null="pnorm")
#> 
#>  Cramer-von Mises test of goodness-of-fit
#>  Null hypothesis: Normal distribution
#> 
#> data:  R_t
#> omega2 = 81.322, p-value = 0.003383

#computer an Anderson Darling test for normality
ad.test(R_SPt, null = "pnorm")
#> 
#>  Anderson-Darling test of goodness-of-fit
#>  Null hypothesis: Normal distribution
#> 
#> data:  R_SPt
#> An = 379.59, p-value = 6e-07
ad.test(R_t, null="pnorm")
#> 
#>  Anderson-Darling test of goodness-of-fit
#>  Null hypothesis: Normal distribution
#> 
#> data:  R_t
#> An = 378.25, p-value = 6e-07

Three tests were used to compare both the S&P 500 index and the Johnson & Johnson stock to a normal distribution. The Jarque-bera test was used to compare the skewness and kurtosis of both data sets against that of the normal distribution. In contrast the Cramer-von Mises and Anderson-Darling tests were both used to compare the goodness of fit of the data to the normal distribution. At a 5% level of significance the hypothesis that the data follows the normal distribution has been rejected in each case as the p-values are much lower than our level of significance.