Installation of the Libraries

Libraries

If making the financial analysis of the stock exchange data, the following packages are highly popular:

  1. quantmod
  2. PerformanceAnalytics
  3. tidyquant
#install.packages("PerformanceAnalytics", repos = "http://cran.us.r-project.org")
#install.packages("dplyr", repos = "http://cran.us.r-project.org")
#install.packages("tidyquant", repos = "http://cran.us.r-project.org")
#install.packages("quantmod", repos = "http://cran.us.r-project.org")
#install.packages("tseries", repos = "http://cran.us.r-project.org")
#install.packages("tidyverse", repos = "http://cran.us.r-project.org")



library(PerformanceAnalytics)  #useful package !!!
library(quantmod)              # useful package
library(tidyquant)             # useful

# other libraries

library(ggplot2)
library(dplyr)

library(tseries)
library(tidyverse)
library(plotly)
library(hrbrthemes)
library(xts)
library(knitr)
library(kableExtra)
library(car)
library(mathjaxr)
library(zoo)
rm(list=ls())

Data download

If downloading the stock-exchange data, we use the quantmod package command “getSymbols”. COmmand “Ad” enables extraction of just Adjusted closing prices of the day.

symbol_name <<- c("AAPL", "GOOG", "AMZN", "F", "T", "TQQQ")
#symbol_name <- "AAPL"

for (i in 1:length(symbol_name)) {
  prac <<- Ad(getSymbols(symbol_name[i], from = "2020-01-01", to = "2022-12-31",auto.assign=FALSE))
  if (i==1) {
    price <<-prac
  } else{
    price <<- merge(price,prac)
  }
}
rm(prac)    # prac is just temporary variable to remove
colnames(price) <- symbol_name  #puting the names of the shares

Analysis of one asset

In the above, we learned downloading of the adjusted closing prices. Now, take the first asset from the “symbol_name” and make a picture. We learly see that the stochastic properties of the assets coming from the two different periods significantly differ.

title <- paste(symbol_name[1], "Share" ) 
dolna_hranica <<- "2020-07-01"
horna_hranica <<- "2022-07-01"
#data(sample_matrix)
#sample.xts <- as.xts(price[,1])
events <- xts(c(" "," "),as.Date(c(dolna_hranica, horna_hranica)))
plot(price[,1], col="red", main=title)

addEventLines(events, srt=90, pos=2,col="blue")

The differences of the stochastic properties of the stock prices originating from 2 different preiods, we can depict also making some basic descriptive statistics.

subT_D <<- price["2020-01-01/2020-07-01",1]
subT_H <<- price["2022-07-01/2022-12-28",1]

# Get summary statistics using summary() function
summary_data <- summary(data.frame(cbind(subT_D,subT_H)))
summary_data <- summary_data[1:6,]

# Convert summary statistics to a table using kable()
colnames(summary_data) <- c("1st period","2nd period")
summary_data
 1st period      2nd period     
 Min.   :55.00   Min.   :125.8  
 1st Qu.:69.33   1st Qu.:142.7  
 Median :75.70   Median :149.0  
 Mean   :74.21   Mean   :149.9  
 3rd Qu.:78.59   3rd Qu.:155.3  
 Max.   :90.09   Max.   :174.0  
#summary_table <- kable(summary_data)

# Customize the table using kableExtra package
#summary_table %>%
#  kable_styling(full_width = FALSE) %>%
#  add_header_above(c("Summary Statistics" = 3))

Conversion of the level data time series to the returns

Previous empirical experience lead to the conclusion that the underlying probability distribution of the prices \(f_t(P) <> f_{t \pm i}(P)\) that means, the underlying time series is not stationary. That is, why we are unable to use the tools of the probability theory. The economists solve the problem by taking the capital asset price returns, i.e.

\[r_t = \frac{\Delta P_t}{P_{t-1}}\] where \(\Delta P_t = P_t - P_{t-1}\). Osborne came with an alternative formulation of the returns as \[r_t = \ln(P_t) - ln(P_{t-1}).\]

At first, look at the stochastic properties of the returns, i.e. compare their means and values recorded in two researched periods.


return_l <<- CalculateReturns(price, method="log")
return_p <- CalculateReturns(price, method="discrete")

for(i in 1:dim(return_l)[2]){   #imputation of the missing data
  return_l[,i][is.na(return_l[,i])] <- median(return_l[,i],na.rm = TRUE)
}

for(i in 1:dim(return_p)[2]){     #imputation of the missing data
  return_p[,i][is.na(return_p[,i])] <- median(return_p[,i],na.rm = TRUE)
}

subT_Dr_p <<- return_p["2020-01-01/2020-07-01",1]  #Rozdelenie return_p na dva podsubory
subT_Hr_p <<- return_p["2022-07-01/2022-12-28",1]

# graphing just the firs share
title <- paste(symbol_name[1], "Share" ) 
events <- xts(c(" "," "),as.Date(c(dolna_hranica, horna_hranica)))
plot(return_p[,1], col="red", main=title)

addEventLines(events, srt=90, pos=2,col="blue")



# Get summary statistics using summary() function


summary_data <- summary(data.frame(cbind(subT_Dr_p,subT_Hr_p)))
summary_data <- summary_data[1:6,]

# Convert summary statistics to a table using kable()
colnames(summary_data) <- c("1st period","2nd period")
summary_data
 1st period         2nd period        
 Min.   :-0.12865   Min.   :-0.05868  
 1st Qu.:-0.01154   1st Qu.:-0.01458  
 Median : 0.00148   Median :-0.00195  
 Mean   : 0.00212   Mean   :-0.00038  
 3rd Qu.: 0.01876   3rd Qu.: 0.01297  
 Max.   : 0.11981   Max.   : 0.08897  
#summary_table <- kable(summary_data)

# Customize the table using kableExtra package
#summary_table %>%
#  kable_styling(full_width = FALSE) %>%
#  add_header_above(c("Summary Statistics" = 3))

The distribution seems to be equally distributed. At least, let us test the equality of the mean returns as follows

\[H_0: M_{1st Period} = M_{2nd Period}\]

against

\[H_1: M_{1st Period} \neq M_{2nd Period},\] and also

\[H_0: \sigma^2_{1st Period} = \sigma^2_{2nd Period}\]

against

\[H_1: \sigma^2_{1st Period} \neq \sigma^2_{2nd Period},\]


resultM <<- wilcox.test(as.vector(subT_Dr_p), as.vector(subT_Hr_p))

# Perform Welch's test
resultSigma <<- var.test(as.vector(subT_Dr_p), as.vector(subT_Hr_p))

resultM

    Wilcoxon rank sum test with continuity correction

data:  as.vector(subT_Dr_p) and as.vector(subT_Hr_p)
W = 8575, p-value = 0.2239
alternative hypothesis: true location shift is not equal to 0
resultSigma

    F test to compare two variances

data:  as.vector(subT_Dr_p) and as.vector(subT_Hr_p)
F = 2.1779, num df = 125, denom df = 124, p-value = 1.881e-05
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 1.530087 3.099320
sample estimates:
ratio of variances 
          2.177942 

The p-value of the Wilcoxon test is 0.2238738 , while the result of the F test of the equal variances is 1.88054^{-5}. 1

Normally distributed returns?

Many theories (Markowitz, Black Schole, VaR) assume the normal distribution of the data. In reality, the assumption is often violated and the real distribution has have tails if comparing to the normal distribution. See result of the following Jarque-Berra normality test

test_result <- jarque.bera.test(price[,1])
print(test_result)

    Jarque Bera Test

data:  price[, 1]
X-squared = 56.844, df = 2, p-value = 4.534e-13

Histograms

dev.new() # new Plot
par(mfrow=c(2,3)) # arrange plots in a 3x2 grid
for(i in 1:dim(return_p)[2]){ # iterate through each symbol
  etf = return_p[,i] # load data into temp variable
  colnames( etf ) = colnames(return_p[,i]) # data header as the ticker
  chart.Histogram( etf, main=paste(" Return Distribution"), 
    breaks=15, methods=c("add.normal"), # Add the normal curve, and the VaR levels
    colorset=c("steelblue", "darkgreen", "navy") # colors for each (middle color not used)
  )
}

NA
NA

  chart.Histogram( return_p[,1], main= paste(" Return Distribution of ",colnames(return_p[,1])), 
    breaks=15, methods=c("add.normal", "add.risk"), # Add the normal curve, and the VaR levels
    colorset=c("steelblue", "darkgreen", "navy") # colors for each (middle color not used)
  )

Correlation structure of the portfolio assets

If speaking about facing the risks, the effective diversification of the investment is needed. In this way, we can reduce the unsystemic risks. See the correlations in our hypotetical portfolio in the following Figure.

#chart.Correlation(return_p[2:ncol(return_p)])
chart.Correlation(return_p)

Estimation of the variance-covariance matrix of the portfolio returns

Variance - covariance matrix is a cetral topic of the portfolio theory. It containes the return variances on the main diagonal, while besides there are the covariances. If speaking about the correlation matrix, then the diagonal terms are 1’s and out of the diagonal, there are the corresponding correlations.

var_covar_p <- cov(return_p)
var_covar_p
             AAPL         GOOG         AMZN            F            T         TQQQ
AAPL 0.0005405816 0.0003660638 0.0003793896 0.0003066270 0.0001693079 0.0011384589
GOOG 0.0003660638 0.0004681645 0.0003637674 0.0002879402 0.0001535278 0.0010263473
AMZN 0.0003793896 0.0003637674 0.0006053833 0.0002239328 0.0001049222 0.0010747271
F    0.0003066270 0.0002879402 0.0002239328 0.0009734113 0.0002524982 0.0008633685
T    0.0001693079 0.0001535278 0.0001049222 0.0002524982 0.0003164058 0.0004406637
TQQQ 0.0011384589 0.0010263473 0.0010747271 0.0008633685 0.0004406637 0.0030440217

  1. In reality, however, the test is called Wilcoxon test in the R software, but in reality it is Mann-Whitney test (test of means equality of two independent (unpaired) samples) ↩︎

