Fitting Flexible Two-Piece Distributions to the Daily Log-Returns of GME

In line with my dissertation, I have published the R code needed to fit flexible two-piece distributions to the daily log-returns to any financial asset. In this R markdown, we will investigate the daily logarithmic returns of GameStop’s stock over a 1 year period. As a brief overview, we will use both point and interval estimation, namely the method of maximum likelihood estimation and confidence intervals. We will begin by defining the likelihood function and log-likelihood function and optimise the latter to obtain the MLEs for each respective distribution. From this, we can generate a 95% (asymptotic) confidence interval for the parameters in question. Finally, we will implement the code I have written for value at risk (VaR) estimation.

We begin by loading the data set into question an begin defining several equations that we will later use.
Data
We also plot a histogram just to observe the shape of the log-returns density.

GME <- read.table("R_Diss\\GME.csv",header=T,sep=",")
attach(GME)
str(GME)

## 'data.frame':    254 obs. of  7 variables:
##  $ Date     : Factor w/ 254 levels "2020-01-28","2020-01-29",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Open     : num  4.3 4.22 4.1 3.91 3.88 4.03 4.15 4.2 4.11 3.85 ...
##  $ High     : num  4.3 4.32 4.12 3.94 3.98 4.25 4.41 4.3 4.13 4.1 ...
##  $ Low      : num  4.18 4.13 3.88 3.83 3.85 3.97 4.14 4.14 3.77 3.74 ...
##  $ Close    : num  4.21 4.13 3.93 3.84 3.95 4.07 4.18 4.14 3.81 3.94 ...
##  $ Adj.Close: num  4.21 4.13 3.93 3.84 3.95 4.07 4.18 4.14 3.81 3.94 ...
##  $ Volume   : int  2144900 1552600 3006500 2891700 2026200 3563100 2641700 1510300 2742300 2777000 ...

n=length(Adj.Close)
ret = log(Adj.Close[-1]/(Adj.Close[-n]-1))
hist(ret,breaks = 50, probability=T)

data <- log(Adj.Close[-1]/(Adj.Close[-n]-1))

Normal Distribution and VaR
We begin with the normal disribution. We first define the log-likelihood function (“log.lik”), and use the (“optim”) function to optimise the log-likelihood function. With these obtained maximum likelihood estimators, we can plot the corresponding distribution over the log-returns density.

#Normal 
log.lik <- function(par){
  mu <- par[1]; sigma <- exp(par[2])
  out <- -sum( dnorm(data, mu, sigma, log=TRUE) )
  return(out)
}
OPT <- optim(par = c(0,0), fn = log.lik)
MLE <- c(OPT$par[1], exp(OPT$par[2]))
MLE

## [1] 0.2069991 0.1403370

We use these maximum likelihood estimators obtained and plot it over the log-return densities:

fitn <- Vectorize(function(x) dnorm(x, MLE[1], MLE[2]))
hist(data, probability = T, breaks = 70)
curve(fitn,-1,1,add=T, n = 1000)

Using these MLEs we also can obtain the 5% value at risk (VaR) by taking the lower quantile of the respective distribution:

#VaR for Normal
alpha<- 0.05
conf <- 1 - alpha
VaRn95 <- qnorm(alpha, mean = MLE[1],sd = MLE[2], lower.tail = TRUE)
VaRn95

## [1] -0.02383483

We will also define a function for the confidence intervals, and obtain the CIs for the parameters of the normal distribution.

library(numDeriv)

Conf.Int <- function(FUN,MLE,level=0.95){
  sd.int <- abs(qnorm(0.5*(1-level)))
  HESS <- hessian(FUN,x=MLE)
  Fisher.Info <- solve(HESS, tol = 1e-18)
  Sigma <- (sqrt(diag(Fisher.Info)))
  U<- MLE + sd.int*Sigma
  L<- MLE - sd.int*Sigma
  C.I <- cbind(L,U,MLE, Sigma)
  rownames(C.I)<- paste0("par", seq_along(1:length(MLE)))
  colnames(C.I)<- c("Lower","Upper","Transf MLE", "Std. Error")
  return(C.I)
}

Conf.Int(FUN=log.lik,OPT$par, level = 0.95)

##           Lower      Upper Transf MLE  Std. Error
## par1  0.1897065  0.2242916  0.2069991 0.008822914
## par2 -2.0508352 -1.8765815 -1.9637084 0.044453272

Two piece normal and VaR
In a similar way, we will obtain the MLEs and CIs of the two-piece normal distribution. We use the “twopiece” package by [Rubio & Steel, 2020] (https://rpubs.com/FJRubio/TPD) for the two-piece normal distribution:

library(twopiece)
dtpn <- function(x, mu, sigma1, sigma2){
  val <- dtp3(x, mu = mu, par1 = sigma1, par2 = sigma2, FUN = dnorm, param = "tp", log = FALSE)
  return(val)}
ldtpn <- function(x, mu, sigma1, sigma2){
  val <- dtp3(x, mu = mu, par1 = sigma1, par2 = sigma2, FUN = dnorm, param = "tp", log = TRUE)
  return(val)}

log.liktpn <- function(par){
  mu <- par[1]; sigma1 <- exp(par[2]); sigma2 <- exp(par[3])
  out <- -sum(ldtpn(data, mu,sigma1,sigma2) )
  return(out)
}

OPTTPN <- optim(par = c(0,0,0), fn = log.liktpn)
MLETPN <- c(OPTTPN$par[1], exp(OPTTPN$par[2]), exp(OPTTPN$par[3]))
MLETPN

## [1] 0.2108834 0.1429300 0.1377128

fittpn <- Vectorize(function(x) dtpn(x, MLETPN[1], MLETPN[2],MLETPN[3]))
hist(data, probability = T, breaks = 80)
curve(fittpn,-1,1,add=T, n = 1000, col = 'red')

Conf.Int(FUN=log.liktpn,OPTTPN$par, level = 0.95)

##           Lower      Upper Transf MLE Std. Error
## par1  0.1765326  0.2452343  0.2108834 0.01752626
## par2 -2.1093765 -1.7814242 -1.9454003 0.08366282
## par3 -2.1510537 -1.8141156 -1.9825847 0.08595518

#VaR for dtpn
Vartpn95 <- qtp3(alpha, mu = MLETPN[1], par1 = MLETPN[2], par2 = MLETPN[3],FUN = dnorm, param = "tp")
Vartpn95

## [1] 0.2678356

t-distribution and VaR In this section, we will obtain the MLEs and CIs for the t-distribution:

#t distribution
log.likt <- function(par){
  mu <- par[1]; sigma <- exp(par[2]); nu <- exp(par[3])
  out <- -sum( dt( (data -  mu)/sigma, df = nu , log=TRUE) - log(sigma) )
  return(out)
}

OPTt <- optim(par = c(0,0,0), fn = log.likt)
MLEt <- c(OPTt$par[1], exp(OPTt$par[2]), exp(OPTt$par[3]))

From this we obtain the MLE and Confidence Interval:

MLEt

## [1] 0.2066455 0.1127515 6.2042186

Conf.Int(FUN=log.likt,OPTt$par[], level = 0.95)

##          Lower     Upper Transf MLE  Std. Error
## par1  0.190886  0.222405  0.2066455 0.008040705
## par2 -2.311717 -2.053422 -2.1825692 0.065892820
## par3  1.218013  2.432446  1.8252295 0.309809920

fitdt <- Vectorize(function(x) dt((x-MLEt[1])/MLEt[2], df = MLEt[3], log=FALSE)/MLEt[2])
hist(data, probability = T, breaks = 80)
curve(fitdt,-1,1,add=T, n = 1000, col = "red")

Here we obtain the 5% VaR:

Vart95 <- qt(alpha, MLEt[3])
Vart95

## [1] -1.931785

Two Piece t Distribution and VaR

library(twopiece)
dtpt <- function(x, mu, sigma1, sigma2, df){
  val <- dtp4(x, mu = mu, par1 = sigma1, par2 = sigma2,delta = df, FUN = dt, param = "tp", log = FALSE)
  return(val)}
ldtpt <- function(x, mu, sigma1, sigma2, df){
  val <- dtp4(x, mu = mu, par1 = sigma1, par2 = sigma2,delta = df, FUN = dt, param = "tp", log = TRUE)
  return(val)}
log.liktpt <- function(par){
  mu <- par[1]; sigma1 <- exp(par[2]); sigma2 <- exp(par[3]); df <- exp(par[4])
  out <- -sum(ldtpt(data,mu,sigma1,sigma2,df))
  return(out)}
OPTTPt <- optim(par = c(0,0,0,0), fn = log.liktpt)
MLETPt <- c(OPTTPt$par[1], exp(OPTTPt$par[2]), exp(OPTTPt$par[3]), exp(OPTTPt$par[4]))
MLETPt

## [1] 0.2150003 0.1172765 0.1069059 6.0138730

Conf.Int(FUN=log.liktpt, OPTTPt$par[], level=0.95)

##           Lower      Upper Transf MLE Std. Error
## par1  0.1746819  0.2553186  0.2150003 0.02057096
## par2 -2.3586839 -1.9277580 -2.1432209 0.10993210
## par3 -2.5137064 -1.9579069 -2.2358067 0.14178820
## par4  1.1732823  2.4148557  1.7940690 0.31673372

fittpt <- Vectorize(function(x) dtpt(x, MLETPt[1], MLETPt[2],MLETPt[3], MLETPt[4]))
hist(data, probability = T, breaks = 80)
curve(fittpt,-1,1,add=T, n = 1000, col = 'blue')

#VaR tpt
Vartpt <- qtp4(alpha,MLETPt[1],MLETPt[2],MLETPt[3],MLETPt[4], FUN = dt, param = "tp")
Vartpt

## [1] 0.2598304

TPSAS Distribution and VaR Here we will fit the two-piece sinh-arcsinh distribution. We will use the “TPSAS” package from [Rubio & Ogundimu, 2017] (https://rpubs.com/FJRubio/TPSAS) :

#TPSAS
library(TPSAS)

Again we begin by defining the log-likelihood function and optimise it to obtain the MLEs:

log.liktpsas <- function(par){
  if(par[2]>0 & abs(par[3])<1 & par[4]>0){
    return( - sum(dtpsas(data,par[1],par[2],par[3],par[4],param="eps",log=TRUE)))
  }
  else return(Inf)
}
OPTtpsas = optim(c(MLETPN,1),log.liktpsas, control = list(maxit=50000),method = c("Nelder-Mead"))
MLE.sas<- OPTtpsas$par
MLE.sas

## [1] 0.22862652 0.07742275 0.10087809 0.70388876

We use these MLEs and plot the distribution over the daily log-return density of GME.

fit.tpsas <- Vectorize(function(x) dtpsas(x, MLE.sas[1], MLE.sas[2], MLE.sas[3], MLE.sas[4], param="eps") )
hist(data, breaks = 80, probability = T)
curve(fit.tpsas, -1,1, add= T, lwd = 2, col = "blue", n = 1000)

summary(data)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.5821  0.1148  0.2145  0.2070  0.2809  0.8605

#VaR tpsas
Vartpsas95 <- qtpsas(alpha,MLE.sas[1], MLE.sas[2], MLE.sas[3], MLE.sas[4], param="eps")
Vartpsas95

## [1] -0.03336937

AIC Values
Finally we define the AIC function in order to calculate the AIC value and determine which function has the best relative fit:

AIC.n <- 2*OPT$value + 2*length(OPT$par)
AIC.TPN <- 2*OPTTPN$value + 2*length(OPTTPN$par)
AIC.t <- 2*OPTt$value + 2*length(OPTt$par)
AIC.tpt <- 2*OPTTPt$value + 2*length(OPTTPt$par)
AIC.sas <- 2*OPTtpsas$value + 2*length(OPTtpsas$par)

AIC.n

## [1] -271.6291

AIC.TPN

## [1] -269.6953

AIC.t

## [1] -295.9732

AIC.tpt

## [1] -294.1578

AIC.sas

## [1] -285.844

For further reading and the mathematical theory behind this, please refer to my undergraduate dissertation found here: