Object

This is a stock price analysis report for Toronto-Dominion Bank (TD) from 2000 to 2020. The objective of the report is to provide proper suggestions based two risk measures, Value at Risk and Expected shortfall. The report consists of three parts: 1) univariate data analysis, 2) multivariate data analysis, and 3) time series data analysis.

Univariate Data Analysis

Data Import and Model Transformation

The data set was downloaded from Yahoo, we loaded the data and plot the time series, to check the stationarity of the data, and found there is an increasing trend which indicated the non-stationarity. In order to better analyze the stock data, we calculated the return rate (current day’s adjusted close divide last day’s adjusted close) and applied log-transformation. In the time series for log-transformation, we could see there two huge fluctuation, and clearly, that was the two specific timeframe that we will discuss later on specifically.

TD <- read.csv(file = 'TD.csv')
plot.ts(TD$Adj.Close); 

n=length(TD$Adj.Close);
X=log(TD$Adj[2:n]/TD$Adj[1:n-1]); 
plot.ts(X);

Normality

Similar to most of the financial data, in the normal QQ plot we found that it is more like a curve than a straight line meaning that it does not meet the normality.

n=length(X);
library(evir);
qqnorm(X);

Model Fitting

We did a mean excess plot of the data and compared to the mean excess plot stimulated from Exponential distribution, Weibull distribution, and t distribution. After the comparison, we found our data fit with t-distribution with degree of freedom 3. For further confirmation, we put the sample quantile (data) and true quantile (simulated t with df=3) in a plot, we saw that they are at a good fit because of a straight line showed up.

pvector=seq(1,n,by=1)/(n+1);
true.quantiles=qt(pvector,3); # qt stands for quantiles of t distribution
sample.quantiles=sort(X);
plot(true.quantiles,sample.quantiles);

Value at Risk

In the next step, we estimated the mean and variance of the data, we obtained two results, the first one used the degree of freedom gathered by R, the other used the degree of freedom estimated by us. Then we calculated the Value at Risk under three circumstances, 1) using the df provided by R, 2) using the df estimated by us, 3) non-parametric method. Note t-distribution has the Value at Risk in the form $VaR_x(p) = m + s*T^{-1}(1-p) $, where \(T^{-1}\) is the quantile function for t distribution, and we used \(p=0.05\) and \(p=0.01\) for calculation. Appendix (1) provide the sample code for model 1, and model 3 (non-parametric).

VaR <- matrix(c(0.02326914, 0.04524155,0.04727082,0.04727082, 0.02296376,0.07998851),
              ncol=2,byrow=TRUE)
colnames(VaR) <- c("VaR(p=0.05)","VaR(p=0.01)")
rownames(VaR) <- c("model 1","model 2","model 3")
VaR <- as.table(VaR)
VaR
##         VaR(p=0.05) VaR(p=0.01)
## model 1  0.02326914  0.04524155
## model 2  0.04727082  0.04727082
## model 3  0.02296376  0.07998851

Therefore, we had the following results, since Value at Risk is used to estimate the amount of assets needed to cover the possible losses, and these assets are frozen. We would hence consider the smallest Value at Risk given by the three methods. We saw the non-parametric approach provided the lowest VaR, however, non-parametric approach does not rely on any distributions, it can be applied to any conditions even the parametric is valid. Thus, we consider the parametric approaches first and model 1 is the most appropriate model so far.

Backtesting

Next, we did the VaR backtesting to check the accuracy and effectivness of the given Value at Risk. We provided 2 confidence intervals for each model, 1) 5% significance level and 2) 1% significance level.

We calculated the Violation rate as follow:

Violation <- matrix(c( Violations.rate.1.005, Violations.rate.1.001,
                       Violations.rate.2.005, Violations.rate.2.001, 
                       Violations.rate.3.005, Violations.rate.3.001),ncol=2,byrow=TRUE)
colnames(Violation) <- c("p=0.05","p=0.01")
rownames(Violation) <- c("model 1","model 2","model 3")
as.table(Violation)
##               p=0.05       p=0.01
## model 1 0.0485282418 0.0087509944
## model 2 0.0079554495 0.0079554495
## model 3 0.0499204455 0.0009944312

Table 2 displayed the Confidence Interval for model 1,2,3 with 5% significance level.

Backtesting <- matrix(c( 0.04258019, 0.054456990, 005499005, 
                         0.010408730, 0.04389254, 0.05592849),ncol=2,byrow=TRUE)
colnames(Backtesting) <- c("Lower Bound","Upper Bound")
rownames(Backtesting) <- c("model 1","model 2","model 3")
Backtesting <- as.table(Backtesting)
Backtesting
##          Lower Bound  Upper Bound
## model 1 4.258019e-02 5.445699e-02
## model 2 5.499005e+06 1.040873e-02
## model 3 4.389254e-02 5.592849e-02

Table 3 displayed the confidence Interval with 1% significance level.

Backtesting2 <- matrix(c( 0.005361492, 0.012137017,0.004722467, 
                          0.011185268, -0.0001522397,0.0021407066),ncol=2,byrow=TRUE)
colnames(Backtesting2) <- c("Lower Bound","Upper Bound")
rownames(Backtesting2) <- c("model 1","model 2","model 3")
Backtesting2 <- as.table(Backtesting2)
Backtesting2
##           Lower Bound   Upper Bound
## model 1  0.0053614920  0.0121370170
## model 2  0.0047224670  0.0111852680
## model 3 -0.0001522397  0.0021407066

The graph below displayed the violation rate of these models.

plot.ts(X);
lines(VAR.1.005.est,type='l',col='blue')
lines(VAR.1.001.est,type='l',col='red')
lines(VAR.2.005.est,type='l',col='orange')
lines(VAR.2.001.est,type='l',col='purple')
lines(VAR.3.005.est,type='l',col='green')
lines(VAR.3.001.est,type='l',col='yellow')

All values are in the confidence interval, meaning that all three models pass the back testing of Value at Risk. Hence, model 1 is the most appropriate model.

Expected Shortfall

Then we estimated the Expected Shortfall for model 1:

ES.1.005=m1+s1*(3+(qt(1-p1,3))^2)/2*dt(qt(1-p1,3),3)/p1
ES.1.001=m1+s1*(3+(qt(1-p2,3))^2)/2*dt(qt(1-p2,3),3)/p2
ES.1.005;ES.1.001
## [1] 0.03697392
## [1] 0.1449315
  • For 5%, we have 0.0369739.
  • For 1%, we have 0.1449315.

Conclusion

  1. With 95% confidence, the worst expected log return rate of TD will not exceed 0.0232. If the loss does exceed the Value at Risk, the expected shortfall will be 0.0369
  2. With 99% confidence, the worst expected log return rate of TD will not exceed 0.04524. If the loss does exceed the Value at Risk, the expected shortfall will be 0.1449

Multivariate Comparison

Marginal & Bivariate Model Fitting

Considered TD as a portion of the whole finance industry, in the multivariate comparison, we would like to compare the small portion will the macro-background, so we chose S&P(Standard and Poor’s) to compare with.

SP<- read.csv(file = 'GSPC.csv')
n1=length(SP$Adj.Close)
Y=log(SP$Adj.Close[2:n1]/SP$Adj.Close[1:n1-1])

We compared the time series and the normal QQ plot of TD and S&P

n1=length(Y);
par(mfrow=c(2,2))
library(evir);library(MASS);
plot.ts(X);plot.ts(Y);
qqnorm(X);qqnorm(Y)

Their shape is more likely the same, both of them do not meet normality, and data from TD shows a wider range (-0.15,0.15), meaning that their fluctuation seems to be larger. Therefore, we concluded that the data from S&P also follows a t-distribution with 3 degree of freedom.

We applied MASS package again to estimate the parameters from TD and S&P, and obtained

fitdistr(X,'t')
##         m              s              df     
##   0.0007116962   0.0093570729   2.8345144204 
##  (0.0001626059) (0.0001737758) (0.1320291483)
fitdistr(Y,'t')
##         m              s              df     
##   0.0005857715   0.0069029154   2.5314693922 
##  (0.0001209456) (0.0001342558) (0.1131139323)

Note that, the copulas theorem demonstrates that two marginal data sets with same distribution do not guarantee that their bivariate distribution will have the same distribution, however, in our case, since the marginal degree of freedom from two data are quiet similar (2.83 and 2.53), and knowing that bivariate t-distribution forces all marginals to have the same degree of freedom, we can fit two models into bivariate t. By using ‘mvtnorm’ and ‘QRM’ obtained the following:

we obtained the mean vector

library(mvtnorm);library(QRM)
Z=cbind(X,Y)
fit<-fit.mst(Z); #fit biv t distr
mu.fit<-fit$mu; #estimated location vector
Sigma.fit<-as.matrix(fit$Sigma); #estimated scale matrix
df.fit <-fit$df; #estimated df
mu.fit
##            X            Y 
## 0.0009214342 0.0005083723

the auto covariance vector

Sigma.fit
##               X             Y
## X  8.664308e-05 -3.178662e-06
## Y -3.178662e-06  5.039172e-05

the degree of freedom

df.fit
## [1] 2.694494

The degree of freedom is close to the marginal df, which shows multivariate t is a good fit. In order to demonstrate bivariate t is the best option, we conducted the ‘Goodness of fit’ test between multivariate normal and multivariate t. We found the 3d density plot for our data and multivariate t are quiet similar, so as the contour plot. Therefore, multivariate t is the best option.

n=length(Z);
Znorm <- rmvnorm(n,mean=mu.fit,sigma=Sigma.fit*df.fit/(df.fit-2),method='chol');
Zt <- rmvt(n,sigma=Sigma.fit,df=df.fit);
Bivnorm.kde <- kde2d(Znorm[,1], Znorm[,2], n = 50);
Bivt.kde <- kde2d(Zt[,1], Zt[,2], n = 50);
Biv.kde <- kde2d(X, Y, n = 50);
par(mfrow=c(1,3));
persp(Biv.kde,phi=45, theta=30, shade=.1, border = NA);
persp(Bivnorm.kde, phi=45, theta=30, shade=.1, border = NA);
persp(Bivt.kde, phi = 45, theta = 30, shade = .1, border = NA);

par(mfrow=c(2,3));
contour(Biv.kde); contour(Bivnorm.kde); contour(Bivt.kde)

Bivariate Value at Risk

Then we calculated the Value at Risk for bivariate case. since both of the marginals are t-distribution, we can represent our data \((X_1, X_2)\) as \((\mu_1, \mu_2) + \sqrt{W}\times(Z_1,Z_2)\) where \((Z_1, Z_2)\) is multivariate normal with the scale matrix (again, that covariance matrix for the normal part is the scale matrix for the bivariate t part. Hence we obtained:

newalpha=df.fit
VAR=mu.fit[1]+mu.fit[2]+sqrt(Sigma.fit[1,1]+Sigma.fit[2,2]+2*Sigma.fit[1,2])*qt(1-p1,newalpha);
n=length(X);
Combined=X+Y;
Index=seq(1,n,by=1);
VAR.line=c(rep(VAR,n))
par(mfrow=c(1,1));
plot(Index,Combined,type="l");
points(Index,VAR.line,col="blue",type="l");

length(Combined[Combined>VAR])/n
## [1] 0.04514718

Time Series Analysis

As above, we applied log transformation on the return rate. But we calculated the difference this time (to compared with part 1, we could ignore the difference).

In univariate analysis, we made a strong assumption that the data is independent and identically distributed, but in most of the case, financial data has dependency. Indeed, in part one we checked the non-normality; by checking the ACF of our data, and compared to the ACF of our Squared data, we observed weak correlation in normal ACF but observed strong correlation in the squared ACF, meaning that the data is not independent identically distributed. Hence using time series analysis should be more appropriate.

library(tseries)
Q=diff(X)
par(mfrow=c(1,4))
plot.ts(Q)
acf(Q);acf(Q^2);pacf(Q^2)

Then we fit the data with ARCH(1), and calculated the two coefficients \(a_0=0.0002042\), \(a_1= 0.597489\), since a1 is smaller than 1, this is a stationary model. Hence we continued our calculation and extracted the fitted value from ARCH(1) model. The following is the volatility plot, we observed two high volatile part corresponding to the 2008 financial crisis and current COVID-19 impact.

fit.ARCH1<-garch(Q,order=c(0,1))
Coefficients<-fit.ARCH1$coef
alpha0=Coefficients[1]
alpha1=Coefficients[2]
past.prediction=fit.ARCH1$fitted.values
# Compare with
n=length(Q)
sigmat=past.prediction[2:n]
par(mfrow=c(1,1))
plot.ts(sigmat^2)

Without difference:

Q=(X)
fit.ARCH1<-garch(Q,order=c(0,1))
Coefficients<-fit.ARCH1$coef
alpha0=Coefficients[1]
alpha1=Coefficients[2]
past.prediction=fit.ARCH1$fitted.values
n=length(Q)
sigmat=past.prediction[2:n]
par(mfrow=c(1,1))
plot.ts(sigmat^2)

Backtesting and Value at Risk

Now we did the back testing and calculated the Value at Risk. In both 5% and 1% significance level, violation rate passed the Back testing

  1. Violation rate(5%) : 0.04019 Confidence Interval: [0.0347, 0.0456]
  2. Violation rate(1%): 0.01412 Confidence interval: [0.0098, 0.01842]

Hence the prediction of the next Value at Risk will is calculated as below:

next.VAR.005=sqrt(alpha0+alpha1*Q[n]^2)*qnorm(1-p1)+mean(Q)
next.VAR.001=sqrt(alpha0+alpha1*Q[n]^2)*qnorm(1-p2)+mean(Q)
next.VAR.005; next.VAR.001
##        a0 
## 0.1634653
##        a0 
## 0.2310014

Conclusion

  1. With 95% confidence, the biggest difference between log return rate of TD will not exceed 0.1634653.
  2. With 99% confidence, the biggest difference between log return rate of TD will not exceed 0.2310014.

If we would like to compare with results in part 1, we did the time series analysis again but not calculating the difference and we get

  1. With 95% confidence, the worst expected log return rate for TD will not exceed 0.1634
  2. With 99% confidence, the worst expected log return rate for TD will not exceed 0.2310

Since the data is not independent identically distributed, the results from time series should be more appropriate.