ContextBase Logo

Synopsis
References

Section 1a - Working Directory, Required Packages, and Session Information
Section 1b - Formatting Proprietary Data
Section 1c - Downloading Stock Ticker Data

Section 2a - Exploratory Data Analysis
Section 2b - VaR Estimation of the 10 Stocks Index
Section 2c - Estimated Shortfall (CvaR) 10 Stocks Index
Section 2d - Hill Estimation of 10 Stocks Index
Section 2e - Anderson-Darling Test of Normal Distribution
Section 2f - Table of Results

Section 3a - EVT Block Maxima Estimation of 10 Stocks Index
Section 3b - VaR Forecasting of Block Maxima
Section 3c - Estimated Shortfall (CvaR) of Block Maxima
Section 3d - Hill Estimation of Block Maxima
Section 3e - Anderson-Darling Test of Normal Distribution
Section 3f - Table of Results
Section 3g - 100 days GARCH Forecasting of Block Maxima

Section 4a - EVT Peaks-Over-Threshold Estimation - 10 Stocks Index
Section 4b - 100, and 500 days VaR Forecasting of POT
Section 4c - Estimated Shortfall (CvaR) of POT
Section 4d - Hill Estimation of POT
Section 4e - Anderson-Darling Test of Normal Distribution
Section 4f - Table of Results
Section 4g - 100 days GARCH Forecasting of Peaks-Over-Threshold

Section 5a - Table of Estimation Method Influence
Section 5b - Conclusions

Section 6 - R Language Source Code

Synopsis

This document is an examination of implementing Extreme Value Theory (EVT) to determine Value at Risk, (and Conditional VaR), of an index of 10 stocks, utilizing the R programming language. The combined data on 10 stocks is tested for normality with an Anderson-Darling Test, and the EVT methods of Block Maxima and Peak-Over-Threshold are used for VaR/CvaR estimation. Finally, Generalized AutoRegression of Conditional Heteroskadacity (GARCH) processing is used to predict future values of the index, 20 days into the future. This document will determine the influence on model results of different methods for calculating Risk Factors.

Extreme Value Theory, (initially developed by Fisher, Tippett, and Gnedenko), demonstrates that the distribution of the block-maxima of a sample of independent identically distributed, (iid), variables converges to one of the three Extreme Value distributions.

Renewed interest by Statisticians to modeling Extreme Values has recently occurred. Extreme Value Thoery has proven useful in a variety of Risk Factor cases. After Financial Market instabilities from 1999 - 2008, Extreme Value Analysis gained in effectiveness, as opposed to previous Value-at-Risk analysis. Extreme Values represent extreme fluctuations of a system. EVT offers the ability to model the relationships of probability of extreme events, magnitude, damage, and cost of protection.

References

https://arxiv.org/pdf/1310.3222.pdf
https://www.ma.utexas.edu/mp_arc/c/11/11-33.pdf
http://evt2013.weebly.com/uploads/1/2/6/9/12699923/penalva.pdf
http://pubs.sciepub.com/ijefm/2/5/4/

http://www4.stat.ncsu.edu/~mannshardt/st810EVA/Lectures/Lec3GEV.pdf
http://www.sfu.ca/~rjones/econ811/readings/McNeil%201999.pdf
http://www.bankofcanada.ca/wp-content/uploads/2010/01/wp00-20.pdf

Section 1a - Working Directory, Required Packages, and Session Information

To begin the analysis, the Working Directory is set to the folder containing the stock ticker comma separated values files. Then, the needed R programming language packages are installed and included in the package library. The R packages include packages for Extreme Value Theory functions, VaR functions, time series analysis, quantitative trade analysis, regression analysis, plotting, and html formatting.

	List of Required Packages
Required Packages	‘dplyr’ ‘magrittr’ ‘ggplot2’ ‘tseries’ ‘vars’ ‘evd’ ‘evir’ ‘POT’ ‘fBasics’ ‘fExtremes’ ‘quantmod’ ‘PerformanceAnalytics’ ‘rugarch’ ‘fGarch’ ‘nortest’ ‘knitr’

The session information below is for reference when running the required packages, and R code.

	Session Information
R Version	R version 3.3.1 (2016-06-21)
Platform	x86_64-w64-mingw32/x64 (64-bit)
Running	Windows 10 x64 (build 14393)
RStudio Citation	RStudio: Integrated Development for R.
RStudio Version	1.0.44

Section 1b - Formatting the Proprietary Data

The first file to explore for this analysis is “Data_CSV.csv”. This file contains stock ticker data for 15 German corporations listed on the DAX stock exchange, and market portfolio data for the DAX exchange. Ten corporations were selected from this data file, and the last ten years of stock price information for these corporations was downloaded from Google Finance.

##  [1] "Volkswagen VG"          "Adidas"                
##  [3] "Siemens"                "Deutsche Bank"         
##  [5] "Henkel"                 "SAP"                   
##  [7] "BASF"                   "Bayer"                 
##  [9] "Allianz"                "Daimler"               
## [11] "Fresenius"              "BMW"                   
## [13] "Commerzbank"            "Lufthansa"             
## [15] "Deutsche Post"          "DAX (Market portfolio)"

##  [1] "VOW3" "ADS"  "SIE"  "DB"   "HEN3" "SAP"  "BAS"  "BAYN" "ALV"  "DAI" 
## [11] "FMS"  "BMW"  "CBK"  "LHA"  "DPW"  "DAX"

Section 1c - Downloading Stock Ticker Data

The stock price data was downloaded and read into the R programming environment. The returns are calculated with “Open Price / Close Price - 1”, and the data from the ten corporations is combined in one dataframe, (with one column per corporation).

Each row of the resulting dataframe represents one business day of the 10 years of recorded stock prices. The means for each row in the dataframe is then calculated. A column of 10 years of dates is appended to the dataframe. A second dataframe containing only the row means, and date information, is also created.

Section 2a - Exploratory Data Analysis

A table of dataframe statistics is created containing the Minimum, Median, Mean, Maximum, Standard Deviation, 1% Quantile, 5% Quantile, 95% Quantile, 99% Quantile, and Number of Observations for each column, (or Corporation). The quantile percentages are for extreme values. A times series chart of the means of all the returns is also created.

Stats	Volkswagen	Adidas	Siemens	DeutscheBank	Henkel
Min.	-0.99	-0.98	-0.99	-0.99	-0.98
Median	3.67	18.72	10.94	23.52	19.28
Mean	9.29	15.71	10.10	21.35	16.73
Max.	38.42	21.96	17.24	32.08	24.06
Std. Dev.	9.76	6.02	4.89	7.28	6.97
1% Quantile	-0.81	-0.79	-0.83	-0.78	-0.83
5% Quantile	-0.07	-0.20	-0.16	0.12	-0.27
95% Quantile	25.22	20.69	16.42	28.18	23.01
99% Quantile	33.31	20.93	16.87	28.82	23.49
No. Of Observations	2528.00	2528.00	2528.00	2528.00	2528.00

Stats	SAP	BASF	Fresenius	Allianz	Daimler
Min.	-0.99	-0.98	-0.96	-0.99	-0.98
Median	17.39	14.08	23.27	4.06	16.75
Mean	15.20	13.01	23.24	7.86	14.93
Max.	22.74	20.13	181.02	20.31	23.66
Std. Dev.	6.23	5.04	14.10	6.93	6.04
1% Quantile	-0.18	0.00	0.62	-0.87	-0.21
5% Quantile	2.66	3.09	5.18	-0.31	2.99
95% Quantile	21.95	18.75	30.39	17.29	21.74
99% Quantile	22.31	19.07	31.23	17.78	22.19
No. Of Observations	2527.00	2527.00	2528.00	2528.00	2528.00

Section 2b - VaR Estimation of the 10 Stocks Index

To begin estimation of the future events implied by the data, an initial Value at Risk estimation is made. First the dataframe of all row means and date information is converted to time series format, then a VaR is calculated from this time series. Predictions are made from the VaR calculation of values 100, and 500, days in the future. In the subsequent predictions-only plot, the blue circles represent values for 100 days in the future, and red circles represents return values for 500 days.

Section 2c - Estimated Shortfall (CvaR) 10 Stocks Index

For comparison purposes, the Conditional Value at Risk (CvaR or Estimated Shortfall), of the 10 Stocks Index data is calculated. First, the times series of the data is utilized to find the maximum of the worst 0.95% drawdowns. Then, an Estimated Shortfall is calculated, via the “gaussian” method, and the results of both calculations are presented in table format.

Estimated Drawdown Table
	Max Drawdown	from	to	Estimated Shortfall
	21.10127	1933	2171	-9.508332

Section 2d - Hill Estimation of 10 Stocks Index

Because it is presumed that the 10 Stocks Index data is a heavy-tailed distribution, with infrequent extreme changes in the data, Hill Estimation is used for parametric estimation of the tail index. The objective is to verify that the 10 Stocks data is an Extreme Value Distribution. The plot generated by the Hill Estimation confirmes that is factual.

Section 2e - Anderson-Darling Test of Normal Distribution

The Anderson-Darling Test is mainly used on families of distributions, and is a powerful determinent of non-normality of a distribution. With a large sample size, (as in the 10 Stocks Index), a P-Value of less than 0.05 indicates that the distribution varies from normality. That is expected of an Extreme Value Distribution. Probability Value found with the Anderson-Darling Test was 3.7^-24, therefore confirming non-normality.

##   Statistic P-Value
## A  43.35969 3.7e-24

Section 2f - Table of Results

Finally, a Table of Results of the estimations of the future values of the 10 Stocks Index is presented. The Point Estimates and ranges of the 3 VaR estimations, (and Estimated Shortfall), are tabulated for comparison.

VaR Table of Results
Method	Point Estimate	lower	upper	CI
VaR 100 Days	14.1053	10.1452	18.0654	3.9601
VaR 500 Days	13.962	9.8798	18.0443	4.0823
ES	-9.5083	1933	2171	NA

Section 3a - EVT Block Maxima Estimation of 10 Stocks Index

The Block Maxima approach in Extreme Value Theory is the most basic method of EVT analysis. Block Maxima consists of dividing the observation period into non-overlapping periods of equal size, and restricts attention to the maximum observation in each period. The observations created follow domain of attraction conditions, approximately an Extreme Value Distribution. Parametric statistical methods for Extreme Value Distributions are then applied to those observations.

Extreme Value Theorists have developed the Generalized Extreme Value distribution. The GEV contains a family of continuous probability distributions, the Gumbel, Frechet and Weibull distributions, (also known as Type I, II and III Extreme Value Distributions).

In the following EVT Block Maxima analysis, the 10 Stocks Index data is fitted to a GEV. The resulting distribution is plotted. A time series plot is created in order to localize the extreme events on a timeline, from 2006 to 2016. Four plots of the order of Block Maxima data are then created. Finally, table of the Block Maxima analysis parameters is created from the gev() function.

Block Maxima Table
	Block Maxima Parameters
xi	0.1598266
sigma	0.0008699
mu	0.0027634
loc	0.0030876
scale	0.0012700
shape	0.0000035

Section 3b - VaR Forecasting of Block Maxima

In order to create a Value at Risk, (VaR), estimation from the Block Maxima data, the 10 Stocks Index GEV data is converted into a time series. A VaR estimation is made from the GEV time series data. Predictions of future values, (100 and 500 days in the future), are extrapolated from the VaR data. In the resulting plot, the blue circles represent values for 100 days in the future, and red circles represents return values for 500 days.

Section 3c - Estimated Shortfall (CvaR) of Block Maxima

The Conditional Value at Risk, (“CvaR” or “Estimated Shortfall”), of the 10 Stocks Index GEV data is calculated. First, the times series of the data is utilized to find the maximum of the worst 0.95% drawdowns. Then, an Estimated Shortfall is calculated via the “modified” method for extreme distributions, and the results of both calculations are presented in table format.

	Max Drawdown	from	to	Estimated Shortfall
	0.0980278	1243	1933	-2.19e-05

Section 3d - Hill Estimation of Block Maxima

The Hill Estimation, (used for parametric estimation of the tail index), verifies that the 10 Stocks GEV data is an Extreme Value Distribution.

Section 3e - Anderson-Darling Test of Normal Distribution

The Anderson-Darling Test is a powerful determinent of non-normality of large sample size distributions. If the P-Value is less than 0.05, the distribution varies from normality. An insignificant Probability Value of 3.7^-24 was found via this test.

##  Statistic P-Value
##   422.5289 3.7e-24

Section 3f - Table of Results

Finally, a Table of Results of the estimations of the future values of the 10 Stocks Index GEV is presented. The Point Estimates and ranges of the 3 GEV VaR estimations, (and GEV Estimated Shortfall), are tabulated for comparison.

Method	Point Estimate	lower	upper	CI
GEV VaR 100 Days	0.0038	-0.0023	0.0099	0.0061
GEV VaR 500 Days	0.0039	-0.0022	0.01	0.0061
GEV ES	-2.18696219987451e-05	1243	1933	NA

Section 3g - 100 days GARCH Forecasting of Block Maxima

Predictions are made for the Block Maxima EVT data via fitting the Block Maxima GEV distribution, (of the index of 10 stocks), to a GARCH(1,1), (Generalized Auto-Regressive Conditional Heteroskadacity), model. A table of the prediction formula parameters is displayed. An “Auto Correlation Function” (ACF) plot is created that displays the significant events over time. Then, an array of plots of the fitted model results is displayed. A prediction for the next 20 days, (of the stock index performance), is created. Lastly, the 20 day prediction is displayed in 2 plots.

## [1] 13440

Block Maxima Prediction Table
	GARCH Parameters
mu	0.0060439
ar1	0.9976487
ma1	-0.2595056
omega	-3.2076335
alpha1	0.5405849
beta1	0.7848842
gamma1	1.8662047
shape	2.1000002

Section 4a - Peaks-Over-Threshold Estimation - 10 Stocks Index

In the peaks-over-threshold approach in EVT, the initial observations that exceed a certain high threshold are selected. The probability distribution of those selected observations is approximately a generalized Pareto distribution. A maximum likelihood estimation (mle) is created by fitting a Generalized Pareto Distribution. The MLE statistics are presented in tabular form. The resulting estimation is then graphically diagnosed via MLE plotting.

Peaks-Over-Threshold Table
	POT Parameters
Fitted Values	14.3934587025131
Standard Error	0.286270436634011
Standard Error Type	observed
Log Likelihood	-9269.60418661959
Hessian	12.2024489428441
Threshold	0.35

Section 4b - VaR Forecasting of POT

A Value at Risk, (VaR), estimation from the POT data is created via converting the 10 Stocks Index MLE data into a time series. A VaR estimation is made from the MLE time series data. Predictions of future values, (100 and 500 days in the future), are extrapolated from the MLE VaR data. In the resulting plot, the blue circles represent values for 100 days in the future, and red circles represents return values for 500 days.

Section 4c - Estimated Shortfall (CvaR) Forecasting of POT

The Conditional Value at Risk, (“CvaR” or “Estimated Shortfall”), of the 10 Stocks Index MLE data is then calculated. The times series of the data is utilized to find the maximum of the worst 0.95% drawdowns. An Estimated Shortfall is calculated, via the “modified” method for extreme distributions, and the results of both calculations are presented in table format.

	Max Drawdown	from	to	Estimated Shortfall
	21.10127	1933	2171	-0.0523317

Section 4d - Hill Estimation of Peaks-Over-Threshold

The Hill Estimation, (used for parametric estimation of the tail index), verifies that the 10 Stocks MLE data is an Extreme Value Distribution.

Section 4e - Anderson-Darling Test of Normal Distribution

##  Statistic P-Value
##   43.35969 3.7e-24

Section 4f - Table of Results

Finally, a Table of Results of the estimations of the future values of the 10 Stocks Index MLE is presented. The Point Estimates and ranges of the 3 MLE VaR estimations, (and MLE Estimated Shortfall), are tabulated for comparison.

Method	Point Estimate	lower	upper	CI
MLE VaR 100 Days	14.1053	10.1452	18.0654	3.9601
MLE VaR 500 Days	13.962	9.8798	18.0443	4.0823
MLE ES	-0.0523	1933	2171	NA

Section 4g - 100 days GARCH Forecasting of Peaks-Over-Threshold

Predictions are made for the Peaks-Over-Threshold EVT data via fitting the MLE, (maximum likelihood estimation of the index of 10 stocks), to a GARCH(1,1), (Generalized Auto-Regressive Conditional Heteroskadacity), model. A table of the prediction formula parameters is displayed. An “Auto Correlation Function” (ACF) plot is created, that displays the significant events over time. Then, an array of plots of the fitted model results are displayed. Then a prediction for the next 20 days, (of the stock index performance), is created. Lastly, the 20 day prediction, (derived from Peaks-Over-Threshold EVT extimation), is displayed in 2 plots.

Peaks-Over-Threshold Prediction Table
	POT Parameters
mu	10.3337633
ar1	0.9963151
ma1	-0.2938808
omega	-0.1349384
alpha1	-0.1570853
beta1	0.8676732
gamma1	0.5664016
shape	2.6095115

Section 5a - Table of Estimation Method Influence

The following table assembles the results of the four methods of examining the Extreme Distribution 10 Stocks Index. The first column has the names of the four estimation methods. Statistics on VaR, ES, mu (or “shape”) statistic, and Anderson-Darling P-Values are presented.

Test	VaR	ES	mu	P-Value
VaR	0.3222	NA	NA	3.7e-24
ES	NA	-9.50833218037079	NA	3.7e-24
Block Maxima	0.0421	-2.18696219987451e-05	0.00276335504443396	3.7e-24
Peaks Over Threshold	0.3222	-0.0523317246992927	14.3934587025131	3.7e-24

Section 5b - Conclusions

After thorough examination of 10 years of stock returns of 10 Corporations, (listed on the German DAX stock exchange), the validity of characterising the changes in return percentage as an Extreme Value Distribution is confirmed. All Anderson-Darling Tests of the fitted values of the four analysis methods show insignificant probabilty that the distribution has normality, or all non-exteme values. The methods are in agreement regarding the Value at Risk within the returns data. The Block Maxima method produces a slight devergence of VaR estimation. Traditional VaR estimation, and POT estimation, produce the same Value at Risk. The 2 EVT methods predict lower Expected Shortfall, versus traditional CvaR estimation of stock return data. The Standard Q-Q Plots demonstrate that Peaks-Over-Threshold is the most reliable estimation method, in the case of the Index of 10 Stocks.

Section 6 - R Language Source Code

# Section 1a - Working Directory, Required Packages, and Session Information

# Set Working Directory to the directory with .csv, or .RData, files
# setwd(" ")

# Required Packages
# install.packages("dplyr")
# install.packages("magrittr")
# install.packages("ggplot2")
# install.packages("tseries")
# install.packages("vars")
# install.packages("evd")
# install.packages("evir")
# install.packages("POT")
# install.packages("fBasics")
# install.packages("fExtremes")
# install.packages("quantmod")
# install.packages("PerformanceAnalytics")
# install.packages("rugarch")
# install.packages("fGarch")
# install.packages("nortest")
# install.packages("knitr")

# Initialize libraries
library(dplyr)
library(magrittr)
library(ggplot2)
library(tseries)
library(vars)
library(evd)
library(evir)
library(POT)
library(fBasics)
library(fExtremes)
library(quantmod)
library(PerformanceAnalytics)
library(rugarch)
library(fGarch)
library(nortest)
library(knitr)

PackageTable <- data.frame(matrix(nrow=1, ncol=1))
rownames(PackageTable) <- "Required Packages"
colnames(PackageTable) <- "List of Required Packages"
PackageTable[1,] <- paste("'dplyr'", "'magrittr'", "'ggplot2'",
                          "'tseries'", "'vars'", "'evd'", "'evir'",
                          "'POT'", "'fBasics'", "'fExtremes'",
                          "'quantmod'", "'PerformanceAnalytics'",
                          "'rugarch'", "'fGarch'", "'nortest'",
                          "'knitr'")
kable(PackageTable)

session <- sessionInfo()
SessionTable <- data.frame(matrix(nrow=5, ncol=1))
rownames(SessionTable) <- c("R Version", "Platform", "Running",
                         "RStudio Citation","RStudio Version")
colnames(SessionTable) <- "Session Information"
SessionTable[1,] <- session$R.version$version.string
SessionTable[2,] <- session$platform
SessionTable[3,] <- session$running
SessionTable[4,] <- "RStudio: Integrated Development for R."
SessionTable[5,] <- "1.0.44"
kable(SessionTable)

## Section 1b - Formatting the Proprietary Data

stockData <- read.csv("Data_CSV.csv", header = F)

companyList <- c("Volkswagen VG", "Adidas", "Siemens",
                 "Deutsche Bank", "Henkel", "SAP", "BASF", "Bayer",
                 "Allianz", "Daimler", "Fresenius", "BMW",
                 "Commerzbank", "Lufthansa", "Deutsche Post",
                 "DAX (Market portfolio)")
companyList

stockSymbols <- c("VOW3", "ADS", "SIE", "DB", "HEN3", "SAP", "BAS",
                  "BAYN", "ALV", "DAI", "FMS", "BMW", "CBK", "LHA",
                  "DPW", "DAX")
stockSymbols

## Section 1c - Downloading Stock Ticker Data

stockData_Volkswagen <- read.csv("vow3.csv")
stockData_Adidas <- read.csv("ads.csv")
stockData_Siemens <- read.csv("sie.csv")
stockData_DeutscheBank <- read.csv("dbk.csv")
stockData_Henkel <- read.csv("hen3.csv")
stockData_SAP <- read.csv("sap.csv")
stockData_BASF <- read.csv("bas.csv")
stockData_Fresenius <- read.csv("fre.csv")
stockData_Allianz <- read.csv("alv.csv")
stockData_Daimler <- read.csv("dai.csv")

returnsFunction <- function(x) {
  as.numeric(x$Open) / as.numeric(x$Close) - 1
}

returnsData_Volkswagen <- returnsFunction(stockData_Volkswagen)
returnsData_Adidas <- returnsFunction(stockData_Adidas)
returnsData_Siemens <- returnsFunction(stockData_Siemens)
returnsData_DeutscheBank <- returnsFunction(stockData_DeutscheBank)
returnsData_Henkel <- returnsFunction(stockData_Henkel)
returnsData_SAP <- returnsFunction(stockData_SAP)
returnsData_BASF <- returnsFunction(stockData_BASF)
returnsData_Fresenius <- returnsFunction(stockData_Fresenius)
returnsData_Allianz <- returnsFunction(stockData_Allianz)
returnsData_Daimler <- returnsFunction(stockData_Daimler)

allData <- cbind(returnsData_Volkswagen, returnsData_Adidas,
                 returnsData_Siemens, returnsData_DeutscheBank,
                 returnsData_Henkel, returnsData_SAP, returnsData_BASF,
                 returnsData_Fresenius, returnsData_Allianz,
                 returnsData_Daimler)
allData <- data.frame(allData)
names(allData) <- c("Volkswagen", "Adidas", "Siemens", "DeutscheBank",
                    "Henkel", "SAP", "BASF", "Fresenius",
                    "Allianz", "Daimler")

allData_Means <- rowMeans(allData)

names(stockData_Adidas) <- c("Date", "Open", "High", "Low", "Close", "Volume")
allData$Time <- as.Date(stockData_Adidas$Date[1:2528], format="%d-%b-%y")

allDF <- data.frame(allData$Time, allData_Means)
names(allDF) <- c("Time", "Observations")

# save(allData, file = "allData.RData")
# load("allData.RData")

# Section 2a - Exploratory Data Analysis

tableStats <- matrix(nrow=10, ncol=11)
tableStats <- data.frame(tableStats)
names(tableStats) <- c("Stats", "Volkswagen", "Adidas", "Siemens",
                     "DeutscheBank", "Henkel", "SAP", "BASF", "Fresenius",
                     "Allianz", "Daimler")

tableStats$Stats <- c("Min.", "Median", "Mean", "Max.", "Std. Dev.",
                    "1% Quantile", "5% Quantile", "95% Quantile",
                    "99% Quantile", "No. Of Observations")

statsFunction <- function(x) {
  tableStats$x <- c(min(x), median(x), mean(x),
                    max(x), sd(x), quantile(x, .01),
                    quantile(x, .05), quantile(x, .95),
                    quantile(x, .99), length(x))
  tableStats$x
}

tableStats$Volkswagen <- round(statsFunction(returnsData_Volkswagen), 2)
tableStats$Adidas <- round(statsFunction(returnsData_Adidas), 2)
tableStats$Siemens <- round(statsFunction(returnsData_Siemens), 2)
tableStats$DeutscheBank <- round(statsFunction(returnsData_DeutscheBank), 2)
tableStats$Henkel <- round(statsFunction(returnsData_Henkel), 2)
tableStats$SAP <- round(statsFunction(returnsData_SAP), 2)
tableStats$BASF <- round(statsFunction(returnsData_BASF), 2)
tableStats$Fresenius <- round(statsFunction(returnsData_Fresenius), 2)
tableStats$Allianz <- round(statsFunction(returnsData_Allianz), 2)
tableStats$Daimler <- round(statsFunction(returnsData_Daimler), 2)

tableStats$mean <- rowMeans(tableStats[,2:11])

firstTable <- cbind(tableStats[,1], tableStats[,2:6])
colnames(firstTable) <- c("Stats", "Volkswagen", "Adidas", "Siemens",
                          "DeutscheBank", "Henkel")

secondTable <- cbind(tableStats[,1], tableStats[,7:11])
colnames(secondTable) <- c("Stats", "SAP", "BASF", "Fresenius",
                           "Allianz", "Daimler")
firstTable
secondTable

tsData_All <- ts(allData_Means, start=c(2006, 11, 26), frequency=250)
plot.ts(tsData_All, ylab="Returns", main="Time Series of All Data")

## Section 2b - VaR Estimation of the 10 Stocks Index

# VaR estimation of Block Maxima data
allDF_var_ts <- ts(allDF)
allDF_var.2c <- VAR(allDF_var_ts, p = 2, type = "const")

# Predict the values for the next 125, 250, and 500 days
allDF_var.prd.100 <- predict(allDF_var.2c, n.ahead = 100, ci = 0.9)
allDF_var.prd.500 <- predict(allDF_var.2c, n.ahead = 500, ci = 0.9)

plot(allDF_var.prd.100)
plot(allDF_var.prd.500)

plot(allDF_var.prd.500$fcst$Time[1:1200],
     allDF_var.prd.500$fcst$Observations[1:1200], col='red',
     main="Plot of VaR Forecasts", xlab = "Time",
     ylab = "Observations")
points(allDF_var.prd.100$fcst$Time[1:1200],
       allDF_var.prd.100$fcst$Observations[1:1200], col='blue')
legend('topleft', legend=c('100 Days = blue', '500 Days = red'),
       col=c('blue','red'))

## Section 2c - Estimated Shortfall (CvaR) 10 Stocks Index

# The conditional drawdown is the the mean of the worst 0.95% drawdowns
mdd <- maxdrawdown(allDF_var_ts[,2])

estSF <- ES(ts(allDF[1:2528, 2, drop=FALSE]), p=0.95, method="gaussian",
            portfolio_method = "component", invert = F, operational = F)

estSFtable <- data.frame(mdd, estSF$ES)
rownames(estSFtable) <- ""
colnames(estSFtable) <- c("Max Drawdown", "from", "to", "Estimated Shortfall")
estSFtable

## Section 2d - Hill Estimation of 10 Stocks Index

hill(allDF$Observations, option="xi", start=15, end=45)

## Section 2e - Anderson-Darling Test of Robustness

adtest <- ad.test(allDF$Observations)
adtestDF <- data.frame(adtest$statistic, adtest$p.value)
names(adtestDF) <- c("Statistic", "P-Value")
adtestDF

## Section 2f - Table of Results

VaR_ES_Results <- matrix(nrow = 3, ncol = 5)
VaR_ES_Results <- data.frame(VaR_ES_Results)
names(VaR_ES_Results) <- c("Method","Point Estimate","lower","upper","CI")

VaR_ES_Results[1,] <- c("VaR 100 Days",
round(mean(allDF_var.prd.100$fcst$Observations[,1]), 4),
round(mean(allDF_var.prd.100$fcst$Observations[,2]), 4),
round(mean(allDF_var.prd.100$fcst$Observations[,3]), 4),
round(mean(allDF_var.prd.100$fcst$Observations[,4]), 4))

VaR_ES_Results[2,] <- c("VaR 500 Days",
round(mean(allDF_var.prd.500$fcst$Observations[,1]), 4),
round(mean(allDF_var.prd.500$fcst$Observations[,2]), 4),
round(mean(allDF_var.prd.500$fcst$Observations[,3]), 4),
round(mean(allDF_var.prd.500$fcst$Observations[,4]), 4))

VaR_ES_Results[3,] <- c("ES", round(estSFtable[1], 4),
                        estSFtable[2], estSFtable[3],
                        round(estSFtable[4], 4))
VaR_ES_Results

# Section 3a - EVT Block Maxima Estimation - 10 Stocks Index

allGEV <- dgev(allData_Means, xi=0.8, mu=0)
allGEVDF <- data.frame(allDF$Time, allGEV)
names(allGEVDF) <- c("Time", "Returns")

plot(allGEVDF, col="green", pch=19, cex=0.8, ylab="Returns",
     main="Plot of Extreme Value Distribution - All Data")

tsGEVDF <- ts(allGEVDF, start=c(2006, 11, 26), frequency=250)
plot.ts(tsGEVDF, col="green", pch=19, cex=0.8, ylab="Data",
     main="Times Series Plot of Extreme Value Distribution")

par(mfrow = c(2,2))
plot(density(allGEVDF$Returns), xlab="Returns",
     main="Block Maxima of All Data", lwd=2)
hist(allGEVDF$Returns, xlab="Returns",
     main="Histogram of Block Maxima")
plot(allGEVDF[1401:2528,], xlab="Time", ylab="Observations",
     main="Left Tail Plot of Block Maxima", lwd=2)
plot(allGEVDF[1:1400,], xlab="Time", ylab="Observations",
     main="Right Tail Plot of Block Maxima", lwd=2)
par(mfrow = c(1,1))

all_BM_GEV <- gev(as.numeric(allGEV))
all_BM_FGEV <- fgev(allGEV, std.err = FALSE)

BlockMaximaTable <- matrix(nrow=6, ncol=1)
BlockMaximaTable <- data.frame(BlockMaximaTable)
BlockMaximaTable[1:3,] <- all_BM_GEV$par.ests[1:3]
BlockMaximaTable[4:6,] <- all_BM_FGEV$param[1:3]
rownames(BlockMaximaTable) <- c(names(all_BM_GEV$par.ests[1:3]),
                                names(all_BM_FGEV$param[1:3]))
colnames(BlockMaximaTable) <- "Block Maxima Parameters"
BlockMaximaTable

## Section 3b - VaR Forecasting of Block Maxima

# VaR estimation of Block Maxima data
allGEV_var_ts <- ts(allGEVDF)
allGEV_var.2c <- VAR(allGEV_var_ts, p = 2, type = "const")

# Predict the values for the next 125, 250, and 500 days
allGEV_var.prd.100 <- predict(allGEV_var.2c, n.ahead = 100, ci = 0.9)
allGEV_var.prd.500 <- predict(allGEV_var.2c, n.ahead = 500, ci = 0.9)

plot(allGEV_var.prd.100)
plot(allGEV_var.prd.500)

plot(allGEV_var.prd.500$fcst$Time[1:1200],
     allGEV_var.prd.500$fcst$Returns[1:1200], col='red',
     main="Plot of GEV VaR Forecasts", xlab = "Time",
     ylab = "Returns")
points(allGEV_var.prd.100$fcst$Time[1:1200],
       allGEV_var.prd.100$fcst$Returns[1:1200], col='blue')
legend('topleft', legend=c('100 Days = blue', '500 Days = red'),
       col=c('blue','red'))

## Section 3c - Estimated Shortfall (CvaR) Forecasting of Block Maxima

# The conditional drawdown is the the mean of the worst 0.95% drawdowns
mddGEV <- maxdrawdown(allGEV_var_ts[,2])

# CvaR (Expected Shortfall) Estmation
estSFGEV <- ES(ts(allGEV), p=0.95, method="modified",
               portfolio_method = "component", weights = ts(allGEV[2]),
               invert = F, operational = F)

estSFGEVtable <- data.frame(mddGEV, estSFGEV$MES)
rownames(estSFGEVtable) <- ""
colnames(estSFGEVtable) <- c("Max Drawdown", "from", "to", "Estimated Shortfall")
estSFGEVtable

## Section 3d - Hill Estimation of Block Maxima

hill(allGEVDF$Returns, option="xi", start=15 , end=45)

## Section 3e - Anderson-Darling Test of Robustness

adtestGEV <- ad.test(allGEVDF$Returns)
adtestGEVdf <- data.frame(adtestGEV$statistic, adtestGEV$p.value)
names(adtestGEVdf) <- c("Statistic", "P-Value")
rownames(adtestGEVdf) <- ""
adtestGEVdf

## Section 3f - Table of Results

GEV_Results <- matrix(nrow = 3, ncol = 5)
GEV_Results <- data.frame(GEV_Results)
names(GEV_Results) <- c("Method","Point Estimate","lower","upper","CI")

GEV_Results[1,] <- c("GEV VaR 100 Days",
round(mean(allGEV_var.prd.100$fcst$Returns[,1]), 4),
round(mean(allGEV_var.prd.100$fcst$Returns[,2]), 4),
round(mean(allGEV_var.prd.100$fcst$Returns[,3]), 4),
round(mean(allGEV_var.prd.100$fcst$Returns[,4]), 4))

GEV_Results[2,] <- c("GEV VaR 500 Days",
round(mean(allGEV_var.prd.500$fcst$Returns[,1]), 4),
round(mean(allGEV_var.prd.500$fcst$Returns[,2]), 4),
round(mean(allGEV_var.prd.500$fcst$Returns[,3]), 4),
round(mean(allGEV_var.prd.500$fcst$Returns[,4]), 4))

GEV_Results[3,] <- c("GEV ES", estSFGEVtable[1],
                     estSFGEVtable[2], estSFGEVtable[3],
                     "NA")
GEV_Results

## Section 3g - 100 days GARCH(1,1) Forecasting of Block Maxima

spec = ugarchspec()
nrow(expand.grid(GARCH = 1:14, VEX = 0:1, VT = 0:1, Mean = 0:1, ARCHM = 0:2, ARFIMA = 0:1, MEX = 0:1, DISTR = 1:10))
spec = ugarchspec(variance.model = list(model = 'eGARCH',
                                        garchOrder = c(1, 1)),
                  distribution = 'std')

# Fit models with Generalized Auto-Regressive Conditional Heteroskadacity
all.fitted.model = ugarchfit(spec, allGEV, solver = 'hybrid')

coefBMtable <- data.frame(coef(all.fitted.model))
names(coefBMtable) <- "GARCH Parameters"
coefBMtable

plot(all.fitted.model, which=10)
par(mfrow = c(2,2))
hist(all.fitted.model@fit$residuals)
plot(all.fitted.model@fit$residuals)
plot(all.fitted.model, which=8)
plot(all.fitted.model, which=9)
par(mfrow = c(1,1))

forc1 = ugarchforecast(all.fitted.model, n.ahead = 100)

par(mfrow=c(1,2))
plot(forc1, which = 1)
plot(forc1, which = 3)
par(mfrow=c(1,1))

# Section 4a - Peaks-Over-Threshold Estimation - 10 Stocks Index

# Determine threshold
par(mfrow = c(1,2))
tcplot(allDF$Observations, u.range=c(0.3, 0.35))
par(mfrow = c(1,1))

# Fit the Generalizaed Pareto Distribution
mle <- fitgpd(allDF$Observations, thresh=.35, shape=0, est="mle")

POTtable <- matrix(nrow=6, ncol=1)
POTtable <- data.frame(POTtable)
POTtable[1,] <- mle$fitted.values
POTtable[2,] <- mle$std.err
POTtable[3,] <- mle$std.err.type
POTtable[4,] <- mle$logLik
POTtable[5,] <- mle$hessian
POTtable[6,] <- mle$threshold
rownames(POTtable) <- c("Fitted Values", "Standard Error",
                        "Standard Error Type", "Log Likelihood",
                        "Hessian", "Threshold")
colnames(POTtable) <- "POT Parameters"
POTtable

# graphic diagnostics for the fitted model
par(mfrow=c(2,2))
plot(mle, npy = 1, which = 1)
plot(mle, npy = 1, which = 4)
plot(mle$data[1:1400], xlab="Time", ylab="Observations",
     main="Left Tail Plot of POT", lwd=2)
plot(mle$data[1401:2528], xlab="Time", ylab="Observations",
     main="Right Tail Plot of POT", lwd=2)
par(mfrow=c(1,1))

## Section 4b - VaR Forecasting of POT

newMLE <- data.frame(allDF$Time, mle$data)
mle_var_ts <- ts(newMLE)
mle.var.2c <- VAR(mle_var_ts, p = 2, type = "const")

# Predict the values for the next 125, 250, and 500 days
mle_var.prd.100 <- predict(mle.var.2c, n.ahead = 100, ci = 0.9)
mle_var.prd.500 <- predict(mle.var.2c, n.ahead = 500, ci = 0.9)

plot(mle_var.prd.100)
plot(mle_var.prd.500)

plot(mle_var.prd.500$fcst$allDF.Time[1:200],
     mle_var.prd.500$fcst$mle.data[1:200], col='red',
     main="Plot of MLE VaR Forecasts", xlab = "Time",
     ylab = "Returns")
points(mle_var.prd.100$fcst$allDF.Time[1:200],
       mle_var.prd.100$fcst$mle.data[1:200], col='blue')
legend('topleft', legend=c('100 Days = blue', '500 Days = red'),
       col=c('blue','red'))

## Section 4c - Estimated Shortfall (CvaR) Forecasting of POT

# The conditional drawdown is the the mean of the worst 0.95% drawdowns
mddMLE <- maxdrawdown(mle_var_ts[,2])

# CvaR (Expected Shortfall) Estmation
estSFmle <- ES(ts(mle$data), p=0.95, method="modified",
               portfolio_method = "component", weights = ts(allGEV[2]),
               invert = F, operational = F)

estSFMLEtable <- data.frame(mddMLE, estSFmle$MES)
rownames(estSFMLEtable) <- ""
colnames(estSFMLEtable) <- c("Max Drawdown", "from", "to", "Estimated Shortfall")
estSFMLEtable

## Section 4d - Hill Estimation of Peaks-Over-Threshold
hill(mle$data, option="xi", start=15 , end=45)

## Section 4e - Anderson-Darling Test of Robustness
adtestMLE <- ad.test(mle$data)
adtestMLEdf <- data.frame(adtestMLE$statistic, adtestMLE$p.value)
names(adtestMLEdf) <- c("Statistic", "P-Value")
rownames(adtestMLEdf) <- ""
adtestMLEdf

## Section 4f - Table of Results  
MLE_Results <- matrix(nrow = 3, ncol = 5)
MLE_Results <- data.frame(MLE_Results)
names(MLE_Results) <- c("Method","Point Estimate","lower","upper","CI")

MLE_Results[1,] <- c("MLE VaR 100 Days",
round(mean(mle_var.prd.100$fcst$mle.data[,1]), 4),
round(mean(mle_var.prd.100$fcst$mle.data[,2]), 4),
round(mean(mle_var.prd.100$fcst$mle.data[,3]), 4),
round(mean(mle_var.prd.100$fcst$mle.data[,4]), 4))

MLE_Results[2,] <- c("MLE VaR 500 Days",
round(mean(mle_var.prd.500$fcst$mle.data[,1]), 4),
round(mean(mle_var.prd.500$fcst$mle.data[,2]), 4),
round(mean(mle_var.prd.500$fcst$mle.data[,3]), 4),
round(mean(mle_var.prd.500$fcst$mle.data[,4]), 4))

MLE_Results[3,] <- c("MLE ES", estSFMLEtable[1],
                     estSFMLEtable[2], estSFMLEtable[3],
                     "NA")
MLE_Results

## Section 4g - 100 days GARCH(1,1) Forecasting of POT

# Fit models with Generalized Auto-Regressive Conditional Heteroskadacity
pot.fitted.model = ugarchfit(spec, mle$data, solver = 'hybrid')

coefPOTtable <- data.frame(coef(pot.fitted.model))
names(coefPOTtable) <- "POT Parameters"
coefPOTtable

par(mfrow = c(2,2))
plot(pot.fitted.model, which=8)
plot(pot.fitted.model, which=3)
plot(pot.fitted.model, which=7)
plot(pot.fitted.model, which=9)
par(mfrow = c(1,1))

forc1 = ugarchforecast(pot.fitted.model, n.ahead = 100)

par(mfrow=c(1,2))
plot(forc1, which = 1)
plot(forc1, which = 3)
par(mfrow=c(1,1))

# Section 5a - Table of Estimation Method Influence
All_Results <- matrix(nrow = 4, ncol = 5)
All_Results <- data.frame(All_Results)
colnames(All_Results) <- c("Test","VaR","ES","mu","P-Value")

All_Results[1,] <- c("VaR",
round(mean(allDF_var.2c$varresult$Observations$coefficients),4),
"NA", "NA", adtest$p.value)

All_Results[2,] <- c("ES", "NA", estSF$ES, "NA", adtest$p.value)

All_Results[3,] <- c("Block Maxima",
round(mean(allGEV_var.2c$varresult$Returns$coefficients),4),
estSFGEV$MES, all_BM_GEV$par.ests[3], adtestGEV$p.value)

All_Results[4,] <- c("Peaks Over Threshold", 
round(mean(mle.var.2c$varresult$mle.data$coefficients), 4),
estSFmle$MES, mle$fitted.values, adtestMLE$p.value)

All_Results

Diversified Portfolio Predictive Analytics

ContextBase, http://contextbase.github.io

All programming by John Akwei, ECMp ERMp Data Scientist

December 14, 2016

Table of Contents

Synopsis

References

Section 1a - Working Directory, Required Packages, and Session Information

Section 1b - Formatting the Proprietary Data

Section 1c - Downloading Stock Ticker Data

Section 2a - Exploratory Data Analysis

Section 2b - VaR Estimation of the 10 Stocks Index

Section 2c - Estimated Shortfall (CvaR) 10 Stocks Index

Section 2d - Hill Estimation of 10 Stocks Index

Section 2e - Anderson-Darling Test of Normal Distribution

Section 2f - Table of Results

Section 3a - EVT Block Maxima Estimation of 10 Stocks Index

Section 3b - VaR Forecasting of Block Maxima

Section 3c - Estimated Shortfall (CvaR) of Block Maxima

Section 3d - Hill Estimation of Block Maxima

Section 3e - Anderson-Darling Test of Normal Distribution

Section 3f - Table of Results

Section 3g - 100 days GARCH Forecasting of Block Maxima

Section 4a - Peaks-Over-Threshold Estimation - 10 Stocks Index

Section 4b - VaR Forecasting of POT

Section 4c - Estimated Shortfall (CvaR) Forecasting of POT

Section 4d - Hill Estimation of Peaks-Over-Threshold

Section 4e - Anderson-Darling Test of Normal Distribution

Section 4f - Table of Results

Section 4g - 100 days GARCH Forecasting of Peaks-Over-Threshold

Section 5a - Table of Estimation Method Influence

Section 5b - Conclusions

Section 6 - R Language Source Code