price = tq_get_cafef(symbol = 'VNM', from = '2010-01-01', to = '2022-01-01', src = 'CAFEF')
#VNM from 2010-01-01 to 2022-01-01 already cloned
glimpse(as.data.frame(price))
Rows: 2,994
Columns: 14
$ date <date> 2021-12-31, 2021-12-30, 2021-12-29, 2021-12-28, 2021-1…
$ change.percent <chr> "1.10 (1.29 %)", "0.10 (0.12 %)", "-0.30 (-0.35 %)", "-…
$ open <dbl> 85.5, 85.3, 85.5, 86.2, 86.0, 84.7, 85.4, 85.7, 85.8, 8…
$ high <dbl> 87.5, 85.6, 85.6, 86.2, 86.4, 86.0, 85.4, 86.2, 86.0, 8…
$ low <dbl> 85.3, 85.1, 85.1, 85.2, 85.7, 84.7, 84.5, 85.3, 85.5, 8…
$ close <dbl> 86.4, 85.3, 85.2, 85.5, 86.1, 86.0, 84.7, 85.4, 85.5, 8…
$ adjusted <dbl> 84.98, 83.90, 83.80, 84.10, 84.69, 84.59, 83.31, 84.00,…
$ match.volume <dbl> 2325100, 893100, 945400, 1544400, 1138300, 1148400, 214…
$ reconcile.volume <dbl> 143400, 147400, 20000, 393400, 147400, 280000, 471100, …
$ match.value <dbl> 200645000000, 76247000000, 80583000000, 132401000000, 9…
$ reconcile.value <dbl> 13078080000, 13428140000, 1751200000, 34511340000, 1179…
$ volume.round1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ volume.round2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ volume.round3 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
df <- price %>%
arrange(date) %>%
select(date, adjusted)
tail(df)
ggplot(df, aes(date, adjusted, group = 1, color = adjusted)) + geom_line() +labs(x = "Date", y = "Price", title = "VNM")+ geom_hline(yintercept =mean(df$adjusted) , color = 'red')
# Return Plot
vnm_logret = xts(df$adjusted,as.Date(df$date))
vnm_logret = vnm_logret %>%
log %>%
diff
vnm_logret= vnm_logret[-1,]
qplot(x = 1:length(vnm_logret) , y = vnm_logret , geom = 'line') + geom_line(color = 'darkblue') +
geom_hline(yintercept = mean(vnm_logret) , color = 'red' , size = 1) +
labs(x = '' , y = 'Daily Returns')
Don't know how to automatically pick scale for object of type xts/zoo. Defaulting to continuous.
To verify the stationarity of the returns, we utilize the Augmented Dickey-Fuller test where null hypothesis indicates non-stationary time series.
adf.test(vnm_logret)
Augmented Dickey-Fuller Test
data: vnm_logret
Dickey-Fuller = -14.737, Lag order = 14, p-value = 0.01
alternative hypothesis: stationary
For time series analysis, Box-Jenkins approach applies ARIMA models to find the best fit of a time series model that represent the stochastic process which generate time series. This method uses a three stage modelling approach: a) identification, b) estimation, c) diagnostic checking. ### Identification
model.arima = auto.arima(vnm_logret , max.order = c(3 , 0 ,3) , stationary = TRUE , trace = T , ic = 'aicc')
Fitting models using approximations to speed things up...
ARIMA(2,0,2) with non-zero mean : -16640.05
ARIMA(0,0,0) with non-zero mean : -16627.37
ARIMA(1,0,0) with non-zero mean : -16632.69
ARIMA(0,0,1) with non-zero mean : -16625.63
ARIMA(0,0,0) with zero mean : -16621.77
ARIMA(1,0,2) with non-zero mean : -16629
ARIMA(2,0,1) with non-zero mean : -16629.1
ARIMA(3,0,2) with non-zero mean : -16640.76
ARIMA(3,0,1) with non-zero mean : -16641.24
ARIMA(3,0,0) with non-zero mean : -16638.24
ARIMA(4,0,1) with non-zero mean : Inf
ARIMA(2,0,0) with non-zero mean : -16629.98
ARIMA(4,0,0) with non-zero mean : -16642.4
ARIMA(5,0,0) with non-zero mean : -16641.9
ARIMA(5,0,1) with non-zero mean : -16641.61
ARIMA(4,0,0) with zero mean : -16635.79
Now re-fitting the best model(s) without approximations...
ARIMA(4,0,0) with non-zero mean : -16629.58
Best model: ARIMA(4,0,0) with non-zero mean
model.arima
Series: vnm_logret
ARIMA(4,0,0) with non-zero mean
Coefficients:
ar1 ar2 ar3 ar4 mean
0.0084 0.0083 -0.0472 -0.0321 0.0008
s.e. 0.0183 0.0183 0.0183 0.0183 0.0003
sigma^2 = 0.0002257: log likelihood = 8320.81
AIC=-16629.61 AICc=-16629.58 BIC=-16593.59
The procedure includes observing residual plot and its ACF & PACF diagram, and check Ljung-Box test result. If ACF & PACF of the model residuals show no significant lags, the selected model is appropriate.
model.arima$residuals %>% ggtsdisplay(plot.type = 'hist' , lag.max = 14)
Both ACF and PACF plots are similar and autocorrelations seem to be equal to zero. The lower right corner plot represents the histogram of the residuals compared to a normal distribution N(0 , σ2).
ar.res = model.arima$residuals
Box.test(model.arima$residuals , lag = 14 , fitdf = 2 , type = 'Ljung-Box')
Box-Ljung test
data: model.arima$residuals
X-squared = 5.8647, df = 12, p-value = 0.9227
We cannot reject the null hypothesis, therefore the process of the residuals behave like white noise so there is no indication of pattern that might be modeled.
Although ACF & PACF of residuals have no significant lags, the time series plot of residuals shows some cluster volatility. It is important to remember that ARIMA is a method to linearly model the data and the forecast width remains constant because the model does not reflect recent changes or incorporate new information. In order to model volatility we use the Autoregressive Conditional Heteroscedasticity (ARCH) model. ARCH is a statistical model for time series data that describes the variance of the current error term as a function of the actual sizes of the previous time periods’ error terms.
The GARCH process is valid when the squared residuals are correlated. ACF and PACF plots clearly indicate significant correlation.
tsdisplay(ar.res^2 , main = 'Squared Residuals')
Another way to test the Heteroscedasticity of the squared residuals is to perform significance testing on \(\alpha_1\) and \(\beta_1\) parameters.
# Model specification
model.spec = ugarchspec(variance.model = list(model = 'sGARCH' , garchOrder = c(1 , 1)) ,
mean.model = list(armaOrder = c(0 , 0)))
model.fit = ugarchfit(spec = model.spec , data = ar.res , solver = 'solnp')
options(scipen = 999)
model.fit@fit$matcoef
Estimate Std. Error t value Pr(>|t|)
mu -0.00006781119 0.000225598332 -0.3005837 0.76373195
omega 0.00001562176 0.000006460835 2.4179160 0.01560968
alpha1 0.16338532058 0.018628800583 8.7705765 0.00000000
beta1 0.77278283793 0.046550170076 16.6010744 0.00000000
Both \(\alpha_1\) and \(\beta_1\) are significantly different from zero, therefore it is reasonable to assume time-varying volatility of the residuals. With successive replacement of the \(\sigma_{t-1}^{2}\)
Value at Risk (VaR) is a statistical measure of downside risk based on current position. It estimates how much a set of investments might lose given normal market conditions in a set time period. A VaR statistic has three components: a) time period, b) confidence level, c) loss ammount (or loss percentage). For 95% confidence level, we can say that the worst daily loss will not exceed VaR estimation. If we use historical data, we can estimate VaR by taking the 5% quantile value. For our data this estimation is:
qplot(vnm_logret , geom = 'histogram') + geom_histogram(fill = 'lightblue' , bins = 30) +
geom_histogram(aes(vnm_logret[vnm_logret < quantile(vnm_logret , 0.05)]) , fill = 'red' , bins = 30) +
labs(x = 'Daily Returns')
Don't know how to automatically pick scale for object of type xts/zoo. Defaulting to continuous.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
fitdist(distribution = 'std' , x = vnm_logret)$pars
mu sigma shape
0.000174047 0.017529840 2.827809536
Red bars refer to returns lower than 5% quantile.
To estimate VaR, we need to properly define the corresponding quantile of the assumed distribution. For normal distribution, the quantile corresponding to a = 5% is -1.645. Empirical evidence suggest the assumption of normality often produces weak results. Jarque-Bera test can test the hypothesis that stock returns follow a normal distribution. \[JB = \frac{n+k-1}{6}* (S^2+\frac{1}{4}*(C-3^2))\] where S is skewness and C is kurtosis. A normal distributed sample would return a JB score of zero. The low p-value indicates stock returns are not normally distributed.
jarque.test(vnm_logret)
p2_1 = qplot(vnm_logret , geom = 'density') + geom_density(fill = 'blue' , alpha = 0.4) +
geom_density(aes(rnorm(200000 , 0 , sd(vnm_logret))) , fill = 'red' , alpha = 0.25) +
labs(x = '')
p2_2 = qplot(vnm_logret , geom = 'density') + geom_density(fill = 'blue' , alpha = 0.4) +
geom_density(aes(rnorm(200000 , 0 , sd(vnm_logret))) , fill = 'red' , alpha = 0.25) +
coord_cartesian(xlim = c(-0.07 , -0.02) , ylim = c(0 , 10)) +
geom_vline(xintercept = c(qnorm(p = c(0.01 , 0.05) , mean = mean(vnm_logret) , sd = sd(vnm_logret))) ,
color = c('darkgreen' , 'green') , size = 1) + labs(x = 'Daily Returns')
grid.arrange(p2_1 , p2_2 , ncol = 1)
On the figure above, Density plots are shown for stock returns (blue) and normal distributed data (red). Vertical lines of the lower plot represent the normal corresponding quantile for a = 0.05 (light green) and a = 0.01 (dark green). Lower plot indicates that for 95% significance, normal distribution usage may overestimate the value at risk. However, for 99% significance level, a normal distribution would underestimate the risk. Student’s t-distribution
fitdist(distribution = 'std' , x = vnm_logret)$pars
mu sigma shape
0.000174047 0.017529840 2.827809536
numnum = fitdist(distribution = 'std' , x = vnm_logret)$pars
numnum[3]
shape
2.82781
cat("For a = 0.05 the quantile value of normal distribution is: " ,
qnorm(p = 0.05) , "\n" ,
"For a = 0.05 the quantile value of t-distribution is: " ,
qdist(distribution = 'std' , shape = 2.9365168555 , p = 0.05) , "\n" , "\n" ,
'For a = 0.01 the quantile value of normal distribution is: ' ,
qnorm(p = 0.01) , "\n" ,
"For a = 0.01 the quantile value of t-distribution is: " ,
qdist(distribution = 'std' , shape = 2.9365168555 , p = 0.01) , sep = "")
For a = 0.05 the quantile value of normal distribution is: -1.644854
For a = 0.05 the quantile value of t-distribution is: -1.340812
For a = 0.01 the quantile value of normal distribution is: -2.326348
For a = 0.01 the quantile value of t-distribution is: -2.609131
head(vnm_logret)
[,1]
2010-01-05 0.043412493
2010-01-06 0.006514681
2010-01-07 -0.030771659
2010-01-08 -0.031748698
2010-01-11 -0.013921339
2010-01-12 -0.033257222
summary(vnm_logret)
Index vnm_logret
Min. :2010-01-05 Min. :-0.072469
1st Qu.:2013-01-03 1st Qu.:-0.007086
Median :2016-01-08 Median : 0.000000
Mean :2016-01-08 Mean : 0.000758
3rd Qu.:2019-01-07 3rd Qu.: 0.007734
Max. :2021-12-31 Max. : 0.067646
sd(vnm_logret)
[1] 0.01503838
skewness(vnm_logret)
[1] 0.14798
attr(,"method")
[1] "moment"
kurtosis(vnm_logret) - 3
[1] 0.1439202
attr(,"method")
[1] "excess"
qqnorm(vnm_logret)
qqline(vnm_logret, col = "blue")
# Calculate ACF
acf(vnm_logret,lag.max=40,plot=F)
Autocorrelations of series ‘vnm_logret’, by lag
0 1 2 3 4 5 6 7 8 9 10
1.000 0.009 0.008 -0.047 -0.033 0.023 -0.001 0.029 0.005 -0.012 0.006
11 12 13 14 15 16 17 18 19 20 21
0.014 -0.010 -0.007 0.004 -0.041 0.030 -0.017 -0.016 0.002 -0.011 0.034
22 23 24 25 26 27 28 29 30 31 32
-0.002 0.010 -0.002 0.000 0.009 0.000 0.011 -0.013 0.029 -0.006 -0.006
33 34 35 36 37 38 39 40
-0.001 -0.003 0.018 0.017 0.007 0.015 -0.014 0.010
# Plot ACF
acf(vnm_logret,lag.max=40)
h=hist(vnm_logret)
xfit = seq(min(vnm_logret), max(vnm_logret), length = 1000)
yfit = dnorm(xfit, mean = mean(vnm_logret), sd = sd(vnm_logret)) * diff(h$mids[1:2]) * length(vnm_logret)
lines(xfit, yfit, col = "blue", lwd = 2)
Using historical simulation for VaR at 5% with different estimation rolling window (500, 1000) ### (VaR, 0.05, 500)
alpha = 0.05
window = 500
VaR = c()
for (i in 1:(length(vnm_logret) - window)) {
y = vnm_logret[i:(i + window)]
ys = sort(y)
ysta = ys[round(alpha * window)] # 25 is 5% of 500...
VaR = c(VaR, ysta)
}
plot(df$date[(window+2):length(df$date)], VaR, type = 'l', main = "(VaR, 0.05, 500)", xlab = "Year", col = 'blue')
alpha = 0.05
window = 750
VaR = c()
for (i in 1:(length(vnm_logret) - window)) {
y = vnm_logret[i:(i + window)]
ys = sort(y)
ysta = ys[round(alpha * window)] # 5% of 750...
VaR = c(VaR, ysta)
}
plot(df$date[(window+2):length(df$date)], VaR, type = 'l', main = "(VaR, 0.05, 750)", xlab = "Year", col = 'blue')
alpha = 0.05
window = 1000
VaR = c()
for (i in 1:(length(vnm_logret) - window)) {
y = vnm_logret[i:(i + window)]
ys = sort(y)
ysta = ys[round(alpha * window)] # 5% of 1000...
VaR = c(VaR, ysta)
}
plot( df$date[(window+2):length(df$date)], VaR, type = 'l', main = "(VaR, 0.05, 1000)", xlab = "Year", col = 'blue')
The ugarchroll method allows to perform a rolling estimation and forecasting of a model/dataset combination. It returns the distributional forecast parameters necessary to calculate any required measure on the forecasted density. We set the last 500 observations as test set and we perform a rolling moving 1-step ahead forecast of the conditional standard deviation. We re-estimate GARCH parameters every 50 observations.
var_roll_forecast = function(par_arr) {
model.spec = ugarchspec(variance.model = list(model = 'sGARCH' ,
garchOrder = par_arr) ,
mean.model = list(armaOrder = c(0 , 0)))
model.fit = ugarchfit(spec = model.spec , data = ar.res , solver = 'solnp')
model.fit@fit$matcoef
model.roll = ugarchroll(spec = model.spec , data = vnm_logret,
n.start = length(vnm_logret)-500, refit.every = 50,
refit.window = 'moving')
VaR95_td = mean(vnm_logret) +
model.roll@forecast$density[,'Sigma'] * qdist(distribution = 'std',
shape = numnum[3], p = 0.05)
return(VaR95_td)
}
qplot(y = vnm_logret , x = 1:length(vnm_logret) , geom = 'point') + geom_point(colour = 'lightgrey' , size = 2) +
geom_line(aes(y = model.fit@fit$sigma*(-1.340812) , x = 1:length(vnm_logret)) , colour = 'red') +
geom_hline(yintercept = sd(vnm_logret)*qnorm(0.05) , colour = 'darkgreen' , size = 1.2) + theme_light() +
labs(x = '' , y = 'Daily Returns' , title = 'Value at Risk Comparison')
Don't know how to automatically pick scale for object of type xts/zoo. Defaulting to continuous.
Red line denotes VaR produced by GARCH model and green line refers to delta-normal VaR.
p = c()
p[1] = pbinom(q = 0 , size = 500 , prob = 0.05)
for(i in 1:50){
p[i] = (pbinom(q = (i-1) , size = 500 , prob = 0.05) - pbinom(q = (i-2) , size = 500 , prob = 0.05))
}
qplot(y = p , x = 1:50 , geom = 'line') + scale_x_continuous(breaks = seq(0 , 50 , 2)) +
annotate('segment' , x = c(16 , 35) , xend = c(16 , 35) , y = c(0 , 0) , yend = p[c(16 , 35)] , color = 'red' ,
size = 1) + labs(y = 'Probability' , x = 'Number of Exceptions') + theme_light()
The plot above represent the distribution of probabilities for exceptions given by the binomial distribution. The expected number is 25 (=500obs. x 5%). Two red lines denote the 95% confidence level, the lower being 16 and the upper 35. Therefore, when we check the exceptions on the test set, we expect a number between 16 and 35 to state that GARCH model as successfully predictive.
show_VaR = function(VaR) {
cat('Number of exceptions: ', (sum(vnm_logret[(length(vnm_logret)-499):length(vnm_logret)] < VaR)) , sep = '')
qplot(y = VaR, x = 1:500, geom = 'line') +
geom_point(aes(x = 1:500, y = vnm_logret[(length(vnm_logret)-499):length(vnm_logret)],
color = as.factor(vnm_logret[(length(vnm_logret)-499):length(vnm_logret)] < VaR)), size = 2) +
scale_color_manual(values = c('gray', 'red')) +
labs(y = 'Daily Returns', x = 'Test set Observation') + theme_light() +
theme(legend.position = 'none')
}
VaR95_Garch = var_roll_forecast(c(1, 1))
show_VaR(VaR95_Garch)
Number of exceptions: 38
argarch_spec = ugarchspec(mean.model = list(armaOrder = c(1,0),
include.mean = T),
variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
distribution.model = "norm")
argarch_roll = ugarchroll(argarch_spec,
data = vnm_logret,
window.size = 300,
calculate.VaR = TRUE,
VaR.alpha = 0.05)
var_argarch_df = as.data.frame(argarch_roll, which = "VaR")
var_argarch_df = var_argarch_df %>%
mutate(date = as.Date(rownames(var_argarch_df)),
var = `alpha(5%)`)
var_argarch_df %>%
ggplot(aes(date,var)) +
geom_line(color = "red") +
scale_y_continuous()
ugarchforecast(ugarchfit(argarch_spec, vnm_logret), n.ahead = 5)
*------------------------------------*
* GARCH Model Forecast *
*------------------------------------*
Model: sGARCH
Horizon: 5
Roll Steps: 0
Out of Sample: 0
0-roll forecast [T0=2021-12-31]:
Series Sigma
T+1 0.0010104 0.01071
T+2 0.0007003 0.01109
T+3 0.0006921 0.01143
T+4 0.0006919 0.01174
T+5 0.0006919 0.01203