About

In this worksheet we look at different variance, covariance, volatility, and causality calculations. We finish with a short matematical proof (no R required).


Task 1: Variance, Covariance, and Volatility

This task follows the two examples in the book R Example 2.5/p. 58 and R Example 2.6/p. 66

# Require will load the package only if not installed 
# Dependencies = TRUE makes sure that dependencies are install
if(!require("quantmod",quietly = TRUE))
  install.packages("quantmod",dependencies = TRUE, repos = "https://cloud.r-project.org")
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Version 0.4-0 included new data defaults. See ?getSymbols.

##### 1A) Calculate the correlation and covariance matrix of the adjusted daily log returns for four different stocks of your choice. Explain your observations in terms of potential relationships.

#Covariance & Correlation
library("quantmod")
symbols=c('MSFT','AAPL','TWTR','GOOG')
getSymbols(symbols,src='yahoo',from="2017-11-28", to="2018-11-28")
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## 
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
## 
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## [1] "MSFT" "AAPL" "TWTR" "GOOG"
MSRd = periodReturn(MSFT$MSFT.Adjusted, period = "daily", type="log")
msrd2=as.numeric(MSRd)
APRd = periodReturn(AAPL$AAPL.Adjusted, period = "daily", type="log")
aprd2=as.numeric(APRd)
TWRd = periodReturn(TWTR$TWTR.Adjusted, period = "daily", type="log")
twrd2=as.numeric(TWRd)
GOOGRd = periodReturn(GOOG$GOOG.Adjusted, period = "daily", type="log")
googrd2=as.numeric(GOOGRd)
#Creating a matrix
M <- cbind(msrd2,aprd2,twrd2,googrd2)
#Covariance
cov(M)
##                msrd2        aprd2        twrd2      googrd2
## msrd2   0.0002687491 0.0001732819 0.0002393885 0.0002182014
## aprd2   0.0001732819 0.0002715042 0.0002025488 0.0001756354
## twrd2   0.0002393885 0.0002025488 0.0012505897 0.0002678249
## googrd2 0.0002182014 0.0001756354 0.0002678249 0.0002743206
#Correlation (Pearson method)
cor(M, method = "pearson")
##             msrd2     aprd2     twrd2   googrd2
## msrd2   1.0000000 0.6414924 0.4129260 0.8036276
## aprd2   0.6414924 1.0000000 0.3476034 0.6435684
## twrd2   0.4129260 0.3476034 1.0000000 0.4572611
## googrd2 0.8036276 0.6435684 0.4572611 1.0000000

We have positive values in the covariance matrix, indicating positive covariance and linear dependency between the various log return values for the four chosen stocks.

Correlation refers to the unitless description of the strength of the covariance and ranges from -1 to 1, with perfect correlation at the extremes and non-correlated at zero. In the correlation matrix we see the value of 1 descending across the diagonal, which is expected and indicates perfect positive correlation between a stock’s log return distribution and itself. In general, we see strong correlation between the log return distributions of each stock and each of the other three, however, there is especially strong correlation (0.8036276) between the daily log return distributions of google and microsoft.

##### 1B) Calculate the three types of volatility for a particular stock of your choice. Consider a time window extending one year back from most recent obtainable closing day price. Order the three estimates from low to high volatility and explain how the ordering makes sense.

#volatility calculations
library("TTR")
library("quantmod")
getSymbols("RPM",src="yahoo",from="2017-11-30",to="2018-11-30")
## [1] "RPM"
rpm=RPM['2017-11/2018-11']
m=length(rpm$RPM.Close)
ohlc <-rpm[,c("RPM.Open","RPM.High","RPM.Low","RPM.Close")]
vClose <-volatility(ohlc,n=m,calc="close",N=252)
vParkinson <-volatility(ohlc,n= m,calc="parkinson",N=252)
vGK <-volatility(ohlc,n=m,calc="garman",N=252)
vClose[m]
##                 [,1]
## 2018-11-29 0.2216536
vParkinson[m]
##                 [,1]
## 2018-11-29 0.2134643
vGK[m]
##                 [,1]
## 2018-11-29 0.2142447

Ordering these volatility values from low to high we have:

Low - Parkinson estimator (0.2134643)

Mid - Garman & Klass estimator (0.2142447)

High - Close-to-close volatility (0.2216536)

In these calculations the variables “n” and “m” refer to the general size of the sample and to the length of the sample of RPM close prices, respectively. More specifically, n = m in this case.

This ordering makes sense given the various calculations and variables used by each volatility estimator. For instance, the close-to-close estimate is a very simplified estimate that incorporates only the closing price values of the stock from day to day. The Parkinson estimator is slightly more involved as it incorporates the daily high and low values for the stock in order to utilize a more descriptive range-based valuation. Finally, the Garman & Klass estimator utilizes not only the high and low price values but also the open and close prices. These last two estimates offer a “sharper” estimate of volatility by capturing the variation in daily prices in a “range-based” estimation.

Task 2: Auto-Correlation and Auto-Regression

Follow the example in the book R Example 3.2/p. 74 and R Example 4.1/p. 115

##### 2A) Calculate the ACF for a stock of your choice. Consider both the log return and squared log return. Interpret your results in terms of possible existence of autocorrelation.

#Compute the ACF of Netflix log returns & squared log returns
library("quantmod")
getSymbols("nflx",src='yahoo',from="2017-11-26", to="2018-11-26")
## [1] "NFLX"
nfRd = periodReturn(NFLX$NFLX.Adjusted, period = "daily", type="log")

acf(na.omit(nfRd),main="ACF of NFLX", ylim=c(-0.2,0.2))

acf(na.omit(nfRd)^2,main="ACF of NFLX square of log returns", ylim=c(-0.3,0.5))

From these results we see fairly significant correlation at lag 8 for the log returns and lag 10 for the squared log returns, which indicates that Netflix log returns may have linear dependence with its past log returns, 8 and/or 10 time lags away. However, relative to other examples of ACF these values may be considered insignificant and would therefore not be indicative of meaningful autocorrelation between log return values and values of the past.

##### 2B) Plot the exchange rate for USD versus another currency of your choice. Interpret your results in terms of behavior.

library("quantmod")
#Download USD/GBP rates
getFX("USD/GBP")
## [1] "USDGBP"
#Plot the results
plot(USDGBP)

These results indicate a general increase in value of the US Dollar relative to the British Pound Sterling, with some obvious fluctuations, and ultimately settling at or near the current rate of $1 = 0.79 GBP.

##### 2C) Test for the possible existence of an underlying AR(1) – Markov process in your exchange rate currency pair. To this end, plot the ACF and the partial ACF (PACF). Interpret your results. Clearly refer to the lags, and their impacts in determining the order.

#Plot the ACF
acf(USDGBP)

#Plot the partial ACF
pacf(USDGBP)

The plot of the ACF for the USD/GBP series exhibits a slow exponential decay as we progress through from lag 1 to lag 22, which typifies the behavior of an underlying AR(1) Markov process. The plot of the partial ACF (PACF) confirms the order of the underlying autoregressive process to be 1.

Task 3: Granger Causality Test

To conduct this test the package lmtest will be required, as already done in the code chunk below.

# Require will load the package only if not installed 
# Dependencies = TRUE makes sure that dependencies are install
if(!require("lmtest",quietly = TRUE))
  install.packages("lmtest",dependencies = TRUE, repos = "https://cloud.r-project.org")

##### 3A) Include below the code chunk to solve for 3.5.7 R Lab/p. 106. Write your conclusions.

library("lmtest")
# Which came first: the chicken or the egg? Based on information regarding yearly population and production of chicken and eggs from 1930 to 1983
data(ChickEgg)
#Granger Causality tests
grangertest(egg ~ chicken, order = 3, data = ChickEgg)
## Granger causality test
## 
## Model 1: egg ~ Lags(egg, 1:3) + Lags(chicken, 1:3)
## Model 2: egg ~ Lags(egg, 1:3)
##   Res.Df Df      F Pr(>F)
## 1     44                 
## 2     47 -3 0.5916 0.6238
grangertest(chicken ~ egg, order = 3, data = ChickEgg)
## Granger causality test
## 
## Model 1: chicken ~ Lags(chicken, 1:3) + Lags(egg, 1:3)
## Model 2: chicken ~ Lags(chicken, 1:3)
##   Res.Df Df     F   Pr(>F)   
## 1     44                     
## 2     47 -3 5.405 0.002966 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The initial test is run to determine if egg is “caused” by chicken (egg ~ chicken), or egg as a function of chicken; with the second test attempting to determine the reverse scenario in which chicken is “caused” by egg (chicken ~ egg).

To determine significance, or causality, we refer to the p-values of the two Granger tests. In the first test we observe a p-value of 0.6238, which indicates that we should not reject the null hypothesis and that egg is not caused by, not a function of, chicken. In the second test, however, we observe a p-value of 0.002966, which indicates that we should reject the null hypothesis and that chicken is a function of egg.

# More information about the data used in testing for causality can be obtained by typing the name of the data set `ChickEgg` in the R Help menu.

##### 3B) Briefly describe the data in terms of time range and variables. Similar to the linear autoegressive model described in class, write the mathematical regression model solved in each Granger test, including the proper order. Use naming conventions, and notations more reflective of the data set considered for ChickEgg.

The data contained in the “ChickEgg” set includes the US chicken population and egg production each year from 1930 to 1983. The observed variables are “chicken”, the number of chickens at December 1 of the given year excluding commercial broilers; and “egg”, the US egg production value in millions of dozens for the given year.

Mathematical regression model (Test 1):

\[E_t = a_0 + a_1E_{t-1} + a_2E_{t-2} + a_3E_{t-3} + b_1C_{t-1} + b_2C_{t-2} + b_3C_{t-3} + R_t\]

Mathematical regression model (Test 2):

\[C_t = a_0 + a_1C_{t-1} + a_2C_{t-2} + a_3C_{t-3} + b_1E_{t-1} + b_2E_{t-2} + b_3E_{t-3} + R_t\]

Null hypothesis:

\[H_0 : b_1 = b_2 = b_3 = 0\]

C does not cause E, or E does not cause C

In these regression models the variables E and C refer to egg and chicken, respectively, and R is the error term. T is the time value, and the variables a and b are the regression coefficients, essentially slope coefficients, that indicate the change in the response variable according to the change in the predictor variable.

Task 4: Mathematical Proof

##### 4A) Prove the two results in Eq (2.32)/p. 53. No R-coding is needed here. Clearly show your steps. Hint: Use the definition of \(E(X^n)\) for X-log normally distributed. Observe also that \(Var(X) = E(X^2)-E^2(X)\) for any random variable X.

knitr::include_graphics("Proof.png")
Equation 2.32 Proof

Equation 2.32 Proof