About

In this worksheet we look at different variance, covariance, volatility, and causality calculations. We finish with a short matematical proof (no R required).

Setup

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.

Note

For clarity, tasks/questions to be completed/answered are highlighted in red color (color visible only in preview mode) and numbered according to their particular placement in the task section. Type your answers outside the red color tags!

Quite often you will need to add your own code chunk. Execute sequentially all code chunks, preview, publish, and submit link on Sakai following the naming convention. Make sure to add comments to your code where appropriate. Use own language!

Any sign of plagiarism, will result in dissmissal of work!


Task 1: Variance, Covariance, and Volatility

This task follows the two examples in the book R Example 2.5/p. 58 and R Example 2.6/p. 66

# Require will load the package only if not installed 
# Dependencies = TRUE makes sure that dependencies are install
if(!require("quantmod",quietly = TRUE))
  install.packages("quantmod",dependencies = TRUE, repos = "https://cloud.r-project.org")

Attaching package: <U+393C><U+3E31>zoo<U+393C><U+3E32>

The following objects are masked from <U+393C><U+3E31>package:base<U+393C><U+3E32>:

    as.Date, as.Date.numeric

Version 0.4-0 included new data defaults. See ?getSymbols.
Learn from a quantmod author: https://www.datacamp.com/courses/importing-and-managing-financial-data-in-r

##### 1A) Calculate the correlation and covariance matrix of the adjusted daily log returns for four different stocks of your choice. Explain your observations in terms of potential relationships.

# Once you have obtained the adjusted daily log returns for your stocks, omitting the time index, you will need to combine them to create a matrix. Below is an example.  For more details see the Help command in R on cbind, cov, and cor.
getSymbols('AAPL',src='yahoo', from="2007-01-01", to="2017-12-30")
[1] "AAPL"
aplRd = periodReturn(AAPL,period="daily",type="log")
AAPLRd=as.numeric(aplRd)
getSymbols('GE',src='yahoo', from="2007-01-01", to="2017-12-30")
[1] "GE"
aplRd = periodReturn(GE,period="daily",type="log")
GERd=as.numeric(aplRd)
getSymbols('MCD',src='yahoo', from="2007-01-01", to="2017-12-30")
[1] "MCD"
aplRd = periodReturn(MCD,period="daily",type="log")
MCDRd=as.numeric(aplRd)
getSymbols('KO',src='yahoo', from="2007-01-01", to="2017-12-30")
[1] "KO"
aplRd = periodReturn(KO,period="daily",type="log")
KORd=as.numeric(aplRd)
# M <- cbind(A,B,C) # create a matrix where each column is an array/vector of numerical values 
M <- cbind(AAPLRd,GERd,MCDRd,KORd)
# cov(M) # compute the covariance matrix
cov(M)
             AAPLRd         GERd        MCDRd         KORd
AAPLRd 4.043223e-04 1.700594e-04 8.576271e-05 7.507611e-05
GERd   1.700594e-04 3.735289e-04 9.831640e-05 9.052457e-05
MCDRd  8.576271e-05 9.831640e-05 1.374065e-04 6.487736e-05
KORd   7.507611e-05 9.052457e-05 6.487736e-05 1.306746e-04
# cor(M, method="pearson") # compute the correlation matrix based on the Pearson method
cor(M, method="pearson")
          AAPLRd      GERd     MCDRd      KORd
AAPLRd 1.0000000 0.4375975 0.3638572 0.3266196
GERd   0.4375975 1.0000000 0.4339705 0.4097403
MCDRd  0.3638572 0.4339705 1.0000000 0.4841655
KORd   0.3266196 0.4097403 0.4841655 1.0000000

From the tables we can see: the potential linear relationship between AAPL and GE daily log return distribution is 43.76%; the potential linear relationship between AAPL and MCD daily log return distribution is 36.39%; the potential linear relationship between AAPL and KO daily log return distribution is 32.66%; the potential linear relationship between GE and MCD daily log return distribution is 43.40%; the potential linear relationship between GE and KO daily log return distribution is 40.97%; the potential linear relationship between MCD and KO daily log return distribution is 48.42%. From the results above, we can conclude that all these four stocks have strong linear relationship between their daily log return distribution.

##### 1B) Calculate the three types of volatility for a particular stock of your choice. Consider a time window extending one year back from most recent obtainable closing day price. Order the three estimates from low to high volatility and explain how the ordering makes sense.

# For this task make sure you understand well what the variables n,m represent in the book's referenced example.
getSymbols('AAPL',src='yahoo', from="2007-01-01", to="2018-12-30")
[1] "AAPL"
aapl=AAPL['2017-12-04/2018-12-04']; m=length(aapl$AAPL.Close);
ohlc <-aapl[,c("AAPL.Open","AAPL.High","AAPL.Low","AAPL.Close")]
vClose <- volatility(ohlc, n= m,calc="close",N=252)
vParkinson <- volatility(ohlc, n= m,calc="parkinson",N=252)
vGK <- volatility(ohlc, n= m,calc="garman",N=252)
vClose[m]; vParkinson[m]; vGK[m];
                [,1]
2018-12-04 0.2692029
                [,1]
2018-12-04 0.2036672
                [,1]
2018-12-04 0.2035915

The order of these three estimators from low to high volatility is 0.2035915 for the Garman and Klass, 0.2036672 for the Parkinson and 0.2692029 for the Close-to-Close. We know that Garman and Klass volatility uses open, close, high and low prices for the given stock; Parkinson volatility uses both high and low prices for the given stock; Close-to-Close volatility only uses the close prices for the given stock. The ordering of three types of estimators shows that when we include more types of stock prices in the function to calculate the daily log return volatility, the result may be smaller, which means that the distribution of daily log return is more centralized.

Task 2: Auto-Correlation and Auto-Regression

Follow the example in the book R Example 3.2/p. 74 and R Example 4.1/p. 115

##### 2A) Calculate the ACF for a stock of your choice. Consider both the log return and squared log return. Interpret your results in terms of possible existence of autocorrelation.

acf(AAPLRd,main="acf of AAPL",ylim=c(-0.2,0.2))

acf(AAPLRd^2,main="",ylim=c(-0.2,0.2))

The horizontal dashed lines mark the two standard error limits of the sample ACFs in the plot of ACF of AAPL daily log return. We observe significant correlations at lags 2, 4, 12, 16, 18, 24; that is, the autocorrelations at these lags are over the 5 % level. Since some ACFs are above the limits, we can conclude that some of them are significantly different from zero at 95% confidence level; hence the daily log returns of AAPL in this time frame presents some autocorrelations with its past values, at least at lag 4.

From the output of ACF of AAPL square of daily log returns, we observe significant correlations at all lags; that is, the autocorrelations at all these lags are over the 5 % level. Since a good estimate of the variance of the daily log return {rt} of a stock is the sum of the squares of {rt}, this experimental result indicates that there is strong linear dependence of the variance of AAPL daily log return with its past values at all lags.

##### 2B) Plot the exchange rate for USD versus another currency of your choice. Interpret your results in terms of behavior.

getFX("GBP/USD") #download EUR/USD rates from oanda.com
[1] "GBPUSD"
plot(GBPUSD)

From the plot of exchange rate of USE versus GBP in the past six months, we can see that the value of GBP is following a declining trend. Although it did appreciate on September, but it continued to depreciate again on November. Besides, the plot shows that the range of the change of the exchange rate of GBP is quiet small from the past 6 months, only from $1.26 to $1.34.

##### 2C) Test for the possible existence of an underlying AR(1) – Markov process in your exchange rate currency pair. To this end, plot the ACF and the partial ACF (PACF). Interpret your results. Clearly refer to the lags, and their impacts in determining the order.

acf(GBPUSD)

pacf(GBPUSD)

From the plot of ACF, we can see a slow exponential decay for successive lags, hence revealing that the series GBP/USD does behaves as an AR(1) process. Besides, the plot of the Partial ACF (PACF) function confirms that the order of the underlying autoregressive process is 1 since we observe significant partial autocorrelation at lag 1, which is the order of AR.

Task 3: Granger Causality Test

To conduct this test the package lmtest will be required, as already done in the code chunk below.

# Require will load the package only if not installed 
# Dependencies = TRUE makes sure that dependencies are install
if(!require("lmtest",quietly = TRUE))
  install.packages("lmtest",dependencies = TRUE, repos = "https://cloud.r-project.org")

##### 3A) Include below the code chunk to solve for 3.5.7 R Lab/p. 106. Write your conclusions.

# More information about the data used in testing for causality can be obtained by typing the name of the data set `ChickEgg` in the R Help menu.
data(ChickEgg)
grangertest(egg~chicken, order=3, data=ChickEgg)
Granger causality test

Model 1: egg ~ Lags(egg, 1:3) + Lags(chicken, 1:3)
Model 2: egg ~ Lags(egg, 1:3)
  Res.Df Df      F Pr(>F)
1     44                 
2     47 -3 0.5916 0.6238
grangertest(chicken~egg, order=3, data=ChickEgg)
Granger causality test

Model 1: chicken ~ Lags(chicken, 1:3) + Lags(egg, 1:3)
Model 2: chicken ~ Lags(chicken, 1:3)
  Res.Df Df     F   Pr(>F)   
1     44                     
2     47 -3 5.405 0.002966 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The first equation is to test whether chicken causes egg, and the second equation is to test whether egg causes chicken. From the results of test 1, the F-statistic for the test 1 is 0.5916, and the p-value of F-statistic is 0.6238, which is greater than 0.05. So we cannot reject the null at 5% significant level, which means that chicken does not cause egg.

From the results of test2, the F-statistic for the test 1 is 5.405, and the p-value of F-statistic is 0.002966, which is less than 0.01. So we can reject the null at 1% significant level, which means that chicken does cause egg.

##### 3B) Briefly describe the data in terms of time range and variables. Similar to the linear autoegressive model described in class, write the mathematical regression model solved in each Granger test, including the proper order. Use naming conventions, and notations more reflective of the data set considered for ChickEgg.

The data “ChickEgg” contains an annual time series from 1930 to 1983 with 2 variables: chicken and egg. Chicken is the population of all US chickens per year, and egg is the US egg production in millions of dozens per year. Let E denotes variable egg, C denotes the variable chicken, we can get: The mathematical regression models solved in testing whether chicken causes egg are as below:

For model1 which is the unrestricted model for F-test: \(E_{t}\) =a0+a1\(E_{t-1}\) +a2\(E_{t-2}\) +a3\(E_{t-3}\) +b1\(C_{t-1}\) +b2\(C_{t-2}\)+b3\(C_{t-3}\)+\(\varepsilon_{t}\)

For model2 which is the restricted model for F-test: \(E_{t}\) =a0+a1\(E_{t-1}\) +a2\(E_{t-2}\) +a3\(E_{t-3}\)+\(\varepsilon_{t}\)

The mathematical regression models solved in testing whether egg causes chicken are as below:

For model1 which is the unrestricted model for F-test: \(C_{t}\) =a0+a1\(C_{t-1}\) +a2\(C_{t-2}\) +a3\(C_{t-3}\) +b1\(E_{t-1}\) +b2\(E_{t-2}\)+b3\(E_{t-3}\)+\(\varepsilon_{t}\)

For model2 which is the restricted model for F-test: \(C_{t}\) =a0+a1\(C_{t-1}\) +a2\(C_{t-2}\) +a3\(C_{t-3}\) +\(\varepsilon_{t}\)

Task 4: Mathematical Proof

##### 4A) Prove the two results in Eq (2.32)/p. 53. No R-coding is needed here. Clearly show your steps. Hint: Use the definition of \(E(X^n)\) for X-log normally distributed. Observe also that \(Var(X) = E(X^2)-E^2(X)\) for any random variable X.

This is the solution of the provement

This is the solution of the provement

*http://computationalfinance.lsi.upc.edu

---
title: "FINC621 Winter 2018-19 Lab Worksheet 03"
author: "Yue Huang"
date: "12-5-2018"
output:
  html_notebook: default
  html_document: default
subtitle: Variance, Covariance, Correlation & Causality (finc621-lab03)
---

### About

In this worksheet we look at different variance, covariance, volatility, and causality calculations. We finish with a short matematical proof (no R required).  

### Setup

Remember to always set your working directory to the source file location. Go to 'Session', scroll down to 'Set Working Directory', and click 'To Source File Location'. Read carefully the below and follow the instructions to complete the tasks and answer any questions.  Submit your work to RPubs as detailed in previous notes. 

### Note

For clarity, tasks/questions to be completed/answered are highlighted in red color (color visible only in preview mode) and numbered according to their particular placement in the task section.  Type your answers outside the red color tags!

Quite often you will need to add your own code chunk. Execute sequentially all code chunks, preview, publish, and submit link on Sakai following the naming convention. Make sure to add comments to your code where appropriate. Use own language!

**Any sign of plagiarism, will result in dissmissal of work!**

--------------

### Task 1: Variance, Covariance, and Volatility

This task follows the two examples in the book `R Example 2.5/p. 58` and `R Example 2.6/p. 66` 

```{r}
# Require will load the package only if not installed 
# Dependencies = TRUE makes sure that dependencies are install
if(!require("quantmod",quietly = TRUE))
  install.packages("quantmod",dependencies = TRUE, repos = "https://cloud.r-project.org")
```


<span style="color:red">
##### 1A) Calculate the correlation and covariance matrix of the adjusted daily log returns for four different stocks of your choice. Explain your observations in terms of potential relationships.
</span>

```{r}
# Once you have obtained the adjusted daily log returns for your stocks, omitting the time index, you will need to combine them to create a matrix. Below is an example.  For more details see the Help command in R on cbind, cov, and cor.
getSymbols('AAPL',src='yahoo', from="2007-01-01", to="2017-12-30")
aplRd = periodReturn(AAPL,period="daily",type="log")
AAPLRd=as.numeric(aplRd)
getSymbols('GE',src='yahoo', from="2007-01-01", to="2017-12-30")
aplRd = periodReturn(GE,period="daily",type="log")
GERd=as.numeric(aplRd)
getSymbols('MCD',src='yahoo', from="2007-01-01", to="2017-12-30")
aplRd = periodReturn(MCD,period="daily",type="log")
MCDRd=as.numeric(aplRd)
getSymbols('KO',src='yahoo', from="2007-01-01", to="2017-12-30")
aplRd = periodReturn(KO,period="daily",type="log")
KORd=as.numeric(aplRd)
# M <- cbind(A,B,C) # create a matrix where each column is an array/vector of numerical values 
M <- cbind(AAPLRd,GERd,MCDRd,KORd)
# cov(M) # compute the covariance matrix
cov(M)
# cor(M, method="pearson") # compute the correlation matrix based on the Pearson method
cor(M, method="pearson")
```
From the tables we can see: the potential linear relationship between AAPL and GE daily log return distribution is 43.76%; the potential linear relationship between AAPL and MCD daily log return distribution is 36.39%; the potential linear relationship between AAPL and KO daily log return distribution is 32.66%; the potential linear relationship between GE and MCD daily log return distribution is 43.40%; the potential linear relationship between GE and KO daily log return distribution is 40.97%; the potential linear relationship between MCD and KO daily log return distribution is 48.42%. From the results above, we can conclude that all these four stocks have strong linear relationship between their daily log return distribution.

<span style="color:red">
##### 1B) Calculate the three types of volatility for a particular stock of your choice. Consider a time window extending one year back from most recent obtainable closing day price. Order the three estimates from low to high volatility and explain how the ordering makes sense.
</span>

```{r}
# For this task make sure you understand well what the variables n,m represent in the book's referenced example.
getSymbols('AAPL',src='yahoo', from="2007-01-01", to="2018-12-30")
aapl=AAPL['2017-12-04/2018-12-04']; m=length(aapl$AAPL.Close);
ohlc <-aapl[,c("AAPL.Open","AAPL.High","AAPL.Low","AAPL.Close")]
vClose <- volatility(ohlc, n= m,calc="close",N=252)
vParkinson <- volatility(ohlc, n= m,calc="parkinson",N=252)
vGK <- volatility(ohlc, n= m,calc="garman",N=252)
vClose[m]; vParkinson[m]; vGK[m];

```
The order of these three estimators from low to high volatility is 0.2035915 for the Garman and Klass, 0.2036672 for the Parkinson and 0.2692029 for the Close-to-Close. We know that Garman and Klass volatility uses open, close, high and low prices for the given stock; Parkinson volatility uses both high and low prices for the given stock; Close-to-Close volatility only uses the close prices for the given stock. The ordering of three types of estimators shows that when we include more types of stock prices in the function to calculate the daily log return volatility, the result may be smaller, which means that the distribution of daily log return is more centralized. 

### Task 2: Auto-Correlation and Auto-Regression

Follow the example in the book  `R Example 3.2/p. 74` and `R Example 4.1/p. 115`

<span style="color:red">
##### 2A) Calculate the ACF for a stock of your choice. Consider both the log return and squared log return. Interpret your results in terms of possible existence of autocorrelation.  
</span>

```{r}
acf(AAPLRd,main="acf of AAPL",ylim=c(-0.2,0.2))
acf(AAPLRd^2,main="",ylim=c(-0.2,0.2))

```
The horizontal dashed lines mark the two standard error limits of the sample ACFs in the plot of ACF of AAPL daily log return. We observe significant correlations at lags 2, 4, 12, 16, 18, 24; that is, the autocorrelations at these lags are over the 5 % level. Since some ACFs are above the limits, we can conclude that some of them are significantly different from zero at 95% confidence level; hence the daily log returns of AAPL in this time frame presents some autocorrelations with its past values, at least at lag 4.

From the output of ACF of AAPL square of daily log returns, we observe significant correlations at all lags; that is, the autocorrelations at all these lags are over the 5 % level. Since a good estimate of the variance of the daily log return {rt} of a stock is the sum of the squares of {rt}, this experimental result indicates that there is strong linear dependence of the variance of AAPL daily log return with its past values at all lags.


<span style="color:red">
##### 2B) Plot the exchange rate for USD versus another currency of your choice. Interpret your results in terms of behavior.
</span>

```{r}
getFX("GBP/USD") #download EUR/USD rates from oanda.com
plot(GBPUSD)

```
From the plot of exchange rate of USE versus GBP in the past six months, we can see that the value of GBP is following a declining trend. Although it did appreciate on September, but it continued to depreciate again on November. Besides, the plot shows that the range of the change of the exchange rate of GBP is quiet small from the past 6 months, only from $1.26 to $1.34.	

<span style="color:red">
##### 2C) Test for the possible existence of an underlying AR(1) – Markov process in your exchange rate currency pair. To this end, plot the ACF and the partial ACF (PACF). Interpret your results.  Clearly refer to the lags, and their impacts in determining the order.
</span>

```{r}
acf(GBPUSD)
pacf(GBPUSD)
```
From the plot of ACF, we can see a slow exponential decay for successive lags, hence revealing that the series GBP/USD does behaves as an AR(1) process. Besides, the plot of the Partial ACF (PACF) function confirms that the order of the underlying autoregressive process is 1 since we observe significant partial autocorrelation at lag 1, which is the order of AR.

### Task 3: Granger Causality Test

To conduct this test the package `lmtest` will be required, as already done in the code chunk below.

```{r}
# Require will load the package only if not installed 
# Dependencies = TRUE makes sure that dependencies are install
if(!require("lmtest",quietly = TRUE))
  install.packages("lmtest",dependencies = TRUE, repos = "https://cloud.r-project.org")
```

<span style="color:red">
##### 3A) Include below the code chunk to solve for 3.5.7 R Lab/p. 106.  Write your conclusions.
</span>

```{r}
# More information about the data used in testing for causality can be obtained by typing the name of the data set `ChickEgg` in the R Help menu.
data(ChickEgg)
grangertest(egg~chicken, order=3, data=ChickEgg)
grangertest(chicken~egg, order=3, data=ChickEgg)
```
The first equation is to test whether chicken causes egg, and the second equation is to test whether egg causes chicken. From the results of test 1, the F-statistic for the test 1 is 0.5916, and the p-value of F-statistic is 0.6238, which is greater than 0.05. So we cannot reject the null at 5% significant level, which means that chicken does not cause egg. 

From the results of test2, the F-statistic for the test 1 is 5.405, and the p-value of F-statistic is 0.002966, which is less than 0.01. So we can reject the null at 1% significant level, which means that chicken does cause egg.


<span style="color:red">
##### 3B) Briefly describe the data in terms of time range and variables. Similar to the linear autoegressive model described in class, write the mathematical regression model solved in each Granger test, including the proper order. Use naming conventions, and notations more reflective of the data set considered for  `ChickEgg`.
</span>

The data "ChickEgg" contains an annual time series from 1930 to 1983 with 2 variables: chicken and egg. Chicken is the population of all US chickens per year, and egg is the US egg production in millions of dozens per year. Let E denotes variable egg, C denotes the variable chicken, we can get:
The mathematical regression models solved in testing whether chicken causes egg are as below:

For model1 which is the unrestricted model for F-test: $E_{t}$ =a0+a1$E_{t-1}$ +a2$E_{t-2}$ +a3$E_{t-3}$ +b1$C_{t-1}$ +b2$C_{t-2}$+b3$C_{t-3}$+$\varepsilon_{t}$

For model2 which is the restricted model for F-test: $E_{t}$ =a0+a1$E_{t-1}$ +a2$E_{t-2}$ +a3$E_{t-3}$+$\varepsilon_{t}$

The mathematical regression models solved in testing whether egg causes chicken are as below: 

For model1 which is the unrestricted model for F-test: $C_{t}$ =a0+a1$C_{t-1}$ +a2$C_{t-2}$ +a3$C_{t-3}$ +b1$E_{t-1}$ +b2$E_{t-2}$+b3$E_{t-3}$+$\varepsilon_{t}$

For model2 which is the restricted model for F-test: $C_{t}$ =a0+a1$C_{t-1}$ +a2$C_{t-2}$ +a3$C_{t-3}$ +$\varepsilon_{t}$


### Task 4: Mathematical Proof

<span style="color:red">
##### 4A) Prove the two results in Eq (2.32)/p. 53.  No R-coding is needed here.  Clearly show your steps. Hint: Use the definition of $E(X^n)$ for X-log normally distributed.   Observe also that $Var(X) = E(X^2)-E^2(X)$ for any random variable X.
</span>

![This is the solution of the provement](image1.jpg)



*[http://computationalfinance.lsi.upc.edu ](http://computationalfinance.lsi.upc.edu)
