1 General directions for this Workshop

You will work in RStudio. Create an R Notebook document to write whatever is asked in this workshop.

At the beginning of the R Notebook write Workshop 2 - Financial Econometrics II and your name (as we did in previous workshop).

You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.

It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.

Keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.

2 Q Analyzing Histograms of returns

We start clearing our R environment:

rm(list=ls())
# To avoid scientific notation for numbers: 
options(scipen=999)

Using the getsymbols function, download MONTHLY prices of the Mexican market index (^MXX), the IPyC (Índice de Precios y Cotizaciones), and also download the S&P500 index from the US market (^GSPC). For both indices download data from January 2000 to date. Do the following:

2.1 Data collection with getsymbols

The getsymbols function brings price (quotations) data of the financial instruments.

# Load the packages I need for this workshop:

library(quantmod)
#library(dplyr)

# Downloading the historical quotation data for both indexes:
getSymbols(c("^MXX", "^GSPC"), from="2000-01-01",  src="yahoo", periodicity="monthly")
## [1] "^MXX"  "^GSPC"

In your environment now you have the MXX and the GSPC xts datasets with historical data for the Mexican and the US fiancial market indexes.

Remember that xts datasets are time-series datasets with a time index.

Both datasets have columns for open, low, high, close, and adjusted prices. As we have mentioned, it is recommended to use adjusted prices to calculate returns. In the case of indexes, close prices will be always equal to adjusted prices.

2.2 Return calculation

Before calculating returns, we can create an integrated dataset with both indexes:

prices = merge(MXX,GSPC)

By doing this, we can easily calculate returns for both indexes using the integrated dataset.

We select only adjusted prices and rename the columns with meaningful names for the indexes:

prices = Ad(prices)
names(prices) = c("MXX","GSPC")

With adjusted prices we calculate continuously compounded returns:

r = diff(log(prices))

Remember that the continuously compounded returns (r) can be calculated as the first difference of the natural log prices (or index). The first difference is equal to the log price of the price at period t minus the log of the price at period (t-1)

We visualize the first historical returns:

head(r)
##                    MXX        GSPC
## 2000-01-01          NA          NA
## 2000-02-01  0.11232485 -0.02031300
## 2000-03-01  0.01410906  0.09232375
## 2000-04-01 -0.11811558 -0.03127991
## 2000-05-01 -0.10795263 -0.02215875
## 2000-06-01  0.15323959  0.02365163

We see that the first value is NA since it is not possible to calculate returns for the first month. We can delete any NA values with the na.omit function:

r = na.omit(r)

Now we caculate simple returns:

R =  na.omit(prices / lag(prices,n=1) - 1)

Remember that simple returns are calculated as a percentage change between the current price and the previous price.

The lag is a function that can be used with xts datasets to get previous values of the variable. In this case, we indicate n=1 meaning to get the past value for the price.

We can have a look to the content of the simple returns by looking the oldest and most recent returns:

head(R)
##                    MXX        GSPC
## 2000-02-01  0.11887627 -0.02010808
## 2000-03-01  0.01420906  0.09671983
## 2000-04-01 -0.11140666 -0.03079576
## 2000-05-01 -0.10232989 -0.02191505
## 2000-06-01  0.16560422  0.02393355
## 2000-07-01 -0.06247834 -0.01634128
tail(R)
##                    MXX        GSPC
## 2021-03-01  0.05950165 0.042438633
## 2021-04-01  0.01615910 0.052425321
## 2021-05-01  0.05990934 0.005486489
## 2021-06-01 -0.01171638 0.022214010
## 2021-07-01  0.01150474 0.022748055
## 2021-08-01  0.01015543 0.017673625

In this case, there is no NA values, but it might be a good idea to still apply the na.omit funcion just in case there is an NA value for any month of any index:

R <- na.omit(R)

2.3 Q Histograms

Do a histogram of the simple return of the Mexican index:

hist(R$MXX, main="Histogram of IPC monthly returns", 
     xlab="Simple returns", col="dark blue")

INTERPRET this histogram with your words.

Do a histogram of the simple return of the S&P500 index.

hist(R$GSPC, main="Histogram of S&P500 monthly returns", 
     xlab="Simple returns", col="blue")

INTERPRET this histogram with your words.

2.4 Q Appreciating risk by looking at histrograms.

Here is a graph that shows both histograms together. To better appreciate the histograms, I reduced the length of each bar to be equal to 2 percentual points.

The S&P returns are represented in blue; the IPyC returns are represented in yellow; the gray-yellow is shared area of both histograms.

LOOK CAREFULLY AT THIS PLOT WITH BOTH HISTOGRAMS. WHICH INSTRUMENT IS RISKIER? EXPLAIN

2.5 Q Calculate the mean and standard deviation

CALCULATE the mean and standard deviation of monthly returns for both market indexes index. Hint: you can check how we did this in Workshop 1

Just by looking at the mean and standard deviation of each instrument returns, WHICH INSTRUMENT LOOKS MORE ATTRACTIVE TO INVEST? EXPLAIN

2.6 Q (OPTIONAL) Calculating the holding-period return

If you had invested in the Mexican Index $1 peso in Jan 2000, WHICH WOULD BE THE VALUE OF YOUR INVESTMENT TODAY?

If you had invested in the S&P500 index $1 USD in Jan 2000, WHICH WOULD BE THE VALUE OF YOUR INVESTMENT TODAY?

3 Q The Central Limit Theorem

The Central Limit Theorem is one of the most important discoveries in mathematics and statistics. Actually, thanks to this discovery, the field of Statistics was developed at the beginning of the 20th century.

We will do an exercise using simulated numbers. I hope that you understand what the Central Limit Theorem is about.

Let’s do the following.

3.1 Q Monte Carlo simulation to create variables

Create the x variable as a random variable with normal distribution, with mean=20 and standard deviation=40. Create 100,000 observations:

x <- rnorm(n=100000, mean = 20, 40)

Create the variable y as a random variable with uniform distribution in the range [0,60]:

y <- runif(n=100000, min = 0, max = 60)

Learn about the variance of the uniform distribution. HOW CAN YOU ESTIMATE THE VARIANCE OF A UNIFORM DISTRIBUTED VARIABLE?

3.2 Q Histograms of x and y

Run a histogram for y and another histogram for x.

hist(x, main="Histogram of x", 
     xlab="x values", col="dark blue")

hist(y, main="Histogram of y", 
     xlab="y values", col="green")

WHAT DO YOU SEE? BRIEFLY EXPLAIN

3.3 Calculating standard deviation and variance

Calculate the mean of x and y and save them as xbar and ybar:

xbar= mean(x)
ybar= mean(y)

Now we will manually calculate the variance of x and y. Remember that the VARIANCE of a variable is the AVERAGE OF ITS SQUARED DEVIATIONS.

Calculate first the squared deviations of x and y:

xdesv2=(x-xbar)^2
ydesv2=(y-ybar)^2

Now we just calculate the mean of the squared deviations to get the variance:

varx=mean(xdesv2)
varx
## [1] 1604.346
vary=mean(ydesv2)
vary
## [1] 300.1798

Compare the variance you computed with the theoretical variance of each variable (the variances we used to generate the simulated values for x and y). You will see that the computed variances varx and vary will be very similar to the theoretical values.

3.4 Calculating mean of groups for x and y

Create a data frame with x and y as columns

dataset <- cbind(x,y)
dataset <- as.data.frame(dataset)

Now assign a group number to each observation. We will create 4,000 groups of 25 observations each. Each group will be labeled from 1 to 4,000. You can use the function rep and seq:

# I create a column called group where each group will have 25 
#   observations:

dataset$group <- rep(seq(1:4000),each=25)

With the group_by() and summarize() functions, for each group and for each variable x and y,compute the sample mean for EACH group.

Before the following code, you must INSTALL the PACKAGE dplyr. You can do this with the Package menu in RStudio.

#Now I'm grouping the observations by the columns and number previously assigned to them :
library(dplyr)
group_means <- dataset %>%
   group_by(group) %>%
   summarise(x_mean = mean(x),
             y_mean = mean(y))

Now do a histogram of mean of x and another for the mean of y:

hist(group_means$x_mean, main="Histogram of mean of X", 
     xlab="Mean of X ", col="green")

hist(group_means$y_mean, main="Histogram of mean of Y", 
     xlab="Mean of Y", col="dark blue")

LOOKING AT THE HISTOGRAM OF THE SAMPLE MEAN OF Y (y_mean), HOW DIFFERENT IT IS FROM ITS ORIGINAL HISTOGRAM OF Y (y)? EXPLAIN

CALCULATE THE MEAN AND STANDARD DEVIATION OF BOTH SAMPLE MEANS (COLUMNS x_mean AND y_mean). HINT: YOU CAN USE THE mean and sd functionws

IS THE VARIANCE OF THE RANDOM SAMPLE MEANS EQUAL TO THE VARIANCE OF THE ORIGINAL RANDOM VARIABLES? BRIEFLY EXPLAIN

DO A RESEARCH ABOUT THE CENTRAL LIMIT THEOREM. WITHYOUR WORDS, EXPLAIN WHAT THE CENTRAL THEOREM IS

4 Datacamp exercises

Go to Datacamp.com. You already were invited to join our course in Datacamp.com. Datacamp is one of the best learning sites for Data Science. You have to do the following chapters from the course “Introduction to Statistics in R”:

(OPTIONAL) CHAPTER 1: Summary Statistics

CHAPTER 3: More Distributions and the Central Limit Theorem

You will receive points for each exercise you will be doing.

5 Quiz 2 and W2 submission

Go to Canvas and respond Quiz 1 about Basics of Return and Risk. You will be able to try this quiz up to 3 times. Questions in this Quiz are related to concepts of the readings related to this Workshop. The grade of this Workshop will be the following:

  • Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
  • Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
  • Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
  • Not submitted (0%) Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.