Abstract
In this workshop we practice the concepts of a) Histogram and b) the Central Limit TheoremYou will work in RStudio. Create an R Notebook document to write whatever is asked in this workshop.
At the beginning of the R Notebook write Workshop 2 - Financial Econometrics II and your name (as we did in previous workshop).
You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.
It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.
Keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.
We start clearing our R environment:
rm(list=ls())
# To avoid scientific notation for numbers:
options(scipen=999)
Using the getsymbols function, download MONTHLY prices of the Mexican market index (^MXX), the IPyC (Índice de Precios y Cotizaciones), and also download the S&P500 index from the US market (^GSPC). For both indices download data from January 2000 to date. Do the following:
The getsymbols function brings price (quotations) data of the financial instruments.
# Load the packages I need for this workshop:
library(quantmod)
#library(dplyr)
# Downloading the historical quotation data for both indexes:
getSymbols(c("^MXX", "^GSPC"), from="2000-01-01", src="yahoo", periodicity="monthly")
## [1] "^MXX" "^GSPC"
In your environment now you have the MXX and the GSPC xts datasets with historical data for the Mexican and the US fiancial market indexes.
Remember that xts datasets are time-series datasets with a time index.
Both datasets have columns for open, low, high, close, and adjusted prices. As we have mentioned, it is recommended to use adjusted prices to calculate returns. In the case of indexes, close prices will be always equal to adjusted prices.
Before calculating returns, we can create an integrated dataset with both indexes:
= merge(MXX,GSPC) prices
By doing this, we can easily calculate returns for both indexes using the integrated dataset.
We select only adjusted prices and rename the columns with meaningful names for the indexes:
= Ad(prices)
prices names(prices) = c("MXX","GSPC")
With adjusted prices we calculate continuously compounded returns:
= diff(log(prices)) r
Remember that the continuously compounded returns (r) can be calculated as the first difference of the natural log prices (or index). The first difference is equal to the log price of the price at period t minus the log of the price at period (t-1)
We visualize the first historical returns:
head(r)
## MXX GSPC
## 2000-01-01 NA NA
## 2000-02-01 0.11232485 -0.02031300
## 2000-03-01 0.01410906 0.09232375
## 2000-04-01 -0.11811558 -0.03127991
## 2000-05-01 -0.10795263 -0.02215875
## 2000-06-01 0.15323959 0.02365163
We see that the first value is NA since it is not possible to calculate returns for the first month. We can delete any NA values with the na.omit function:
= na.omit(r) r
Now we caculate simple returns:
= na.omit(prices / lag(prices,n=1) - 1) R
Remember that simple returns are calculated as a percentage change between the current price and the previous price.
The lag is a function that can be used with xts datasets to get previous values of the variable. In this case, we indicate n=1 meaning to get the past value for the price.
We can have a look to the content of the simple returns by looking the oldest and most recent returns:
head(R)
## MXX GSPC
## 2000-02-01 0.11887627 -0.02010808
## 2000-03-01 0.01420906 0.09671983
## 2000-04-01 -0.11140666 -0.03079576
## 2000-05-01 -0.10232989 -0.02191505
## 2000-06-01 0.16560422 0.02393355
## 2000-07-01 -0.06247834 -0.01634128
tail(R)
## MXX GSPC
## 2021-03-01 0.05950165 0.042438633
## 2021-04-01 0.01615910 0.052425321
## 2021-05-01 0.05990934 0.005486489
## 2021-06-01 -0.01171638 0.022214010
## 2021-07-01 0.01150474 0.022748055
## 2021-08-01 0.01015543 0.017673625
In this case, there is no NA values, but it might be a good idea to still apply the na.omit funcion just in case there is an NA value for any month of any index:
<- na.omit(R) R
Do a histogram of the simple return of the Mexican index:
hist(R$MXX, main="Histogram of IPC monthly returns",
xlab="Simple returns", col="dark blue")
INTERPRET this histogram with your words.
Do a histogram of the simple return of the S&P500 index.
hist(R$GSPC, main="Histogram of S&P500 monthly returns",
xlab="Simple returns", col="blue")
INTERPRET this histogram with your words.
Here is a graph that shows both histograms together. To better appreciate the histograms, I reduced the length of each bar to be equal to 2 percentual points.
The S&P returns are represented in blue; the IPyC returns are represented in yellow; the gray-yellow is shared area of both histograms.
LOOK CAREFULLY AT THIS PLOT WITH BOTH HISTOGRAMS. WHICH INSTRUMENT IS RISKIER? EXPLAIN
CALCULATE the mean and standard deviation of monthly returns for both market indexes index. Hint: you can check how we did this in Workshop 1
Just by looking at the mean and standard deviation of each instrument returns, WHICH INSTRUMENT LOOKS MORE ATTRACTIVE TO INVEST? EXPLAIN
If you had invested in the Mexican Index $1 peso in Jan 2000, WHICH WOULD BE THE VALUE OF YOUR INVESTMENT TODAY?
If you had invested in the S&P500 index $1 USD in Jan 2000, WHICH WOULD BE THE VALUE OF YOUR INVESTMENT TODAY?
The Central Limit Theorem is one of the most important discoveries in mathematics and statistics. Actually, thanks to this discovery, the field of Statistics was developed at the beginning of the 20th century.
We will do an exercise using simulated numbers. I hope that you understand what the Central Limit Theorem is about.
Let’s do the following.
Create the x variable as a random variable with normal distribution, with mean=20 and standard deviation=40. Create 100,000 observations:
<- rnorm(n=100000, mean = 20, 40) x
Create the variable y as a random variable with uniform distribution in the range [0,60]:
<- runif(n=100000, min = 0, max = 60) y
Learn about the variance of the uniform distribution. HOW CAN YOU ESTIMATE THE VARIANCE OF A UNIFORM DISTRIBUTED VARIABLE?
Run a histogram for y and another histogram for x.
hist(x, main="Histogram of x",
xlab="x values", col="dark blue")
hist(y, main="Histogram of y",
xlab="y values", col="green")
WHAT DO YOU SEE? BRIEFLY EXPLAIN
Calculate the mean of x and y and save them as xbar and ybar:
= mean(x)
xbar= mean(y) ybar
Now we will manually calculate the variance of x and y. Remember that the VARIANCE of a variable is the AVERAGE OF ITS SQUARED DEVIATIONS.
Calculate first the squared deviations of x and y:
=(x-xbar)^2
xdesv2=(y-ybar)^2 ydesv2
Now we just calculate the mean of the squared deviations to get the variance:
=mean(xdesv2)
varx varx
## [1] 1604.346
=mean(ydesv2)
vary vary
## [1] 300.1798
Compare the variance you computed with the theoretical variance of each variable (the variances we used to generate the simulated values for x and y). You will see that the computed variances varx and vary will be very similar to the theoretical values.
Create a data frame with x and y as columns
<- cbind(x,y)
dataset <- as.data.frame(dataset) dataset
Now assign a group number to each observation. We will create 4,000 groups of 25 observations each. Each group will be labeled from 1 to 4,000. You can use the function rep and seq:
# I create a column called group where each group will have 25
# observations:
$group <- rep(seq(1:4000),each=25) dataset
With the group_by() and summarize() functions, for each group and for each variable x and y,compute the sample mean for EACH group.
Before the following code, you must INSTALL the PACKAGE dplyr. You can do this with the Package menu in RStudio.
#Now I'm grouping the observations by the columns and number previously assigned to them :
library(dplyr)
<- dataset %>%
group_means group_by(group) %>%
summarise(x_mean = mean(x),
y_mean = mean(y))
Now do a histogram of mean of x and another for the mean of y:
hist(group_means$x_mean, main="Histogram of mean of X",
xlab="Mean of X ", col="green")
hist(group_means$y_mean, main="Histogram of mean of Y",
xlab="Mean of Y", col="dark blue")
LOOKING AT THE HISTOGRAM OF THE SAMPLE MEAN OF Y (y_mean), HOW DIFFERENT IT IS FROM ITS ORIGINAL HISTOGRAM OF Y (y)? EXPLAIN
CALCULATE THE MEAN AND STANDARD DEVIATION OF BOTH SAMPLE MEANS (COLUMNS x_mean AND y_mean). HINT: YOU CAN USE THE mean and sd functionws
IS THE VARIANCE OF THE RANDOM SAMPLE MEANS EQUAL TO THE VARIANCE OF THE ORIGINAL RANDOM VARIABLES? BRIEFLY EXPLAIN
DO A RESEARCH ABOUT THE CENTRAL LIMIT THEOREM. WITHYOUR WORDS, EXPLAIN WHAT THE CENTRAL THEOREM IS
Go to Datacamp.com. You already were invited to join our course in Datacamp.com. Datacamp is one of the best learning sites for Data Science. You have to do the following chapters from the course “Introduction to Statistics in R”:
(OPTIONAL) CHAPTER 1: Summary Statistics
CHAPTER 3: More Distributions and the Central Limit Theorem
You will receive points for each exercise you will be doing.
Go to Canvas and respond Quiz 1 about Basics of Return and Risk. You will be able to try this quiz up to 3 times. Questions in this Quiz are related to concepts of the readings related to this Workshop. The grade of this Workshop will be the following: