1 General directions for this Workshop

You will work in RStudio. It is strongly recommended to have the latest version of R and RStudio. Once you are in RStudio, do the following.

Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop. More specifically, you have to:

  • Replicate all the R Code along with its output.

  • You have to do whatever is asked in the workshop. It can be: a) Responses to specific questions and/or do an exercise/challenge.

Any QUESTION or any INTERPRETATION you need to do will be written in CAPITAL LETTERS. For ANY QUESTION or INTERPRETATION, you have to RESPOND IN CAPITAL LETTERS right after the question.

  • It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your personal notebook. Your own workshop/notebook will be very helpful for your further study.

You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file. Pay attention in class to know how to generate an html file from your .Rmd.

2 Set up the name of your R Notebook for this workshop

Setup title and name of your Workshop

Once you have created a new R Notebook, you will see a sample R Notebook document. You must DELETE all the lines of this sample document except the first lines related to title and output. As title, write the workshop # and course, and add a new line with your name. You have to end up with something like:


title: “Workshop 1, Financial Econometrics I”

author: YourName

output: html_notebook


Now you are ready to continue writing your first R Notebook.

You can start writing your own notes/explanations we cover in this workshop. When you need to write lines of R Code, you need to click Insert at the top of the RStudio Window and select R. Immediately a chunk of R code will be set up to start writing your R code. You can execute this piece of code by clicking in the play button (green triangle).

Note that you can open and edit several R Notebooks, which will appear as tabs at the top of the window. You can visualize the output (results) of your code in the console, located at the bottom of the window. Also, the created variables are listed in the environment, located in the top-right pane. The bottom-right pane shows the files, plots, installed packages, help, and viewer tabs.

Save your R Notebook file as W1-YourName.Rmd. Go to the File menu and select Save As.

To generate the .html file, you have knit your R Notebook. Pay attention how to do this in class.

3 Downloading online financial data and calculating returns

We start clearing our R environment:

rm(list=ls())
# To avoid scientific notation for numbers: 
options(scipen=999)

3.1 Install the quantmod package

In order to import and manage financial data in R, the quantmod package must be installed. This package contains the getSymbols() function, which creates an xts (extensible time series) object in the environment with the downloaded data from the Internet. In order to install packages in R, go to the Package tab in the bottom-right section of RStudio, select Install and then type quantmod, and the botton Install.

Once you install a package, this package will be in your computer forever. You might re-install a package in case there is a new version of the package.

3.2 Load the quantmod package

Now, you have installed a package and it is not necessary to install it again in further occasions. It will stay in your computer. However, next time you want to use it, you have to load it using the library() function

library(quantmod)

3.3 Downloading real financial prices

The getSymbols() function enables its user to download online and up-to-date financial data, such as stock prices, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, Google Finance, FRED and Oanda. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables around the world.

Type ?function in the console or the R Script and run it to know more about the syntaxis of any function. This will display the R documentation of the function in the bottom-right pane. Apply this trick for searching help to getSymbols.

?getSymbols

getSymbols() from the quantmod package is not the only way to download financial data into R. The Quandl package has the Quandl() function that will also accomplish this with a little difference in syntax and output though. To know more about Quandl(), display the R documentation by using ?function. You can also search in the Internet for the R cheat sheet, a document that contains all the relevant information of functions along with examples.

Now, we will work with historical data of the Bitcoin cryptocurrency. Using getSymbols(), download the daily prices of Bitcoin in USD (BTC-USD) from January 1, 2017 to date from Yahoo Finance:

getSymbols(Symbols=c("BTC-USD"), from="2017-01-01", src="yahoo", periodicity="daily")
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## Warning: BTC-USD contains missing values. Some functions will not work if
## objects contain missing values in the middle of the series. Consider using
## na.omit(), na.approx(), na.fill(), etc to remove or replace them.
## [1] "BTC-USD"

This function will create an xts-zoo R object with the price data of any financial instrument. In this case, we brought historical monthly prices of Bitcoin. xts stands for extensible time-series. An xts-zoo object is designed to easily manipulate time series data.

In the Symbols argument you can specify on or more than one ticker by using the container c() operator and separated by commas. The from argument is used to indicate the initial date from which you want to bring data. The to argument is the end date of the series you want to download. In this case we omit the to argument in order to download the most recent data. The src argument indicates the source of the data, in this case it is Yahoo Finance. Finally, the periodicity argument specifies the granularity of the data (daily, weekly, monthly, quarterly).

3.4 Show the content of datasets

Now you have a data set with daily prices of BITCOIN from 2017 to date. You can check the content of the data set with View(). This will take you to a different tab showing the data as a table.

Return to your R script. You can list the FIRST 5 rows of the data set by using head():

head(`BTC-USD`,5)
##            BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2017-01-01      963.658      1003.08     958.699       998.325      147775008
## 2017-01-02      998.617      1031.39     996.702      1021.750      222184992
## 2017-01-03     1021.600      1044.08    1021.600      1043.840      185168000
## 2017-01-04     1044.400      1159.42    1044.400      1154.730      344945984
## 2017-01-05     1156.730      1191.10     910.417      1013.380      510199008
##            BTC-USD.Adjusted
## 2017-01-01          998.325
## 2017-01-02         1021.750
## 2017-01-03         1043.840
## 2017-01-04         1154.730
## 2017-01-05         1013.380

Also, you can list the LAST 5 rows of the data set. Note that you can change number of rows you want to display.

tail(`BTC-USD`, 5)
##            BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2021-08-06     40865.87     43271.66    39932.18      42816.50    38226483046
## 2021-08-07     42832.80     44689.86    42618.57      44555.80    40030862141
## 2021-08-08     44574.44     45282.35    43331.91      43798.12    36302664750
## 2021-08-09     43791.93     46456.83    42848.69      46365.40    38734079049
## 2021-08-10     46274.19     46503.17    44705.55      45640.66    35090677760
##            BTC-USD.Adjusted
## 2021-08-06         42816.50
## 2021-08-07         44555.80
## 2021-08-08         43798.12
## 2021-08-09         46365.40
## 2021-08-10         45640.66

For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded in every specific period. The adjusted prices are used for stocks, not for currencies. Adjusted prices considers dividend payments and also stock splits. Then, for the Bitcoin series we can use close of adjusted price to calculate daily returns.

Let’s see some of the benefits of using xts-zoo objects. We can, for example, select columns using any of the following functions, where x represents a generic xts zoo object:

  • Op(x): Extract the Opening prices of the period.
  • Hi(x): Extract the Highest price of the period.
  • Lo(x): Extract the Lowest price of the period.
  • Cl(x): Extract the closing prices of the period.
  • Vo(x): Extract the volume traded of the period.
  • Ad(x): Extract the Adjusted prices of the period.

3.5 Visualization of prices

Visualize how Bitcoin has been valued over time. Use the following command to display the graph and save it:

plot(`BTC-USD`)

The same information can also be plotted by using a different function:

chartSeries(`BTC-USD`, theme=("white"))

As you can see, there is more than one way of doing the same thing in R.

3.6 Data cleaning

In Finance, when managing daily data it is very common to have gaps in the series. What does this mean? It means that the contains some missing days. For example, for stock series there is no data for weekends or holidays. However, R deals with gaps because it recognizes that we are working with a time series object. Thus, we have a time variable with NO GAPS, which avoids problems when computing returns. However, R does not deal automatically with empty values (called NA’s). It is a good idea to have a data set free of NA’s.

`BTC-USD` <- na.omit(`BTC-USD`)

3.7 Financial returns

A financial simple return for a stock (\(R_{t}\)) is calculated as a percentage change of price from the previous period (t-1) to the present period (t):

\[ R_{t}=\frac{\left(Adjprice_{t}-Adjprice_{t-1}\right)}{Adjprice_{t-1}}=\frac{Adjprice_{t}}{Adjprice_{t-1}}-1 \] For example, if the adjusted price of a stock at the end of January 2021 was $100.00, and its previous (December 2020) adjusted price was $80.00, then the monthly simple return of the stock in January 2021 will be:

\[ R_{Jan2021}=\frac{Adprice_{Jan2021}}{Adprice_{Dec2020}}-1=\frac{100}{80}-1=0.25 \]

We can use returns in decimal or in percentage (multiplying by 100). We will keep using decimals.

In Finance it is very recommended to calculate continuously compounded returns (cc returns) and using cc returns instead of simple returns for data analysis, statistics and econometric models. cc returns are also called log returns.

One way to calculate cc returns is by subtracting the log of the current adjusted price (at t) minus the log of the previous adjusted price (at t-1):

\[ r_{t}=log(Adjprice_{t})-log(Adjprice_{t-1}) \] This is also called as the difference of the log of the price.

We can also calculate cc returns as the log of the current adjusted price (at t) divided by the previous adjusted price (at t-1):

\[ r_{t}=log\left(\frac{Adjprice_{t}}{Adjprice_{t-1}}\right) \]

cc returns are usually represented by small r, while simple returns are represented by capital R.

3.8 Calculation of financial returns

It is recommended to always use adjusted prices to calculate financial returns. Yahoo Finance adjust closing prices to consider any stock split and/or dividend payments.

We create a new object that contains the adjusted prices of BTC-USD:

prices <- Ad(`BTC-USD`)

The Ad() function extracts the adjusted prices of an xts-zoo object.

We assign a column name for the adjusted prices:

colnames(prices) <- "adj_price"

Now we will create columns for the price of the previous period, and other column for simple daily returns:

# First we create a column that will shift the prices of BTC one period forward. 
# That is, it will apply a lag 1 (backshift operation). n=2 will shift the data two periods. 
# This allows to have the data for time t and t-1 in the same row.
prices$lag1 <- lag(prices$adj_price, n=1)
# Then, we will follow the formula for simple returns (P(t)/P(t-1))-1
prices$R_a <- prices$adj_price / prices$lag1 -1  
# Another way to do this is:
prices$R_b <- prices$adj_price / lag(prices$adj_price, n=1) -1
# Notice the results are the same for both ways

As you can see, the dollar sign operator ($) is used to refer to a specific column of an object (matrix, data frame, xts) as well as to create new columns, while assignment in R is denoted by the <- operator.

We can calculate a column for the daily growth factor:

prices$GrowthR <- 1 + prices$R_a

Now create a column to calculate daily continuously compounded returns using prices

prices$r_b <- log(prices$adj_price / prices$lag1)
# or
prices$r <- diff(log(prices$adj_price))
# the result must be the same for both columns

Create a column to calculate daily continuously compounded returns using simple returns (column R). Call it r_from_R.

prices$r_from_R <- log(1 + prices$R_a)

Now create a column to calculate simple daily returns using continuously compounded returns.

prices$R_from_r <- exp(prices$r) - 1

Check the first rows of the return variables. We used 2 methods to calculate both R and r, but they must have the same values:

head(prices, 5)
##            adj_price     lag1         R_a         R_b   GrowthR         r_b
## 2017-01-01   998.325       NA          NA          NA        NA          NA
## 2017-01-02  1021.750  998.325  0.02346429  0.02346429 1.0234643  0.02319324
## 2017-01-03  1043.840 1021.750  0.02161974  0.02161974 1.0216197  0.02138934
## 2017-01-04  1154.730 1043.840  0.10623277  0.10623277 1.1062328  0.10096034
## 2017-01-05  1013.380 1154.730 -0.12240955 -0.12240955 0.8775905 -0.13057525
##                      r    r_from_R    R_from_r
## 2017-01-01          NA          NA          NA
## 2017-01-02  0.02319324  0.02319324  0.02346429
## 2017-01-03  0.02138934  0.02138934  0.02161974
## 2017-01-04  0.10096034  0.10096034  0.10623277
## 2017-01-05 -0.13057525 -0.13057525 -0.12240955

You can see that R and R_from_r columns have the same information, and r and r_from_R also have the same information.

Visualize the daily returns over time:

plot(prices$R_a)

You can observe increasing volatility at the end of 2017, beginning of 2018, and during 2020 and early 2021. For now on, we can define volatility as how disperse the returns move up and down in a short period of time. Later, we will use different measures of volatility such as standard deviation.

3.9 Calculation of holing-period return

Calculate the holding-period return using initial and end prices (HPR1).

# First, we need to define the number of rows of the prices dataset:
n <- as.numeric(nrow(prices))
  # Use as.numeric() to keep the format as a number (float)
# Then, the first and last prices of the dataset must be defined:
price_0 <- as.numeric(prices$adj_price[1])
price_n <- as.numeric(prices$adj_price[n])
    # Here we are indicating R to save the n observation of a specific column of the dataset
# Now we can compute the HPR. For this, we will create a different object:
HPR1 <- (price_n / price_0) - 1
cat("HPR1 = " ,HPR1)
## HPR1 =  44.71724

Calculate the holding-period return using continuously compounded returns:

sumr <- sum(prices$r, na.rm = TRUE)
HPR2 <- exp(sumr) - 1

Even though HPR1 and HPR2 were calculated using different methodologies, they must have the same value:

cat(HPR1, HPR2)
## 44.71724 44.71724

To see why both calculations of the HPR works, read the Note “Basics of Return and Risk”.

4 Q Descriptive Statistics of financial returns

Start by clearing your environment (this will erase all the variables and objects that are in your environment):

rm(list = ls())

Use the getSymbols command to download monthly data from Yahoo Finance for Starbucks from January 2008 to date. Type the following command:

getSymbols(Symbols = "SBUX", from="2008-01-01", periodicity="monthly", src = "yahoo")
## [1] "SBUX"

Calculate continuously compounded (cc) monthly returns

returns.df <- as.data.frame(diff(log(Ad(SBUX))))
# change the name of the column in ccreturns.df
colnames(returns.df) <- "r_SBUX"

A data frame is a basic object in R. It is a data structure of R that stores tabular data (rows and columns). Data frames look like matrices but data frames can store different types of objects in different columns. On the other hand, matrices can store only one kind of data. We transform the xts object (created by getSymbols) to a data frame in order to make manipulation easier.

4.1 Mean, standard deviation and variance of cc returns

Calculate the mean, standard deviation and variance of continuously compounded (cc) monthly returns using the summary command:

summary(returns.df)
##      r_SBUX        
##  Min.   :-0.38548  
##  1st Qu.:-0.02651  
##  Median : 0.01956  
##  Mean   : 0.01663  
##  3rd Qu.: 0.05766  
##  Max.   : 0.26354  
##  NA's   :1

As you can see, summary() does not show standard deviation or variance. You can also try the table.Stats() function. However, you must install and load the Performance Analytics package.

Go to the Packages tab in the bottom-right window pane of RStudio, click install and type PerformanceAnalytics as the name of the package and the click in install. After this, you can load this package into your R memory:

library(PerformanceAnalytics)

Now we use the table.Stats function from this package to estimate basic descriptive statistics of returns:

table.Stats(returns.df$r_SBUX)
##                         
## Observations    163.0000
## NAs               1.0000
## Minimum          -0.3855
## Quartile 1       -0.0265
## Median            0.0196
## Arithmetic Mean   0.0166
## Geometric Mean    0.0136
## Quartile 3        0.0577
## Maximum           0.2635
## SE Mean           0.0060
## LCL Mean (0.95)   0.0049
## UCL Mean (0.95)   0.0284
## Variance          0.0058
## Stdev             0.0761
## Skewness         -0.6933
## Kurtosis          4.7025

This function will show several statistical measures and indicators that may be useful. You can also try to obtain the specific measures you were asked by using the following functions:

mean_r_SBUX <- mean(returns.df$r_SBUX, na.rm=TRUE) # arithmetic mean
sd_r_SBUX <- sd(returns.df$r_SBUX, na.rm=TRUE) # standard deviation
var_r_SBUX <- var(returns.df$r_SBUX, na.rm=TRUE) # variance
# Note that the na.rm argument is set to TRUE. This means that NA values will be removed.
# The variables are kept in the environment, so we have to print them to see them in console.
cat("Mean =", mean_r_SBUX)
## Mean = 0.01663307
cat("Standard deviation = ", sd_r_SBUX)
## Standard deviation =  0.07606456
cat("Variance = ", var_r_SBUX)
## Variance =  0.005785818

4.2 Q Mean, standard deviation and variance of simple returns

Calculate the mean, standard deviation and variance of simple monthly returns for Starbucks:

# First, calculate simple returns as before
returns.df$R_SBUX <- SBUX$SBUX.Adjusted / lag(SBUX$SBUX.Adjusted, n=1) - 1

# The, apply the previous functions
table.Stats(returns.df$R_SBUX)
##                         
## Observations    163.0000
## NAs               1.0000
## Minimum          -0.3199
## Quartile 1       -0.0262
## Median            0.0197
## Arithmetic Mean   0.0197
## Geometric Mean    0.0168
## Quartile 3        0.0594
## Maximum           0.3015
## SE Mean           0.0060
## LCL Mean (0.95)   0.0079
## UCL Mean (0.95)   0.0314
## Variance          0.0058
## Stdev             0.0762
## Skewness         -0.0502
## Kurtosis          3.1035
mean(returns.df$R_SBUX, na.rm=TRUE)
## [1] 0.01965467
sd(returns.df$R_SBUX, na.rm=TRUE)
## [1] 0.07616188
var(returns.df$R_SBUX, na.rm=TRUE)
##             [,1]
## [1,] 0.005800632

QUESTION: DO YOU SEE A DIFFERENCE BETWEEN THE SIMPLE AND CONTINUOUSLY COMPOUNDED RETURNS? BRIEFLY EXPLAIN.

5 Q The Histrogram

You have to remember what is a histogram. Read the Note Basics of Statistics for Finance.

5.1 Histogram using historical data

Do a histogram of Starbuck cc returns:

hist(returns.df$r_SBUX, main="Histogram of SBUX monthly returns", 
     xlab="Continuously Compounded returns", col="dark green")

QUESTIONS:

A) INTERPRET THIS HISTOGRAM WITH YOUR OWN WORDS

B) HOW MEAN AND STANDARD DEVIATION IS RELATED WITH THE HISTOBRAM?

5.2 Q Histogram using simulated data for returns

With the real mean, and standard deviation of monthly cc returns of Starbucks, create (simulate) a random variable with that mean and standard deviation for the same time period. Use the rnorm function for this:

rSBUX_sim <- rnorm(n=nrow(returns.df), mean = mean_r_SBUX, sd=sd_r_SBUX)
# We will use the same number of observations as returns.df
# The nrow function gets the number of rows of an R object

Do a histogram of the simulated returns :

# First, omit NA's. This will make your analysis more accurate 
    # and coding easier since many functions throw errors while working with NA's
rSBUX <- na.omit(returns.df$r_SBUX)

# Calculate the histograms and store their information in variables (don't plot yet)
hist_sim_SBUX<- hist(rSBUX_sim,plot = FALSE)
hist_SBUX <- hist(rSBUX,plot = FALSE)

# Calculate the range of the graph
xlim <- range(hist_SBUX$breaks,hist_sim_SBUX$breaks)
ylim <- range(0,hist_SBUX$density,
              hist_sim_SBUX$density)

# Plot the first histogram
plot(hist_sim_SBUX,xlim = xlim, ylim = ylim,
     col = rgb(1,0,0,0.4),xlab = 'Lengths',
     freq = FALSE, ## relative, not absolute frequency
     main = 'Distribution of simulated and real Starbucks Returns')

# Plot the second histogram on top of the 1st one
opar <- par(new = FALSE)
plot(hist_SBUX,xlim = xlim, ylim = ylim,
     xaxt = 'n', yaxt = 'n', ## don't add axes
     col = rgb(0,0,1,0.4), add = TRUE,
     freq = FALSE) ## relative, not absolute frequency

# Add a legend in the corner
legend('topleft',c('Simulated Returns','Real Returns'),
       fill = rgb(1:0,0,0:1,0.4), bty = 'n')

par(opar)

As you can see, the peach color represents the normally simulated returns, while the light purple bars represent the real returns of Starbucks. The dark purple color appears when both real and simulated returns meet.

QUESTION: WHAT DIFFERENCE DO YOU SEE IN THE HISTOGRAMS? HOW REAL RETURNS ARE DIFFERENT FROM THE THEORETICAL NORMAL DISTRIBUTION OF RETURNS? BRIEFLY EXPLAIN.

Assuming that the monthly returns of Starbucks follow a normal distribution, WHAT WOULD BE THE 95% CONFIDENCE INTERVAL? WHAT IS THE INTERPRETATION OF THIS INTERVAL? EXPLAIN.

6 Quiz 1 and W1 submission

Go to Canvas and respond Quiz 1 about Basics of Return and Risk. You will be able to try this quiz up to 3 times. Questions in this Quiz are related to concepts of the readings related to this Workshop. The grade of this Workshop will be the following:

  • Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions

  • Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.

  • Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.

  • Not submitted (0%)

Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.