1 General directions for this Workshop

You will work in RStudio. Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop.

You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.

It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.

You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file. Pay attention in class to know how to generate an html file from your .Rmd.

2 Downloading online financial data and calculating returns

  1. Once you have created a new R Notebook, you will see a sample R Notebook document. You must DELETE all the lines of this sample document except the first lines related to title and output. As title, write the workshop # and course, and add a new line with your name. You have to end up with something like:

Now you are ready to continue writing your first R Notebook.

You can start writing your own notes/explanations we cover in this workshop. When you need to write lines of R Code, you need to click Insert at the top of the RStudio Window and select R. Immediately a chunk of R code will be set up to start writing your R code. You can execute this piece of code by clicking in the play button (green triangle).

Note that you can open and edit several R Notebooks, which will appear as tabs at the top of the window. You can visualize the output (results) of your code in the console, located at the bottom of the window. Also, the created variables are listed in the environment, located in the top-right pane. The bottom-right pane shows the files, plots, installed packages, help, and viewer tabs.

We start clearing our R environment:

rm(list=ls())
# To avoid scientific notation for numbers: 
options(scipen=999)
  1. Change your working directory to the folder in which you want to save your work. The working directory is the file path on your computer. Use the setwd() function. Type your working directory inside quotation marks (""). For example:
# Example of a Windows folder:
setwd("C:/Users/Alberto/Dropbox/202011/EC2003/Workshops/W1")
# Example of a Mac folder: 
setwd("/Users/Kelly/Desktop/Financial/programming")
  1. In order to import and manage financial data in R, the quantmod package must be installed. This package contains the getSymbols() function, which creates an xts (extensible time series) object in the environment with the downloaded data from the Internet. In order to install packages in R, use the install.packages() function:
install.packages("quantmod")
  1. Now, you have installed a package and it is not necessary to install it again in further occasions. It will stay in your computer. However, next time you want to use it, you have to load it using the library() function
library(quantmod)

The getSymbols() function enables its user to download online and up-to-date financial data, such as stock prices, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, Google Finance, FRED and Oanda. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables around the world.

  1. Type ?function in the console or the R Script and run it to know more about the syntaxis of any function. This will display the R documentation of the function in the bottom-right pane. Apply this trick for searching help to getSymbols.
?getSymbols
## starting httpd help server ... done

getSymbols() from the quantmod package is not the only way to download financial data into R. The Quandl package has the Quandl() function that will also accomplish this with a little difference in syntax and output though. To know more about Quandl(), display the R documentation by using ?function. You can also search in the Internet for the R cheat sheet, a document that contains all the relevant information of functions along with examples.

  1. Now, we will work with historical data of the Bitcoin cryptocurrency. Using getSymbols(), download the daily prices of Bitcoin in USD (BTC-USD) from January 1, 2017 to date from Yahoo Finance:
getSymbols("BTC-USD", from="2017-01-01", src="yahoo", periodicity="daily")
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## Warning: BTC-USD contains missing values. Some functions will not work if
## objects contain missing values in the middle of the series. Consider using
## na.omit(), na.approx(), na.fill(), etc to remove or replace them.
## [1] "BTC-USD"

BTC-USD is the ticker name for this series in Yahoo Finance. It should always be specified as a character vector (inside ""). The from argument is used to indicate the initial date from which you want to bring data. The to argument is the end date of the series you want to download. In this case we omit the to argument in order to download the most recent data. The src argument indicates the source of the data, in this case it is Yahoo Finance. Finally, the periodicity argument specifies the granularity of the data (daily, weekly, monthly, quarterly) also as a character vector.

  1. Now you have a data set with daily prices of BITCOIN from 2017 to date. You can check the content of the data set with View(). This will take you to a different tab showing the data as a table.

Return to your R script. You can list the FIRST 5 rows of the data set by using head():

head(`BTC-USD`,5)
##            BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2017-01-01      963.658      1003.08     958.699       998.325      147775008
## 2017-01-02      998.617      1031.39     996.702      1021.750      222184992
## 2017-01-03     1021.600      1044.08    1021.600      1043.840      185168000
## 2017-01-04     1044.400      1159.42    1044.400      1154.730      344945984
## 2017-01-05     1156.730      1191.10     910.417      1013.380      510199008
##            BTC-USD.Adjusted
## 2017-01-01          998.325
## 2017-01-02         1021.750
## 2017-01-03         1043.840
## 2017-01-04         1154.730
## 2017-01-05         1013.380

Also, you can list the LAST 5 rows of the data set. Note that you can change number of rows you want to display.

tail(`BTC-USD`, 5)
##            BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2021-02-04     37475.11     38592.18    36317.50      36926.07    68838074392
## 2021-02-05     36931.55     38225.91    36658.76      38144.31    58598066402
## 2021-02-06     38138.39     40846.55    38138.39      39266.01    71326033653
## 2021-02-07     39250.19     39621.84    37446.15      38903.44    65500641143
## 2021-02-08     39169.54     44112.34    38080.92      43014.96    88248238080
##            BTC-USD.Adjusted
## 2021-02-04         36926.07
## 2021-02-05         38144.31
## 2021-02-06         39266.01
## 2021-02-07         38903.44
## 2021-02-08         43014.96

For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded in every specific period. The adjusted prices are used for stocks, not for currencies. Adjusted prices considers dividend payments and also stock splits. Then, for the Bitcoin series we can use close price to calculate daily returns.

  1. Save your do-file as W1-YourName.R. Go to the File menu and select Save As. Now you can execute all lines of code by clicking the Run button (top-right of the R Studio window).

  2. Visualize how Bitcoin has been valued over time. Use the following command to display the graph and save it:

plot(`BTC-USD`)

The same information can also be plotted by using a different function:

chartSeries(`BTC-USD`, theme=("white"))

As you can see, there is more than one way of doing the same thing in R.

  1. In Finance, when managing daily data it is very common to have gaps in the series. What does this mean? It means that the contains some missing days. For example, for stock series there is no data for weekends or holidays. However, R deals with gaps because it recognizes that we are working with a time series object. Thus, we have a time variable with NO GAPS, which avoids problems when computing returns. However, R does not deal automatically with empty values (called NA’s). It is a good idea to have a data set free of NA’s.
`BTC-USD` <- na.omit(`BTC-USD`)
  1. Create a new object that contains the adjusted prices of BTC-USD:
prices <- Ad(`BTC-USD`)
# Ad() extract the adjusted prices of an object You can also extract open (Op()), high (Hi()), etc.
  1. Rename the column:
colnames(prices) <- "adj_price"
  1. Generate a new column in the prices object that contains simple daily returns:
# First we create a column that will shift the prices of BTC one period forward. 
# That is, it will apply a lag 1 (backshift operation). n=2 will shift the data two periods. 
# This allows to have the data for time t and t-1 in the same row.
prices$lag1 <- lag(prices$adj_price, n=1)
# Then, we will follow the formula for simple returns (P(t)/P(t-1))-1
prices$R_a <- prices$adj_price / prices$lag1 -1  
# Another way to do this is:
prices$R_b <- prices$adj_price / lag(prices$adj_price, n=1) -1
# Notice the results are the same for both ways

As you can see, the dollar sign operator ($) is used to refer to a specific column of an object (matrix, data frame, xts) as well as to create new columns, while assignment in R is denoted by the <- operator.

  1. Calculate daily gross return:
prices$GrossR <- 1 + prices$R_a
  1. Create a column to calculate daily continuously compounded returns using prices
prices$r_b <- log(prices$adj_price / prices$lag1)
# or
prices$r <- diff(log(prices$adj_price))
# the result must be the same for both columns
  1. Create a column to calculate daily continuously compounded returns using simple returns (column R). Call it r_from_R.
prices$r_from_R <- log(1 + prices$R_a)
  1. Create a column to calculate simple daily returns using continuously compounded returns. Call it R_from_r
prices$R_from_r <- exp(prices$r) - 1
  1. Check the first 10 rows of the return variables. We used 2 methods to calculate both R and r, but they must have the same values:
head(prices, 5)
##            adj_price     lag1         R_a         R_b    GrossR         r_b
## 2017-01-01   998.325       NA          NA          NA        NA          NA
## 2017-01-02  1021.750  998.325  0.02346429  0.02346429 1.0234643  0.02319324
## 2017-01-03  1043.840 1021.750  0.02161974  0.02161974 1.0216197  0.02138934
## 2017-01-04  1154.730 1043.840  0.10623277  0.10623277 1.1062328  0.10096034
## 2017-01-05  1013.380 1154.730 -0.12240955 -0.12240955 0.8775905 -0.13057525
##                      r    r_from_R    R_from_r
## 2017-01-01          NA          NA          NA
## 2017-01-02  0.02319324  0.02319324  0.02346429
## 2017-01-03  0.02138934  0.02138934  0.02161974
## 2017-01-04  0.10096034  0.10096034  0.10623277
## 2017-01-05 -0.13057525 -0.13057525 -0.12240955

You can see that R and R_from_r columns have the same information, and r and r_from_R also have the same information.

  1. Visualize the daily returns over time:
plot(prices$R_a)

You can observe increasing volatility at the end of 2017, beginning of 2018, and at the end of the first quarter of 2020. For now on, we can define volatility as how disperse the returns move up and down in a short period of time. Later, we will use different measures of volatility such as standard deviation.

  1. Calculate the holding-period return using initial and end prices (HPR1).
# First, we need to define the number of rows of the prices dataset:
n <- as.numeric(nrow(prices))
  # Use as.numeric() to keep the format as a number (float)
# Then, the first and last prices of the dataset must be defined:
price_0 <- as.numeric(prices$adj_price[1])
price_n <- as.numeric(prices$adj_price[n])
    # Here we are indicating R to save the n observation of a specific column of the dataset
# Now we can compute the HPR. For this, we will create a different object:
HPR1 <- (price_n / price_0) - 1
print(paste("HPR1 = " ,HPR1))
## [1] "HPR1 =  42.0871274524373"

Answer the following question: HOW CAN YOU INTERPRET THE RESULT OF HPR1?

  1. Calculate the holding-period return using continuously compounded returns:
sumr <- sum(prices$r, na.rm = TRUE)
HPR2 <- exp(sumr) - 1

Even though HPR1 and HPR2 were calculated using different methodologies, they must have the same value:

print(paste(HPR1, HPR2))
## [1] "42.0871274524373 42.0871274524373"

To see why both calculations of the HPR works, you can read the Note “Basics of Return and Risk”.

3 Descriptive Statistics

Start by clearing your environment (this will erase all the variables and objects that are in your environment):

rm(list = ls())

Use the getSymbols command to download monthly data from Yahoo Finance for Starbucks from January 2008 to date. Type the following command:

getSymbols(Symbols = "SBUX", from="2008-01-01", periodicity="monthly", src = "yahoo")
## [1] "SBUX"

The data is being pulled from Yahoo.com; the variables of this data set are: Open, High, Low, Close, Volume, Adjusted (prices).

  1. Calculate continuously compounded (cc) monthly returns
returns.df <- as.data.frame(diff(log(Ad(SBUX))))
# change the name of the column in ccreturns.df
colnames(returns.df) <- "r_SBUX"

A data frame is a basic object in R. It is a data structure of R that stores tabular data (rows and columns). Data frames look like matrices but data frames can store different types of objects in different columns. On the other hand, matrices can store only one kind of data. We transform the xts object (created by getSymbols) to a data frame in order to make manipulation easier.

  1. Calculate the mean, standard deviation and variance of continuously compounded (cc) monthly returns using the summary command:
summary(returns.df)
##      r_SBUX        
##  Min.   :-0.38548  
##  1st Qu.:-0.02706  
##  Median : 0.02003  
##  Mean   : 0.01657  
##  3rd Qu.: 0.05767  
##  Max.   : 0.26354  
##  NA's   :1

As you can see, summary() does not show standard deviation or variance. You can also try the table.Stats() function. However, you must install and load the Performance Analytics package fist since table.Stats() belongs to such package.

install.packages("PerformanceAnalytics")
library(PerformanceAnalytics)
table.Stats(returns.df$r_SBUX)
##                         
## Observations    157.0000
## NAs               1.0000
## Minimum          -0.3855
## Quartile 1       -0.0271
## Median            0.0200
## Arithmetic Mean   0.0166
## Geometric Mean    0.0135
## Quartile 3        0.0577
## Maximum           0.2635
## SE Mean           0.0061
## LCL Mean (0.95)   0.0044
## UCL Mean (0.95)   0.0287
## Variance          0.0059
## Stdev             0.0770
## Skewness         -0.6972
## Kurtosis          4.5868

This function will show several statistical measures and indicators that may be useful. You can also try to obtain the specific measures you were asked by using the following functions:

mean_r_SBUX <- mean(returns.df$r_SBUX, na.rm=TRUE) # arithmetic mean
sd_r_SBUX <- sd(returns.df$r_SBUX, na.rm=TRUE) # standard deviation
var_r_SBUX <- var(returns.df$r_SBUX, na.rm=TRUE) # variance
# Note that the na.rm argument is set to TRUE. This means that NA values will be removed.
# The variables are kept in the environment, so we have to print them to see them in console.
print(paste("Mean =", mean_r_SBUX))
## [1] "Mean = 0.0165653064244857"
print(paste("Standard deviation = ", sd_r_SBUX))
## [1] "Standard deviation =  0.0770462054869134"
print(paste("Variance = ", var_r_SBUX))
## [1] "Variance =  0.00593611777993169"
  1. Calculate the mean, standard deviation and variance of simple monthly returns:
# First, calculate simple returns as before
returns.df$R_SBUX <- SBUX$SBUX.Adjusted / lag(SBUX$SBUX.Adjusted, n=1) - 1

# The, apply the previous functions
table.Stats(returns.df$R_SBUX)
##                         
## Observations    157.0000
## NAs               1.0000
## Minimum          -0.3199
## Quartile 1       -0.0267
## Median            0.0202
## Arithmetic Mean   0.0197
## Geometric Mean    0.0167
## Quartile 3        0.0594
## Maximum           0.3015
## SE Mean           0.0062
## LCL Mean (0.95)   0.0075
## UCL Mean (0.95)   0.0318
## Variance          0.0059
## Stdev             0.0771
## Skewness         -0.0577
## Kurtosis          3.0209
mean(returns.df$R_SBUX, na.rm=TRUE)
## [1] 0.01965889
sd(returns.df$R_SBUX, na.rm=TRUE)
## [1] 0.07710913
var(returns.df$R_SBUX, na.rm=TRUE)
##             [,1]
## [1,] 0.005945818
  1. In your R Script, as comments (using # before your responses), answer the following in CAPITAL LETTERS: DO YOU SEE A DIFFERENCE BETWEEN THE SIMPLE AND CONTINUOUSLY COMPOUNDED RETURNS? EXPLAIN.

  2. Do a histogram of returns

hist(returns.df$r_SBUX, main="Histogram of SBUX monthly returns", 
     xlab="Continuously Compounded returns", col="dark green")

  1. With the descriptive statistics you computed in (1) and with this histogram, INTERPRET WITH YOUR OWN WORDS THE RESULTS OF THE MEAN AND STANDARD DEVIATION OF CONTINUOUSLY COMPOUNDED RETURNS.

  2. With the mean, and standard deviation of monthly returns of r, create (simulate) a random variable with that mean and variance for the same time period. Use the function rnorm for this.

rSBUX_sim <- rnorm(n=150, mean = mean_r_SBUX, sd=sd_r_SBUX)
# We will use the same number of observations as returns.df

To do a histogram for the simulated returns:

# First, omit NA's. This will make your analysis more accurate 
    # and coding easier since many functions throw errors while working with NA's
rSBUX <- na.omit(returns.df$r_SBUX)

# Calculate the histograms and store their information in variables (don't plot yet)
hist_sim_SBUX<- hist(rSBUX_sim,plot = FALSE)
hist_SBUX <- hist(rSBUX,plot = FALSE)

# Calculate the range of the graph
xlim <- range(hist_SBUX$breaks,hist_sim_SBUX$breaks)
ylim <- range(0,hist_SBUX$density,
              hist_sim_SBUX$density)

# Plot the first histogram
plot(hist_sim_SBUX,xlim = xlim, ylim = ylim,
     col = rgb(1,0,0,0.4),xlab = 'Lengths',
     freq = FALSE, ## relative, not absolute frequency
     main = 'Distribution of simulated and real Starbucks Returns')

# Plot the second histogram on top of the 1st one
opar <- par(new = FALSE)
plot(hist_SBUX,xlim = xlim, ylim = ylim,
     xaxt = 'n', yaxt = 'n', ## don't add axes
     col = rgb(0,0,1,0.4), add = TRUE,
     freq = FALSE) ## relative, not absolute frequency

# Add a legend in the corner
legend('topleft',c('Simulated Returns','Real Returns'),
       fill = rgb(1:0,0,0:1,0.4), bty = 'n')

par(opar)

As you can see, the peach color represents the normally simulated returns, while the light purple bars represent the real returns of Starbucks. The dark purple color appears when both real and simulated returns meet.

  1. Respond to the following question: WHAT DIFFERENCE DO YOU SEE IN THE HISTOGRAMS? HOW REAL RETURNS ARE DIFFERENT FROM THE THEORETICAL NORMAL DISTRIBUTION OF RETURNS? BRIEFLY EXPLAIN.

  2. Assuming that the monthly returns of BITCOIN follow a normal distribution, WHAT WOULD BE THE 95% CONFIDENCE INTERVAL? WHAT IS THE INTERPRETATION OF THIS INTERVAL? EXPLAIN.

4 Quiz 1 and W1 submission

Go to Canvas and respond Quiz 1 about Basics of Return and Risk. You will be able to try this quiz up to 3 times. Questions in this Quiz are related to concepts of the readings related to this Workshop. The grade of this Workshop will be the following:

  • Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
  • Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
  • Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
  • Not submitted (0%)

Remember that you have to submit your do-file through Canvas BEFORE NEXT CLASS.