Abstract
This is an INDIVIDUAL workshop. In this workshop, besides learning the basics of R, we will work on calculation of returns and basic statistical concepts of Finance. In addition, we will learn to download financial data from public data sources such as Yahoo Finance and we will work with the properties of simple and continuously compounded returns. You have to submit your R file through Canvas.You will work in RStudio. Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop.
You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.
It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.
You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file. Pay attention in class to know how to generate an html file from your .Rmd.
Now you are ready to continue writing your first R Notebook.
You can start writing your own notes/explanations we cover in this workshop. When you need to write lines of R Code, you need to click Insert at the top of the RStudio Window and select R. Immediately a chunk of R code will be set up to start writing your R code. You can execute this piece of code by clicking in the play button (green triangle).
Note that you can open and edit several R Notebooks, which will appear as tabs at the top of the window. You can visualize the output (results) of your code in the console, located at the bottom of the window. Also, the created variables are listed in the environment, located in the top-right pane. The bottom-right pane shows the files, plots, installed packages, help, and viewer tabs.
We start clearing our R environment:
# Example of a Windows folder:
setwd("C:/Users/Alberto/Dropbox/202011/EC2003/Workshops/W1")
# Example of a Mac folder:
setwd("/Users/Kelly/Desktop/Financial/programming")
The getSymbols() function enables its user to download online and up-to-date financial data, such as stock prices, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, Google Finance, FRED and Oanda. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables around the world.
## starting httpd help server ... done
getSymbols() from the quantmod package is not the only way to download financial data into R. The Quandl package has the Quandl() function that will also accomplish this with a little difference in syntax and output though. To know more about Quandl(), display the R documentation by using ?function. You can also search in the Internet for the R cheat sheet, a document that contains all the relevant information of functions along with examples.
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## Warning: BTC-USD contains missing values. Some functions will not work if
## objects contain missing values in the middle of the series. Consider using
## na.omit(), na.approx(), na.fill(), etc to remove or replace them.
## [1] "BTC-USD"
BTC-USD is the ticker name for this series in Yahoo Finance. It should always be specified as a character vector (inside ""). The from argument is used to indicate the initial date from which you want to bring data. The to argument is the end date of the series you want to download. In this case we omit the to argument in order to download the most recent data. The src argument indicates the source of the data, in this case it is Yahoo Finance. Finally, the periodicity argument specifies the granularity of the data (daily, weekly, monthly, quarterly) also as a character vector.
Return to your R script. You can list the FIRST 5 rows of the data set by using head():
## BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2017-01-01 963.658 1003.08 958.699 998.325 147775008
## 2017-01-02 998.617 1031.39 996.702 1021.750 222184992
## 2017-01-03 1021.600 1044.08 1021.600 1043.840 185168000
## 2017-01-04 1044.400 1159.42 1044.400 1154.730 344945984
## 2017-01-05 1156.730 1191.10 910.417 1013.380 510199008
## BTC-USD.Adjusted
## 2017-01-01 998.325
## 2017-01-02 1021.750
## 2017-01-03 1043.840
## 2017-01-04 1154.730
## 2017-01-05 1013.380
Also, you can list the LAST 5 rows of the data set. Note that you can change number of rows you want to display.
## BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2021-02-04 37475.11 38592.18 36317.50 36926.07 68838074392
## 2021-02-05 36931.55 38225.91 36658.76 38144.31 58598066402
## 2021-02-06 38138.39 40846.55 38138.39 39266.01 71326033653
## 2021-02-07 39250.19 39621.84 37446.15 38903.44 65500641143
## 2021-02-08 39169.54 44112.34 38080.92 43014.96 88248238080
## BTC-USD.Adjusted
## 2021-02-04 36926.07
## 2021-02-05 38144.31
## 2021-02-06 39266.01
## 2021-02-07 38903.44
## 2021-02-08 43014.96
For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded in every specific period. The adjusted prices are used for stocks, not for currencies. Adjusted prices considers dividend payments and also stock splits. Then, for the Bitcoin series we can use close price to calculate daily returns.
Save your do-file as W1-YourName.R. Go to the File menu and select Save As. Now you can execute all lines of code by clicking the Run button (top-right of the R Studio window).
Visualize how Bitcoin has been valued over time. Use the following command to display the graph and save it:
The same information can also be plotted by using a different function:
As you can see, there is more than one way of doing the same thing in R.
prices <- Ad(`BTC-USD`)
# Ad() extract the adjusted prices of an object You can also extract open (Op()), high (Hi()), etc.
# First we create a column that will shift the prices of BTC one period forward.
# That is, it will apply a lag 1 (backshift operation). n=2 will shift the data two periods.
# This allows to have the data for time t and t-1 in the same row.
prices$lag1 <- lag(prices$adj_price, n=1)
# Then, we will follow the formula for simple returns (P(t)/P(t-1))-1
prices$R_a <- prices$adj_price / prices$lag1 -1
# Another way to do this is:
prices$R_b <- prices$adj_price / lag(prices$adj_price, n=1) -1
# Notice the results are the same for both ways
As you can see, the dollar sign operator ($) is used to refer to a specific column of an object (matrix, data frame, xts) as well as to create new columns, while assignment in R is denoted by the <- operator.
prices$r_b <- log(prices$adj_price / prices$lag1)
# or
prices$r <- diff(log(prices$adj_price))
# the result must be the same for both columns
## adj_price lag1 R_a R_b GrossR r_b
## 2017-01-01 998.325 NA NA NA NA NA
## 2017-01-02 1021.750 998.325 0.02346429 0.02346429 1.0234643 0.02319324
## 2017-01-03 1043.840 1021.750 0.02161974 0.02161974 1.0216197 0.02138934
## 2017-01-04 1154.730 1043.840 0.10623277 0.10623277 1.1062328 0.10096034
## 2017-01-05 1013.380 1154.730 -0.12240955 -0.12240955 0.8775905 -0.13057525
## r r_from_R R_from_r
## 2017-01-01 NA NA NA
## 2017-01-02 0.02319324 0.02319324 0.02346429
## 2017-01-03 0.02138934 0.02138934 0.02161974
## 2017-01-04 0.10096034 0.10096034 0.10623277
## 2017-01-05 -0.13057525 -0.13057525 -0.12240955
You can see that R and R_from_r columns have the same information, and r and r_from_R also have the same information.
You can observe increasing volatility at the end of 2017, beginning of 2018, and at the end of the first quarter of 2020. For now on, we can define volatility as how disperse the returns move up and down in a short period of time. Later, we will use different measures of volatility such as standard deviation.
# First, we need to define the number of rows of the prices dataset:
n <- as.numeric(nrow(prices))
# Use as.numeric() to keep the format as a number (float)
# Then, the first and last prices of the dataset must be defined:
price_0 <- as.numeric(prices$adj_price[1])
price_n <- as.numeric(prices$adj_price[n])
# Here we are indicating R to save the n observation of a specific column of the dataset
# Now we can compute the HPR. For this, we will create a different object:
HPR1 <- (price_n / price_0) - 1
print(paste("HPR1 = " ,HPR1))
## [1] "HPR1 = 42.0871274524373"
Answer the following question: HOW CAN YOU INTERPRET THE RESULT OF HPR1?
Even though HPR1 and HPR2 were calculated using different methodologies, they must have the same value:
## [1] "42.0871274524373 42.0871274524373"
To see why both calculations of the HPR works, you can read the Note “Basics of Return and Risk”.
Start by clearing your environment (this will erase all the variables and objects that are in your environment):
Use the getSymbols command to download monthly data from Yahoo Finance for Starbucks from January 2008 to date. Type the following command:
## [1] "SBUX"
The data is being pulled from Yahoo.com; the variables of this data set are: Open, High, Low, Close, Volume, Adjusted (prices).
returns.df <- as.data.frame(diff(log(Ad(SBUX))))
# change the name of the column in ccreturns.df
colnames(returns.df) <- "r_SBUX"
A data frame is a basic object in R. It is a data structure of R that stores tabular data (rows and columns). Data frames look like matrices but data frames can store different types of objects in different columns. On the other hand, matrices can store only one kind of data. We transform the xts object (created by getSymbols) to a data frame in order to make manipulation easier.
## r_SBUX
## Min. :-0.38548
## 1st Qu.:-0.02706
## Median : 0.02003
## Mean : 0.01657
## 3rd Qu.: 0.05767
## Max. : 0.26354
## NA's :1
As you can see, summary() does not show standard deviation or variance. You can also try the table.Stats() function. However, you must install and load the Performance Analytics package fist since table.Stats() belongs to such package.
##
## Observations 157.0000
## NAs 1.0000
## Minimum -0.3855
## Quartile 1 -0.0271
## Median 0.0200
## Arithmetic Mean 0.0166
## Geometric Mean 0.0135
## Quartile 3 0.0577
## Maximum 0.2635
## SE Mean 0.0061
## LCL Mean (0.95) 0.0044
## UCL Mean (0.95) 0.0287
## Variance 0.0059
## Stdev 0.0770
## Skewness -0.6972
## Kurtosis 4.5868
This function will show several statistical measures and indicators that may be useful. You can also try to obtain the specific measures you were asked by using the following functions:
mean_r_SBUX <- mean(returns.df$r_SBUX, na.rm=TRUE) # arithmetic mean
sd_r_SBUX <- sd(returns.df$r_SBUX, na.rm=TRUE) # standard deviation
var_r_SBUX <- var(returns.df$r_SBUX, na.rm=TRUE) # variance
# Note that the na.rm argument is set to TRUE. This means that NA values will be removed.
# The variables are kept in the environment, so we have to print them to see them in console.
print(paste("Mean =", mean_r_SBUX))
## [1] "Mean = 0.0165653064244857"
## [1] "Standard deviation = 0.0770462054869134"
## [1] "Variance = 0.00593611777993169"
# First, calculate simple returns as before
returns.df$R_SBUX <- SBUX$SBUX.Adjusted / lag(SBUX$SBUX.Adjusted, n=1) - 1
# The, apply the previous functions
table.Stats(returns.df$R_SBUX)
##
## Observations 157.0000
## NAs 1.0000
## Minimum -0.3199
## Quartile 1 -0.0267
## Median 0.0202
## Arithmetic Mean 0.0197
## Geometric Mean 0.0167
## Quartile 3 0.0594
## Maximum 0.3015
## SE Mean 0.0062
## LCL Mean (0.95) 0.0075
## UCL Mean (0.95) 0.0318
## Variance 0.0059
## Stdev 0.0771
## Skewness -0.0577
## Kurtosis 3.0209
## [1] 0.01965889
## [1] 0.07710913
## [,1]
## [1,] 0.005945818
In your R Script, as comments (using # before your responses), answer the following in CAPITAL LETTERS: DO YOU SEE A DIFFERENCE BETWEEN THE SIMPLE AND CONTINUOUSLY COMPOUNDED RETURNS? EXPLAIN.
Do a histogram of returns
hist(returns.df$r_SBUX, main="Histogram of SBUX monthly returns",
xlab="Continuously Compounded returns", col="dark green")
With the descriptive statistics you computed in (1) and with this histogram, INTERPRET WITH YOUR OWN WORDS THE RESULTS OF THE MEAN AND STANDARD DEVIATION OF CONTINUOUSLY COMPOUNDED RETURNS.
With the mean, and standard deviation of monthly returns of r, create (simulate) a random variable with that mean and variance for the same time period. Use the function rnorm for this.
rSBUX_sim <- rnorm(n=150, mean = mean_r_SBUX, sd=sd_r_SBUX)
# We will use the same number of observations as returns.df
To do a histogram for the simulated returns:
# First, omit NA's. This will make your analysis more accurate
# and coding easier since many functions throw errors while working with NA's
rSBUX <- na.omit(returns.df$r_SBUX)
# Calculate the histograms and store their information in variables (don't plot yet)
hist_sim_SBUX<- hist(rSBUX_sim,plot = FALSE)
hist_SBUX <- hist(rSBUX,plot = FALSE)
# Calculate the range of the graph
xlim <- range(hist_SBUX$breaks,hist_sim_SBUX$breaks)
ylim <- range(0,hist_SBUX$density,
hist_sim_SBUX$density)
# Plot the first histogram
plot(hist_sim_SBUX,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of simulated and real Starbucks Returns')
# Plot the second histogram on top of the 1st one
opar <- par(new = FALSE)
plot(hist_SBUX,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
# Add a legend in the corner
legend('topleft',c('Simulated Returns','Real Returns'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n')
As you can see, the peach color represents the normally simulated returns, while the light purple bars represent the real returns of Starbucks. The dark purple color appears when both real and simulated returns meet.
Respond to the following question: WHAT DIFFERENCE DO YOU SEE IN THE HISTOGRAMS? HOW REAL RETURNS ARE DIFFERENT FROM THE THEORETICAL NORMAL DISTRIBUTION OF RETURNS? BRIEFLY EXPLAIN.
Assuming that the monthly returns of BITCOIN follow a normal distribution, WHAT WOULD BE THE 95% CONFIDENCE INTERVAL? WHAT IS THE INTERPRETATION OF THIS INTERVAL? EXPLAIN.
Go to Canvas and respond Quiz 1 about Basics of Return and Risk. You will be able to try this quiz up to 3 times. Questions in this Quiz are related to concepts of the readings related to this Workshop. The grade of this Workshop will be the following:
Remember that you have to submit your do-file through Canvas BEFORE NEXT CLASS.