Abstract
In this workshop we will use real financial time series data to learn about a) data management such as return calculation and dataset merging, b) data visualization, c) and also we will start learning how to estimate expected stock return and expected portfolio return and risk using matrix algebra.You will work in RStudio. Create an R Notebook document to write whatever is asked in this workshop.
You have to solve CHALLENGE exercises.
It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.
Keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.
In this section I will work on the following: a) Data management of financial data, b) return calculations, c) descriptive statistics for finance, d) visualization of financial data, and e) Portfolio formation and basic estimations.
We will use the quantmod package to import real online data from Yahoo Finance. This package contains the getSymbols() function, which creates an xts (extensible time series) object in the environment with the downloaded data from the Internet. :
library(quantmod)
The getSymbols() function download online and up-to-date financial data, such as stock prices, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, FRED database and Oanda. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables of most of the countries.
You can type ?function in the console or the R Script and run it to know more about the syntax of any function. This will display the R documentation of the function in the bottom-right pane:
?getSymbols
Now, we will work with historical data of the Bitcoin cryptocurrency and the TESLA stock. We download daily quotations of these instruments from January 1, 2019 to date from Yahoo Finance:
getSymbols(c("BTC-USD","TSLA"), from="2019-01-01", src="yahoo", periodicity="daily")
## [1] "BTC-USD" "TSLA"
This function will create an xts-zoo R object for each ticker. Each object has the corresponding historical daily prices. xts stands for extensible time-series. An xts-zoo object is designed to easily manipulate time series data.
BTC-USD and TSLA are the ticker names in Yahoo Finance. The from argument is used to indicate the initial date from which you want to bring data. The to argument is the end date of the series you want to download. In this case we omit the to argument in order to download the most recent data. The src argument indicates the source of the data, in this case it is Yahoo Finance. Finally, the periodicity argument specifies the granularity of the data (daily, weekly, monthly, quarterly) also as a character vector.
You can check the content of any of these dataset with View(). When tickers have special characters, we have to make reference to the object with simple quotes (``):
You can list the FIRST 5 rows of a dataset by using head():
head(`BTC-USD`,5)
## BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2019-01-01 3746.713 3850.914 3707.231 3843.520 4324200990
## 2019-01-02 3849.216 3947.981 3817.409 3943.409 5244856836
## 2019-01-03 3931.049 3935.685 3826.223 3836.741 4530215219
## 2019-01-04 3832.040 3865.935 3783.854 3857.718 4847965467
## 2019-01-05 3851.974 3904.903 3836.900 3845.195 5137609824
## BTC-USD.Adjusted
## 2019-01-01 3843.520
## 2019-01-02 3943.409
## 2019-01-03 3836.741
## 2019-01-04 3857.718
## 2019-01-05 3845.195
Also, you can list the LAST 5 rows of a data set. Note that you can change number of rows you want to display.
tail(`BTC-USD`, 5)
## BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2022-08-28 20041.04 20139.05 19616.81 19616.81 24366810591
## 2022-08-29 19615.15 20357.46 19600.79 20297.99 32637854078
## 2022-08-30 20298.61 20542.64 19617.64 19796.81 34483360283
## 2022-08-31 19799.58 20420.99 19799.58 20049.76 33225232872
## 2022-09-01 20068.56 20192.18 19867.73 19955.56 30635261952
## BTC-USD.Adjusted
## 2022-08-28 19616.81
## 2022-08-29 20297.99
## 2022-08-30 19796.81
## 2022-08-31 20049.76
## 2022-09-01 19955.56
For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded in every specific period. The adjusted prices are used for stocks, not for currencies. Adjusted prices considers dividend payments and also stock splits. Then, for the Bitcoin series we can use close of adjusted price to calculate daily returns.
Let’s see some of the benefits of using xts-zoo objects. We can, for
example, select columns using any of the following functions, where
x
represents a generic xts
zoo
object:
Op(x)
: Extract the Opening prices of the period.Hi(x)
: Extract the Highest price of the period.Lo(x)
: Extract the Lowest price of the period.Cl(x)
: Extract the closing prices of the period.Vo(x)
: Extract the volume traded of the period.Ad(x)
: Extract the Adjusted prices of the period.We can use the merge()
function to create a consolidated
dataset of one or more xts-zoo objects:
<- merge(`BTC-USD`, TSLA)
prices # I can select only the adjusted prices in order to calculate returns:
<-Ad(prices) adjprices
Now we have an xts-zoo objects with 2 columns. I can change the names of the columns with simple names:
names(adjprices)<-c("bitcoin","tesla")
Now I can make reference to the adjusted prices using these names.
In Finance, when managing daily data it is very common to have gaps in the series. What does this mean? It means that the contains some missing days. For example, for stock series there is no data for weekends or holidays. However, R deals with gaps because it recognizes that we are working with a time series object. Thus, we have a time variable with NO GAPS, which avoids problems when computing returns. However, R does not deal automatically with empty values (called NA’s). It is a good idea to have a data set free of NA’s. So, I can use the function na.omit:
<- na.omit(adjprices) adjprices
We can create xts-zoo objects for simple and continuously compounded returns of both instruments:
# Calculating continuously compounded daily returns:
<- diff(log(adjprices))
ccreturns # Calculating simple daily returns:
<- adjprices / lag(adjprices,n=1) - 1
returns # We can also calculate simple returns using the diff function:
<- diff(adjprices) / lag(adjprices,n=1) returns2
The diff function works with xts to calculate the difference between the value of one period minus the previous one. The lag function gets the previous value of the time series.
We can calculate holding returns for any time period for each instrument (HPR). For example, if we want to calculate the whole period return from the first day of the series to the most recent one, we can do the following:
<- 100* as.vector(last(adjprices)) / as.vector(first(adjprices)) - 1
HPR1 HPR1
## [1] 507.4373 1332.0806
cat("The holding period return for Bitcoin is: ", HPR1[1], "%")
## The holding period return for Bitcoin is: 507.4373 %
cat("The holding period return for Bitcoin is: ", HPR1[2], "%")
## The holding period return for Bitcoin is: 1332.081 %
We could also use the continuously compounded returns to calculate the same holding simple returns:
<-100*exp(colSums(ccreturns,na.rm=TRUE)) - 1
HPR2 HPR2
## bitcoin tesla
## 507.4373 1332.0806
One of the advantages of the quantmod
package is that it
has some built-in data visualization capabilities. For example, the
chartSeries
function is a plotting tool designed to create
standard financial charts given a time series object. The
chartSeries
function also includes some arguments that
allows the user to modify the cosmetics of the plot such as the
theme
argument.
Let’s see the performance of Bitcoin prices:
chartSeries(`BTC-USD`, theme = ("white"))
Let’s see the
performance of TESLA prices:
chartSeries(TSLA, theme = ("black"))
We can see exponential growth of both, the Bitcoin prices and Tesla prices for the last days. If you had invested in Bitcoin or Tesla in July 2020 and has sold your position early January 2021, you would have multiplied your investment by around 4 times (300% period return)!
We can visualize the daily returns over time:
plot(returns$bitcoin)
plot(returns$tesla)
With this plot we can appreciate the daily volatility of each instrument over time. Volatility is a measure of how disperse the returns move up and down from its mean; it is basically the standard deviation of returns.
We will use the PerformanceAnalytics and related packages for this section:
# Load the packages:
library(tidyverse)
library(PerformanceAnalytics)
library(highcharter)
We will do an exercise of a simple portfolio composed by 5 US stocks: Tesla, Microsoft, Nextera, WalMart and Pfizer. You can change the tickers of the stocks as you wish as long as Yahoo Finance has data for your tickers. I selected stocks from the industries: high-tech, clean energies, pharmaceutical, and retail.
With the following code we will automatically download data from the 5 stocks, merge the data and calculate returns:
<- c("TSLA","MSFT","NEE","WMT","PFE")
symbols
<- getSymbols(symbols,src = 'yahoo',
prices periodicity = "monthly",
from = "2018-01-01",
auto.assign = TRUE,
warnings = FALSE) %>%
map(~Ad(get(.))) %>%
reduce(merge) %>%
`colnames<-`(symbols)
If you see I am doing several data management process in sequential process. The %>% is used to indicate the separation of each data management process. The previous code does the following:
Download price data of the 5 stocks
Apply (map) the adjusted function to all xts-zoo datasets in order to get the adjusted stock prices
Do the merge of all the xts-zoo objects into one object
Finally rename the columns of the integrated object, and return
the object as prices
.
We calculate returns of all stocks using the Return.calculate function:
<-Return.calculate(prices) %>%
Returnsna.omit()
The na.omit function was also applied to clean the dataset and drop the rows with NA values.
We calculate the continuous compounded return (also called log returns) of all stocks:
<-Return.calculate(prices,method = "log") %>%
returnsna.omit()
# We could calculate this using the diff and log functions:
<-na.omit(diff(log(prices)))
returns2# The returns and returns2 objects will have exactly the same returns
We calculate descriptive statistics for the returns of all stocks:
table.Stats(returns)
## TSLA MSFT NEE WMT PFE
## Observations 55.0000 55.0000 55.0000 55.0000 55.0000
## NAs 0.0000 0.0000 0.0000 0.0000 0.0000
## Minimum -1.2192 -0.1052 -0.1782 -0.1734 -0.1453
## Quartile 1 -0.1095 -0.0215 -0.0031 -0.0284 -0.0356
## Median 0.0272 0.0223 0.0192 0.0108 -0.0034
## Arithmetic Mean 0.0247 0.0194 0.0157 0.0055 0.0077
## Geometric Mean NaN 0.0177 0.0138 0.0038 0.0050
## Quartile 3 0.1870 0.0565 0.0528 0.0431 0.0652
## Maximum 0.5547 0.1624 0.1614 0.0969 0.2057
## SE Mean 0.0342 0.0079 0.0083 0.0078 0.0101
## LCL Mean (0.95) -0.0440 0.0036 -0.0009 -0.0101 -0.0125
## UCL Mean (0.95) 0.0934 0.0353 0.0324 0.0211 0.0280
## Variance 0.0645 0.0034 0.0038 0.0033 0.0056
## Stdev 0.2540 0.0586 0.0617 0.0577 0.0750
## Skewness -1.8102 -0.0738 -0.9233 -0.8602 0.3509
## Kurtosis 8.8040 -0.4505 1.9539 1.2442 -0.0818
We can see arithmetic, geometric mean, median of returns. Also we can see risk measures such as standard deviation, variance. Finally we can see skewness and kurtosis, which are important financial measures to understand the data.
We can do a Box plot of the returns to better appreciate historical median return and risk by looking at the median, the quartiles Q1 and Q3, volatility and extreme values of these returns:
chart.Boxplot(returns)
It is easy to see that Tesla is the stock with the highest risk. The red circles show the mean, the mid line is the median (50 percentile), the boxes include the 50% of the data from the Q1 (25 percentile) to the Q3 (75 percentile). The dots are considered extreme values for in the context of its own distribution.
Now we can start evaluating the performance over time for each stock by looking at how much $ we would have made if we had invested $1.00 in each stock at the beginning of the time periods:
charts.PerformanceSummary(Returns,
main = "Performance of $1.00 over time",
wealth.index = TRUE)
Since Tesla has had an extraordinary performance for the last months, it is hard to appreciate the performance of the rest of the stocks. So, we can drop Tesla from the plot:
charts.PerformanceSummary(Returns[,2:5],
main = "Performance of $1.00 over time",
wealth.index = TRUE)
I start creating the weights of 2 portfolios. One will be an equally-weighted portfolio, and the other will be an aggressive portfolio assigning high weights to risky stocks, and low weights to conservative stocks.
I start creating a vector of weights for the equally weighted portfolio:
= rep(0.2,5)
w_ew # The rep function repeats a value n times
Now I create a vector of weights for the aggressive portfolio. I will assign the following weights:
Tesla: 40% Microsoft: 30% Nextera: 20% WalMart:0% Pfizer: 10%
= c(0.4,0.3,0.2,0,0.1) w_aggressive
We calculate the historical return of the equally-weighted portfolio using the Return.portfolio function:
<-
portfolio_returns_ew Return.portfolio(Returns,
weights = w_ew) %>%
`colnames<-`("returns")
We calculate the historical return of the aggressive portfolio using the Return.portfolio function:
<-
portfolio_returns_ag Return.portfolio(Returns,
weights = w_aggressive) %>%
`colnames<-`("returns")
With the portfolio returns I can plot a performance chart of each portfolio:
charts.PerformanceSummary(portfolio_returns_ew,
main = "Equally-weighted Portfolio")
charts.PerformanceSummary(portfolio_returns_ag,
main = "Aggressive Portfolio Performance")
We finally do a dynamic plot of historical monthly returns of each portfolio using the highchart function from the highcharter package:
highchart(type = "stock") %>%
hc_title(text = "Equally-weighted Portfolio") %>%
hc_add_series(portfolio_returns_ew$returns,
name = "Monthly Returns",
color = "cornflowerblue") %>%
hc_add_theme(hc_theme_flat()) %>%
hc_navigator(enabled = FALSE) %>%
hc_scrollbar(enabled = FALSE) %>%
hc_legend(enabled = TRUE) %>%
hc_exporting(enabled = TRUE)
We can interact with the plot to zoom in and out, and selecting time periods.
We do the same for the aggressive portfolio:
highchart(type = "stock") %>%
hc_title(text = "Aggressive Portfolio") %>%
hc_add_series(portfolio_returns_ag$returns,
name = "Monthly Returns",
color = "cornflowerblue") %>%
hc_add_theme(hc_theme_flat()) %>%
hc_navigator(enabled = FALSE) %>%
hc_scrollbar(enabled = FALSE) %>%
hc_legend(enabled = TRUE) %>%
hc_exporting(enabled = TRUE)
Regardless the approach you use to download and manipulate the source data, once that your data frame of monthly returns is ready for analysis you will be in a good position to create the Variance-covariance matrix which is needed to compute the expected risk of the portfolio.
The first step is to transform the data frame into a matrix using the
as.matrix
function. We save the result in a
matrix
class object called ret.mat
.
<- as.matrix(Returns) ret.mat
We compute the Var-Covariance matrix. The var
function
receives a matrix of returns as parameter. The returns of stock 1 are in
column 1, returns of stock 2 are in column 2, and so on. This function
calculates the Variance-Covariance Matrix of stock returns.
Remember that the Variance-Covariance matrix contains Variances of stock returns in the diagonal, and it contains the covariances of pairs of stock returns in the non-diagonals. Also, this matrix is symetric, since Cov(Reti, Retj) = Cov(Retj,Reti). More formally:
For example, the variance-covariance matrix of 2 variables \(X1\) and \(X2\) contains the variance of each variable in its diagonal and covariances between the variables in its non-diagonal terms. In other words, the elements in the upright part of the matrix are repeated in the down-left part of the matrix. More formally:
\[\mathbf{\sum_{cov}} = \left[\begin{array} {rr} Var(X_1) & Cov(X_1, X_2) \\ Cov(X_2, X_1) & Var(X_2) \\ \end{array}\right] \]
<- var(ret.mat) COV
We can see the results of the Var-Covar matrix as shown below:
COV
## TSLA MSFT NEE WMT PFE
## TSLA 0.056019849 0.008324568 0.0057045311 0.0049511133 0.004838516
## MSFT 0.008324568 0.003572705 0.0012970568 0.0013178024 0.001396100
## NEE 0.005704531 0.001297057 0.0037436062 0.0008943766 0.001882363
## WMT 0.004951113 0.001317802 0.0008943766 0.0032274386 0.001236388
## PFE 0.004838516 0.001396100 0.0018823633 0.0012363875 0.005905838
We can also calculate the Correlation Matrix to better understand the relationships of all different pairs of stock returns.
cor(ret.mat)
## TSLA MSFT NEE WMT PFE
## TSLA 1.0000000 0.5884264 0.3939163 0.3682162 0.2660115
## MSFT 0.5884264 1.0000000 0.3546625 0.3880813 0.3039325
## NEE 0.3939163 0.3546625 1.0000000 0.2573039 0.4003295
## WMT 0.3682162 0.3880813 0.2573039 1.0000000 0.2831945
## PFE 0.2660115 0.3039325 0.4003295 0.2831945 1.0000000
Remember that the correlation between 2 stock returns can be any value between -1 and 1. When the correlation is close to 0, it means that both stocks have no relationship. If the correlation is close to 1 it means that both stock returns move in a very similar way in the same direction.
We calculate the expected simple returns of the stocks in another matrix using the apply function. It is important to double check that the second parameter is equal to 2. This means that we are applying the function mean by column. If we want to apply the mean function by raw, we have to specify the second parameter as 1.
<- exp(apply(ret.mat, 2, mean)) - 1 ER
This is the simple way to calculate the expected return of each stock according to Markowitz theory. Remember that the expected return of one stock is simply the Geometric average return of its historical returns.
To calculate this expected return we have 2 methods:
Calculate the products of Gross Returns of each period, substract 1, and then apply the N root to get the geometric average.
Get the arithmetic average of continuously compounded returns and then convert this amount to simple returns by raising \(e\) to this average and then substract 1.
We have estimated the geometric average using the method 2, but not
only for one stock, but for all stocks in the the matrix
ret.mat
. The result will be a vector of geometric returns
that will be saved in er
. We can estimate the vector \(M_r\) of expected returns as:
\[\mathbf{M_{R}} = exp[{M_r}] = exp \left[\begin{array} {r} \bar{r_1} \\ \bar{r_2} \\ .\\ .\\ .\\ \bar{r_N} \end{array}\right] -1 \]
We can see the expected simple returns by calling the er
object.
ER
## TSLA MSFT NEE WMT PFE
## 0.055677643 0.021545764 0.017896575 0.007137089 0.010636478
We calculate the variance, standard deviation, skewness and kurtosis of all asset returns (by columns). We do this just to see descriptive statistics of the stock returns.
apply(ret.mat, 2, var)
## TSLA MSFT NEE WMT PFE
## 0.056019849 0.003572705 0.003743606 0.003227439 0.005905838
apply(ret.mat, 2, sd)
## TSLA MSFT NEE WMT PFE
## 0.23668513 0.05977211 0.06118502 0.05681055 0.07684945
apply(ret.mat, 2, skewness)
## TSLA MSFT NEE WMT PFE
## 0.25435998 0.06414829 -0.64113072 -0.65372480 0.55382268
apply(ret.mat, 2, kurtosis)
## TSLA MSFT NEE WMT PFE
## 1.5869720 -0.3504060 1.4714191 0.7056401 0.1744541
Remember what is skewness and kurtosis. Skewness is a measure of the shape of the probability distribution of the returns while kurtosis gives us information about how much the stock has had extreme negative and positive returns. In other words, kurtosis helps us to identify how big are the fat-tails of the return distribution. We can complement this with the histogram of returns to better appreciate the level of risk of the stock.
Assume that you invest in a portfolio with the following weights: Apple 30%, Walmart 10%, Microsoft 30%, GE 10% and Tesla 20%. Create a vector with these 5 weights:
<- c(0.30,0.10,0.30,0.10,0.20) W
According to Portfolio Theory, the expected risk of a portfolio \(P\) is calculated as the squared root of the Portfolio Variance.
\[SD(P) = \sqrt{(Var(P))}\]
The portfolio variance can be calculated with the following matrix multiplication:
\[Var(P) = W' * \sum_{cov} * W\] where:
\(W\) is the weight vector of the stocks, and \(W'\) is the tranposed matrix of weights.
We now can estimate the expected return and expected risk of this portfolio.
Now I am ready to start estimating the expected return and risk of the portfolio using Matrix Algebra:
<- t(W)%*%ER
ERP1 ERP1
## [,1]
## [1,] 0.02706785
The vector ERP1 will have one value with the expected return of the portfolio. The t function is the transpose function of a vector or matrix. The result of multiplying the vector transposed times the vector of expected returns of the stocks is the same as calculating a weighted average.
Now I can estimate the expected risk of the portfolio using Matrix Albegra:
<- t(W)%*%COV%*%W
EVARPORT <- sqrt(EVARPORT)
ERISK ERISK
## [,1]
## [1,] 0.09260642
With this matrix multiplication I am applying Markowitz theory to estimate the expected variance of the portfolio. You can review why this is the case in the Notes about Portfolio Theory. Make sure you understand why we do this matrix multiplication.
Go to datacamp.com site. You have to take (view and do the exercises) of the following chapters of the following courses:
Coruse: Introduction to Portfolio Analysis in R, complete 3 chapters:
Chapter 1: “The building blocks”,
Chapter 2: “Analyzing Performance”, and
Chapter 3 “Performance Drivers”
The grade of this Workshop will be the following:
You have to submit your .html file of this workshop through Canvas BEFORE NEXT CLASS.
The grade of this Workshop will be the following:
Remember that you have to submit your .html file of your workshop.