1 General directions for this Workshop

You will work in RStudio. Create an R Notebook document to write whatever is asked in this workshop.

You have to solve CHALLENGE exercises.

It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.

Keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.

2 Data management and visualization for financial portfolios

In this section I will work on the following: a) Data management of financial data, b) return calculations, c) descriptive statistics for finance, d) visualization of financial data, and e) Portfolio formation and basic estimations.

2.1 Data management of financial data

2.1.1 Data collection

We will use the quantmod package to import real online data from Yahoo Finance. This package contains the getSymbols() function, which creates an xts (extensible time series) object in the environment with the downloaded data from the Internet. :

library(quantmod)

The getSymbols() function download online and up-to-date financial data, such as stock prices, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, FRED database and Oanda. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables of most of the countries.

You can type ?function in the console or the R Script and run it to know more about the syntax of any function. This will display the R documentation of the function in the bottom-right pane:

?getSymbols

Now, we will work with historical data of the Bitcoin cryptocurrency and the TESLA stock. We download daily quotations of these instruments from January 1, 2019 to date from Yahoo Finance:

getSymbols(c("BTC-USD","TSLA"), from="2019-01-01", src="yahoo", periodicity="daily")

## [1] "BTC-USD" "TSLA"

This function will create an xts-zoo R object for each ticker. Each object has the corresponding historical daily prices. xts stands for extensible time-series. An xts-zoo object is designed to easily manipulate time series data.

BTC-USD and TSLA are the ticker names in Yahoo Finance. The from argument is used to indicate the initial date from which you want to bring data. The to argument is the end date of the series you want to download. In this case we omit the to argument in order to download the most recent data. The src argument indicates the source of the data, in this case it is Yahoo Finance. Finally, the periodicity argument specifies the granularity of the data (daily, weekly, monthly, quarterly) also as a character vector.

You can check the content of any of these dataset with View(). When tickers have special characters, we have to make reference to the object with simple quotes (``):

You can list the FIRST 5 rows of a dataset by using head():

head(`BTC-USD`,5)

##            BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2019-01-01     3746.713     3850.914    3707.231      3843.520     4324200990
## 2019-01-02     3849.216     3947.981    3817.409      3943.409     5244856836
## 2019-01-03     3931.049     3935.685    3826.223      3836.741     4530215219
## 2019-01-04     3832.040     3865.935    3783.854      3857.718     4847965467
## 2019-01-05     3851.974     3904.903    3836.900      3845.195     5137609824
##            BTC-USD.Adjusted
## 2019-01-01         3843.520
## 2019-01-02         3943.409
## 2019-01-03         3836.741
## 2019-01-04         3857.718
## 2019-01-05         3845.195

Also, you can list the LAST 5 rows of a data set. Note that you can change number of rows you want to display.

tail(`BTC-USD`, 5)

##            BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close BTC-USD.Volume
## 2022-08-28     20041.04     20139.05    19616.81      19616.81    24366810591
## 2022-08-29     19615.15     20357.46    19600.79      20297.99    32637854078
## 2022-08-30     20298.61     20542.64    19617.64      19796.81    34483360283
## 2022-08-31     19799.58     20420.99    19799.58      20049.76    33225232872
## 2022-09-01     20068.56     20192.18    19867.73      19955.56    30635261952
##            BTC-USD.Adjusted
## 2022-08-28         19616.81
## 2022-08-29         20297.99
## 2022-08-30         19796.81
## 2022-08-31         20049.76
## 2022-09-01         19955.56

For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded in every specific period. The adjusted prices are used for stocks, not for currencies. Adjusted prices considers dividend payments and also stock splits. Then, for the Bitcoin series we can use close of adjusted price to calculate daily returns.

Let’s see some of the benefits of using xts-zoo objects. We can, for example, select columns using any of the following functions, where x represents a generic xts zoo object:

Op(x): Extract the Opening prices of the period.
Hi(x): Extract the Highest price of the period.
Lo(x): Extract the Lowest price of the period.
Cl(x): Extract the closing prices of the period.
Vo(x): Extract the volume traded of the period.
Ad(x): Extract the Adjusted prices of the period.

2.1.2 Merging and cleaning financial datasets:

We can use the merge() function to create a consolidated dataset of one or more xts-zoo objects:

prices <- merge(`BTC-USD`, TSLA)
# I can select only the adjusted prices in order to calculate returns:
adjprices <-Ad(prices)

Now we have an xts-zoo objects with 2 columns. I can change the names of the columns with simple names:

names(adjprices)<-c("bitcoin","tesla")

Now I can make reference to the adjusted prices using these names.

In Finance, when managing daily data it is very common to have gaps in the series. What does this mean? It means that the contains some missing days. For example, for stock series there is no data for weekends or holidays. However, R deals with gaps because it recognizes that we are working with a time series object. Thus, we have a time variable with NO GAPS, which avoids problems when computing returns. However, R does not deal automatically with empty values (called NA’s). It is a good idea to have a data set free of NA’s. So, I can use the function na.omit:

adjprices <- na.omit(adjprices)

2.2 Return calculation

We can create xts-zoo objects for simple and continuously compounded returns of both instruments:

# Calculating continuously compounded daily returns:
ccreturns <- diff(log(adjprices)) 
# Calculating simple daily returns:
returns <- adjprices / lag(adjprices,n=1) - 1
# We can also calculate simple returns using the diff function:
returns2<- diff(adjprices) / lag(adjprices,n=1)

The diff function works with xts to calculate the difference between the value of one period minus the previous one. The lag function gets the previous value of the time series.

We can calculate holding returns for any time period for each instrument (HPR). For example, if we want to calculate the whole period return from the first day of the series to the most recent one, we can do the following:

HPR1<- 100* as.vector(last(adjprices)) / as.vector(first(adjprices)) - 1 
HPR1

## [1]  507.4373 1332.0806

cat("The holding period return for Bitcoin is: ", HPR1[1], "%")

## The holding period return for Bitcoin is:  507.4373 %

cat("The holding period return for Bitcoin is: ", HPR1[2], "%")

## The holding period return for Bitcoin is:  1332.081 %

We could also use the continuously compounded returns to calculate the same holding simple returns:

HPR2<-100*exp(colSums(ccreturns,na.rm=TRUE)) - 1
HPR2

##   bitcoin     tesla 
##  507.4373 1332.0806

2.3 Visualization of financial data

One of the advantages of the quantmod package is that it has some built-in data visualization capabilities. For example, the chartSeries function is a plotting tool designed to create standard financial charts given a time series object. The chartSeries function also includes some arguments that allows the user to modify the cosmetics of the plot such as the theme argument.

Let’s see the performance of Bitcoin prices:

chartSeries(`BTC-USD`, theme = ("white"))

Let’s see the performance of TESLA prices:

chartSeries(TSLA, theme = ("black"))

We can see exponential growth of both, the Bitcoin prices and Tesla prices for the last days. If you had invested in Bitcoin or Tesla in July 2020 and has sold your position early January 2021, you would have multiplied your investment by around 4 times (300% period return)!

We can visualize the daily returns over time:

plot(returns$bitcoin)

plot(returns$tesla)

With this plot we can appreciate the daily volatility of each instrument over time. Volatility is a measure of how disperse the returns move up and down from its mean; it is basically the standard deviation of returns.

2.4 Basics of Portfolio analysis

We will use the PerformanceAnalytics and related packages for this section:

# Load the packages:
library(tidyverse)
library(PerformanceAnalytics)
library(highcharter)

We will do an exercise of a simple portfolio composed by 5 US stocks: Tesla, Microsoft, Nextera, WalMart and Pfizer. You can change the tickers of the stocks as you wish as long as Yahoo Finance has data for your tickers. I selected stocks from the industries: high-tech, clean energies, pharmaceutical, and retail.

2.4.1 Automation of data collection and data management

With the following code we will automatically download data from the 5 stocks, merge the data and calculate returns:

symbols <- c("TSLA","MSFT","NEE","WMT","PFE")

prices <- getSymbols(symbols,src = 'yahoo',
             periodicity = "monthly",
             from = "2018-01-01",
             auto.assign = TRUE,
             warnings = FALSE) %>%
  map(~Ad(get(.))) %>%
  reduce(merge) %>%
  `colnames<-`(symbols)

If you see I am doing several data management process in sequential process. The %>% is used to indicate the separation of each data management process. The previous code does the following:

Download price data of the 5 stocks
Apply (map) the adjusted function to all xts-zoo datasets in order to get the adjusted stock prices
Do the merge of all the xts-zoo objects into one object
Finally rename the columns of the integrated object, and return the object as prices.

2.4.2 Return calculations

We calculate returns of all stocks using the Return.calculate function:

Returns<-Return.calculate(prices) %>%
  na.omit()

The na.omit function was also applied to clean the dataset and drop the rows with NA values.

We calculate the continuous compounded return (also called log returns) of all stocks:

returns<-Return.calculate(prices,method = "log") %>%
         na.omit() 
# We could calculate this using the diff and log functions:
returns2<-na.omit(diff(log(prices)))
# The returns and returns2 objects will have exactly the same returns

2.4.3 Descriptive statistics of returns

We calculate descriptive statistics for the returns of all stocks:

table.Stats(returns)

##                    TSLA    MSFT     NEE     WMT     PFE
## Observations    55.0000 55.0000 55.0000 55.0000 55.0000
## NAs              0.0000  0.0000  0.0000  0.0000  0.0000
## Minimum         -1.2192 -0.1052 -0.1782 -0.1734 -0.1453
## Quartile 1      -0.1095 -0.0215 -0.0031 -0.0284 -0.0356
## Median           0.0272  0.0223  0.0192  0.0108 -0.0034
## Arithmetic Mean  0.0247  0.0194  0.0157  0.0055  0.0077
## Geometric Mean      NaN  0.0177  0.0138  0.0038  0.0050
## Quartile 3       0.1870  0.0565  0.0528  0.0431  0.0652
## Maximum          0.5547  0.1624  0.1614  0.0969  0.2057
## SE Mean          0.0342  0.0079  0.0083  0.0078  0.0101
## LCL Mean (0.95) -0.0440  0.0036 -0.0009 -0.0101 -0.0125
## UCL Mean (0.95)  0.0934  0.0353  0.0324  0.0211  0.0280
## Variance         0.0645  0.0034  0.0038  0.0033  0.0056
## Stdev            0.2540  0.0586  0.0617  0.0577  0.0750
## Skewness        -1.8102 -0.0738 -0.9233 -0.8602  0.3509
## Kurtosis         8.8040 -0.4505  1.9539  1.2442 -0.0818

We can see arithmetic, geometric mean, median of returns. Also we can see risk measures such as standard deviation, variance. Finally we can see skewness and kurtosis, which are important financial measures to understand the data.

2.4.4 Visualization of risk and return of stocks

We can do a Box plot of the returns to better appreciate historical median return and risk by looking at the median, the quartiles Q1 and Q3, volatility and extreme values of these returns:

chart.Boxplot(returns)

It is easy to see that Tesla is the stock with the highest risk. The red circles show the mean, the mid line is the median (50 percentile), the boxes include the 50% of the data from the Q1 (25 percentile) to the Q3 (75 percentile). The dots are considered extreme values for in the context of its own distribution.

Now we can start evaluating the performance over time for each stock by looking at how much $ we would have made if we had invested $1.00 in each stock at the beginning of the time periods:

charts.PerformanceSummary(Returns, 
                          main = "Performance of $1.00 over time",
                          wealth.index = TRUE)

Since Tesla has had an extraordinary performance for the last months, it is hard to appreciate the performance of the rest of the stocks. So, we can drop Tesla from the plot:

charts.PerformanceSummary(Returns[,2:5], 
                        main = "Performance of $1.00 over time",
                          wealth.index = TRUE)

2.5 Portfolio formation

I start creating the weights of 2 portfolios. One will be an equally-weighted portfolio, and the other will be an aggressive portfolio assigning high weights to risky stocks, and low weights to conservative stocks.

I start creating a vector of weights for the equally weighted portfolio:

w_ew = rep(0.2,5)
# The rep function repeats a value n times

Now I create a vector of weights for the aggressive portfolio. I will assign the following weights:

Tesla: 40% Microsoft: 30% Nextera: 20% WalMart:0% Pfizer: 10%

w_aggressive = c(0.4,0.3,0.2,0,0.1)

2.6 Portfolio return calculation

We calculate the historical return of the equally-weighted portfolio using the Return.portfolio function:

portfolio_returns_ew <-
  Return.portfolio(Returns,
                   weights = w_ew) %>%
  `colnames<-`("returns")

We calculate the historical return of the aggressive portfolio using the Return.portfolio function:

portfolio_returns_ag <-
  Return.portfolio(Returns,
                   weights = w_aggressive) %>%
  `colnames<-`("returns")

With the portfolio returns I can plot a performance chart of each portfolio:

charts.PerformanceSummary(portfolio_returns_ew, 
                          main = "Equally-weighted Portfolio")

charts.PerformanceSummary(portfolio_returns_ag, 
                          main = "Aggressive Portfolio Performance")

We finally do a dynamic plot of historical monthly returns of each portfolio using the highchart function from the highcharter package:

highchart(type = "stock") %>% 
  hc_title(text = "Equally-weighted Portfolio") %>% 
  hc_add_series(portfolio_returns_ew$returns, 
                name = "Monthly Returns", 
                color = "cornflowerblue") %>% 
  hc_add_theme(hc_theme_flat()) %>% 
  hc_navigator(enabled = FALSE) %>% 
  hc_scrollbar(enabled = FALSE) %>% 
  hc_legend(enabled = TRUE) %>% 
  hc_exporting(enabled = TRUE)

We can interact with the plot to zoom in and out, and selecting time periods.

We do the same for the aggressive portfolio:

highchart(type = "stock") %>% 
  hc_title(text = "Aggressive Portfolio") %>% 
  hc_add_series(portfolio_returns_ag$returns, 
                name = "Monthly Returns", 
                color = "cornflowerblue") %>% 
  hc_add_theme(hc_theme_flat()) %>% 
  hc_navigator(enabled = FALSE) %>% 
  hc_scrollbar(enabled = FALSE) %>% 
  hc_legend(enabled = TRUE) %>% 
  hc_exporting(enabled = TRUE)

3 Introduction to Portfolio Theory

Regardless the approach you use to download and manipulate the source data, once that your data frame of monthly returns is ready for analysis you will be in a good position to create the Variance-covariance matrix which is needed to compute the expected risk of the portfolio.

The first step is to transform the data frame into a matrix using the as.matrix function. We save the result in a matrix class object called ret.mat.

ret.mat <- as.matrix(Returns)

We compute the Var-Covariance matrix. The var function receives a matrix of returns as parameter. The returns of stock 1 are in column 1, returns of stock 2 are in column 2, and so on. This function calculates the Variance-Covariance Matrix of stock returns.

Remember that the Variance-Covariance matrix contains Variances of stock returns in the diagonal, and it contains the covariances of pairs of stock returns in the non-diagonals. Also, this matrix is symetric, since Cov(Reti, Retj) = Cov(Retj,Reti). More formally:

For example, the variance-covariance matrix of 2 variables $X1$ and $X2$ contains the variance of each variable in its diagonal and covariances between the variables in its non-diagonal terms. In other words, the elements in the upright part of the matrix are repeated in the down-left part of the matrix. More formally:

\[\mathbf{\sum_{cov}} = \left[\begin{array} {rr} Var(X_1) & Cov(X_1, X_2) \\ Cov(X_2, X_1) & Var(X_2) \\ \end{array}\right] \]

COV <- var(ret.mat)

We can see the results of the Var-Covar matrix as shown below:

COV

##             TSLA        MSFT          NEE          WMT         PFE
## TSLA 0.056019849 0.008324568 0.0057045311 0.0049511133 0.004838516
## MSFT 0.008324568 0.003572705 0.0012970568 0.0013178024 0.001396100
## NEE  0.005704531 0.001297057 0.0037436062 0.0008943766 0.001882363
## WMT  0.004951113 0.001317802 0.0008943766 0.0032274386 0.001236388
## PFE  0.004838516 0.001396100 0.0018823633 0.0012363875 0.005905838

We can also calculate the Correlation Matrix to better understand the relationships of all different pairs of stock returns.

cor(ret.mat)

##           TSLA      MSFT       NEE       WMT       PFE
## TSLA 1.0000000 0.5884264 0.3939163 0.3682162 0.2660115
## MSFT 0.5884264 1.0000000 0.3546625 0.3880813 0.3039325
## NEE  0.3939163 0.3546625 1.0000000 0.2573039 0.4003295
## WMT  0.3682162 0.3880813 0.2573039 1.0000000 0.2831945
## PFE  0.2660115 0.3039325 0.4003295 0.2831945 1.0000000

Remember that the correlation between 2 stock returns can be any value between -1 and 1. When the correlation is close to 0, it means that both stocks have no relationship. If the correlation is close to 1 it means that both stock returns move in a very similar way in the same direction.

We calculate the expected simple returns of the stocks in another matrix using the apply function. It is important to double check that the second parameter is equal to 2. This means that we are applying the function mean by column. If we want to apply the mean function by raw, we have to specify the second parameter as 1.

ER<- exp(apply(ret.mat, 2, mean)) - 1

This is the simple way to calculate the expected return of each stock according to Markowitz theory. Remember that the expected return of one stock is simply the Geometric average return of its historical returns.

To calculate this expected return we have 2 methods:

Calculate the products of Gross Returns of each period, substract 1, and then apply the N root to get the geometric average.
Get the arithmetic average of continuously compounded returns and then convert this amount to simple returns by raising $e$ to this average and then substract 1.

We have estimated the geometric average using the method 2, but not only for one stock, but for all stocks in the the matrix ret.mat. The result will be a vector of geometric returns that will be saved in er. We can estimate the vector $M_r$ of expected returns as:

\[\mathbf{M_{R}} = exp[{M_r}] = exp \left[\begin{array} {r} \bar{r_1} \\ \bar{r_2} \\ .\\ .\\ .\\ \bar{r_N} \end{array}\right] -1 \]

We can see the expected simple returns by calling the er object.

ER

##        TSLA        MSFT         NEE         WMT         PFE 
## 0.055677643 0.021545764 0.017896575 0.007137089 0.010636478

We calculate the variance, standard deviation, skewness and kurtosis of all asset returns (by columns). We do this just to see descriptive statistics of the stock returns.

apply(ret.mat, 2, var)

##        TSLA        MSFT         NEE         WMT         PFE 
## 0.056019849 0.003572705 0.003743606 0.003227439 0.005905838

apply(ret.mat, 2, sd)

##       TSLA       MSFT        NEE        WMT        PFE 
## 0.23668513 0.05977211 0.06118502 0.05681055 0.07684945

apply(ret.mat, 2, skewness)

##        TSLA        MSFT         NEE         WMT         PFE 
##  0.25435998  0.06414829 -0.64113072 -0.65372480  0.55382268

apply(ret.mat, 2, kurtosis)

##       TSLA       MSFT        NEE        WMT        PFE 
##  1.5869720 -0.3504060  1.4714191  0.7056401  0.1744541

Remember what is skewness and kurtosis. Skewness is a measure of the shape of the probability distribution of the returns while kurtosis gives us information about how much the stock has had extreme negative and positive returns. In other words, kurtosis helps us to identify how big are the fat-tails of the return distribution. We can complement this with the histogram of returns to better appreciate the level of risk of the stock.

4 Calculation of expected variance and risk of a Portfolio

Assume that you invest in a portfolio with the following weights: Apple 30%, Walmart 10%, Microsoft 30%, GE 10% and Tesla 20%. Create a vector with these 5 weights:

W <- c(0.30,0.10,0.30,0.10,0.20)

According to Portfolio Theory, the expected risk of a portfolio $P$ is calculated as the squared root of the Portfolio Variance.

\[SD(P) = \sqrt{(Var(P))}\]

The portfolio variance can be calculated with the following matrix multiplication:

\[Var(P) = W' * \sum_{cov} * W\] where:

$W$ is the weight vector of the stocks, and $W'$ is the tranposed matrix of weights.

We now can estimate the expected return and expected risk of this portfolio.

Now I am ready to start estimating the expected return and risk of the portfolio using Matrix Algebra:

ERP1 <- t(W)%*%ER
ERP1

##            [,1]
## [1,] 0.02706785

The vector ERP1 will have one value with the expected return of the portfolio. The t function is the transpose function of a vector or matrix. The result of multiplying the vector transposed times the vector of expected returns of the stocks is the same as calculating a weighted average.

Now I can estimate the expected risk of the portfolio using Matrix Albegra:

EVARPORT <- t(W)%*%COV%*%W
ERISK <- sqrt(EVARPORT)
ERISK

##            [,1]
## [1,] 0.09260642

With this matrix multiplication I am applying Markowitz theory to estimate the expected variance of the portfolio. You can review why this is the case in the Notes about Portfolio Theory. Make sure you understand why we do this matrix multiplication.

5 Datacamp activities

Go to datacamp.com site. You have to take (view and do the exercises) of the following chapters of the following courses:

Coruse: Introduction to Portfolio Analysis in R, complete 3 chapters:

Chapter 1: “The building blocks”,

Chapter 2: “Analyzing Performance”, and

Chapter 3 “Performance Drivers”

The grade of this Workshop will be the following:

6 Workshop submission

You have to submit your .html file of this workshop through Canvas BEFORE NEXT CLASS.