Alberto Dorantes Dosamantes, Ph.D. Monterrey Tech, Queretaro Campus Sep 1st, 2022

Abstract

In this workshop we will use real financial time series data to learn about a) data management such as return calculation and dataset merging, b) data visualization, c) and also we will start learning how to estimate expected stock return and expected portfolio return and risk using matrix algebra.

Data management and visualization for financial portfolios

In this section I will work on the following: a) Data management of financial data, b) return calculations, c) descriptive statistics for finance, d) visualization of financial data, and e) Portfolio formation and basic estimations.

2.1 Data management of financial data

2.1.1 Data collection

We will use the quantmod package to import real online data from Yahoo Finance. This package contains the getSymbols() function, which creates an xts (extensible time series) object in the environment with the downloaded data from the Internet. :

library(quantmod)
Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: TTR
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 

The getSymbols() function download online and up-to-date financial data, such as stock prices, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, FRED database and Oanda. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables of most of the countries.

You can type ?function in the console or the R Script and run it to know more about the syntax of any function. This will display the R documentation of the function in the bottom-right pane:

?getSymbols

Now, we will work with historical data of the Bitcoin cryptocurrency and the TESLA stock. We download daily quotations of these instruments from January 1, 2019 to date from Yahoo Finance:

getSymbols(c("BTC-USD","TSLA"), from="2019-01-01", src="yahoo", periodicity="daily")
Warning: BTC-USD contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.
[1] "BTC-USD" "TSLA"   

This function will create an xts-zoo R object for each ticker. Each object has the corresponding historical daily prices. xts stands for extensible time-series. An xts-zoo object is designed to easily manipulate time series data.

BTC-USD and TSLA are the ticker names in Yahoo Finance. The from argument is used to indicate the initial date from which you want to bring data. The to argument is the end date of the series you want to download. In this case we omit the to argument in order to download the most recent data. The src argument indicates the source of the data, in this case it is Yahoo Finance. Finally, the periodicity argument specifies the granularity of the data (daily, weekly, monthly, quarterly) also as a character vector.

You can check the content of any of these dataset with View(). When tickers have special characters, we have to make reference to the object with simple quotes (``):

You can list the FIRST 5 rows of a dataset by using head():

head(`BTC-USD`,5)
           BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close
2019-01-01     3746.713     3850.914    3707.231      3843.520
2019-01-02     3849.216     3947.981    3817.409      3943.409
2019-01-03     3931.049     3935.685    3826.223      3836.741
2019-01-04     3832.040     3865.935    3783.854      3857.718
2019-01-05     3851.974     3904.903    3836.900      3845.195
           BTC-USD.Volume BTC-USD.Adjusted
2019-01-01     4324200990         3843.520
2019-01-02     5244856836         3943.409
2019-01-03     4530215219         3836.741
2019-01-04     4847965467         3857.718
2019-01-05     5137609824         3845.195

Also, you can list the LAST 5 rows of a data set. Note that you can change number of rows you want to display.

tail(`BTC-USD`, 5)
           BTC-USD.Open BTC-USD.High BTC-USD.Low BTC-USD.Close
2022-09-01     20050.50     20198.39    19653.97      20127.14
2022-09-02     20126.07     20401.57    19814.77      19969.77
2022-09-03     19969.72     20037.01    19698.36      19832.09
2022-09-04           NA           NA          NA            NA
2022-09-05     20027.30     20027.30    19840.12      19853.44
           BTC-USD.Volume BTC-USD.Adjusted
2022-09-01    30182031010         20127.14
2022-09-02    29123998928         19969.77
2022-09-03    23613051457         19832.09
2022-09-04             NA               NA
2022-09-05    26419550208         19853.44

For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded in every specific period. The adjusted prices are used for stocks, not for currencies. Adjusted prices considers dividend payments and also stock splits. Then, for the Bitcoin series we can use close of adjusted price to calculate daily returns.

Let’s see some of the benefits of using xts-zoo objects. We can, for example, select columns using any of the following functions, where x represents a generic xts zoo object:

Op(x): Extract the Opening prices of the period. Hi(x): Extract the Highest price of the period. Lo(x): Extract the Lowest price of the period. Cl(x): Extract the closing prices of the period. Vo(x): Extract the volume traded of the period. Ad(x): Extract the Adjusted prices of the period.

2.1.2 Merging and cleaning financial datasets:

We can use the merge() function to create a consolidated dataset of one or more xts-zoo objects:

prices <- merge(`BTC-USD`, TSLA)
# I can select only the adjusted prices in order to calculate returns:
adjprices <-Ad(prices) 

Now we have an xts-zoo objects with 2 columns. I can change the names of the columns with simple names:

names(adjprices)<-c("bitcoin","tesla")

Now I can make reference to the adjusted prices using these names.

In Finance, when managing daily data it is very common to have gaps in the series. What does this mean? It means that the contains some missing days. For example, for stock series there is no data for weekends or holidays. However, R deals with gaps because it recognizes that we are working with a time series object. Thus, we have a time variable with NO GAPS, which avoids problems when computing returns. However, R does not deal automatically with empty values (called NA’s). It is a good idea to have a data set free of NA’s. So, I can use the function na.omit:

adjprices <- na.omit(adjprices)

2.2 Return calculation

We can create xts-zoo objects for simple and continuously compounded returns of both instruments:

# Calculating continuously compounded daily returns:
ccreturns <- diff(log(adjprices)) 
# Calculating simple daily returns:
returns <- adjprices / lag(adjprices,n=1) - 1
# We can also calculate simple returns using the diff function:
returns2<- diff(adjprices) / lag(adjprices,n=1)

The diff function works with xts to calculate the difference between the value of one period minus the previous one. The lag function gets the previous value of the time series.

We can calculate holding returns for any time period for each instrument (HPR). For example, if we want to calculate the whole period return from the first day of the series to the most recent one, we can do the following:

HPR1<- 100* as.vector(last(adjprices)) / as.vector(first(adjprices)) - 1 
HPR1
[1]  505.4088 1305.9618
cat("The holding period return for Bitcoin is: ", HPR1[1], "%")
The holding period return for Bitcoin is:  505.4088 %
cat("The holding period return for Bitcoin is: ", HPR1[2], "%")
The holding period return for Bitcoin is:  1305.962 %

We could also use the continuously compounded returns to calculate the same holding simple returns:

HPR2<-100*exp(colSums(ccreturns,na.rm=TRUE)) - 1
HPR2
  bitcoin     tesla 
 505.4088 1305.9618 

2.3 Visualization of financial data

One of the advantages of the quantmod package is that it has some built-in data visualization capabilities. For example, the chartSeries function is a plotting tool designed to create standard financial charts given a time series object. The chartSeries function also includes some arguments that allows the user to modify the cosmetics of the plot such as the theme argument.

Let’s see the performance of Bitcoin prices:

chartSeries(`BTC-USD`, theme = ("white"))

Let’s see the performance of TESLA prices:

chartSeries(TSLA, theme = ("black"))

We can see exponential growth of both, the Bitcoin prices and Tesla prices for the last days. If you had invested in Bitcoin or Tesla in July 2020 and has sold your position early January 2021, you would have multiplied your investment by around 4 times (300% period return)!

We can visualize the daily returns over time:

plot(returns$bitcoin)

plot(returns$tesla)

With this plot we can appreciate the daily volatility of each instrument over time. Volatility is a measure of how disperse the returns move up and down from its mean; it is basically the standard deviation of returns.

2.4 Basics of Portfolio analysis

We will use the PerformanceAnalytics and related packages for this section:

# Load the packages:
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      
── Attaching packages ─────────────────────── tidyverse 1.3.2 ──✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.0      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 ── Conflicts ────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::first()  masks xts::first()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::last()   masks xts::last()
library(PerformanceAnalytics)

Attaching package: ‘PerformanceAnalytics’

The following object is masked _by_ ‘.GlobalEnv’:

    prices

The following object is masked from ‘package:graphics’:

    legend
library(highcharter)
Registered S3 method overwritten by 'htmlwidgets':
  method           from         
  print.htmlwidget tools:rstudio
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use

We will do an exercise of a simple portfolio composed by 5 US stocks: Tesla, Microsoft, Nextera, WalMart and Pfizer. You can change the tickers of the stocks as you wish as long as Yahoo Finance has data for your tickers. I selected stocks from the industries: high-tech, clean energies, pharmaceutical, and retail.

2.4.1 Automation of data collection and data management

With the following code we will automatically download data from the 5 stocks, merge the data and calculate returns:

symbols <- c("TSLA","MSFT","NEE","WMT","PFE")

prices <- getSymbols(symbols,src = 'yahoo',
             periodicity = "monthly",
             from = "2018-01-01",
             auto.assign = TRUE,
             warnings = FALSE) %>%
  map(~Ad(get(.))) %>%
  reduce(merge) %>%
  `colnames<-`(symbols)

If you see I am doing several data management process in sequential process. The %>% is used to indicate the separation of each data management process. The previous code does the following:

Download price data of the 5 stocks

Apply (map) the adjusted function to all xts-zoo datasets in order to get the adjusted stock prices

Do the merge of all the xts-zoo objects into one object

Finally rename the columns of the integrated object, and return the object as prices.

2.4.2 Return calculations

We calculate returns of all stocks using the Return.calculate function:

Returns<-Return.calculate(prices) %>%
  na.omit() 

The na.omit function was also applied to clean the dataset and drop the rows with NA values.

We calculate the continuous compounded return (also called log returns) of all stocks:

returns<-Return.calculate(prices,method = "log") %>%
         na.omit() 
# We could calculate this using the diff and log functions:
returns2<-na.omit(diff(log(prices)))
# The returns and returns2 objects will have exactly the same returns

2.4.3 Descriptive statistics of returns

We calculate descriptive statistics for the returns of all stocks:

table.Stats(returns) 

We can see arithmetic, geometric mean, median of returns. Also we can see risk measures such as standard deviation, variance. Finally we can see skewness and kurtosis, which are important financial measures to understand the data.

2.4.4 Visualization of risk and return of stocks

We can do a Box plot of the returns to better appreciate historical median return and risk by looking at the median, the quartiles Q1 and Q3, volatility and extreme values of these returns:

chart.Boxplot(returns)

It is easy to see that Tesla is the stock with the highest risk. The red circles show the mean, the mid line is the median (50 percentile), the boxes include the 50% of the data from the Q1 (25 percentile) to the Q3 (75 percentile). The dots are considered extreme values for in the context of its own distribution.

Now we can start evaluating the performance over time for each stock by looking at how much $ we would have made if we had invested $1.00 in each stock at the beginning of the time periods:

charts.PerformanceSummary(Returns, 
                          main = "Performance of $1.00 over time",
                          wealth.index = TRUE)

Since Tesla has had an extraordinary performance for the last months, it is hard to appreciate the performance of the rest of the stocks. So, we can drop Tesla from the plot:

charts.PerformanceSummary(Returns[,2:5], 
                        main = "Performance of $1.00 over time",
                          wealth.index = TRUE)

2.5 Portfolio formation

I start creating the weights of 2 portfolios. One will be an equally-weighted portfolio, and the other will be an aggressive portfolio assigning high weights to risky stocks, and low weights to conservative stocks.

I start creating a vector of weights for the equally weighted portfolio:

w_ew = rep(0.2,5)
# The rep function repeats a value n times

Now I create a vector of weights for the aggressive portfolio. I will assign the following weights:

Tesla: 40% Microsoft: 30% Nextera: 20% WalMart:0% Pfizer: 10%

w_aggressive = c(0.4,0.3,0.2,0,0.1)

2.6 Portfolio return calculation

We calculate the historical return of the equally-weighted portfolio using the Return.portfolio function:

portfolio_returns_ew <-
  Return.portfolio(Returns,
                   weights = w_ew) %>%
  `colnames<-`("returns")

We calculate the historical return of the aggressive portfolio using the Return.portfolio function:

portfolio_returns_ag <-
  Return.portfolio(Returns,
                   weights = w_aggressive) %>%
  `colnames<-`("returns")

With the portfolio returns I can plot a performance chart of each portfolio:

charts.PerformanceSummary(portfolio_returns_ew, 
                          main = "Equally-weighted Portfolio")

charts.PerformanceSummary(portfolio_returns_ag, 
                          main = "Aggressive Portfolio Performance")

We finally do a dynamic plot of historical monthly returns of each portfolio using the highchart function from the highcharter package:

highchart(type = "stock") %>% 
  hc_title(text = "Equally-weighted Portfolio") %>% 
  hc_add_series(portfolio_returns_ew$returns, 
                name = "Monthly Returns", 
                color = "cornflowerblue") %>% 
  hc_add_theme(hc_theme_flat()) %>% 
  hc_navigator(enabled = FALSE) %>% 
  hc_scrollbar(enabled = FALSE) %>% 
  hc_legend(enabled = TRUE) %>% 
  hc_exporting(enabled = TRUE)

We can interact with the plot to zoom in and out, and selecting time periods.

We do the same for the aggressive portfolio:

highchart(type = "stock") %>% 
  hc_title(text = "Aggressive Portfolio") %>% 
  hc_add_series(portfolio_returns_ag$returns, 
                name = "Monthly Returns", 
                color = "cornflowerblue") %>% 
  hc_add_theme(hc_theme_flat()) %>% 
  hc_navigator(enabled = FALSE) %>% 
  hc_scrollbar(enabled = FALSE) %>% 
  hc_legend(enabled = TRUE) %>% 
  hc_exporting(enabled = TRUE)

3 Introduction to Portfolio Theory

Regardless the approach you use to download and manipulate the source data, once that your data frame of monthly returns is ready for analysis you will be in a good position to create the Variance-covariance matrix which is needed to compute the expected risk of the portfolio.

The first step is to transform the data frame into a matrix using the as.matrix function. We save the result in a matrix class object called ret.mat.

ret.mat <- as.matrix(Returns)

We compute the Var-Covariance matrix. The var function receives a matrix of returns as parameter. The returns of stock 1 are in column 1, returns of stock 2 are in column 2, and so on. This function calculates the Variance-Covariance Matrix of stock returns.

Remember that the Variance-Covariance matrix contains Variances of stock returns in the diagonal, and it contains the covariances of pairs of stock returns in the non-diagonals. Also, this matrix is symetric, since Cov(Reti, Retj) = Cov(Retj,Reti). More formally:

For example, the variance-covariance matrix of 2 variables X1 and X2 contains the variance of each variable in its diagonal and covariances between the variables in its non-diagonal terms. In other words, the elements in the upright part of the matrix are repeated in the down-left part of the matrix. More formally:

∑cov=[Var(X1)Cov(X2,X1)Cov(X1,X2)Var(X2)]

COV <- var(ret.mat)

We can see the results of the Var-Covar matrix as shown below:

COV
            TSLA        MSFT          NEE          WMT
TSLA 0.044324810 0.007089377 0.0036288008 0.0043913507
MSFT 0.007089377 0.003480521 0.0012656788 0.0012728561
NEE  0.003628801 0.001265679 0.0036179444 0.0008644815
WMT  0.004391351 0.001272856 0.0008644815 0.0031130716
PFE  0.003429573 0.001350195 0.0018184272 0.0011935499
             PFE
TSLA 0.003429573
MSFT 0.001350195
NEE  0.001818427
WMT  0.001193550
PFE  0.005696879

We can also calculate the Correlation Matrix to better understand the relationships of all different pairs of stock returns.

cor(ret.mat)
          TSLA      MSFT       NEE       WMT       PFE
TSLA 1.0000000 0.5707718 0.2865555 0.3738351 0.2158232
MSFT 0.5707718 1.0000000 0.3566731 0.3866897 0.3032187
NEE  0.2865555 0.3566731 1.0000000 0.2575907 0.4005405
WMT  0.3738351 0.3866897 0.2575907 1.0000000 0.2834181
PFE  0.2158232 0.3032187 0.4005405 0.2834181 1.0000000

Remember that the correlation between 2 stock returns can be any value between -1 and 1. When the correlation is close to 0, it means that both stocks have no relationship. If the correlation is close to 1 it means that both stock returns move in a very similar way in the same direction.

We calculate the expected simple returns of the stocks in another matrix using the apply function. It is important to double check that the second parameter is equal to 2. This means that we are applying the function mean by column. If we want to apply the mean function by raw, we have to specify the second parameter as 1.

ER<- exp(apply(ret.mat, 2, mean)) - 1

This is the simple way to calculate the expected return of each stock according to Markowitz theory. Remember that the expected return of one stock is simply the Geometric average return of its historical returns.

To calculate this expected return we have 2 methods:

Calculate the products of Gross Returns of each period, substract 1, and then apply the N root to get the geometric average.

Get the arithmetic average of continuously compounded returns and then convert this amount to simple returns by raising e to this average and then substract 1.

We have estimated the geometric average using the method 2, but not only for one stock, but for all stocks in the the matrix ret.mat. The result will be a vector of geometric returns that will be saved in er. We can estimate the vector Mr of expected returns as:

MR=exp[Mr]=exp⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢r1¯r2¯…rN¯⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥−1

We can see the expected simple returns by calling the er object.

ER
       TSLA        MSFT         NEE         WMT         PFE 
0.064287042 0.020448711 0.017360339 0.007022978 0.010445553 

We calculate the variance, standard deviation, skewness and kurtosis of all asset returns (by columns). We do this just to see descriptive statistics of the stock returns.

apply(ret.mat, 2, var)
       TSLA        MSFT         NEE         WMT         PFE 
0.044324810 0.003480521 0.003617944 0.003113072 0.005696879 
apply(ret.mat, 2, sd)
      TSLA       MSFT        NEE        WMT        PFE 
0.21053458 0.05899594 0.06014935 0.05579491 0.07547767 
apply(ret.mat, 2, skewness)
      TSLA       MSFT        NEE        WMT        PFE 
 1.0645506  0.1130058 -0.6246087 -0.6591064  0.5710369 
apply(ret.mat, 2, kurtosis)
      TSLA       MSFT        NEE        WMT        PFE 
 0.8277288 -0.2992434  1.5910755  0.8327515  0.2933593 

Remember what is skewness and kurtosis. Skewness is a measure of the shape of the probability distribution of the returns while kurtosis gives us information about how much the stock has had extreme negative and positive returns. In other words, kurtosis helps us to identify how big are the fat-tails of the return distribution. We can complement this with the histogram of returns to better appreciate the level of risk of the stock.

4 Calculation of expected variance and risk of a Portfolio

Assume that you invest in a portfolio with the following weights: Apple 30%, Walmart 10%, Microsoft 30%, GE 10% and Tesla 20%. Create a vector with these 5 weights:

W <- c(0.30,0.10,0.30,0.10,0.20)

According to Portfolio Theory, the expected risk of a portfolio P is calculated as the squared root of the Portfolio Variance.

SD(P)=(Var(P))−−−−−−−−√

The portfolio variance can be calculated with the following matrix multiplication:

Var(P)=W′∗∑cov∗W where:

W is the weight vector of the stocks, and W′ is the tranposed matrix of weights.

We now can estimate the expected return and expected risk of this portfolio.

Now I am ready to start estimating the expected return and risk of the portfolio using Matrix Algebra:

ERP1 <- t(W)%*%ER
ERP1
           [,1]
[1,] 0.02933049

The vector ERP1 will have one value with the expected return of the portfolio. The t function is the transpose function of a vector or matrix. The result of multiplying the vector transposed times the vector of expected returns of the stocks is the same as calculating a weighted average.

Now I can estimate the expected risk of the portfolio using Matrix Albegra:

EVARPORT <- t(W)%*%COV%*%W
ERISK <- sqrt(EVARPORT)
ERISK
           [,1]
[1,] 0.08267686

With this matrix multiplication I am applying Markowitz theory to estimate the expected variance of the portfolio. You can review why this is the case in the Notes about Portfolio Theory. Make sure you understand why we do this matrix multiplication.

---
title: "Finance Programming - Workshop 4"
author: Stefan Schweitzer A01209755
output: html_notebook
---

Alberto Dorantes Dosamantes, Ph.D.
Monterrey Tech, Queretaro Campus
Sep 1st, 2022

# Abstract
In this workshop we will use real financial time series data to learn about a) data management such as return calculation and dataset merging, b) data visualization, c) and also we will start learning how to estimate expected stock return and expected portfolio return and risk using matrix algebra.

# Data management and visualization for financial portfolios
In this section I will work on the following: a) Data management of financial data, b) return calculations, c) descriptive statistics for finance, d) visualization of financial data, and e) Portfolio formation and basic estimations.

# 2.1 Data management of financial data
## 2.1.1 Data collection
We will use the quantmod package to import real online data from Yahoo Finance. This package contains the getSymbols() function, which creates an xts (extensible time series) object in the environment with the downloaded data from the Internet. :

```{r}
library(quantmod)
```

The getSymbols() function download online and up-to-date financial data, such as stock prices, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, FRED database and Oanda. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables of most of the countries.

You can type ?function in the console or the R Script and run it to know more about the syntax of any function. This will display the R documentation of the function in the bottom-right pane:

```{r}
?getSymbols
```

Now, we will work with historical data of the Bitcoin cryptocurrency and the TESLA stock. We download daily quotations of these instruments from January 1, 2019 to date from Yahoo Finance:

```{r}
getSymbols(c("BTC-USD","TSLA"), from="2019-01-01", src="yahoo", periodicity="daily")
```

This function will create an xts-zoo R object for each ticker. Each object has the corresponding historical daily prices. xts stands for extensible time-series. An xts-zoo object is designed to easily manipulate time series data.

BTC-USD and TSLA are the ticker names in Yahoo Finance. The from argument is used to indicate the initial date from which you want to bring data. The to argument is the end date of the series you want to download. In this case we omit the to argument in order to download the most recent data. The src argument indicates the source of the data, in this case it is Yahoo Finance. Finally, the periodicity argument specifies the granularity of the data (daily, weekly, monthly, quarterly) also as a character vector.

You can check the content of any of these dataset with View(). When tickers have special characters, we have to make reference to the object with simple quotes (``):

You can list the FIRST 5 rows of a dataset by using head():

```{r}
head(`BTC-USD`,5)
```
Also, you can list the LAST 5 rows of a data set. Note that you can change number of rows you want to display.

```{r}
tail(`BTC-USD`, 5)
```

For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded in every specific period. The adjusted prices are used for stocks, not for currencies. Adjusted prices considers dividend payments and also stock splits. Then, for the Bitcoin series we can use close of adjusted price to calculate daily returns.

Let’s see some of the benefits of using xts-zoo objects. We can, for example, select columns using any of the following functions, where x represents a generic xts zoo object:

Op(x): Extract the Opening prices of the period.
Hi(x): Extract the Highest price of the period.
Lo(x): Extract the Lowest price of the period.
Cl(x): Extract the closing prices of the period.
Vo(x): Extract the volume traded of the period.
Ad(x): Extract the Adjusted prices of the period.


## 2.1.2 Merging and cleaning financial datasets:
We can use the merge() function to create a consolidated dataset of one or more xts-zoo objects:

```{r}
prices <- merge(`BTC-USD`, TSLA)
# I can select only the adjusted prices in order to calculate returns:
adjprices <-Ad(prices) 
```

Now we have an xts-zoo objects with 2 columns. I can change the names of the columns with simple names:

```{r}
names(adjprices)<-c("bitcoin","tesla")
```

Now I can make reference to the adjusted prices using these names.

In Finance, when managing daily data it is very common to have gaps in the series. What does this mean? It means that the contains some missing days. For example, for stock series there is no data for weekends or holidays. However, R deals with gaps because it recognizes that we are working with a time series object. Thus, we have a time variable with NO GAPS, which avoids problems when computing returns. However, R does not deal automatically with empty values (called NA’s). It is a good idea to have a data set free of NA’s. So, I can use the function na.omit:

```{r}
adjprices <- na.omit(adjprices)
```

## 2.2 Return calculation

We can create xts-zoo objects for simple and continuously compounded returns of both instruments:

```{r}
# Calculating continuously compounded daily returns:
ccreturns <- diff(log(adjprices)) 
# Calculating simple daily returns:
returns <- adjprices / lag(adjprices,n=1) - 1
# We can also calculate simple returns using the diff function:
returns2<- diff(adjprices) / lag(adjprices,n=1)
```

The diff function works with xts to calculate the difference between the value of one period minus the previous one. The lag function gets the previous value of the time series.

We can calculate holding returns for any time period for each instrument (HPR). For example, if we want to calculate the whole period return from the first day of the series to the most recent one, we can do the following:

```{r}
HPR1<- 100* as.vector(last(adjprices)) / as.vector(first(adjprices)) - 1 
HPR1
```
```{r}
cat("The holding period return for Bitcoin is: ", HPR1[1], "%")
```
```{r}
cat("The holding period return for Bitcoin is: ", HPR1[2], "%")
```
We could also use the continuously compounded returns to calculate the same holding simple returns:

```{r}
HPR2<-100*exp(colSums(ccreturns,na.rm=TRUE)) - 1
HPR2
```
## 2.3 Visualization of financial data

One of the advantages of the quantmod package is that it has some built-in data visualization capabilities. For example, the chartSeries function is a plotting tool designed to create standard financial charts given a time series object. The chartSeries function also includes some arguments that allows the user to modify the cosmetics of the plot such as the theme argument.

Let’s see the performance of Bitcoin prices:

```{r}
chartSeries(`BTC-USD`, theme = ("white"))
```
Let’s see the performance of TESLA prices:
```{r}
chartSeries(TSLA, theme = ("black"))
```
We can see exponential growth of both, the Bitcoin prices and Tesla prices for the last days. If you had invested in Bitcoin or Tesla in July 2020 and has sold your position early January 2021, you would have multiplied your investment by around 4 times (300% period return)!

We can visualize the daily returns over time:

```{r}
plot(returns$bitcoin)
```

```{r}
plot(returns$tesla)
```
With this plot we can appreciate the daily volatility of each instrument over time. Volatility is a measure of how disperse the returns move up and down from its mean; it is basically the standard deviation of returns.

## 2.4 Basics of Portfolio analysis
We will use the PerformanceAnalytics and related packages for this section:

```{r}
# Load the packages:
library(tidyverse)
library(PerformanceAnalytics)
library(highcharter)
```

We will do an exercise of a simple portfolio composed by 5 US stocks: Tesla, Microsoft, Nextera, WalMart and Pfizer. You can change the tickers of the stocks as you wish as long as Yahoo Finance has data for your tickers. I selected stocks from the industries: high-tech, clean energies, pharmaceutical, and retail.

## 2.4.1 Automation of data collection and data management
With the following code we will automatically download data from the 5 stocks, merge the data and calculate returns:

```{r}
symbols <- c("TSLA","MSFT","NEE","WMT","PFE")

prices <- getSymbols(symbols,src = 'yahoo',
             periodicity = "monthly",
             from = "2018-01-01",
             auto.assign = TRUE,
             warnings = FALSE) %>%
  map(~Ad(get(.))) %>%
  reduce(merge) %>%
  `colnames<-`(symbols)
```
If you see I am doing several data management process in sequential process. The %>% is used to indicate the separation of each data management process. The previous code does the following:

Download price data of the 5 stocks

Apply (map) the adjusted function to all xts-zoo datasets in order to get the adjusted stock prices

Do the merge of all the xts-zoo objects into one object

Finally rename the columns of the integrated object, and return the object as prices.

## 2.4.2 Return calculations

We calculate returns of all stocks using the Return.calculate function:

```{r}
Returns<-Return.calculate(prices) %>%
  na.omit() 
```

The na.omit function was also applied to clean the dataset and drop the rows with NA values.

We calculate the continuous compounded return (also called log returns) of all stocks:

```{r}
returns<-Return.calculate(prices,method = "log") %>%
         na.omit() 
# We could calculate this using the diff and log functions:
returns2<-na.omit(diff(log(prices)))
# The returns and returns2 objects will have exactly the same returns
```

## 2.4.3 Descriptive statistics of returns
We calculate descriptive statistics for the returns of all stocks:

```{r}
table.Stats(returns) 
```
We can see arithmetic, geometric mean, median of returns. Also we can see risk measures such as standard deviation, variance. Finally we can see skewness and kurtosis, which are important financial measures to understand the data.

## 2.4.4 Visualization of risk and return of stocks
We can do a Box plot of the returns to better appreciate historical median return and risk by looking at the median, the quartiles Q1 and Q3, volatility and extreme values of these returns:

```{r}
chart.Boxplot(returns)
```
It is easy to see that Tesla is the stock with the highest risk. The red circles show the mean, the mid line is the median (50 percentile), the boxes include the 50% of the data from the Q1 (25 percentile) to the Q3 (75 percentile). The dots are considered extreme values for in the context of its own distribution.

Now we can start evaluating the performance over time for each stock by looking at how much $ we would have made if we had invested $1.00 in each stock at the beginning of the time periods:

```{r}
charts.PerformanceSummary(Returns, 
                          main = "Performance of $1.00 over time",
                          wealth.index = TRUE)
```

Since Tesla has had an extraordinary performance for the last months, it is hard to appreciate the performance of the rest of the stocks. So, we can drop Tesla from the plot:

```{r}
charts.PerformanceSummary(Returns[,2:5], 
                        main = "Performance of $1.00 over time",
                          wealth.index = TRUE)
```

## 2.5 Portfolio formation

I start creating the weights of 2 portfolios. One will be an equally-weighted portfolio, and the other will be an aggressive portfolio assigning high weights to risky stocks, and low weights to conservative stocks.

I start creating a vector of weights for the equally weighted portfolio:

```{r}
w_ew = rep(0.2,5)
# The rep function repeats a value n times
```

Now I create a vector of weights for the aggressive portfolio. I will assign the following weights:

Tesla: 40% Microsoft: 30% Nextera: 20% WalMart:0% Pfizer: 10%

```{r}
w_aggressive = c(0.4,0.3,0.2,0,0.1)
```

## 2.6 Portfolio return calculation

We calculate the historical return of the equally-weighted portfolio using the Return.portfolio function:

```{r}
portfolio_returns_ew <-
  Return.portfolio(Returns,
                   weights = w_ew) %>%
  `colnames<-`("returns")
```

We calculate the historical return of the aggressive portfolio using the Return.portfolio function:

```{r}
portfolio_returns_ag <-
  Return.portfolio(Returns,
                   weights = w_aggressive) %>%
  `colnames<-`("returns")
```

With the portfolio returns I can plot a performance chart of each portfolio:

```{r}
charts.PerformanceSummary(portfolio_returns_ew, 
                          main = "Equally-weighted Portfolio")
```

```{r}
charts.PerformanceSummary(portfolio_returns_ag, 
                          main = "Aggressive Portfolio Performance")
```
We finally do a dynamic plot of historical monthly returns of each portfolio using the highchart function from the highcharter package:

```{r}
highchart(type = "stock") %>% 
  hc_title(text = "Equally-weighted Portfolio") %>% 
  hc_add_series(portfolio_returns_ew$returns, 
                name = "Monthly Returns", 
                color = "cornflowerblue") %>% 
  hc_add_theme(hc_theme_flat()) %>% 
  hc_navigator(enabled = FALSE) %>% 
  hc_scrollbar(enabled = FALSE) %>% 
  hc_legend(enabled = TRUE) %>% 
  hc_exporting(enabled = TRUE)
```
We can interact with the plot to zoom in and out, and selecting time periods.

We do the same for the aggressive portfolio:

```{r}
highchart(type = "stock") %>% 
  hc_title(text = "Aggressive Portfolio") %>% 
  hc_add_series(portfolio_returns_ag$returns, 
                name = "Monthly Returns", 
                color = "cornflowerblue") %>% 
  hc_add_theme(hc_theme_flat()) %>% 
  hc_navigator(enabled = FALSE) %>% 
  hc_scrollbar(enabled = FALSE) %>% 
  hc_legend(enabled = TRUE) %>% 
  hc_exporting(enabled = TRUE)
```

# 3 Introduction to Portfolio Theory

Regardless the approach you use to download and manipulate the source data, once that your data frame of monthly returns is ready for analysis you will be in a good position to create the Variance-covariance matrix which is needed to compute the expected risk of the portfolio.

The first step is to transform the data frame into a matrix using the as.matrix function. We save the result in a matrix class object called ret.mat.

```{r}
ret.mat <- as.matrix(Returns)
```

We compute the Var-Covariance matrix. The var function receives a matrix of returns as parameter. The returns of stock 1 are in column 1, returns of stock 2 are in column 2, and so on. This function calculates the Variance-Covariance Matrix of stock returns.

Remember that the Variance-Covariance matrix contains Variances of stock returns in the diagonal, and it contains the covariances of pairs of stock returns in the non-diagonals. Also, this matrix is symetric, since Cov(Reti, Retj) = Cov(Retj,Reti). More formally:

For example, the variance-covariance matrix of 2 variables X1 and X2 contains the variance of each variable in its diagonal and covariances between the variables in its non-diagonal terms. In other words, the elements in the upright part of the matrix are repeated in the down-left part of the matrix. More formally:

∑cov=[Var(X1)Cov(X2,X1)Cov(X1,X2)Var(X2)]

```{r}
COV <- var(ret.mat)
```

We can see the results of the Var-Covar matrix as shown below:

```{r}
COV
```

We can also calculate the Correlation Matrix to better understand the relationships of all different pairs of stock returns.

```{r}
cor(ret.mat)
```

Remember that the correlation between 2 stock returns can be any value between -1 and 1. When the correlation is close to 0, it means that both stocks have no relationship. If the correlation is close to 1 it means that both stock returns move in a very similar way in the same direction.

We calculate the expected simple returns of the stocks in another matrix using the apply function. It is important to double check that the second parameter is equal to 2. This means that we are applying the function mean by column. If we want to apply the mean function by raw, we have to specify the second parameter as 1.

```{r}
ER<- exp(apply(ret.mat, 2, mean)) - 1
```

This is the simple way to calculate the expected return of each stock according to Markowitz theory. Remember that the expected return of one stock is simply the Geometric average return of its historical returns.

To calculate this expected return we have 2 methods:

Calculate the products of Gross Returns of each period, substract 1, and then apply the N root to get the geometric average.

Get the arithmetic average of continuously compounded returns and then convert this amount to simple returns by raising e to this average and then substract 1.

We have estimated the geometric average using the method 2, but not only for one stock, but for all stocks in the the matrix ret.mat. The result will be a vector of geometric returns that will be saved in er. We can estimate the vector Mr of expected returns as:

MR=exp[Mr]=exp⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢r1¯r2¯...rN¯⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥−1

We can see the expected simple returns by calling the er object.

```{r}
ER
```

We calculate the variance, standard deviation, skewness and kurtosis of all asset returns (by columns). We do this just to see descriptive statistics of the stock returns.

```{r}
apply(ret.mat, 2, var)
```
```{r}
apply(ret.mat, 2, sd)
```
```{r}
apply(ret.mat, 2, skewness)
```
```{r}
apply(ret.mat, 2, kurtosis)
```
Remember what is skewness and kurtosis. Skewness is a measure of the shape of the probability distribution of the returns while kurtosis gives us information about how much the stock has had extreme negative and positive returns. In other words, kurtosis helps us to identify how big are the fat-tails of the return distribution. We can complement this with the histogram of returns to better appreciate the level of risk of the stock.

# 4 Calculation of expected variance and risk of a Portfolio
Assume that you invest in a portfolio with the following weights: Apple 30%, Walmart 10%, Microsoft 30%, GE 10% and Tesla 20%. Create a vector with these 5 weights:

```{r}
W <- c(0.30,0.10,0.30,0.10,0.20)
```

According to Portfolio Theory, the expected risk of a portfolio P is calculated as the squared root of the Portfolio Variance.

SD(P)=(Var(P))−−−−−−−−√

The portfolio variance can be calculated with the following matrix multiplication:

Var(P)=W′∗∑cov∗W
where:

W is the weight vector of the stocks, and W′ is the tranposed matrix of weights.

We now can estimate the expected return and expected risk of this portfolio.

Now I am ready to start estimating the expected return and risk of the portfolio using Matrix Algebra:

```{r}
ERP1 <- t(W)%*%ER
ERP1
```

The vector ERP1 will have one value with the expected return of the portfolio. The t function is the transpose function of a vector or matrix. The result of multiplying the vector transposed times the vector of expected returns of the stocks is the same as calculating a weighted average.

Now I can estimate the expected risk of the portfolio using Matrix Albegra:

```{r}
EVARPORT <- t(W)%*%COV%*%W
ERISK <- sqrt(EVARPORT)
ERISK
```
With this matrix multiplication I am applying Markowitz theory to estimate the expected variance of the portfolio. You can review why this is the case in the Notes about Portfolio Theory. Make sure you understand why we do this matrix multiplication.
