Workshop 3, Financial Modeling and Programming

Author

Alberto Dorantes, Ph.D.

Published

November 11, 2024

Abstract
This is an INDIVIDUAL workshop. In this workshop we review the basics of portfolio theory and portfolio optimization. We learn how to program estimations for portfolio return and risk, and portfolio optimization.

1 General directions for this Workshop

Using an RNotebook you have to:

  • Replicate all the R Code along with its output.

  • You have to do whatever is asked in the workshop. It can be: a) Responses to specific questions and/or do an exercise/challenge.

Any QUESTION or any INTERPRETATION you need to do will be written in CAPITAL LETTERS. For ANY QUESTION or INTERPRETATION, you have to RESPOND IN CAPITAL LETTERS right after the question.

  • It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your personal notebook. Your own workshop/notebook will be very helpful for your further study.

You have to submit ONLY the .html version of your .Rmd file.

2 What is a Portfolio?

A portfolio is a set of 2 or more financial assets. A financial asset can of any type such as stock, bond, risk-free instrument, derivative, commodity. The portfolio owner needs to decide how much money allocate to each individual asset. The main advantage of financial portfolios is the possibility to diversify risk, while maintaining an expected rate of return.

In this workshop we review practical easy examples of 2-asset and 3-asset portfolios with the purpose to illustrate portfolio theory and portfolio optimization.

It is recommended that you review the lecture notes posted in the course site where portfolio theory, matrix algebra, and portfolio optimization are explain in more detail.

3 Calculating historical returns and risk of a portfolio

3.1 Historical return of a portfolio

Let’s start with an example of calculating historical returns of a 2-asset portfolio. Imagine that we want to calculate how much return you would have realized if you had invested in the following portfolio between January 2019 to December 2020:

Wal Mart: 70%

Tesla: 30%

We start downloading historical monthly prices from Yahoo Finance:

library(quantmod)
library(tidyverse)
rm(list=ls())
tickers = c("WMT","TSLA")
getSymbols(Symbols = tickers, periodicity="monthly",from="2020-01-01", to="2024-10-31")
[1] "WMT"  "TSLA"
datasets_list <- lapply(tickers, get)

Now we merge all datasets using the function call. I do this to generalize our program for any number of tickers:

prices <- do.call(merge, datasets_list)
# We select only Adjusted prices:
prices <- Ad(prices)

# We change the name of the columns to be equal to the tickers vector:
names(prices) = tickers

We calculate holding period returns for both stocks. We can divide the price of each stock by its own price in the first month and then subtract 1:

firstprices = as.vector(prices[1,])

HPR = sweep(prices,2,firstprices,'/') - 1

#Also, the HPR can be calculated in the prices dataset as:
#prices$HPRWMT = prices$WMT / as.numeric(prices$WMT[1]) - 1 
#prices$HPRTSLA = prices$TSLA / as.numeric(prices$TSLA[1]) - 1 

The sweep function divides each row of prices by the fistprices vector. The parameter 2 indicates that the division is by row.

We set the weight for each stock to form a portfolio:

w1 = 0.70
w2 = 0.30

The holding period return for each month of this portfolio would be the weighted average of the holding period stock returns:

PortfolioHPR_t=0.7*HPR1_t+0.3*HPR2_t At the end of the first month the weights assigned to each stock will change depending on the return of each stock. For example, if stock 1 had a much higher return than stock 2, then the weight w1 will increase to more than 70% and weight w2 will be less than 30%.

We can calculate the holding portfolio return as follows:

HPR$Port = w1 * HPR$WMT + w2 * HPR$TSLA

The historical holding portfolio monthly return would look like:

plot(HPR$Port)

We can calculate how much $1 invested in our portfolio would have grew over time. We can use the HPR of the portfolio and add 1 to get a growth factor for each month:

HPR$invport = 1 * (1+HPR$Port)

plot(HPR$invport)

We can do the previous portfolio return calculations and the same plot using the PerformanceAnalytics package:

library(PerformanceAnalytics)
# I create a vector with the portfolio weights:
w = c(w1,w2)

# I create the monthly return for each stock:

R = prices / lag(prices,1) - 1 

# I calculate the portfolio historical returns using the weight vector and the stock historical returns:
PR <- Return.portfolio(R,weights=w)

# The charts.PerformanceSummary function calculates and plots the $1.0 performance invested in the portfolio:
charts.PerformanceSummary(PR, 
                          main = "Performance of $1.00 over time",
                          wealth.index = TRUE)

This plot shows the same portfolio performance than the previous one, but with this we can also see the monthly return and the drowdowns, which are the negative % declines over time.

3.2 Historical risk (volatility) of a portfolio

The historical risk of a portfolio can be measured with the standard deviation of the portfolio historical period returns. Another important measure for portfolio risk is Value at Risk, but we will not cover that measure in this workshop.

I recommend to use the continuously compounded (cc) returns to calculate the standard deviation of returns. The main reason is that cc returns behave more like a normal distribution, and also because cc returns are more conservative measure than simple returns: for negative returns, cc returns have higher magnitude, and for positive returns, cc returns are less than simple returns.

Then, we can calculate the cc returns of the monthly portfolio returns as follows. We can use the monthly returns calculated by the Return.Portfolio function and convert it to continuously compounded returns:

pr = log(1+PR$portfolio.returns) 

Now we can use the table.Stats from PeformanceAnalytics to quickly calculate the standard deviation and also other important descriptive statistics of returns:

returns_statistics = table.Stats(pr)
returns_statistics
                portfolio.returns
Observations              58.0000
NAs                        0.0000
Minimum                   -0.2776
Quartile 1                -0.0597
Median                     0.0151
Arithmetic Mean            0.0208
Geometric Mean             0.0144
Quartile 3                 0.0837
Maximum                    0.3188
SE Mean                    0.0152
LCL Mean (0.95)           -0.0097
UCL Mean (0.95)            0.0513
Variance                   0.0134
Stdev                      0.1159
Skewness                   0.3066
Kurtosis                   0.1916
returns_statistics["Stdev",]
[1] 0.1159

Also we can use the sd and the mean function:

portfolio_volatility = sd(pr,na.rm = TRUE)
portfolio_volatility 
[1] 0.1159008
portfolio_mean_return = mean(pr,na.rm=TRUE)
portfolio_mean_return
[1] 0.02081747

Remember that in Finance, the standard deviation of returns is usually called volatility.

We can say that the monthly volatility of this portfolio composed of 70% of Wal Mart and 30% of Tesla calculated from 2018 up to Oct 2022 was 11.5900759%.

We can complement this including the mean historical return of the portfolio. We can say that this portfolio had an average monthly return of 2.0817467%, and a monthly volatility of 11.5900759%.

We can also appreciate the historical risk of a portfolio if we do a Box Plot of its monthly returns:

chart.Boxplot(PR)

We can compare this portfolio risk with the individual stock risk:

chart.Boxplot(merge(R,PR))

The box contains the returns between the quartile 1 (Q1) and the quartile 3 (Q3), while the mean return is the red point and the mdian (Q2) is the mid vertical line.

Can you appreciate the risk of the stocks and the portfolio?

It is very important to be aware of the granularity of the historical data used to calculate volatility and mean returns. Granularity of a dataset refers to the frequency of periods. In this case, we are using monthly returns, so the granularity is monthly.

Then, we can have a better interpretation of portfolio volatility and portfolio return if we convert both values from monthly to annual. It is very common to report annual mean portfolio return and annual portfolio volatility based on monthly data.

To convert monthly portfolio mean return to annual portfolio mean return, we just multiply the monthly average return times 12:

annual_portfolio_mean_return = 12 * portfolio_mean_return
annual_portfolio_mean_return
[1] 0.2498096

However, to convert from monthly to annual volatility, we need to multiply by the square root of the numbers of the periods in the year, in this case, 12:

annual_portfolio_volatility = sqrt(12) * portfolio_volatility
annual_portfolio_volatility
[1] 0.401492

Now we can do a better interpretation using these annual figures:

We can say that this portfolio had an average annual return of 24.9809605%, and an annual volatility of 40.1492005%.

In the following sections we review what is variance and standard deviation of returns, and how they are calculated (without using a function)

3.2.1 What is variance of returns?

Variance of any variable is actually the arithmetic mean of squared deviations. A squared deviation is the value resulting from subtracting the value of the variable minus its mean, and the square the value.

The mean of returns is estimated as:

\bar{r} =\frac{r_{1}+r_{2}+...+r_{N}}{N}

The variance is estimated as:

VAR(r)=\frac{(r_{1}-\bar{r})^{2}+(r_{2}-\bar{r})^{2}+...+(r_{N}-\bar{r})^{2}}{N}

Variance is a measure of dispersion. The higher the variance, the more dispersed the values from its mean. It is hard to interpret the magnitude of variance. That is the reason why we need to calculate standard deviation, which is basically the squared root of the variance.

3.2.2 What is standard deviation?

Standard deviation of a variable is the squared root of the variance of the variable:

SD(r)=\sqrt{VAR(r)} Then, the standard deviation of returns can be interpreted as a standardized average distance from each value of the variable from its mean.

The standard deviation of returns is called volatility. Then, volatility of returns tells us an idea of how much on average (above or below) the period returns move from its mean.

We can calculate volatility of a single stock, or volatility of a portfolio composed of 2 or more stocks.

3.3 Historical Sharpe ratio of a portfolio

The Sharpe ratio is a standardized measure of portfolio premium return after considering its volatility.

A premium return is the return above the risk-free rate. In Mexico the risk-free rate is the CETES; in the US, the risk-free rate is the Treasury Bills.

Then, the Sharpe ratio tells us how much portfolio returns (above the risk free rate) we can expect for each percent point of volatility. Let’s see the formula:

SharpeRatio=\frac{(PortfolioReturn-riskfreeRate)}{PortfolioVolatility}

4 Calculating expected portfolio return

Up to now we have calculated historical portfolio returns and risk. Here we review how to estimate future expected portfolio returns and risk based on Portfolio Theory developed by Harry Markowitz.

4.1 Calculating expected asset return

We will use the simpler method to estimate the expected return of each stock, which is the geometric mean of historical returns. Other methods to estimate expected stock return are a) the CAPM regression model, b) ARIMA models.

The geometric mean of historical returns is the average period return needed so that holding an investment at that geometric mean return per period, we will get the final holding return of the stock.

The mathematical formula of geometric mean return is:

GeomMean(R)=\sqrt[N]{(1+R_{1})(1+R_{2})...(1+R_{N})}-1 Where R_t is the historical return for period t. In this formula we have N historical periods (can be months)

Another easier way to calculate geometric mean of returns is to calculate the arithmetic mean of continuously compounded returns, and then convert the result to simple return by applying the exponential function:

GeomMean(R)=e^{\bar{r}}-1 Where \bar{r} is the arithmetic mean of historical returns:

\bar{r}=\frac{r_{1}+r_{2}+...+r_{N}}{N}

Let’s do an example. Calculate the expected monthly return of Wal Mart and Tesla using the same historical data from 2018 to Oct 2022.

We need to calculate the continuously compounded returns of each stock:

#Calculating continuously compounded returns:
r = diff(log(prices))

Now we get the expected return for each stock as the geometric mean of their historical returns:

ccmean_returns  = colMeans(r,na.rm=TRUE)

colMeans calculate the arithmetic mean of 1 or more columns of a dataset.

Once we have the arithmetic mean cc returns, we convert them to simple returns:

ER = exp(ccmean_returns) - 1
ER
       WMT       TSLA 
0.01479262 0.03119713 

Now that we have individual expected returns we can estimate the expected return of a portfolio composed of the stocks.

4.2 Calculating expected portfolio return

The expected portfolio returns is the weighted average of the individual expected return of the stocks of the portfolio.

Imagine we have a 2-stock portfolio composed as follows:

WalMart: 70%

Tesla: 30%

4.2.1 Method 1: weighted average using a sum of products

We use the weights (%) allocated for each asset to calculate the weighted average as the portfolio expected return:

w1 = 0.7
w2 = 0.3
ER_Portfolio = w1 * ER[1] + w2 * ER[2]
#  ER is a vector of 2 numbers: the expected return of stock 1 and the expected return of stock 2 
names(ER_Portfolio)=c("ER_Portfolio")
ER_Portfolio
ER_Portfolio 
  0.01971397 

Then, the expected return of this portfolio for the future month is 1.9713969%.

4.2.2 Method 2: weighted average using matrix algebra

Another way to calculate the expected return of a portfolio is using matrix algebra. This is a very useful method when we have many assets in the portfolio since it is very easy to compute.

If you do not remember how to multiply matrices, it is strongly recommended to review this (you can read Note 2 of Portfolio Theory).

Matrix multiplication is used to compute sum of products or sum of multiplications.

For example, the way to estimate the expected return of our portfolio is the following:

ERPort=t\left(W\right)*ER

ERPort=\begin{bmatrix}w_{1} & w_{2}\end{bmatrix}*\begin{bmatrix}ER_{1}\\ ER_{2} \end{bmatrix}

We compute this in R as follows:

# We set a vector composed of the asset weights:
W = c(w1,w2)
# We multiply the transpose of the weight vector times the vector of expected returns
ER_portfolio = t(W) %*% ER
ER_portfolio
           [,1]
[1,] 0.01971397

The transposed of a matrix or a vector is the conversion of the rows by columns and columns by rows. Transposing is like rotating a vector or a matrix 90 degrees:

t\left(\left[\begin{array}{c} w_1\\ w_2 \end{array}\right]\right)=\left[\begin{array}{cc} w_1 & w_2\end{array}\right]

Then, the previous matrix multiplication was:

\begin{bmatrix}w_{1} & w_{2}\end{bmatrix}*\begin{bmatrix}ER_{1}\\ ER_{2} \end{bmatrix}

\begin{bmatrix}0.7 & 0.3\end{bmatrix}*\begin{bmatrix}0.00775\\ 0.05588 \end{bmatrix}=0.7*0.00775+0.3*0.05588

With this multiplication we got the same expected portfolio return as above.

5 Expected portfolio risk

Before we calculate the expected portfolio risk we need to understand what is the expected portfolio variance.

According to Portfolio Theory, the expected variance of a 2-asset portfolio returns is given by:

VAR(PortReturns)=w_{1}^{2}VAR(r_{1})+w_{2}^{2}VAR(r_{2})+2w_{1}w_{2}COV(r_{1},r_{2}) r_1 refers to returns of stock 1, and r_2 refers to returns of stock 2.

COV(r_1,r_2) is the covariance of return 1 and return 2.

Check the lecture note Basics of Portfolio Theory-Note 1 to understand why this is the way to estimate the expected variance of a 2-asset portfolio.

It is worth to remember what is covariance.

5.1 What is covariance of 2 stock returns?

The covariance of 2 stock returns is the arithmetic mean of the product return deviations. A deviation is the difference between the stock return in a period t and its mean. Here is the formula:

COV(r_{1},r_{2})=\frac{(r_{(1,1)}-\bar{r_{1}})(r_{(2,1)}-\bar{r_{2}})+(r_{(1,2)}-\bar{r_{1}})(r_{(2,2)}-\bar{r_{2}})+(r_{(1,3)}-\bar{r_{1}})(r_{(2,3)}-\bar{r_{2}})+...}{N}

Were:

r_{(1,1)} is the return of stock 1 in the period 1

r_{(2,1)} is the return of stock 2 in the period 1

Then:

r_{(i,j)} is the return of stock i in the period j

\bar{r_{1}} is the average return of stock 1

\bar{r_{2}} is the average return of stock 2

Then, in the numerator we have a sum of product deviations. Each product deviation is the deviation of the stock 1 return multiplied by the deviation of the stock 2 return.

The covariance is a measure of linear relationship between 2 variables. If covariance between stock return 1 and stock return 2 is positive this means that both stock returns are positively related. In other words, when stock 1 return moves up it is likely that stock 2 return moves up and vice versa; both returns move in the same direction (not always, but mostly).

If covariance is negative that means that stock 1 return is negatively related to stock 2 return; when stock 1 return moves up it is likely that stock 2 return moves down.

Covariance can be a negative or positive number (it is not limited to any number). It is very difficult to interpret the magnitude of covariance. It is much more intuitive if we standardize covariance.

The standardization of the covariance is called correlation.

5.2 What is correlation of 2 stock returns?

Correlation of 2 stock returns is also a measure of linear relationship between both returns. The difference compared to covariance is that the possible values of correlation is between -1 and +1, and the correlation gives us a percentage of linear relationship.

Correlation between 2 returns is the covariance between the 2 returns divided by the product of standard deviation of return 1 and standard deviation of return 2.

We calculate correlation as follows:

CORR(r_{1},r_{2})=\frac{COV(r_{1},r_{2})}{SD(r_{1})SD(r_{2})}

Correlation has the following possible values:

-1<=CORR(r_{1},r_{2})<=+1

5.3 Expected variance of a portfolio

Now we can calculate the expected variance of our portfolio according to the previous formula:

5.3.1 Method 1: using sum of products

VAR1 = var(r$WMT,na.rm=TRUE)
VAR2 = var(r$TSLA,na.rm=TRUE)
COV = var(x=r$WMT,y=r$TSLA,na.rm=TRUE)

VarPortfolio = w1^2 * VAR1 + w2^2 * VAR2 + 2*w1*w2*COV
VarPortfolio
            WMT
WMT 0.006121081

5.3.2 Method 2: using matrix algebra

Another faster method to get the expected variance of a portfolio is by multiplying the following matrices:

t\left(W\right)*COV*W Where:

W=\begin{bmatrix}w_{1}\\ w_{2} \end{bmatrix}

And COV is the Variance-Covariance matrix, which has the return variances in the diagonal and the pair correlations in the non-diagonal:

COV=\begin{bmatrix}VAR(r_{2}) & COV(r_{1},r_{2})\\ COV(r_{2},r_{1}) & VAR(r_{1}) \end{bmatrix}

We can easily calculate the expected portfolio variance with matrix multiplication in R as follows:

# We calculate the Variance-Covariance matrix with the var function:
COV =var(r,na.rm=TRUE)
# Note var function calculates the variance-covariance matrix if r has more than 1 column.
# If r has only 1 column, then var calculates only the variance of that column.
COV
             WMT        TSLA
WMT  0.002988332 0.002924079
TSLA 0.002924079 0.038096498
VarPortfolio = t(W) %*% COV %*% W
VarPortfolio
            [,1]
[1,] 0.006121081

It is always good to review the correlation matrix of the asset returns of a portfolios. We can get the correlation matrix from the covariance matrix as follows:

CORR = cov2cor(COV)
CORR
           WMT      TSLA
WMT  1.0000000 0.2740515
TSLA 0.2740515 1.0000000

In this case we see 1’s in the diagonal since the correlation of an asset return with itself will always be equal to 1. In the diagonal we see the level of correlation between both asset returns.

We got the same result as above, but the result now is a matrix of 1 row and 1 element.

5.4 Expected risk of a portfolio

To get the expected risk of a portfolio, we simply get the squared root of the expected variance:

PortRisk=SD(PortReturns)=\sqrt{VAR(PortReturns)}

We do this in R as follows:

PortRisk = sqrt(VarPortfolio[1,1])
PortRisk
[1] 0.07823733

The expected portfolio monthly risk is 7.8237335%. Remember that portfolio risk is also called portfolio volatility. We can annualize portfolio volatility:

AnnualPortRisk = sqrt(12) * PortRisk
AnnualPortRisk
[1] 0.2710221

The expected portfolio annual risk is 27.1022077%.

5.5 Drivers of Portfolio Diversification

Financial portfolios allow an investor to diversify the risk. The main drivers of portfolio diversification are:

  1. N - the number of financial assets. The more the financial assets, the higher the expected diversification.

  2. Combination of pair of correlations between asset returns. The less the correlation of the pair of assets, the more the diversification.

The first driver is just given by N, the number of assets.

The second driver can actually be manipulated by changing the weights allocated for each asset.

In the following section we illustrate how changing asset weights actually change both, the expected return of the portfolio and the expected portfolio volatility.

6 Risk-return space - set of feasible portfolios

Weight combination determines both expected risk and expected return of a portfolio. Let’s do an example with the same 2-asset portfolio:

6.1 Frontier of 2-asset portfolios

Let’s create a set of 11 portfolios where we change the weight assigned to the first stock from 0 to 1 changing by 0.10. We can create a matrix of weights, where each column represents one portfolio:

W=\begin{bmatrix}0 & 0.1 & 0.2 & 0.3 & 0.4 & 0.5 & 0.6 & 0.7 & 0.8 & 0.9\\ 1 & 0.9 & 0.8 & 0.7 & 0.6 & 0.5 & 0.4 & 0.3 & 0.2 & 0.1 \end{bmatrix}

We had already calculated the vector of expected return for each stock:

ER=\begin{bmatrix}ER_{1}\\ ER_{2} \end{bmatrix}=\begin{bmatrix}0.00775\\ 0.05588 \end{bmatrix}

Using the advantage of matrix algebra we can easily get the 11 portfolio expected returns as follows:

ERPortfolios=t(W)*ER

We compute this in R:

# I first create a vector for the weights of the first stock:
w1= seq(from=0,to=1,by=0.1)
w1
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
# The weight vector for stock 2 is just the complement of w1:
w2 = 1 - w1
w2
 [1] 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
# I construct the weight matrix with the cbind function:

W = rbind(w1,w2)
W
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
w1    0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8   0.9     1
w2    1  0.9  0.8  0.7  0.6  0.5  0.4  0.3  0.2   0.1     0
# I compute the expected return of the 11 portfolios:
ERPortfolios = t(W) %*% ER

We see that the portfolios with higher expected returns are the first, which are the portfolios with higher weight in stock 2.

Now we can compute the expected risk of the 11 portfolio as follows:

VARPortfolios = t(W) %*% COV %*% W

When we have more than 1 portfolio in the weight matrix, the result of the previous matrix multiplication will be a matrix of 11 rows and 11 columns. The expected variance of the 11 portfolio result in the diagonal of the previous multiplication:

VARPortfolios = diag(VARPortfolios)
VARPortfolios
 [1] 0.038096498 0.031414381 0.025436997 0.020164347 0.015596430 0.011733247
 [7] 0.008574797 0.006121081 0.004372098 0.003327848 0.002988332

Finally, the expected portfolio volatility is the squared root of these 11 expected variances:

RISKPortfolios = sqrt(VARPortfolios)
RISKPortfolios
 [1] 0.19518324 0.17724102 0.15948980 0.14200122 0.12488567 0.10832011
 [7] 0.09260020 0.07823733 0.06612184 0.05768750 0.05466564

Now that we have the 11 portfolio risk and 11 portfolio returns we can plot the 11 portfolios in the risk-return space:

plot(x=RISKPortfolios,y= ERPortfolios,
     xlabel="Portfolio Expected Risk", ylabel="Portfolio Expected Return")
text(x=RISKPortfolios,y= ERPortfolios,
     labels=w1, cex=0.7, pos=3)

Weights of stock 1 (WMT) are shown in each portfolio. We see that less the w1, the less the expected portfolio risk, but the less the expected portfolio return.

With this illustration we can see how asset weight combination in a portfolio actually change the expected return and risk of a portfolio!

This plot is called the frontier of feasible portfolios of 2 assets. Interestingly, for portfolios with 3 or more assets, the possibilities of feasible portfolios become will not be a nice frontier like this. The feasible portfolios will be in between a frontier. In other words, for these portfolios some feasible portfolios will not be efficient since there will be other feasible portfolios with the same level of expected risk, but with higher expected return.

Let’s do an example of 4-asset portfolios.

6.2 Risk-return space of 4-asset portfolios

We will add 2 stocks to our portfolio: Oracle (ORCL) and Freeport-McMoran, Inc (FCX). This last firm is a mining company. I will collect the data for the 4 assets using the same code we used above:

tickers = c("WMT","TSLA","ORCL","FCX")
getSymbols(Symbols = tickers, periodicity="monthly",from="2020-01-01", to="2024-10-31")
[1] "WMT"  "TSLA" "ORCL" "FCX" 
datasets_list <- lapply(tickers, get)

# Merging the datasets:
prices <- do.call(merge, datasets_list)
# We select only Adjusted prices:
prices <- Ad(prices)

# We change the name of the columns to be equal to the tickers vector:
names(prices) = tickers

#Calculating continuously compounded returns:

r = diff(log(prices))

Since we have now 4 assets, let’s create many feasible portfolios with random weights, but with the condition that the sum of the weights must be 1 (100%). Let’s create 1,000 random portfolios:

# Initialize the Weight matrix
W = c()
# Create 4 vectors of 1000 random values from 0 to 1:
for(i in 1:4) {
  W = rbind(W,runif(1000))
}
# The problem I have now is that some of the 1,000 portfolios
#  might end up having a sum of weights higher than one. 
# I can do a simple "trick" by 
#   dividing each of the 4 weights by the sum of these 4
#   weights. And I can do this # for all 1000 portfolios: 

# I first create a vector with the sum of weights for all portfolios:
sumw <- colSums(W)
# I do another loop to divide each weight by the sum of weights: 
for(i in 1:4)  {
  W[i,]<-W[i,]/sumw
  # In each iteration I divide one raw of W2 by the vector sumw, 
  #  which is the sum of the weights of all 1000 portfolios
}

# I check that the sum of weights is 1 (I do this for 5 portfolios)
W[,1:5]
           [,1]       [,2]      [,3]      [,4]       [,5]
[1,] 0.30533803 0.22863123 0.2467750 0.3076700 0.19416852
[2,] 0.02561224 0.10385961 0.2982268 0.1418303 0.46844257
[3,] 0.38843321 0.07696658 0.1547991 0.2489081 0.09539128
[4,] 0.28061653 0.59054258 0.3001990 0.3015917 0.24199763
colSums(W[,1:5])
[1] 1 1 1 1 1
# All sums are equal to 1, as expected

# Then each column of this matrix represents a random portfolio without allowing for short sales. 

I calculate the expected return of each asset:

ER = exp(colMeans(r,na.rm=TRUE)) - 1
ER
       WMT       TSLA       ORCL        FCX 
0.01479262 0.03119713 0.02195815 0.02579882 

I calculate the variance-covariance matrix:

COV = var(r,na.rm=TRUE)
COV
             WMT        TSLA        ORCL         FCX
WMT  0.002988332 0.002924079 0.001617215 0.001719990
TSLA 0.002924079 0.038096498 0.004776271 0.012022469
ORCL 0.001617215 0.004776271 0.006880790 0.004332892
FCX  0.001719990 0.012022469 0.004332892 0.019741561

I calculate the correlation matrix to review the correlation levels of pair of assets:

CORR = cov2cor(COV)
CORR
           WMT      TSLA      ORCL       FCX
WMT  1.0000000 0.2740515 0.3566435 0.2239343
TSLA 0.2740515 1.0000000 0.2950036 0.4383898
ORCL 0.3566435 0.2950036 1.0000000 0.3717646
FCX  0.2239343 0.4383898 0.3717646 1.0000000

The lowest correlations are the one between ORCL and TSLA and the correlation between WMT and FCX (around 0.12).

We can calculate the expected return of the 1,000 portfolios:

ERPortfolios = t(W) %*% ER

We can calculate the expected risk of the 1,000 portfolios:

VARPortfolios = diag(t(W) %*% COV %*% W) 
RISKPortfolios = sqrt(VARPortfolios)

Now we can visualize the feasible portfolios in the risk-return space:

plot(x=RISKPortfolios,y= ERPortfolios,
     xlabel="Portfolio Expected Risk", ylabel="Portfolio Expected Return")

We can see that for each level of expected risk there are many possible portfolios. The best portfolios will be those that lie in the frontier of this feasible portfolios.

The problem we have when we have 3 or more assets in a portfolio is that we need to find the efficient frontier to get only those portfolios that maximize the expected return for each level of expected risk.

Here is where we need to find optimization algorithms to do this.

In the lecture note Basics of Portfolio Theory - Part III I explain what type of optimization methods can be used.

Fortunately, we can use R packages that already do these complicated optimization algorithms.

7 Portfolio optimization

Out of all feasible portfolios we might be interested in the following:

  1. The portfolio with the least expected risk of all - The Global Minimum Variance Portfolio (GMV)

  2. The efficient frontier - all portfolios that offer the highest expected return for any level of expected risk

  3. The tangent/optimal portfolio - The portfolio with the highest Sharpe Ratio

You need to install the IntroCompFinR package. You need to install this package using the Console, and typing:

install.packages(“IntroCompFinR”, repos=“http://R-Forge.R-project.org”)

If you copy.paste this line from this Workshop, re-type the double quotes (“) before you run in the Console.

7.1 Estimating the global minimum variance portfolio

We can easily get the GMV portfolio as follows:

library(IntroCompFinR)

GMVPort = globalMin.portfolio(ER,COV)
GMVPort
Call:
globalMin.portfolio(er = ER, cov.mat = COV)

Portfolio expected return:     0.01618212 
Portfolio standard deviation:  0.05167417 
Portfolio weights:
    WMT    TSLA    ORCL     FCX 
 0.7916 -0.0280  0.1961  0.0403 

Allowing for short-sales, the portfolio with the minimum variance has an expected return of 0.0161821, and expected risk of 0.0516742. We see that FCX has a negative weight, meaning that we need to short FCX with 3.4%.

We can estimate the GMV portfolio without allowing for short sales:

GMVPort = globalMin.portfolio(ER,COV, shorts = FALSE)
GMVPort
Call:
globalMin.portfolio(er = ER, cov.mat = COV, shorts = FALSE)

Portfolio expected return:     0.01645 
Portfolio standard deviation:  0.05189667 
Portfolio weights:
   WMT   TSLA   ORCL    FCX 
0.7828 0.0000 0.1908 0.0263 

Now the GMV assigned 0% to FCX and 0% to TSLA. This makes sense since both stocks are very volatile. We can check the volatility of each of the stock returns:

table.Stats(r)
                    WMT    TSLA    ORCL     FCX
Observations    57.0000 57.0000 57.0000 57.0000
NAs              1.0000  1.0000  1.0000  1.0000
Minimum         -0.1734 -0.4578 -0.1941 -0.3890
Quartile 1      -0.0174 -0.1187 -0.0241 -0.0762
Median           0.0171  0.0214  0.0197  0.0263
Arithmetic Mean  0.0147  0.0307  0.0217  0.0255
Geometric Mean   0.0132  0.0124  0.0184  0.0155
Quartile 3       0.0508  0.1719  0.0790  0.1270
Maximum          0.1179  0.5547  0.2456  0.2993
SE Mean          0.0072  0.0259  0.0110  0.0186
LCL Mean (0.95)  0.0002 -0.0211 -0.0003 -0.0118
UCL Mean (0.95)  0.0292  0.0825  0.0437  0.0628
Variance         0.0030  0.0381  0.0069  0.0197
Stdev            0.0547  0.1952  0.0830  0.1405
Skewness        -0.6311  0.2881  0.1072 -0.2875
Kurtosis         0.9903  0.0331  0.2674  0.1121

We can see that the historical volatility (Stdev) of Tesla and FCX are much higher than the volatility of ORCL and WMT.

7.2 Estimating the efficient frontier

efrontier <- efficient.frontier(ER, COV, nport = 100, 
                         alpha.min = -0.5, 
                         alpha.max = 1.5, shorts = FALSE)
plot(efrontier, plot.assets=TRUE, col="blue")

7.3 Estimating the optimal/tangent portfolio

We need to define a risk-free rate before estimating the optimal portfolio, since the returns that are important for an investor is the premium returns, which are the returns above the risk-free instrument.

In this case, I check the current risk-free rate or the risk-free rate at the time of the portfolio formation. In this case, I assume that I am at the end of Oct 2024 (I downloaded stock price data until Oct 31, 2024), which was around 5.5% annual.

It is important that I have to use the corresponding rate at granularity of my data. In this case, I am using monthly data, so I need to convert the risk-free rate from annual to monthly:

annual_rfree = 0.055 
# I convert from simple to continuously compounded:
ccannual_rfree = log(1+annual_rfree)
# I convert from annual cc rate to monthly cc rate:
rfree = ccannual_rfree / 12 
rfree
[1] 0.004461731
tangentPort = tangency.portfolio(ER, COV,rfree, shorts=FALSE)
tangentPort
Call:
tangency.portfolio(er = ER, cov.mat = COV, risk.free = rfree, 
    shorts = FALSE)

Portfolio expected return:     0.01926056 
Portfolio standard deviation:  0.05806525 
Portfolio weights:
   WMT   TSLA   ORCL    FCX 
0.4861 0.0461 0.3740 0.0937 
tangentPortWeights = getPortfolio(ER, COV, weights=tangentPort$weights)
plot(tangentPortWeights, col="blue")

Finally, we can plot the efficient frontier, the tangent portfolio and the Capital Market Line:

plot(efrontier, plot.assets=TRUE, col="blue", pch=16)
points(GMVPort$sd, GMVPort$er, col="green", pch=16, cex=2)
points(tangentPort$sd, tangentPort$er, col="red", pch=16, cex=2)
text(GMVPort$sd, GMVPort$er, labels="GLOBAL MIN", pos=2)
text(tangentPort$sd, tangentPort$er, labels="TANGENCY", pos=2)

SharpeRatio = (tangentPort$er - rfree)/tangentPort$sd

abline(a=rfree, b=SharpeRatio, col="green", lwd=2)

The Capital Market Line (CML) is the green line that goes from the risk-free rate to the tangent portfolio. One of the main findings of portfolio theory is that when we add 1 risk-free instrument to a portfolio composed of stocks, the new efficient frontier becomes the Capital Market Line (instead of the hyperbola), which is more efficient that the previous efficient frontier (the hyperbola).

Then, an investor can play between the tangent portfolio and the risk-free rate to move in the CML. If an investor has a middle-level aversion to risk, he/she might allocate 50% to the risk-free asset and the rest 50% in the tangent portfolio, locating the portfolio in a mid-point in the CML (the green line) between the risk-free rate and the tangent portfolio.

7.4 Assumptions of Portfolio Theory

The main assumption of portfolio theory is that all investors behave as rational participants all the time. In other words, they always maximize return and minimize risk using the available information disclosed to the market in a rational way. In other words, they act with no emotions, no fear and they always understand what happen in the market.

8 What if most of the investors are not rational all the time?

If most of the investors are not rational all the time, how can you take advantage of this and define a disruptive portfolio? Which trends, beliefs, fears, regulations, opportunities do you think that few investors are looking for, so that you can beat the rational optimized portfolio?

Based on your own set of beliefs and your understanding of portfolio risk-return trade-off, propose weights for a “disruptive” portfolio that might beat most of the optimized portfolios in the market?

9 Challenge

Create a vector of 10 tickers (select any you prefer). If you prefer, select the best 10 tickers you got from Workshop 2.

With this vector, bring monthly data from Yahoo from Jan 2020 to Oct 2024, and do the following:

1.- Generate the Global Minimum Variance Portfolio

2.- Generate 10,000 portfolios with random weights for each asset.

3.- (Optional) Propose 2 portfolios: Conservative, Aggressive. Include the description of each portfolio, expected return and volatility.

10 Challenge Solution

I will use the best stocks I got from Workshop 2. Then, I will do the same stock screening (Solution of Workshop 2) here before I optimize the portfolio with the selected stocks:

# Load the datasets 
uspanel = read.csv("dataus2024.csv")
usfirms = read.csv("firmsus2024.csv")

I calculate the period EBIT for all stocks-quarters:

# I check the cases with sgae=NA and revenue>0: 
uspanel |> filter(q=="2024q2", !is.na(revenue), is.na(sgae)) |> select(firm,q,revenue,cogs,sgae)
       firm      q   revenue     cogs sgae
1       AFL 2024q2  10575000        0   NA
2       AIG 2024q2  13323000        0   NA
3       AJX 2024q2    -28436    54044   NA
4       ALL 2024q2  30973000        0   NA
5      AMSF 2024q2    156319        0   NA
6      AMTB 2024q2    324942   273607   NA
7       ARR 2024q2    271405   305239   NA
8       ASB 2024q2   1183806   932729   NA
9      AUBN 2024q2     20596    16641   NA
10      AXP 2024q2  36204000 26732000   NA
11      BAC 2024q2  96600000 78951000   NA
12     BCBP 2024q2     97605    80773   NA
13      BFH 2024q2   2418000  1438000   NA
14     BFIN 2024q2     37737    32798   NA
15       BK 2024q2  19542000 16664000   NA
16      BOH 2024q2    508258   410187   NA
17     BOKF 2024q2   1738434  1404510   NA
18      BWB 2024q2    122860   100648   NA
19     BXMT 2024q2    269165    63041   NA
20        C 2024q2  86453000 72758000   NA
21     CCBG 2024q2    133289    98574   NA
22      CFB 2024q2    253627   203081   NA
23     CFFI 2024q2     81836    65557   NA
24      CFR 2024q2   1404352  1038203   NA
25     CINF 2024q2   5479000        0   NA
26      CMA 2024q2   2517000  2067000   NA
27      CNA 2024q2   6963000        0   NA
28     CNOB 2024q2    267861   212534   NA
29      COF 2024q2  26331000 17506000   NA
30      CPF 2024q2    173607   129861   NA
31     CRBG 2024q2   9546000        0   NA
32     DCOM 2024q2    342702   281600   NA
33      DFS 2024q2  11656000  6946000   NA
34       DX 2024q2    199844   168029   NA
35     EFSC 2024q2    447019   328628   NA
36      EIG 2024q2    440100        0   NA
37      EQH 2024q2   5740000        0   NA
38     ERIE 2024q2    357926        0   NA
39      FAF 2024q2   3036900        0   NA
40     FBIZ 2024q2    127875   100864   NA
41     FCBC 2024q2     91419    57560   NA
42     FFIN 2024q2    363818   228605   NA
43      FHN 2024q2   2546000  1927000   NA
44     FIBK 2024q2    733400   564000   NA
45     FLIC 2024q2     89680    79505   NA
46  FNM_old 2024q2  75100000 64543000   NA
47  FRE_old 2024q2  59507000 52018000   NA
48     FRME 2024q2    530253   404995   NA
49     GBCI 2024q2    615429   513073   NA
50      GNW 2024q2   3633000        0   NA
51       GS 2024q2  63089000 53336000   NA
52     HBAN 2024q2   5814000  4511000   NA
53     HBCP 2024q2     96888    73966   NA
54      HBT 2024q2    140021    92846   NA
55      HIG 2024q2  12905000        0   NA
56     HOMB 2024q2    728791   452487   NA
57     HTGC 2024q2    217855    85039   NA
58      HTH 2024q2    791680   715973   NA
59     IBOC 2024q2    513523   243673   NA
60     ISTR 2024q2     77010    67871   NA
61      JPM 2024q2 142257000 96593000   NA
62      KEY 2024q2   5394000  4580000   NA
63     KNSL 2024q2    757344        0   NA
64     KREF 2024q2    315704   255682   NA
65        L 2024q2   8498000        0   NA
66     LADR 2024q2    258691   198479   NA
67     LMND 2024q2    241100        0   NA
68      LNC 2024q2   9269000        0   NA
69     MBWM 2024q2    176152   120850   NA
70      MCB 2024q2    241239   191005   NA
71      MCY 2024q2   2579080        0   NA
72      MET 2024q2  33880000        0   NA
73      MKL 2024q2   8168498        0   NA
74      MTG 2024q2    599638        0   NA
75     MYFW 2024q2     90701    83301   NA
76     NAVI 2024q2   2252000  2090000   NA
77     NBHC 2024q2    295902   222484   NA
78      NLY 2024q2   2775496  2308167   NA
79     NMFC 2024q2    185144   108916   NA
80     NMIH 2024q2    318375        0   NA
81     NRIM 2024q2     90138    68049   NA
82     NTRS 2024q2   8263400  6799700   NA
83     NYMT 2024q2     81792   185701   NA
84     OCFC 2024q2    344299   269833   NA
85      OMF 2024q2   2746000  1444000   NA
86      ONB 2024q2   1424437  1060081   NA
87      ORC 2024q2    128037   113240   NA
88      ORI 2024q2   3887600        0   NA
89     PEBO 2024q2    307846   222333   NA
90      PFG 2024q2   8364100        0   NA
91      PMT 2024q2    488771   427781   NA
92      PNC 2024q2  17058000 13193000   NA
93     PPBI 2024q2    465481   340164   NA
94      PRK 2024q2    310538   214501   NA
95      PRU 2024q2  38392000        0   NA
96      RDN 2024q2    640565        0   NA
97      RGA 2024q2  11215000        0   NA
98      RLI 2024q2    861273        0   NA
99     SBSI 2024q2    228225   172634   NA
100    SFST 2024q2    105096    97479   NA
101    SIGI 2024q2   2360964        0   NA
102     SLM 2024q2   1620997   866094   NA
103     SNV 2024q2   1573989  1354650   NA
104 SOV_old 2024q2   8692481  7090498   NA
105    SRCE 2024q2    283014   190749   NA
106     SSB 2024q2   1195162   852221   NA
107    STBA 2024q2    282654   197576   NA
108    STEL 2024q2    312314   239700   NA
109     STT 2024q2  10728000  9218000   NA
110     SYF 2024q2  10850000  4723000   NA
111    TCBI 2024q2    931189   798657   NA
112    TFIN 2024q2    241128   222726   NA
113     THG 2024q2   3087800        0   NA
114    TIPT 2024q2   1044894        0   NA
115    TRTX 2024q2    184198   142430   NA
116     TRV 2024q2  22511000        0   NA
117     TWO 2024q2    686827   400731   NA
118    UBFO 2024q2     32421    20325   NA
119    UMBF 2024q2   1362510  1077676   NA
120     UNM 2024q2   6433700        0   NA
121     UVE 2024q2    748173        0   NA
122    UVSP 2024q2    245016   194729   NA
123    VBTX 2024q2    393074   312049   NA
124     VEL 2024q2    236625   192296   NA
125    VOYA 2024q2   4084000        0   NA
126    WAFD 2024q2   1042814   849924   NA
127     WAL 2024q2   2447600  1915600   NA
128     WFC 2024q2  63126000 49205000   NA
129     WRB 2024q2   6570805        0   NA
130    WSBC 2024q2    460311   367352   NA
131    WSFS 2024q2    694314   481972   NA
132    WTBA 2024q2     96783    83220   NA
uspanel = uspanel |>
  mutate(sgae = ifelse(!is.na(revenue) & is.na(sgae),0,sgae),
        # for some reason there are some cases of firms with revenue and cogs, but with sgae=NA;
        #    this should be the case for service firms, so I will replace sgae=0 when revenue>0;
        #    if I do not replace sgae=0 for these cases, then YTDebit cannot be calculated! 
         YTDebit = revenue - cogs - sgae, 
         ebitp = ifelse(fiscalq==1,YTDebit, YTDebit - lag(YTDebit)))

I create a column for the industry in the uspanel dataset by pulling the industry from the usfirms dataset:

# I create a temporal usfirms1 dataset with only 2 columns. I rename the empresa column with firm to have the same name for the ticker in both files:
usfirms1 = usfirms |>
  mutate(firm = empresa) |> 
  select(firm,naics1, naics2)
# I merge the usfirms1 with the uspanel to pull the industry:
uspanel = left_join(uspanel,usfirms1,by='firm')

I calculate annual stock return for all firms-quarters:

uspanel = uspanel |>
  # I sort by firm-quarter:
  arrange(firm,q) |>
  # I group by firm to make sure that the lag function works only on the 
  #   quarters of the firm : 
  group_by(firm) |>
  # I create market value, operating earnings per share deflated by stock price,  and stock annual return:
  mutate(marketvalue = originalprice * sharesoutstanding,
         oepsp = (YTDebit / sharesoutstanding) / originalprice,
         # The annual return is equal to the current stock price minus its
         #   stock price of 4 quarters ago:
         r = log(adjprice) - dplyr::lag(log(adjprice),4)
  ) |>
  # I do ungroup to undo the group_by
  ungroup()

I winsorize the oepsp:

library(statar)
hist(uspanel$oepsp)

uspanel$oepspw= winsorize(uspanel$oepsp, probs=c(0.005,0.999))
hist(uspanel$oepspw)

For each quarter-industry I calculate the median return and the binary variable rabove that indicates whether the stock returns beats the industry returns:

uspanel = uspanel |>
  group_by(q,naics1) |>
  mutate(industry_ret = median(r,na.rm=TRUE),
         rabove = ifelse(r>industry_ret,1,0)) |>
  ungroup()

I calculate another variable equal to the rabove, but 1 quarter in the future (1 lag in the future):

uspanel = uspanel |> 
  group_by(firm) |>
  mutate(F1rabove = lead(rabove,1))
# The lead function gets the future value of the variable

# I check how many rows/cases have 1 and 0's:
table(uspanel$F1rabove)

     0      1 
115114 114211 

I run the logit model to examine whether oepspw is related to the probability of a stock to beat its industry return 1 quarter in the future:

logitm1 <- glm(F1rabove ~ oepspw, data= uspanel, family="binomial",na.action=na.omit)
summary(logitm1)

Call:
glm(formula = F1rabove ~ oepspw, family = "binomial", data = uspanel, 
    na.action = na.omit)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.020277   0.004256  -4.764  1.9e-06 ***
oepspw       0.884870   0.017116  51.699  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 311904  on 224995  degrees of freedom
Residual deviance: 308217  on 224994  degrees of freedom
  (332185 observations deleted due to missingness)
AIC: 308221

Number of Fisher Scoring iterations: 4
coefficients = coef(logitm1)
# The coefficients for the odd ratio interpretation are:
exp(0.1*coefficients)
(Intercept)      oepspw 
  0.9979744   1.0925201 

11 Prediction with the logit model

We can use the last model to predict the probability whether the firm will beat the return of the industry 1 year in the future. For this prediction, it might be a good idea to select only the firms in the last quarter of the dataset (2024Q2), so we can predict which firm will beat the industry in 2025:

We create a dataset with only the Q2 of 2024:

uspanel2024 <- uspanel |>
          select(firm,q,yearf, oepspw, F1rabove) |>
          filter(q=="2024q2") |>
        as.data.frame()

Now we run the prediction using the model 2 with this new dataset:

uspanel2024 <- uspanel2024 |> 
  mutate(pred=predict(logitm1,newdata=uspanel2024,type=c("response")) )

The type=c(“response”) parameter indicates to get the prediction in probability. If I do not specify this option, the prediction is calculated in the logarithm of ODDS RATIO.

11.1 Selection of best stocks based on results of the logit model

We can sort the firms according to the predicted probability of beating the benchmark, and then select the top 100 with the highest probability:

top110<- uspanel2024 |>
     merge(usfirms1,by="firm") |> # merge with firms1 to pull firm name and industry
     arrange(desc(pred)) |>  # sort by probability, from highest to lowest
     #slice_head(prop=0.05) |> # select the top 5% firms  
     head(n=110)

# Show the # of firms selected:
nrow(top110)
[1] 110

I extract the 100 firms (tickers) column and put it into a char vector:

tickers = as.vector(top110$firm)

I selected the top 110 just in case there are some tickers that do not exist in Yahoo.

Now I write a for loop to get the historical prices of the 100 tickers. Here is a difficult problem to solve related to data validation. It is possible that some of the tickers are not in Yahoo Finance. If I use getSymbols with the ticker vector and it is one ticker that does not exist in Yahoo, then the getSymbols will stop running. Then, I can write a loop to run the getSymbols for each ticker and detect when a ticker is not in Yahoo.

The tryCatch function is used to run a set of R lines of code in the try mode, so if there is an error in any R code, R DOES NOT STOP RUNNING AND I can cach the error and do a specific action:

# I do a loop to run getSymbols and in each iteration I can detect which one has an error because
#   it does not exist in Yahoo Finance:
# I initialize an empty list where I will be storing the tickers that exist in Yahoo Finance:
finaltickerlist = c()
# I initilize a counter i=0 and tottickers=0. 
#  The counter i will be the counter of ticker #, while the counter tottickers 
#  will count the tickers that are found in Yahoo to know when to stop the loop after finding 100 tickers in Yahoo:
i=0
tottickers = 0
library(quantmod)
#for (t in tickers) {
# I will use a while loop instead of a for to make sure that I get exactly 100 tickers:
while (tottickers<100 & i<110) {
 # I increment i in 1:
 i = i + 1  
 # I save the ticker i in t: 
 t = tickers[i]
  tryCatch(
    {
    getSymbols(t, from="2020-01-01", to="2024-10-31",periodicity="monthly")
    # Since the ticker was found in Yahoo Finance, then:
    tottickers=tottickers + 1
    # I add the ticker found in Yahoo into the finaltickerlist vector: 
    finaltickerlist = c(finaltickerlist,t)
    cat("The ticker ", t, " was found in Yahoo \n")
    if (tottickers==1) {
      # If I am in the ticker 1 I only get the Adjusted price, but I do not merge
      #    the historical prices with the prices dataset:
      temp = get(t)
      # prices will be the final big dataset with all prices for all stocks:
      prices = Ad(temp)
    }
    else { # if tottickers>1, so this means that I already created the price dataset, 
      #  so I need to do a merge:
      temp=get(t)
      prices = merge(prices,Ad(temp))
    }
    }, 
    error = function(e) {
      cat("THE TICKER ",t," WAS NOT FOUND IN YAHOO FINANCE \n")
    }
  )
}
The ticker  AJG  was found in Yahoo 
The ticker  APA  was found in Yahoo 
The ticker  LNC  was found in Yahoo 
The ticker  TIPT  was found in Yahoo 
The ticker  UVE  was found in Yahoo 
The ticker  GNW  was found in Yahoo 
THE TICKER  OCN  WAS NOT FOUND IN YAHOO FINANCE 
The ticker  CYH  was found in Yahoo 
The ticker  PRU  was found in Yahoo 
The ticker  MCY  was found in Yahoo 
The ticker  RGA  was found in Yahoo 
The ticker  ALL  was found in Yahoo 
The ticker  THG  was found in Yahoo 
The ticker  MET  was found in Yahoo 
The ticker  UNM  was found in Yahoo 
The ticker  ATUS  was found in Yahoo 
THE TICKER  VIA__old  WAS NOT FOUND IN YAHOO FINANCE 
The ticker  VOYA  was found in Yahoo 
The ticker  CNA  was found in Yahoo 
The ticker  FAF  was found in Yahoo 
The ticker  CRBG  was found in Yahoo 
The ticker  L  was found in Yahoo 
The ticker  ORI  was found in Yahoo 
The ticker  TRV  was found in Yahoo 
The ticker  PFG  was found in Yahoo 
The ticker  BFH  was found in Yahoo 
The ticker  HIG  was found in Yahoo 
The ticker  EQH  was found in Yahoo 
The ticker  SIGI  was found in Yahoo 
The ticker  EIG  was found in Yahoo 
The ticker  UNIT  was found in Yahoo 
The ticker  MKL  was found in Yahoo 
The ticker  WRB  was found in Yahoo 
The ticker  VNCE  was found in Yahoo 
The ticker  SYF  was found in Yahoo 
The ticker  THC  was found in Yahoo 
The ticker  CINF  was found in Yahoo 
The ticker  DTIL  was found in Yahoo 
The ticker  ILPT  was found in Yahoo 
The ticker  WLFC  was found in Yahoo 
The ticker  AIG  was found in Yahoo 
The ticker  VGZ  was found in Yahoo 
The ticker  RM  was found in Yahoo 
The ticker  BIPC  was found in Yahoo 
The ticker  OMF  was found in Yahoo 
The ticker  TWO  was found in Yahoo 
The ticker  AFL  was found in Yahoo 
The ticker  LMND  was found in Yahoo 
The ticker  OI  was found in Yahoo 
The ticker  AAL  was found in Yahoo 
The ticker  AMSF  was found in Yahoo 
The ticker  SLM  was found in Yahoo 
The ticker  COF  was found in Yahoo 
The ticker  ENVA  was found in Yahoo 
The ticker  SABR  was found in Yahoo 
The ticker  EZPW  was found in Yahoo 
The ticker  CHTR  was found in Yahoo 
The ticker  DIT  was found in Yahoo 
The ticker  DFS  was found in Yahoo 
The ticker  HZO  was found in Yahoo 
The ticker  GMS  was found in Yahoo 
The ticker  RDN  was found in Yahoo 
The ticker  KEQU  was found in Yahoo 
The ticker  GM  was found in Yahoo 
The ticker  JACK  was found in Yahoo 
The ticker  HOV  was found in Yahoo 
The ticker  DAKT  was found in Yahoo 
The ticker  RLI  was found in Yahoo 
The ticker  SR  was found in Yahoo 
The ticker  AMWD  was found in Yahoo 
The ticker  GBX  was found in Yahoo 
The ticker  JBL  was found in Yahoo 
The ticker  UAL  was found in Yahoo 
The ticker  AL  was found in Yahoo 
The ticker  MIND  was found in Yahoo 
The ticker  DLHC  was found in Yahoo 
The ticker  INN  was found in Yahoo 
The ticker  GPI  was found in Yahoo 
The ticker  PDCO  was found in Yahoo 
The ticker  CIVI  was found in Yahoo 
The ticker  NMIH  was found in Yahoo 
The ticker  CABO  was found in Yahoo 
The ticker  CZR  was found in Yahoo 
The ticker  C  was found in Yahoo 
The ticker  SJM  was found in Yahoo 
The ticker  PHM  was found in Yahoo 
The ticker  VLGEA  was found in Yahoo 
The ticker  TNL  was found in Yahoo 
The ticker  OPY  was found in Yahoo 
The ticker  FDP  was found in Yahoo 
The ticker  MCB  was found in Yahoo 
The ticker  MHO  was found in Yahoo 
The ticker  VGR  was found in Yahoo 
The ticker  TRN  was found in Yahoo 
The ticker  DLX  was found in Yahoo 
The ticker  MTG  was found in Yahoo 
The ticker  CALM  was found in Yahoo 
The ticker  SNCY  was found in Yahoo 
The ticker  MTN  was found in Yahoo 
The ticker  NAVI  was found in Yahoo 
The ticker  COOP  was found in Yahoo 
The ticker  CFFI  was found in Yahoo 

I check whether I ended up with a price dataset of 100 columns, one for each ticker:

ncol(prices)
[1] 100

I create a dataset for returns of these 100 stocks:

returns = diff(log(prices))
# finaltickerlist has the tickers that were found in Yahoo Finance:
colnames(returns) = finaltickerlist

Now I get the price data of the market index and calculate its monthly returns:

getSymbols(Symbol="^GSPC", from="2020-01-01", to="2024-10-31",periodicity="monthly")
[1] "GSPC"
sp500ret = diff(log(Ad(GSPC)))

I now run a loop to estimate all market regression models and store them in a data frame:

# I initialize an empty vector/matrix betas: 
betas =c()
for (i in 1:ncol(returns)) {
  # I run the market regression model for stock i:
  model <- lm(returns[,i] ~ sp500ret$GSPC)
  #print(summary(model))
  # I store the summary of the model in a temporal object:
  smodel = summary(model)
  # From the coefficients matrix I extract the coefficients along with their standard errors and pvalues:
  b0 = smodel$coefficients[1,1]
  b1 = smodel$coefficients[2,1]
  seb0 = smodel$coefficients[1,2]
  seb1 =smodel$coefficients[2,2]
  pb0 = smodel$coefficients[1,4]
  pb1 = smodel$coefficients[2,4]
  # I save the number of valid observations (months) used in the regression:
  N = model$df.residual + 2 
  # I save N to later drop those tickers with very few history available since their beta estimations will not be reliable
  # I put the values together in a temporal vector:
  vectorbetas = c(b0,b1,seb0,seb1,pb0,pb1, N)
  # I bind this vector with the betas cumulative matrix:
  betas = rbind(betas,vectorbetas) 
}
# Finally I rename columns and rows of the betas matrix:
# I use the finaltickerlist since it has the tickers that where FOUND IN YAHOO FINANCE:
rownames(betas) = finaltickerlist[1:ncol(returns)]
colnames(betas) =c("b0","b1","seb0","seb1","pb0","pb1","N")
# I create a data frame from the beta matrix:
betasdf = data.frame(betas)
betasdf
b0 b1 seb0 seb1 pb0 pb1 N
AJG 0.0113380 0.7393096 0.0071630 0.1330145 0.1191871 0.0000008 57
APA -0.0345240 3.3370946 0.0360089 0.6686752 0.3418732 0.0000064 57
LNC -0.0223796 1.8342302 0.0159093 0.2954323 0.1651429 0.0000001 57
TIPT 0.0087913 1.1767828 0.0165934 0.3081359 0.5983811 0.0003425 57
UVE -0.0089182 0.9499327 0.0154648 0.2871777 0.5665117 0.0016617 57
GNW -0.0001226 0.8838090 0.0169328 0.3144380 0.9942496 0.0068344 57
CYH -0.0185765 1.7556430 0.0313874 0.5828554 0.5563784 0.0039148 57
PRU -0.0035397 1.2985853 0.0081189 0.1507666 0.6645558 0.0000000 57
MCY 0.0006371 0.8561274 0.0099851 0.1854206 0.9493547 0.0000238 57
RGA -0.0006143 0.9260021 0.0115070 0.2136816 0.9576220 0.0000629 57
ALL 0.0052743 0.4791599 0.0087642 0.1627479 0.5497781 0.0047364 57
THG -0.0041453 0.7352856 0.0080893 0.1502162 0.6103913 0.0000090 57
MET 0.0005412 1.0361106 0.0099504 0.1847764 0.9568180 0.0000007 57
UNM 0.0107548 0.8126782 0.0128720 0.2390300 0.4070410 0.0012612 57
ATUS -0.0584363 1.5950116 0.0214332 0.3980079 0.0085722 0.0001862 57
VOYA -0.0038743 1.0277504 0.0078811 0.1463498 0.6249630 0.0000000 57
CNA 0.0007392 0.6721978 0.0073793 0.1370321 0.9205697 0.0000087 57
FAF -0.0103280 1.3776780 0.0067748 0.1258053 0.1331170 0.0000000 57
CRBG 0.0114967 0.6665819 0.0187263 0.4615249 0.5455552 0.1627457 24
L -0.0005005 0.8390363 0.0067960 0.1261993 0.9415619 0.0000000 57
ORI 0.0042975 0.9366384 0.0082018 0.1523056 0.6024031 0.0000001 57
TRV 0.0063405 0.6503172 0.0074961 0.1392007 0.4013039 0.0000197 57
PFG -0.0010812 1.2059125 0.0091295 0.1695326 0.9061573 0.0000000 57
BFH -0.0287154 2.1501948 0.0216137 0.4013605 0.1894751 0.0000017 57
HIG 0.0032758 0.9601573 0.0093990 0.1745376 0.7287730 0.0000010 57
EQH -0.0006974 1.4085886 0.0098480 0.1828745 0.9438043 0.0000000 57
SIGI 0.0009735 0.5749528 0.0084151 0.1562667 0.9083283 0.0005334 57
EIG 0.0033086 0.2181561 0.0106267 0.1973352 0.7567106 0.2737510 57
UNIT -0.0115029 1.4583344 0.0230401 0.4278492 0.6195909 0.0012288 57
MKL -0.0029650 0.7759603 0.0075975 0.1410843 0.6978552 0.0000010 57
WRB 0.0048429 0.6560228 0.0082789 0.1537367 0.5609585 0.0000787 57
VNCE -0.0558057 1.8868295 0.0346208 0.6428990 0.1127056 0.0048607 57
SYF -0.0054996 1.7062412 0.0127636 0.2370162 0.6682396 0.0000000 57
THC 0.0064087 2.1458508 0.0155155 0.2881181 0.6811760 0.0000000 57
CINF 0.0003360 0.7061352 0.0105423 0.1957681 0.9746900 0.0006687 57
DTIL -0.0716242 1.2937344 0.0287758 0.5343592 0.0158637 0.0188011 57
ILPT -0.0476199 1.7844572 0.0226104 0.4198693 0.0397732 0.0000833 57
WLFC 0.0080878 1.1833150 0.0222588 0.4133396 0.7177354 0.0059303 57
AIG -0.0025594 1.2047645 0.0124726 0.2316120 0.8381705 0.0000030 57
VGZ -0.0169125 1.6111618 0.0187612 0.3483901 0.3712742 0.0000232 57
RM -0.0133970 1.6511918 0.0163004 0.3026945 0.4146952 0.0000012 57
BIPC -0.0060592 1.3036748 0.0118454 0.2457618 0.6111502 0.0000024 54
OMF -0.0023353 1.5450772 0.0139289 0.2586555 0.8674658 0.0000002 57
TWO -0.0439982 2.4489828 0.0205205 0.3810600 0.0364597 0.0000000 57
AFL 0.0048980 0.9596294 0.0078087 0.1450050 0.5330908 0.0000000 57
LMND -0.0343659 1.6666532 0.0302652 0.6291802 0.2618073 0.0108976 50
OI -0.0143482 1.2178123 0.0149605 0.2778133 0.3417204 0.0000531 57
AAL -0.0272285 1.5097844 0.0155172 0.2881509 0.0848761 0.0000026 57
AMSF -0.0010850 0.3538946 0.0100081 0.1858480 0.9140626 0.0621183 57
SLM 0.0015965 1.2379163 0.0121828 0.2262315 0.8962195 0.0000011 57
COF -0.0049734 1.5078016 0.0122869 0.2281644 0.6872173 0.0000000 57
ENVA 0.0075924 1.4216987 0.0150296 0.2790961 0.6154638 0.0000044 57
SABR -0.0531091 2.0126269 0.0260138 0.4830690 0.0460022 0.0001102 57
EZPW 0.0009183 0.9842858 0.0141193 0.2621924 0.9483806 0.0004213 57
CHTR -0.0188152 1.0789512 0.0120383 0.2235479 0.1238023 0.0000115 57
DIT 0.0048551 0.4953524 0.0161000 0.2989720 0.7641269 0.1032439 57
DFS -0.0010414 1.4975452 0.0131692 0.2445494 0.9372599 0.0000001 57
HZO -0.0125688 1.9216400 0.0165536 0.3073959 0.4509275 0.0000001 57
GMS 0.0042707 1.7003253 0.0112299 0.2085371 0.7051886 0.0000000 57
RDN -0.0024941 1.1346989 0.0112744 0.2093626 0.8257412 0.0000014 57
KEQU 0.0088525 0.7396444 0.0179389 0.3331211 0.6236399 0.0305344 57
GM -0.0068359 1.4796292 0.0122657 0.2277702 0.5795692 0.0000000 57
JACK -0.0270817 1.9923881 0.0161809 0.3004750 0.0998722 0.0000000 57
HOV 0.0050693 2.8768282 0.0299148 0.5555090 0.8660591 0.0000033 57
DAKT 0.0027270 1.1335075 0.0165543 0.3074088 0.8697611 0.0005202 57
RLI 0.0079262 0.3608799 0.0094364 0.1752320 0.4045702 0.0441983 57
SR -0.0068911 0.5478918 0.0073282 0.1360830 0.3511483 0.0001751 57
AMWD -0.0210336 1.7696686 0.0142693 0.2649776 0.1461706 0.0000000 57
GBX 0.0042405 1.4256696 0.0156000 0.2896869 0.7867708 0.0000082 57
JBL 0.0081466 1.2517573 0.0098525 0.1829589 0.4118950 0.0000000 57
UAL -0.0147082 1.5492610 0.0183997 0.3416773 0.4275133 0.0000317 57
AL -0.0144037 1.6639302 0.0120150 0.2231146 0.2357412 0.0000000 57
MIND -0.0944896 1.7749395 0.0479676 0.8907463 0.0538982 0.0512723 57
DLHC -0.0006415 1.1622485 0.0181187 0.3364592 0.9718843 0.0010695 57
INN -0.0299584 2.1157986 0.0155649 0.2890368 0.0594412 0.0000000 57
GPI 0.0083219 1.4810676 0.0143332 0.2661636 0.5638814 0.0000008 57
PDCO -0.0082330 1.0626403 0.0117684 0.2185362 0.4871348 0.0000101 57
CIVI 0.0081122 1.4536969 0.0187859 0.3488486 0.6675556 0.0001099 57
NMIH -0.0099721 1.3334456 0.0147531 0.2739616 0.5019157 0.0000099 57
CABO -0.0356232 0.8512587 0.0120768 0.2242626 0.0046638 0.0003689 57
CZR -0.0397440 3.2698129 0.0243790 0.4527115 0.1087619 0.0000000 57
C -0.0140461 1.4632022 0.0104668 0.1943658 0.1851202 0.0000000 57
SJM 0.0019157 0.2254669 0.0077158 0.1432808 0.8048456 0.1213150 57
PHM 0.0020400 1.7541108 0.0124673 0.2315148 0.8706261 0.0000000 57
VLGEA 0.0048138 0.2857608 0.0090340 0.1677589 0.5962808 0.0941379 57
TNL -0.0147136 1.7793751 0.0129833 0.2410970 0.2620173 0.0000000 57
OPY 0.0034775 1.1347066 0.0107546 0.1997094 0.7476558 0.0000005 57
FDP -0.0014145 0.3716576 0.0119085 0.2211379 0.9058804 0.0985005 57
MCB -0.0102722 1.1753253 0.0186966 0.3471917 0.5849451 0.0013182 57
MHO -0.0024442 2.3977187 0.0165943 0.3081516 0.8834399 0.0000000 57
VGR 0.0040314 1.0809228 0.0111707 0.2074369 0.7195638 0.0000029 57
TRN -0.0011655 1.3427477 0.0109939 0.2041545 0.9159584 0.0000000 57
DLX -0.0274098 1.4926079 0.0126830 0.2355205 0.0350550 0.0000000 57
MTG -0.0016569 1.4118013 0.0116711 0.2167298 0.8876269 0.0000000 57
CALM 0.0198805 -0.1541523 0.0108815 0.2020659 0.0731274 0.4487939 57
SNCY -0.0363092 1.4843406 0.0200047 0.4175285 0.0770229 0.0009883 42
MTN -0.0161188 1.1945486 0.0090680 0.1683896 0.0810056 0.0000000 57
NAVI -0.0115599 1.5118820 0.0109989 0.2042466 0.2978507 0.0000000 57
COOP 0.0197037 1.4805047 0.0141093 0.2620058 0.1681725 0.0000006 57
CFFI 0.0031484 0.3621739 0.0115736 0.2149179 0.7866148 0.0976218 57

Based on the data of each company select a set of 10 companies based on any of these criteria:

CRITERIA A: Stocks that are SIGNIFICANTLY offering returns over the market

CRITERIA B: Stocks that are SIGNIFICANTLY less risky than the market according to the market model regressions.

Selecting the stocks of CRITERIA A:

stocks_over_market <- betasdf |>
  filter(b0>0,pb0<=0.05) |>
  arrange(desc(b0))
stocks_over_market
[1] b0   b1   seb0 seb1 pb0  pb1  N   
<0 rows> (or 0-length row.names)
nrow(stocks_over_market)
[1] 0

There is only 0 stock(s) that have a positive and significant beta0.

Filtering the stocks according to CRITERIA B:

# I first add the 95% minimum and the maximum values of the C.I. for beta1:
betasdf <- betasdf |> 
  mutate(
    minb1= b1 - 2*seb1,
    maxb1= b1 + 2*seb1)
# I filter those that might be significantly less risky than the market:
less_risky_market <- betasdf |> 
  filter(maxb1<1) |>
  arrange(b1)
less_risky_market
                 b0         b1        seb0      seb1        pb0          pb1  N
CALM   0.0198804656 -0.1541523 0.010881466 0.2020659 0.07312738 4.487939e-01 57
EIG    0.0033086412  0.2181561 0.010626712 0.1973352 0.75671063 2.737510e-01 57
SJM    0.0019156530  0.2254669 0.007715824 0.1432808 0.80484564 1.213150e-01 57
VLGEA  0.0048138060  0.2857608 0.009033997 0.1677589 0.59628083 9.413785e-02 57
AMSF  -0.0010850108  0.3538946 0.010008117 0.1858480 0.91406258 6.211828e-02 57
RLI    0.0079261776  0.3608799 0.009436433 0.1752320 0.40457022 4.419835e-02 57
CFFI   0.0031483978  0.3621739 0.011573561 0.2149179 0.78661477 9.762175e-02 57
FDP   -0.0014145242  0.3716576 0.011908517 0.2211379 0.90588040 9.850051e-02 57
ALL    0.0052742926  0.4791599 0.008764152 0.1627479 0.54977812 4.736408e-03 57
SR    -0.0068911133  0.5478918 0.007328219 0.1360830 0.35114827 1.751330e-04 57
SIGI   0.0009734538  0.5749528 0.008415131 0.1562667 0.90832829 5.334472e-04 57
TRV    0.0063405497  0.6503172 0.007496107 0.1392007 0.40130388 1.968401e-05 57
WRB    0.0048429224  0.6560228 0.008278888 0.1537367 0.56095848 7.866353e-05 57
CNA    0.0007392236  0.6721978 0.007379326 0.1370321 0.92056970 8.666018e-06 57
            minb1     maxb1
CALM  -0.55828405 0.2499795
EIG   -0.17651424 0.6128264
SJM   -0.06109466 0.5120284
VLGEA -0.04975695 0.6212786
AMSF  -0.01780149 0.7255907
RLI    0.01041584 0.7113439
CFFI  -0.06766193 0.7920096
FDP   -0.07061824 0.8139335
ALL    0.15366406 0.8046558
SR     0.27572577 0.8200579
SIGI   0.26241942 0.8874862
TRV    0.37191579 0.9287185
WRB    0.34854936 0.9634962
CNA    0.39813363 0.9462620
nrow(less_risky_market)
[1] 14

There is only 14 stock(s) that are significantly less risky than the market.

Applying these 2 criteria I only got 2 stocks. If I want to get the best 10 stocks with these criteria, I can relax both criteria and do a filter by steps. Let’s do the following:

I will first use beta0 information to select the best 20 stocks. I can sort all stocks by the minimum of the 95% CI of beta0 (by the maximum to the minimum value):

# I first add the 95% minimum and the maximum values of the C.I. for beta1:
betasdf <- betasdf |> 
  mutate(
    minb0= b0 - 2*seb0,
    maxb0= b0 + 2*seb0)
# I do the first filter selecting 20 stocks based on the min of b0. 
# I will add the restriction that the estimation of the market regression model had more than 30 months, to ensure that the beta estimations are more stable:
filter1 <- betasdf |>
  filter(N>30) |> 
  arrange(desc(minb0)) |>
  head(20)
filter1
b0 b1 seb0 seb1 pb0 pb1 N minb1 maxb1 minb0 maxb0
CALM 0.0198805 -0.1541523 0.0108815 0.2020659 0.0731274 0.4487939 57 -0.5582840 0.2499795 -0.0018825 0.0416434
AJG 0.0113380 0.7393096 0.0071630 0.1330145 0.1191871 0.0000008 57 0.4732807 1.0053385 -0.0029880 0.0256639
COOP 0.0197037 1.4805047 0.0141093 0.2620058 0.1681725 0.0000006 57 0.9564930 2.0045163 -0.0085149 0.0479223
TRV 0.0063405 0.6503172 0.0074961 0.1392007 0.4013039 0.0000197 57 0.3719158 0.9287185 -0.0086517 0.0213328
AFL 0.0048980 0.9596294 0.0078087 0.1450050 0.5330908 0.0000000 57 0.6696195 1.2496393 -0.0107194 0.0205153
RLI 0.0079262 0.3608799 0.0094364 0.1752320 0.4045702 0.0441983 57 0.0104158 0.7113439 -0.0109467 0.0267990
JBL 0.0081466 1.2517573 0.0098525 0.1829589 0.4118950 0.0000000 57 0.8858396 1.6176751 -0.0115585 0.0278516
WRB 0.0048429 0.6560228 0.0082789 0.1537367 0.5609585 0.0000787 57 0.3485494 0.9634962 -0.0117149 0.0214007
ORI 0.0042975 0.9366384 0.0082018 0.1523056 0.6024031 0.0000001 57 0.6320273 1.2412496 -0.0121061 0.0207012
ALL 0.0052743 0.4791599 0.0087642 0.1627479 0.5497781 0.0047364 57 0.1536641 0.8046558 -0.0122540 0.0228026
VLGEA 0.0048138 0.2857608 0.0090340 0.1677589 0.5962808 0.0941379 57 -0.0497570 0.6212786 -0.0132542 0.0228818
SJM 0.0019157 0.2254669 0.0077158 0.1432808 0.8048456 0.1213150 57 -0.0610947 0.5120284 -0.0135160 0.0173473
CNA 0.0007392 0.6721978 0.0073793 0.1370321 0.9205697 0.0000087 57 0.3981336 0.9462620 -0.0140194 0.0154979
L -0.0005005 0.8390363 0.0067960 0.1261993 0.9415619 0.0000000 57 0.5866376 1.0914350 -0.0140924 0.0130915
UNM 0.0107548 0.8126782 0.0128720 0.2390300 0.4070410 0.0012612 57 0.3346183 1.2907382 -0.0149892 0.0364989
HIG 0.0032758 0.9601573 0.0093990 0.1745376 0.7287730 0.0000010 57 0.6110821 1.3092324 -0.0155222 0.0220739
SIGI 0.0009735 0.5749528 0.0084151 0.1562667 0.9083283 0.0005334 57 0.2624194 0.8874862 -0.0158568 0.0178037
EIG 0.0033086 0.2181561 0.0106267 0.1973352 0.7567106 0.2737510 57 -0.1765142 0.6128264 -0.0179448 0.0245621
OPY 0.0034775 1.1347066 0.0107546 0.1997094 0.7476558 0.0000005 57 0.7352878 1.5341255 -0.0180316 0.0249866
MKL -0.0029650 0.7759603 0.0075975 0.1410843 0.6978552 0.0000010 57 0.4937917 1.0581289 -0.0181601 0.0122301

After this first screening using beta0 information, I now use the information of beta1 to further select only 10 stocks.

For beta1 I will consider conservative stocks that are not too risky compared to the market, assuming that I will create a portfolio in the sort-term:

filter2 <- filter1 |> 
  arrange(maxb1) |>
  head(10)
filter2
b0 b1 seb0 seb1 pb0 pb1 N minb1 maxb1 minb0 maxb0
CALM 0.0198805 -0.1541523 0.0108815 0.2020659 0.0731274 0.4487939 57 -0.5582840 0.2499795 -0.0018825 0.0416434
SJM 0.0019157 0.2254669 0.0077158 0.1432808 0.8048456 0.1213150 57 -0.0610947 0.5120284 -0.0135160 0.0173473
EIG 0.0033086 0.2181561 0.0106267 0.1973352 0.7567106 0.2737510 57 -0.1765142 0.6128264 -0.0179448 0.0245621
VLGEA 0.0048138 0.2857608 0.0090340 0.1677589 0.5962808 0.0941379 57 -0.0497570 0.6212786 -0.0132542 0.0228818
RLI 0.0079262 0.3608799 0.0094364 0.1752320 0.4045702 0.0441983 57 0.0104158 0.7113439 -0.0109467 0.0267990
ALL 0.0052743 0.4791599 0.0087642 0.1627479 0.5497781 0.0047364 57 0.1536641 0.8046558 -0.0122540 0.0228026
SIGI 0.0009735 0.5749528 0.0084151 0.1562667 0.9083283 0.0005334 57 0.2624194 0.8874862 -0.0158568 0.0178037
TRV 0.0063405 0.6503172 0.0074961 0.1392007 0.4013039 0.0000197 57 0.3719158 0.9287185 -0.0086517 0.0213328
CNA 0.0007392 0.6721978 0.0073793 0.1370321 0.9205697 0.0000087 57 0.3981336 0.9462620 -0.0140194 0.0154979
WRB 0.0048429 0.6560228 0.0082789 0.1537367 0.5609585 0.0000787 57 0.3485494 0.9634962 -0.0117149 0.0214007

These will be my selected stocks based on their beta0 and beta1 information. These stocks can be candidates to include them in a winning portfolio.

I can show the name of these firms and their industry:

selected_tickers = rownames(filter2)
selected_tickers = as.data.frame(selected_tickers)
colnames(selected_tickers) = c("firm")

selected_tickers = merge(selected_tickers,usfirms,by.x="firm", by.y="empresa")
selected_tickers
firm Nombre status partind naics1 naics2 SectorEconomatica
ALL Allstate Corp activo 0.10 Servicios financieros y de seguros Instituciones de seguros Finanzas y Seguros
CALM Cal-Maine Foods, Inc activo NA Agricultura, ganadería, aprovechamiento forestal, pesca y caza Producción de Animales y Acuicultura Agro & Pesca
CNA CNA Financial Corp activo NA Servicios financieros y de seguros Instituciones de seguros Finanzas y Seguros
EIG Employers Hldg Inc activo NA Servicios financieros y de seguros Instituciones de seguros Finanzas y Seguros
RLI Rli Corp activo NA Servicios financieros y de seguros Instituciones de seguros Finanzas y Seguros
SIGI Selective Insurance Group, Inc activo NA Servicios financieros y de seguros Instituciones de seguros Finanzas y Seguros
SJM The J. M. Smucker Company activo 0.03 Industrias manufactureras Conservación de frutas, verduras, guisos y otros alimentos preparados Alimentos y Beb
TRV Travelers Companies, Inc activo 0.12 Servicios financieros y de seguros Instituciones de seguros Finanzas y Seguros
VLGEA Village Super Market, Inc activo NA Comercio al por menor Almacenes de alimentos Comercio
WRB W. R. Berkley Corp activo 0.04 Servicios financieros y de seguros Instituciones de seguros Finanzas y Seguros

I select the tickers column:

tickers = selected_tickers$firm

I bring monthly prices from Yahoo Finance:

library(quantmod)
getSymbols(Symbols = tickers, periodicity="monthly",from="2020-01-01", to="2024-10-31")
 [1] "ALL"   "CALM"  "CNA"   "EIG"   "RLI"   "SIGI"  "SJM"   "TRV"   "VLGEA"
[10] "WRB"  

I do the data management to create a return dataset:

datasets_list <- lapply(tickers, get)
prices <- do.call(merge, datasets_list)
# We select only Adjusted prices:
prices <- Ad(prices)

# We change the name of the columns to be equal to the tickers vector:
names(prices) = tickers

#Calculating continuously compounded returns:
r = diff(log(prices))

I get the Expected Returns for the stocks:

ccmean_returns  = colMeans(r,na.rm=TRUE)
ER = exp(ccmean_returns) - 1
ER
        ALL        CALM         CNA         EIG         RLI        SIGI 
0.010119504 0.018507230 0.007493000 0.005506566 0.011603861 0.006748959 
        SJM         TRV       VLGEA         WRB 
0.004180331 0.012930345 0.007702569 0.011472225 

I get the Variance-Covariance matrix:

# We calculate the Variance-Covariance matrix with the var function:
COV =var(r,na.rm=TRUE)
# Note var function calculates the variance-covariance matrix if r has more than 1 column.
# If r has only 1 column, then var calculates only the variance of that column.
COV
                ALL          CALM           CNA          EIG          RLI
ALL    4.805863e-03 -0.0008049815  0.0028619725 0.0004942691 0.0010617025
CALM  -8.049815e-04  0.0064675163 -0.0006040997 0.0013452828 0.0003903121
CNA    2.861973e-03 -0.0006040997  0.0042309221 0.0016923103 0.0019709493
EIG    4.942691e-04  0.0013452828  0.0016923103 0.0062392775 0.0040160842
RLI    1.061702e-03  0.0003903121  0.0019709493 0.0040160842 0.0051840367
SIGI   2.110362e-03 -0.0002059750  0.0026340015 0.0021269729 0.0026080845
SJM    1.922001e-06  0.0008226165  0.0001326381 0.0010027191 0.0008089074
TRV    2.835639e-03 -0.0004794668  0.0031211579 0.0020064495 0.0024491445
VLGEA  1.108544e-03  0.0015632195  0.0010005230 0.0021102787 0.0025779227
WRB    3.067534e-03 -0.0003185802  0.0033795999 0.0018220343 0.0020181785
               SIGI          SJM           TRV        VLGEA           WRB
ALL    0.0021103621 1.922001e-06  0.0028356393 0.0011085443  0.0030675338
CALM  -0.0002059750 8.226165e-04 -0.0004794668 0.0015632195 -0.0003185802
CNA    0.0026340015 1.326381e-04  0.0031211579 0.0010005230  0.0033795999
EIG    0.0021269729 1.002719e-03  0.0020064495 0.0021102787  0.0018220343
RLI    0.0026080845 8.089074e-04  0.0024491445 0.0025779227  0.0020181785
SIGI   0.0047695496 6.514068e-04  0.0026280577 0.0006681966  0.0026370310
SJM    0.0006514068 3.362650e-03  0.0002252621 0.0008141834  0.0001672133
TRV    0.0026280577 2.252621e-04  0.0042423514 0.0012092477  0.0035212997
VLGEA  0.0006681966 8.141834e-04  0.0012092477 0.0046438550  0.0009514900
WRB    0.0026370310 1.672133e-04  0.0035212997 0.0009514900  0.0049310149

I now generate 10,000 random portfolios:

# Initialize the Weight matrix
W = c()
# Create 4 vectors of 1000 random values from 0 to 1:
for(i in 1:10) {
  W = rbind(W,runif(10000))
}
# The problem I have now is that some of the 1,000 portfolios
#  might end up having a sum of weights higher than one. 
# I can do a simple "trick" by 
#   dividing each of the 4 weights by the sum of these 4
#   weights. And I can do this # for all 1000 portfolios: 

# I first create a vector with the sum of weights for all portfolios:
sumw <- colSums(W)
# I do another loop to divide each weight by the sum of weights: 
for(i in 1:10)  {
  W[i,]<-W[i,]/sumw
  # In each iteration I divide one raw of W2 by the vector sumw, 
  #  which is the sum of the weights of all 1000 portfolios
}

# I check that the sum of weights is 1 (I do this for 5 portfolios)
W[,1:5]
             [,1]        [,2]        [,3]       [,4]        [,5]
 [1,] 0.164194821 0.133227989 0.229972350 0.16296711 0.011870374
 [2,] 0.238535227 0.077625858 0.126307719 0.01121542 0.178613086
 [3,] 0.157410213 0.151120432 0.076791777 0.11815910 0.178386763
 [4,] 0.225814659 0.103510809 0.004533933 0.09593400 0.000270053
 [5,] 0.057418770 0.069527936 0.096700015 0.05455689 0.090638681
 [6,] 0.041345846 0.081167242 0.046281369 0.09738743 0.191569866
 [7,] 0.027976689 0.003707023 0.041371959 0.11494273 0.086374875
 [8,] 0.015185007 0.056791268 0.165067279 0.08657030 0.043265799
 [9,] 0.001387582 0.196604614 0.111840610 0.14369416 0.153140971
[10,] 0.070731186 0.126716827 0.101132988 0.11457285 0.065869531

I calculate the Expected Return of the 10,000 portfolios:

ERPortfolios = t(W) %*% ER

I calculate the expected risk of the 10,000 portfolios:

VARPortfolios = diag(t(W) %*% COV %*% W) 
RISKPortfolios = sqrt(VARPortfolios)

Now we can visualize the feasible portfolios in the risk-return space:

plot(x=RISKPortfolios,y= ERPortfolios,
     xlabel="Portfolio Expected Risk", ylabel="Portfolio Expected Return")

I now directly estimate the GMV portfolio:

GMVPort = globalMin.portfolio(ER,COV, shorts = FALSE)
GMVPort
Call:
globalMin.portfolio(er = ER, cov.mat = COV, shorts = FALSE)

Portfolio expected return:     0.009354169 
Portfolio standard deviation:  0.03689871 
Portfolio weights:
   ALL   CALM    CNA    EIG    RLI   SIGI    SJM    TRV  VLGEA    WRB 
0.1624 0.1892 0.1250 0.0013 0.0309 0.0526 0.3152 0.0543 0.0691 0.0000