Final Evidence - Financial Modeling and Programming

Problem Setup

Imagine I have been working as a portfolio analyst in RSG bank for the last 2 years. On my first working day of 2021, I’m notified that I will be promoted to be the portfolio manager of a new division focused on US securities that belong to the NYSE and the NASDAQ exchange. So, I have to design 3 “winning” portfolios based on US stocks in 5 weeks.

The challenge for me is to significantly increase the revenues for the Bank in this division investment division. My potential customers for these new products will be both: companies and individuals.

In 5 weeks the marketing department will produce the advertising of my star products:

Portfolio A Portfolio B Portfolio C

Each of them addresses a risk profile: Conservative, Aggressive, and including an additional one: the disruptive. These products will offer options for investing internationally in US Market to investors that usually invest in the “Bolsa Mexicana de Valores”.

Fundamental analysis with a probabilistic model

Explain what type of modeling is used for the first screening process.

The logit regression and its applications in Finance

According to Carlos Alberto Dorantes, the logit model is one type of non-linear regression model where the dependent variable is binary. The logistic or logit model is used to examine the relationship between one or more quantitative variables and the probability of an event happening (1=the event happens; 0 otherwise). In this work, we will define the event to be whether a firm in a specific quarter had higher stock return compared to the market (S&P500 index) return. If the stock return is higher than the market return, then we codify the binary variable equal to 1; 0 otherwise.

Then, in this case, the dependent variable of the regression is the binary variable with 1 if the stock beats the market and 0 otherwise. The independent or explanatory variables can be any financial indicator/ratio/variable that we believe is related to the likelihood of a stock to beat the market in the near future.

During the course we learned that we can define this logistic model using the following mathematical function. So, lets imagine that Y is the binary dependent variable, then the probability that the event happens (Event=1) can be defined as:

\[Prob(Event=1)=fx(X_{1},X_{2},...,X_{n})\] The binary variable Event can be either 1 or 0, but the probability of Event=1 is a continuous value from 0 to 1. The function is a non-linear function defined as follows:

\[Prob(Event=1)=\frac{1}{1+e^{-(b_{0}+b_{1}X_{1}+b_{2}X_{2}+...+b_{n}X_{n})}}\] As we can see, the argument of the exponential function is actually a traditional regression equation. We can re-express this equation as follows:

\[Y=b_{0}+b_{1}X_{1}+b_{2}X_{2}+...+b_{n}X_{n}\] Now we use Y in the original non-linear function:

\[Prob(Event=1)=\frac{1}{1+e^{-Y}}\] This is a non-linear function since the value of the function does not move in a linear way with a change in the value of one independent variable X.

Let’s continue with the same event, which is that the firm return beats the market (in other words, that the firm return is higher than the market return). Then:

p = probability that the firm beats the market (Event=1); or that the event happens. (1-p) = probability that the firm does not beat the market (Event=0); or that the event does not happen.

To have a dependent variable that can get any numeric value from a negative value to a positive value, we can do the following mathematical transformation with these probabilities:

\[Y=log(\frac{P}{1-P})\] The (p1−p) is called the odds ratio:

\[ODDSRATIO =(\frac{P}{1-P})\] The odds ratio is the ratio of the probability of the event happening to the probability of the event NOT happening. Since we want a variable that can have values from any negative to any positive value, then we can just apply the logarithmic function, and then the range of this log will be from any negative value to any positive value.

Now that we have a transformed variable Y (the log of oddsratio) that uses the probability p, then we can use this variable as the dependent variable for our regression model:

\[Y = log(\frac{P}{1-P}) = b_{0}+b_{1}X_{1}\] This is the actual regression model that R estimates, so the coefficients b0 and b1 values define the logarithm of the odd ratio, not the actual probability p of the event happening. Then, when we see the output of a logistic regression, we need to apply the exponential function to the coefficients to provide a meaningful interpretation of the magnitude of the coefficients in the model.

Justify which explanatory variables you propose to use in the model with bibliographic references

In order to determine which explanatory variables could help us to choose those stocks with the highest probability to beat the market, I decided to make an exhaustive research and I find this ones:

Book-to-Market Ratio

The book-to-market ratio helps investors find a company’s value by comparing the firm’s book value to its market value. According to Investopedia, the book-to-market ratio is one indicator of a company’s value. The ratio compares a firm’s book value to its market value. A company’s book value is calculated by looking at the company’s historical cost, or accounting value. A firm’s market value is determined by its share price in the stock market and the number of shares it has outstanding, which is its market capitalization.

If the market value of a company is trading higher than its book value per share, it is considered to be overvalued. If the book value is higher than the market value, analysts consider the company to be undervalued. The book-to-market ratio is used to compare a company’s net asset value or book value to its current or market value.

A high book-to-market ratio might mean that the market is valuing the company’s equity cheaply compared to its book value.

Price-to-Earnings (P/E) Ratio

The price-to-earnings (P/E) ratio relates a company’s share price to its earnings per share. According to Investopedia, the price-to-earnings ratio (P/E ratio) is the ratio for valuing a company that measures its current share price relative to its earnings per share (EPS). The price-to-earnings ratio is also sometimes known as the price multiple or the earnings multiple.

P/E ratios are used by investors and analysts to determine the relative value of a company’s shares in an apples-to-apples comparison. It can also be used to compare a company against its own historical record or to compare aggregate markets against one another or over time.

To determine the P/E value, one must simply divide the current stock price by the earnings per share (EPS). A high P/E ratio could mean that a company’s stock is overvalued, or else that investors are expecting high growth rates in the future.

In general, a high P/E suggests that investors are expecting higher earnings growth in the future compared to companies with a lower P/E. A low P/E can indicate either that a company may currently be undervalued or that the company is doing exceptionally well relative to its past trends.

Price-To-Book (P/B) Ratio

The market-to-book ratio, also called the price-to-book ratio, is the reverse of the book-to-market ratio. Like the book-to-market ratio, it seeks to evaluate whether a company’s stock is over or undervalued by comparing the market price of all outstanding shares with the net assets of the company.

According to investopedia, companies use the price-to-book (P/B) ratio to compare a firm’s market capitalization to its book value. It’s calculated by dividing the company’s stock price per share by its book value per share (BVPS). An asset’s book value is equal to its carrying value on the balance sheet, and companies calculate it netting the asset against its accumulated depreciation.

Book value is also the tangible net asset value of a company calculated as total assets minus intangible assets (.e.g. patents, goodwill) and liabilities. For the initial outlay of an investment, book value may be net or gross of expenses, such as trading costs, sales taxes, and service charges. Some people may know this ratio by its less common name, the price-equity ratio.

Example of How to Use the P/B Ratio: assume that a company has $100 million in assets on the balance sheet and $75 million in liabilities. The book value of that company would be calculated simply as $25 million ($100M - $75M). If there are 10 million shares outstanding, each share would represent $2.50 of book value. If the share price is $5, then the P/B ratio would be 2x (5 / 2.50). This illustrates that the market price is valued at twice its book value.

So, basically, the price-to-book ratio is one of the most widely-used financial ratios. It compares a company’s market price to its book value, essentially showing the value given by the market for each dollar of the company’s net worth. High-growth companies will often show price-to-book ratios well above 1.0, whereas companies facing severe distress will occasionally show ratios below 1.0.

Explain the data management steps to create the dependent variables and the financial ratios/variables.

The dataset I will use for this work is located in a public web site. This dataset has real historical information of all US public firms listed in NASDAQ and NYSE.

#Clear the environment
rm(list=ls())

#To avoid scientific notation:
options(scipen=999)

#Activate the libraries
library(readxl)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#Download the panel dataset from an online text file:
download.file("http://www.apradie.com/datos/usactive2020q.csv",
              "usactive2020q.csv")

#Declaring the variable as a dataset
uspanel <- read.csv("usactive2020q.csv")

download.file("http://www.apradie.com/datos/usdatadictionary.csv",
              "usdatadictionary.csv")
dictionary <- read.csv("usdatadictionary.csv")

#Transforming uspanel, into a data frame
library(plm)

## 
## Attaching package: 'plm'

## The following objects are masked from 'package:dplyr':
## 
##     between, lag, lead

uspanel <- pdata.frame(uspanel, index= c("ticker","q"))

#Filtering the information to work with data from 2020
usfirms2020 <- uspanel %>%
      filter(year==2020, fiscalmonth==12)

As a matter of fact, all the information of the financial variables is in 1000’s (thousands).

#Once we download the dataset, we can start to create the variables and sum up to our dataset.
usfirms2020$marketcap = usfirms2020$originalprice * usfirms2020$sharesoutstanding

uspanel$grossprofit = uspanel$revenue - uspanel$cogs

uspanel$ebit = uspanel$grossprofit - uspanel$sgae

uspanel$netincome = uspanel$ebit - uspanel$finexp - uspanel$incometax

uspanel$stockannual_R = uspanel$adjprice / plm::lag(uspanel$adjprice,k=4) - 1

uspanel$marketcap = uspanel$originalprice * uspanel$sharesoutstanding

uspanel$oeps = ifelse(uspanel$sharesoutstanding==0,NA,uspanel$ebit / uspanel$sharesoutstanding)

uspanel$oepsp = ifelse(uspanel$originalprice==0,NA,uspanel$oeps / uspanel$originalprice)

uspanel$bookvalue = uspanel$totalassets - uspanel$totalliabilities

uspanel$bmr = ifelse(uspanel$marketcap==0,NA,uspanel$bookvalue / uspanel$marketcap)

uspanel$bookvalueshare = uspanel$bookvalue / uspanel$sharesoutstanding

uspanel$pricetobook = uspanel$originalprice / uspanel$bookvalueshare

Once I declare my variables, I winzorise them. Winsorizing is the transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers.

library(statar)
uspanel$bmr_w <- winsorize(uspanel$bmr, probs = c(0.01,0.995))

## 0.71 % observations replaced at the bottom

## 0.36 % observations replaced at the top

uspanel$pricetoearnings_w <- winsorize(uspanel$pricetoearnings, probs = c(0.01,0.99))

## 0.72 % observations replaced at the bottom

## 0.72 % observations replaced at the top

uspanel$pricetobook_w <- winsorize(uspanel$pricetobook, probs = c(0.01,0.99))

## 0.71 % observations replaced at the bottom

## 0.71 % observations replaced at the top

#Let's take a graphic review of our variables 
hist(uspanel$bmr_w)

hist(uspanel$pricetoearnings_w)

hist(uspanel$pricetobook_w)

I have already decided my independent variables, now I will define my dependent variable, in this case, I will evaluate 2 different scenarios; the first one with the historical data, to check if historically my ratios have beaten the market. And the second one, the with the prediction factor (expected returns). In this case my dependent variable will be the market return.

#Creating the market annual return:
uspanel$marketannual_R = uspanel$SP500index / plm::lag(uspanel$SP500index,k=4) - 1

#Creating the binary variable to see if the stock return is higher than the market return:
uspanel$r_above_market = ifelse(is.na(uspanel$stockannual_R),NA,
                            ifelse(uspanel$stockannual_R>uspanel$marketannual_R,1,0))

#We can see how many 1's and 0's were calculated:
table(uspanel$r_above_market)

## 
##     0     1 
## 54941 44919

54941 firms have the probability not beating the market. On the other hand, 44919 have the probability to beat the market.

Run and show the results of the model. Provide a clear interpretation of the logit model. Make sure you interpret the type of effect and the magnitude of the coefficients and their significant levels.

#My first logistic model searching if my variables have been historically explanatory to the market return (above market return)
logitmodel <- glm(r_above_market ~ bmr_w + pricetoearnings_w + pricetobook_w ,data = uspanel, family = "binomial",na.action = na.omit)
summary(logitmodel)

## 
## Call:
## glm(formula = r_above_market ~ bmr_w + pricetoearnings_w + pricetobook_w, 
##     family = "binomial", data = uspanel, na.action = na.omit)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9702  -1.0916  -0.7788   1.1790   2.8088  
## 
## Coefficients:
##                      Estimate  Std. Error z value            Pr(>|z|)    
## (Intercept)        0.17103721  0.01174770   14.56 <0.0000000000000002 ***
## bmr_w             -0.83724813  0.01511447  -55.39 <0.0000000000000002 ***
## pricetoearnings_w  0.00101986  0.00005885   17.33 <0.0000000000000002 ***
## pricetobook_w      0.02174414  0.00106451   20.43 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 134335  on 97568  degrees of freedom
## Residual deviance: 128625  on 97565  degrees of freedom
##   (53791 observations deleted due to missingness)
## AIC: 128633
## 
## Number of Fisher Scoring iterations: 4

In this model, as we mention, we are looking how this financial ratios have been related historically to the market returns. Looking at the beta coefficients of the explanatory variables (bmr_w, pricetoearnings_w, pricetobook_w), we can see that the coefficients are positive and statistically significant (since its p-value are much less than 0.05). Then, we can say that this 3 explanatory variables are significantly related with the probability of a stock beating the market annual return.

Nevertheless, if we want to really interprate the beta coefficient, we need to remember we are working with logarithm functions, so we need to extract the coefficients and get their exponential.

# I extract the coefficients from the model:
coefficients1 <- coef(logitmodel)

# I apply the exponential function to all betas:
coefficient_exp_m1 = exp(coefficients1)
coefficient_exp_m1

##       (Intercept)             bmr_w pricetoearnings_w     pricetobook_w 
##         1.1865349         0.4329002         1.0010204         1.0219823

Now, we can say that if a firm improves its this explanatory variables (bmr_w, pricetoearnings_w, pricetobook_w), its probability to beat the market in the same quarter will be 0.4329002 (bmr_w), 1.0010204 (pricetoearnings_w), 1.0219823 (pricetobook_w) times higher than the case that the firm does not improve them.

class(uspanel)

## [1] "pdata.frame" "data.frame"

#Future market return (annual)
uspanel$F4_r_above_market <- plm::lag(uspanel$r_above_market,-4)

#Model to beat the future market return
logitmodel2 <- glm(F4_r_above_market ~ bmr_w + pricetoearnings_w + pricetobook_w, data = uspanel, family = "binomial",na.action = na.omit)
summary(logitmodel2)

## 
## Call:
## glm(formula = F4_r_above_market ~ bmr_w + pricetoearnings_w + 
##     pricetobook_w, family = "binomial", data = uspanel, na.action = na.omit)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.274  -1.095  -1.082   1.261   1.408  
## 
## Coefficients:
##                      Estimate  Std. Error z value             Pr(>|z|)    
## (Intercept)       -0.24987694  0.01095480 -22.810 < 0.0000000000000002 ***
## bmr_w              0.04778504  0.01223995   3.904       0.000094606532 ***
## pricetoearnings_w  0.00029785  0.00005815   5.122       0.000000302048 ***
## pricetobook_w      0.00639322  0.00100978   6.331       0.000000000243 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130070  on 94451  degrees of freedom
## Residual deviance: 130000  on 94448  degrees of freedom
##   (56908 observations deleted due to missingness)
## AIC: 130008
## 
## Number of Fisher Scoring iterations: 3

For this very second model, we are looking how this financial ratios can beat the future market returns. Looking at the beta coefficients of the explanatory variables (bmr_w, pricetoearnings_w, pricetobook_w), we can see that the coefficients are statistically significant since its p-value are much less than 0.05. Then, we can say that this 3 explanatory variables are significantly related with the probability of a stock beating the future market annual return.

Nevertheless, as well as model 1, if we want to really interprate the beta coefficient, we need to remember we are working with logarithm functions, so we need to extract the coefficients and get their exponential

# I extract the coefficients from the model:
coefficients2 <- coef(logitmodel2)

# I apply the exponential function to all betas:
coefficient_exp_m2 = exp(coefficients2)
coefficient_exp_m2

##       (Intercept)             bmr_w pricetoearnings_w     pricetobook_w 
##         0.7788966         1.0489451         1.0002979         1.0064137

Now, we can say that if a firm improves its bmr_w ratio by 1, its probability to beat the market in the future will be 1.0489 times higher than the case that the firm does not improve its bmr_w ratio. If a firm improves its pricetoearnings_w ratio by 1, its probability to beat the market in the future will be 1.0002 times higher than the case that the firm does not improve its pricetoearnings_w ratio. Finally, if a firm improves its pricetobook_w ratio by 1, its probability to beat the market in the future will be 1.0064 times higher than the case that the firm does not improve its pricetobook_w ratio.

Explain the screening process to select 40 firms. Show the list of firms along with their industries.

We can use the last model to predict the probability whether the firm will beat the market return 1 year in the future. For this prediction, it might be a good idea to select only the firms in the last quarter of 2020, so we can predict which firm will beat the market in 2021

#We create a dataset with only the Q4 of 2020:
firmsQ42020 <- uspanel %>%
          select(ticker,NAICS, q,year, F4_r_above_market,pricetoearnings_w, bmr_w, pricetobook_w) %>%
          filter(q=="2020-10-01") %>%
        as.data.frame()
firmsQ42020

The market model

Explain what type of model you used in the second screening.

The Linear regression model

According to Carlos Alberto Dorantes, the simple linear regression model is used to understand the linear relationship between two variables assuming that one variable, the independent variable (IV), can be used as a predictor of the other variable, the dependent variable (DV).

The Market Model states that the expected return of a stock is given by its alpha coefficient (b0) plus its market beta coefficient (b1) multiplied times the market return. In mathematical terms:

\[E[Ri]=a +B(Rm)\] We can express the same equation using B0 as alpha, and B1 as market beta:

\[E[Ri]=B0 +B1(Rm)\] We can estimate the alpha and market beta coefficient by running a simple linear regression model specifying that the market return is the independent variable and the stock return is the dependent variable. It is strongly recommended to use continuously compounded returns instead of simple returns to estimate the market regression model.

The market regression model can be expressed as: \[r(i,t)=b0+b1∗r(M,t)+εt\] Where:

εt = is the error at time t. Thanks to the Central Limit Theorem, this error behaves like a Normal distributed random variable ∼ N(0, σε); the error term εt is expected to have mean=0 and a specific standard deviation σε(also called volatility).

r(i,t) = is the return of the stock i at time t.

r(M,t) = is the market return at time t.

b0 and b1 are called regression coefficients.

Explain the data management process used before you run the market regression models.

Once we determine our winning model (model 2), now we run the prediction using the model 2 with this new dataset:

#We create the pred variable, which will tell us the stocks with the highest probability to beat the market.
#We use our data set of the top 40 firms
firmsQ42020 <- firmsQ42020 %>% 
  
  mutate(pred=predict.glm(logitmodel2,newdata=firmsQ42020,type=c("response")))

hist(firmsQ42020$pred,breaks=100)

Selection of best stocks based on results of the logit model

#We can sort the firms according to the predicted probability of beating the benchmark, and then select those with the highest probability
top40firms <- firmsQ42020 %>%
     arrange(desc(firmsQ42020$pred)) %>%
head(40)
top40firms

#Before we create our regression market model, we get a new dataset with our 40 top firms and we include the S&P 500 index (market).

library(quantmod)

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

## Loading required package: TTR

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

#We add the GSPC in order to create the loop lately.
setSymbolLookup(GSPC = list(name="^GSPC"))

#We vectorize our dataset
tickers_40_firms = as.vector(top40firms$ticker)

#We add the GSPC to the final 
tickers_40_firms = c(tickers_40_firms, "GSPC")
getSymbols(tickers_40_firms, periodicity = "monthly",
             from = "2017-01-01", to = "2020-12-31")

## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.

## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols

##  [1] "NOW"  "SQ"   "TSLA" "ZM"   "ADSK" "PODD" "WMG"  "APPS" "ENPH" "FICO"
## [11] "PCTY" "UPS"  "CSBR" "IDXX" "PAYC" "COR"  "MA"   "MTD"  "IAA"  "SPGI"
## [21] "CTXS" "WAT"  "BKNG" "CL"   "HD"   "STAA" "NSP"  "KMB"  "ORLY" "WU"  
## [31] "NRT"  "BPT"  "EVFM" "TVTY" "BLMN" "EXPI" "BBIO" "TTD"  "BURL" "RPD" 
## [41] "GSPC"

Explain the algorithm used to run one regression model for each stock

In order to explain the algorithm used to run one regression model for each stock, we can take for example, Apple.

library(quantmod)
getSymbols(c("AAPL", "^GSPC"), from="2017-01-01", to= "2020-12-31", periodicity="monthly")

## [1] "AAPL"  "^GSPC"

#Merge both objects into one dataset by their adj price.
adjprices_FE<-Ad(merge(AAPL,GSPC))

#I calculate continuously returns for Apple and the S&P500
returns_FEE <- diff(log(adjprices_FE)) 

#I dropped the NA and rename de columns
returns_FEE <- na.omit(returns_FEE)
colnames(returns_FEE) <- c("AAPL", "GSPC")

#Now we can visualize the relationship
plot.default(x=returns_FEE$GSPC,y=returns_FEE$AAPL)
abline(lm(returns_FEE$AAPL ~ returns_FEE$GSPC),col='blue')

In this graph, we can clearly see the mathematical relationship between the Apple stock returns and the S&P 500 returns. As we can see, at the x axis we have the independent variable, S&P 500 returns; and at the y axis we have the dependent variable, Apple returns in this case. So, the traced linear line could tell us the probable returns Apple could have in the future according to its historical relationship with the market returns.

Generate the regression models for all stocks and show the main results for all models: Beta0, Beta1, standard errors, p-values.

To avoid writing the name of the tickers in the code, we can use the get function. The get function receives a name of a dataset and then create a new dataset.

#We create our datasetlist
datasets_list_FE <- lapply(tickers_40_firms, get)

#Now we merge all datasets using the function call
prices_FE <- do.call(merge, datasets_list_FE)

# We select only Adjusted prices:
prices_FE <- Ad(prices_FE)

#Calculating continuously compounded returns:
r_FE = diff(log(prices_FE))

Now we have the 3 datasets in a list; each element of the list is an xts dataset with the stock price data.

#No we get the algorithm used to run one regression model for each stock
matrix_results = c()

#A loop helps us to execute a set of commands many times.
for (i in 1:40) {
  if (sum(complete.cases(r_FE[,i]))>=20){
  model_FE <- lm(r_FE[,i] ~ r_FE[,41])
  s_FE = colnames(r_FE[,i])
  
  s2 = summary(model_FE)
  
#We indicate the coefficients we want to add to our matrix
  b0_estimate = s2$coefficients[1,1]
  b0_std_error = s2$coefficients[1,2]
  b0_p_value = s2$coefficients[1,4]
  b1_estimate = s2$coefficients[2,1]
  b1_std_error = s2$coefficients[2,2]
  b1_p_value = s2$coefficients[2,4]
  
#We create the matrix
  result = c(s_FE, b0_estimate, b0_std_error, b0_p_value, b1_estimate, b1_std_error, b1_p_value)
  matrix_results = rbind(matrix_results,result)
  } else {
    result = rep(NA, 8)
  }
}

#We transform the matrix as a data frame and change the column title
matrix_results = as.data.frame(matrix_results)
names(matrix_results) = c("Ticker", "b0_Coefficient", "b0_Standard_Error", "b0_P_value", "b1_Coefficient", "b1_Standard_Error", "b1_P_value")

#We see our results
matrix_results

This is my final matrix, here I have store the required values (Beta0, Beta1, standard errors, p-values) for each stock.

Justify the criteria you used to select only 10 firms based on the previous results of the models. You have to show that you understand the meaning of beta0 and beta1 and their corresponding standard errors and p-values.

Assuming we are working this portfolio in 2020, for this very part I decided to work with B1, because due the pandemic the US government (FED) printed more monney than ever and most of this economic stimulus were going to move to the markets. In addition, most of the stocks dumped about 30% and their recuperation were imminent. Supposing we were in 2021, I would chose B0, because there a lot of fear, doubt and uncertainty outside the markets about macroeconomic topics. Using the B0 we could “garantize” a positive return assumig the market return is 00.00%. Those stocks are such jewels.

Once I decided to work with B1, I check which of this are statistically supported looking at their corresponding p value. If our p value is lower then 0.05, we have a statistically supported coefficient. On the other hand, we can take a look to the standar error, which will show us the risk we are assuming at the moment we invest in each asset. Standard error in practical terms is the standard deviation of the returns.

Select and show the top 10 firms that satisfy your criteria.

As I have said, I will work with B1, but in order to satisfy my curiosity I took a quick look which stocks the code give me if I have chose B0

#B0 Criteria
above_market <- matrix_results %>%
  filter(b0_P_value < .05) %>%
  arrange(desc(b0_Coefficient)) %>%
  head(10)    
above_market

#B1 Criteria
above_market <- matrix_results %>%
  filter(b1_P_value < .05) %>%
  arrange(desc(b1_Coefficient)) %>%
  head(10)    
above_market

Finally, I get my top 10 stocks arranged by B1 coefficient and filtered by a p value lower than 0.05.

Portfolio optimization – design of portfolios

Explain how you use portfolio theory for the design of your final portfolios with the 10 selected stocks.

Once we have determined our top 10 winning stocks, the next step is create the smartest portfolio. As a financial analyst we look to get the maximum return assuming the minimum risk, there are many tools that could help us to elaborate this commitment, such as: optimal portfolio, sharpe ratio, efficient frontier, VaR, capital market line, etc. I will explain these concepts in the next points.

Explain your data management and estimation processes needed (expected returns, variance-covariance matrix, etc) before you do portfolio optimization

We start downloading historical monthly prices from Yahoo Finance:

#I create a new dataset with my top 10 firms
tickers_top_10 = c("TVTY", "EXPI", "SQ", "BLMN", "TTD", "APPS", "TSLA", "NSP", "NRT", "PAYC" )
getSymbols(Symbols = tickers_top_10, periodicity="monthly",from="2018-01-01", to="2021-10-01")

## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols

##  [1] "TVTY" "EXPI" "SQ"   "BLMN" "TTD"  "APPS" "TSLA" "NSP"  "NRT"  "PAYC"

Now we merge all datasets using the function call.

datasets_list_10 <- lapply(tickers_top_10, get)

prices_10 <- do.call(merge, datasets_list_10)

#I select only Adjusted prices
prices_10 <- Ad(prices_10)

#I change the name of the columns to be equal to the tickers vector
names(prices_10) = tickers_top_10

Expected return

For this final project I will use the simpler method to estimate the expected return of each stock, which is the geometric mean of historical returns. According to Dorantes, the geometric mean of historical returns is the average period return needed so that holding an investment at that geometric mean return per period, we will get the final holding return of the stock.

#Calculating continuously compounded returns
r_top_10 = diff(log(prices_10))

#Now we get the expected return for each stock as the geometric mean of their historical returns
#colMeans calculate the arithmetic mean of 1 or more columns of a dataset.
ccmean_returns  = colMeans(r_top_10, na.rm=TRUE)

#Once we have the arithmetic mean cc returns, we convert them to simple returns:
ER = exp(ccmean_returns) - 1
ER

##         TVTY         EXPI           SQ         BLMN          TTD         APPS 
## -0.011726864  0.048158959  0.037781256  0.003949159  0.062662339  0.083347166 
##         TSLA          NSP          NRT         PAYC 
##  0.055886431  0.014792271  0.012733782  0.039113838

Covariance

Covariance measures the directional relationship between the returns on two assets. According to Investopedia, a positive covariance means that asset returns move together while a negative covariance means they move inversely. Covariance is calculated by analyzing at-return surprises (standard deviations from the expected return) or by multiplying the correlation between the two variables by the standard deviation of each variable.

#I can calculate the Variance-Covariance matrix with the var function:
COV = var(r_top_10,na.rm=TRUE)
COV

##            TVTY        EXPI          SQ       BLMN         TTD          APPS
## TVTY 0.04249159 0.018860905 0.017120865 0.02767962 0.020118189  0.0224362517
## EXPI 0.01886091 0.065106403 0.024159600 0.01125635 0.020724836  0.0252370256
## SQ   0.01712087 0.024159600 0.027630435 0.01645213 0.026442486  0.0162575144
## BLMN 0.02767962 0.011256348 0.016452130 0.03570464 0.016458649  0.0212645852
## TTD  0.02011819 0.020724836 0.026442486 0.01645865 0.042341819  0.0231013573
## APPS 0.02243625 0.025237026 0.016257514 0.02126459 0.023101357  0.0527282115
## TSLA 0.01842546 0.013631225 0.013952653 0.01519728 0.011223084  0.0138179869
## NSP  0.01390310 0.011152650 0.014124863 0.01632423 0.015913045  0.0085288079
## NRT  0.01543794 0.004638885 0.004052939 0.01248453 0.003520868 -0.0001361726
## PAYC 0.01322317 0.011581701 0.013855906 0.01315852 0.018467853  0.0105536702
##             TSLA         NSP           NRT        PAYC
## TVTY 0.018425464 0.013903104  0.0154379405 0.013223167
## EXPI 0.013631225 0.011152650  0.0046388852 0.011581701
## SQ   0.013952653 0.014124863  0.0040529386 0.013855906
## BLMN 0.015197275 0.016324231  0.0124845293 0.013158523
## TTD  0.011223084 0.015913045  0.0035208678 0.018467853
## APPS 0.013817987 0.008528808 -0.0001361726 0.010553670
## TSLA 0.035156948 0.008457586  0.0031586758 0.004407346
## NSP  0.008457586 0.022450073  0.0095530416 0.010243578
## NRT  0.003158676 0.009553042  0.0264989639 0.006036331
## PAYC 0.004407346 0.010243578  0.0060363315 0.019186406

Correlation

Correlation, in the finance and investment industries, is a statistic that measures the degree to which two securities move in relation to each other. According to Investopedia, correlations are used in advanced portfolio management, computed as the correlation coefficient, which has a value that must fall between -1.0 and +1.0.

It is always good to review the correlation matrix of the asset returns of a portfolios.

#I can get the correlation matrix from the covariance matrix as follows
CORR = cov2cor(COV)
CORR

##           TVTY      EXPI        SQ      BLMN       TTD         APPS      TSLA
## TVTY 1.0000000 0.3585909 0.4996666 0.7106346 0.4742995  0.473998866 0.4767178
## EXPI 0.3585909 1.0000000 0.5696184 0.2334658 0.3947249  0.430729678 0.2849164
## SQ   0.4996666 0.5696184 1.0000000 0.5238005 0.7730786  0.425930521 0.4476693
## BLMN 0.7106346 0.2334658 0.5238005 1.0000000 0.4232989  0.490086857 0.4289412
## TTD  0.4742995 0.3947249 0.7730786 0.4232989 1.0000000  0.488912592 0.2908854
## APPS 0.4739989 0.4307297 0.4259305 0.4900869 0.4889126  1.000000000 0.3209354
## TSLA 0.4767178 0.2849164 0.4476693 0.4289412 0.2908854  0.320935437 1.0000000
## NSP  0.4501439 0.2917141 0.5671285 0.5765826 0.5161306  0.247889387 0.3010454
## NRT  0.4600695 0.1116831 0.1497827 0.4058779 0.1051116 -0.003642955 0.1034868
## PAYC 0.4631132 0.3276904 0.6017890 0.5027453 0.6479401  0.331806721 0.1696972
##            NSP          NRT      PAYC
## TVTY 0.4501439  0.460069493 0.4631132
## EXPI 0.2917141  0.111683066 0.3276904
## SQ   0.5671285  0.149782719 0.6017890
## BLMN 0.5765826  0.405877883 0.5027453
## TTD  0.5161306  0.105111568 0.6479401
## APPS 0.2478894 -0.003642955 0.3318067
## TSLA 0.3010454  0.103486816 0.1696972
## NSP  1.0000000  0.391668230 0.4935669
## NRT  0.3916682  1.000000000 0.2677083
## PAYC 0.4935669  0.267708288 1.0000000

Calculate the Global Minimum Variance (GMV) portfolio and interpret the result.

library(IntroCompFinR)
#With the function globalMin.portfolio I get the weights that correspond to the min variance combination
GMVPort = globalMin.portfolio(ER,COV, shorts = FALSE)
GMVPort

## Call:
## globalMin.portfolio(er = ER, cov.mat = COV, shorts = FALSE)
## 
## Portfolio expected return:     0.03533639 
## Portfolio standard deviation:  0.1052184 
## Portfolio weights:
##   TVTY   EXPI     SQ   BLMN    TTD   APPS   TSLA    NSP    NRT   PAYC 
## 0.0000 0.0012 0.0000 0.0000 0.0000 0.0724 0.1894 0.1211 0.2738 0.3422

As we can see, this are the indicated weights we should use if we want to create a portfolio with the minimum variance. In this case, we got:

00.12% EXPI 07.24% APPS 18.94% TSLA 12.11% NSP 27.38% NRT 34.22% PAYC

This are the weights we should allocate in a portfolio if we have a conservative risk profile.

Generate 1,000 portfolios with random weights and explain the process and plot of these 1,000 portfolios

#I initialize the Weight matrix
W = c()

#I create a 10 vector of 1000 values from 0 to 1
for(i in 1:10) {
  W = rbind(W,runif(1000))
}

# I first create a vector with the sum of weights for all portfolios
sumw <- colSums(W)

# I do another loop to divide each weight by the sum of weights
for(i in 1:10)  {
  W[i,]<-W[i,]/sumw
}

#In order to see if I have done it right I get the first rows and see if combined are equal to 1.
W[,1:5]

##             [,1]       [,2]        [,3]       [,4]       [,5]
##  [1,] 0.09629904 0.09879849 0.010081813 0.06185115 0.12885088
##  [2,] 0.12074207 0.13491302 0.134310928 0.12237752 0.15077208
##  [3,] 0.03579889 0.02466442 0.020581158 0.13964685 0.07348370
##  [4,] 0.06782198 0.14655598 0.003425133 0.21210159 0.18256282
##  [5,] 0.10972881 0.07127964 0.043461037 0.06937533 0.04966814
##  [6,] 0.09525344 0.03825557 0.147652372 0.11870507 0.13469365
##  [7,] 0.14242957 0.02532555 0.266256121 0.01265410 0.09843562
##  [8,] 0.09255346 0.17532385 0.136407641 0.06148205 0.04137371
##  [9,] 0.05527265 0.11709360 0.183870314 0.08483872 0.03929109
## [10,] 0.18410008 0.16778988 0.053953483 0.11696762 0.10086832

colSums(W[,1:5])

## [1] 1 1 1 1 1

The 1 1 1 1 1 of the last line indicate us the combinations are perfectly (100%). So, now we can graph the 1,000 elaborated portfolios.

#I calculate the expected return of the 1,000 portfolios
ERPortfolios = t(W) %*% ER

#I calculate the expected risk of the 1,000 portfolios
VARPortfolios = diag(t(W) %*% COV %*% W)
RISKPortfolios = sqrt(VARPortfolios)

#Now we can visualize the portfolios in the risk-return space
plot(x=RISKPortfolios,y= ERPortfolios,
     xlabel="Portfolio Expected Risk", ylabel="Portfolio Expected Return")

## Warning in plot.window(...): "xlabel" is not a graphical parameter

## Warning in plot.window(...): "ylabel" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "xlabel" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "ylabel" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "xlabel" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "ylabel" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "xlabel" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "ylabel" is not a
## graphical parameter

## Warning in box(...): "xlabel" is not a graphical parameter

## Warning in box(...): "ylabel" is not a graphical parameter

## Warning in title(...): "xlabel" is not a graphical parameter

## Warning in title(...): "ylabel" is not a graphical parameter

In this graph we can see the risk-return space of the 1,000 portfolios, each bullet is a portfolio that our loop has created. As we can see, most of them threw a return between 3% and 4%. On the other hand, we got a risk between 12% and 14%.

Calculate the efficient frontier and interpret the results

library(IntroCompFinR)

#I first estimate my efficient frontier
efrontier <- efficient.frontier(ER, COV, nport = 100, 
                         alpha.min = -0.5, 
                         alpha.max = 1.5, shorts = FALSE)

#Then I plot it
plot(efrontier, plot.assets=TRUE, col="red")

This graph help us to give us an idea which combination inside our portfolios could be the most optimal. In other words, the efficient frontier is the set of optimal portfolios that offer the highest expected return for a defined level of risk or the lowest risk for a given level of expected return.

A successful optimization of the return versus risk should place a portfolio along the efficient frontier line, so let’s calculate this line.

Calculate the tangent (optimal) portfolio and interpret the results

#We declare our risk free rate
rfree = 0

#We get our optimal weights
tangentPort = tangency.portfolio(ER, COV, rfree, shorts=FALSE)
tangentPort

## Call:
## tangency.portfolio(er = ER, cov.mat = COV, risk.free = rfree, 
##     shorts = FALSE)
## 
## Portfolio expected return:     0.05826172 
## Portfolio standard deviation:  0.1320321 
## Portfolio weights:
##   TVTY   EXPI     SQ   BLMN    TTD   APPS   TSLA    NSP    NRT   PAYC 
## 0.0000 0.0000 0.0000 0.0000 0.0618 0.3083 0.2938 0.0000 0.0331 0.3030

#We make visual the weights
tangentPortWeights = getPortfolio(ER, COV, weights=tangentPort$weights)
plot(tangentPortWeights, col="red")

#Finally, we plot all the created elements, where the red dot is our optimal portfolio

plot(efrontier, plot.assets=TRUE, col="blue", pch=16)
points(GMVPort$sd, GMVPort$er, col="green", pch=16, cex=2)
points(tangentPort$sd, tangentPort$er, col="red", pch=16, cex=2)
text(GMVPort$sd, GMVPort$er, labels="GLOBAL MIN", pos=2)
text(tangentPort$sd, tangentPort$er, labels="TANGENCY", pos=2)

SharpeRatio = (tangentPort$er - rfree)/tangentPort$sd

abline(a=rfree, b=SharpeRatio, col="green", lwd=2)

These are our final graphic of the efficient frontier, as a matter of fact we have to remember what the CML is, the Capital Market Line (CML) is the green line that goes from the risk-free rate to the tangent portfolio. Basically, the point where this line touch our frontier is the optimal portfolio, so if we chose to create a portfolio with this firms we should take a look at the weights. If we decided to take a less riskier porfolio we look at the global minimum variance.

Propose 2 portfolios: the aggressive and the conservative portfolios. Explain your criteria for this proposal. Show the expected return and expected risk of these 2 portfolios.

Conservative portfolio

#For the conservative portfolio I use once again the global minimum variance portfolio
GMVPort = globalMin.portfolio(ER,COV, shorts = FALSE)
GMVPort

## Call:
## globalMin.portfolio(er = ER, cov.mat = COV, shorts = FALSE)
## 
## Portfolio expected return:     0.03533639 
## Portfolio standard deviation:  0.1052184 
## Portfolio weights:
##   TVTY   EXPI     SQ   BLMN    TTD   APPS   TSLA    NSP    NRT   PAYC 
## 0.0000 0.0012 0.0000 0.0000 0.0000 0.0724 0.1894 0.1211 0.2738 0.3422

#I visualize the weights
GMVPort_weights = getPortfolio(ER, COV, weights=GMVPort$weights)
plot(GMVPort_weights, col="black")

Aggressive portfolio

#As we did it in the last part, I declare a 0 risk free rate and I got the optimal weights at the tangency dot
rfree = 0
tangentPort = tangency.portfolio(ER, COV,rfree, shorts=FALSE)
tangentPort

## Call:
## tangency.portfolio(er = ER, cov.mat = COV, risk.free = rfree, 
##     shorts = FALSE)
## 
## Portfolio expected return:     0.05826172 
## Portfolio standard deviation:  0.1320321 
## Portfolio weights:
##   TVTY   EXPI     SQ   BLMN    TTD   APPS   TSLA    NSP    NRT   PAYC 
## 0.0000 0.0000 0.0000 0.0000 0.0618 0.3083 0.2938 0.0000 0.0331 0.3030

#I visualize the weights
tangentPortWeights = getPortfolio(ER, COV, weights=tangentPort$weights)
plot(tangentPortWeights, col="black")

At this point, now I have determine two of the three winning portfolios I would suggest the bank. The reason I decided to use the GMV for the conservative profile, is because this are elaborated for those people who doesn’t like risk. The GMV portfolio threw us the ideal combination of assets assuming the less risk possible.

On the other hand, I decided to use the tangency portfolio for the aggressive profile, as we may see this portfolio increases their risk but with an optimal combination of assets, where we are expecting the highest return with the less risk possible.

Propose a disruptive portfolio based on 1 cognitive bias from Behavioral Finance. Make your any assumption you wish so that you justify your portfolio design base on the cognitive bias you select. Estimate the expected return and risk of this portfolio.

Selected bias: Herd behavior bias

Herd behavior happens when investors follow others rather than making their own decisions based on financial data. For example, if all your friends are investing in penny stocks, you might start too even though it’s risky.

For the creation of the disruptive portfolio, I graph again the efficient frontier but I reduce the number of portfolios.

#I first estimate my efficient frontier
efrontier_2 <- efficient.frontier(ER, COV, nport = 25, 
                         alpha.min = -0.5, 
                         alpha.max = 1.5, shorts = FALSE)

#Then I plot it
plot(efrontier_2, plot.assets=TRUE, col="red")

Now, I take a look which portfolio could fit the better with the bias I selected.

efrontier_2$weights

##         TVTY     EXPI SQ BLMN      TTD     APPS     TSLA      NSP      NRT
## port 1     0 0.001221  0    0 0.000000 0.072389 0.189357 0.121050 0.273782
## port 2     0 0.000000  0    0 0.000000 0.090906 0.201104 0.090929 0.263820
## port 3     0 0.000000  0    0 0.000000 0.109043 0.212672 0.060530 0.253782
## port 4     0 0.000000  0    0 0.000000 0.127180 0.224239 0.030130 0.243743
## port 5     0 0.000000  0    0 0.000000 0.145375 0.235783 0.000000 0.233539
## port 6     0 0.000000  0    0 0.000000 0.170030 0.244729 0.000000 0.204735
## port 7     0 0.000000  0    0 0.000000 0.194685 0.253674 0.000000 0.175931
## port 8     0 0.000000  0    0 0.006742 0.217053 0.261663 0.000000 0.148705
## port 9     0 0.000000  0    0 0.019091 0.237521 0.268858 0.000000 0.122790
## port 10    0 0.000000  0    0 0.031440 0.257988 0.276052 0.000000 0.096875
## port 11    0 0.000000  0    0 0.043789 0.278456 0.283247 0.000000 0.070960
## port 12    0 0.000000  0    0 0.056138 0.298923 0.290442 0.000000 0.045045
## port 13    0 0.000000  0    0 0.068487 0.319391 0.297636 0.000000 0.019131
## port 14    0 0.000000  0    0 0.084185 0.342896 0.302787 0.000000 0.000000
## port 15    0 0.000000  0    0 0.109328 0.374969 0.302172 0.000000 0.000000
## port 16    0 0.000000  0    0 0.134471 0.407041 0.301558 0.000000 0.000000
## port 17    0 0.000000  0    0 0.159614 0.439114 0.300944 0.000000 0.000000
## port 18    0 0.000000  0    0 0.184756 0.471186 0.300329 0.000000 0.000000
## port 19    0 0.000000  0    0 0.196998 0.514305 0.288697 0.000000 0.000000
## port 20    0 0.000000  0    0 0.165416 0.594946 0.239639 0.000000 0.000000
## port 21    0 0.000000  0    0 0.133834 0.675586 0.190580 0.000000 0.000000
## port 22    0 0.000000  0    0 0.102252 0.756227 0.141522 0.000000 0.000000
## port 23    0 0.000000  0    0 0.070669 0.836867 0.092463 0.000000 0.000000
## port 24    0 0.000000  0    0 0.039087 0.917508 0.043405 0.000000 0.000000
## port 25    0 0.000000  0    0 0.000000 1.000000 0.000000 0.000000 0.000000
##             PAYC
## port 1  0.342201
## port 2  0.353241
## port 3  0.363974
## port 4  0.374707
## port 5  0.385303
## port 6  0.380507
## port 7  0.375710
## port 8  0.365837
## port 9  0.351741
## port 10 0.337644
## port 11 0.323548
## port 12 0.309452
## port 13 0.295356
## port 14 0.270132
## port 15 0.213531
## port 16 0.156930
## port 17 0.100329
## port 18 0.043728
## port 19 0.000000
## port 20 0.000000
## port 21 0.000000
## port 22 0.000000
## port 23 0.000000
## port 24 0.000000
## port 25 0.000000

I will work with the portfolio number 17,

    TVTY     EXPI SQ BLMN      TTD     APPS     TSLA      NSP      NRT     PAYC

port 17 0 0.000000 0 0 0.159614 0.439114 0.300944 0.000000 0.000000 0.100329

#I check the expected return and risk
efrontier_2

## Call:
## efficient.frontier(er = ER, cov.mat = COV, nport = 25, alpha.min = -0.5, 
##     alpha.max = 1.5, shorts = FALSE)
## 
## Frontier portfolios' expected returns and standard deviations
##    port 1 port 2 port 3 port 4 port 5 port 6 port 7 port 8 port 9 port 10
## ER 0.0353 0.0373 0.0393 0.0413 0.0433 0.0453 0.0473 0.0493 0.0513  0.0533
## SD 0.1052 0.1054 0.1060 0.1069 0.1082 0.1100 0.1123 0.1150 0.1182  0.1218
##    port 11 port 12 port 13 port 14 port 15 port 16 port 17 port 18 port 19
## ER  0.0553  0.0573  0.0593  0.0613  0.0633  0.0653  0.0673  0.0693  0.0713
## SD  0.1257  0.1300  0.1345  0.1393  0.1445  0.1502  0.1561  0.1624  0.1691
##    port 20 port 21 port 22 port 23 port 24 port 25
## ER  0.0733  0.0753  0.0773  0.0793  0.0813  0.0833
## SD  0.1767  0.1855  0.1953  0.2061  0.2175  0.2296

In this case, we have an expected return of 6.73% and a standard deviation 15.61%.

So, the reason I chose this portfolio and this bias, is because most of us when we are learning about investing and trading, we use to buy whatever our friends buy and never give us the chance to ask ourselves if are a good investment. The main reason we do this is because we all want easy money, and easy money doesn’t exist. So, instead of buying those assets that “one friend tell me”, we can take financial positions more riskier but with a whole statistic background.

3 portfolios along with their expected risk and return.

Conservative portfolio

Portfolio expected return: 3.53% Portfolio standard deviation: 10.52%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0.0000 0.0012 0.0000 0.0000 0.0000 0.0724 0.1894 0.1211 0.2738 0.3422

Aggressive portfolio

Portfolio expected return: 5.82% Portfolio standard deviation: 13.20%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0.0000 0.0000 0.0000 0.0000 0.0618 0.3083 0.2938 0.0000 0.0331 0.3030

Disruptive portfolio

Portfolio expected return: 6.73% Portfolio standard deviation: 15.61%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0 0.000000 0 0 0.159614 0.439114 0.300944 0.000000 0.000000 0.100329

Backtesting

Do a back-testing to calculate the return of each portfolio in the period from January 2020 to December 2020.

Conservative

Conservative portfolio

Portfolio expected return: 3.53% Portfolio standard deviation: 10.52%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0.0000 0.0012 0.0000 0.0000 0.0000 0.0724 0.1894 0.1211 0.2738 0.3422

We start downloading historical monthly prices from Yahoo Finance:

#I indicate the respective weight obtained in the conservative portfolio.
w1 = .0012
w2 = .0724
w3 = .1894
w4 = .1211
w5 = .2738
w6 = .3422
weights_conservative = c(w1, w2, w3, w4, w5, w6)

#I declare the variable with the tickers of conservative inside.
tickers_conservative = c("EXPI", "APPS", "TSLA", "NSP", "NRT", "PAYC")

#January 2020 - December 2020
getSymbols(Symbols = tickers_conservative, periodicity="monthly",from="2020-01-01", to="2020-12-31")

## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols

## [1] "EXPI" "APPS" "TSLA" "NSP"  "NRT"  "PAYC"

datasets_list_conservative <- lapply(tickers_conservative, get)
prices_conservative <- do.call(merge, datasets_list_conservative)

#I select only Adjusted prices:
prices_conservative <- Ad(prices_conservative)

#I change the name of the columns to be equal to the tickers vector:
names(prices_conservative) = tickers_conservative

firstprices_conservative = as.vector(prices_conservative[1,])

#I can calculate the holding portfolio return as follows:
HPR_conservative = sweep(prices_conservative, 2, firstprices_conservative,'/') - 1
HPR_conservative$Port = w1*HPR_conservative$EXPI + w2*HPR_conservative$APPS + w3*HPR_conservative$TSLA + w4*HPR_conservative$NSP + w5*HPR_conservative$NRT + w6*HPR_conservative$PAYC

#I can calculate how much $1 invested in this portfolio would have grew over time. We use the HPR of the portfolio and add 1 to get a growth factor for each month
HPR_conservative$inv_port = 1 * (1+HPR_conservative$Port)
plot(HPR_conservative$inv_port)

Aggressive

Portfolio expected return: 5.82% Portfolio standard deviation: 13.20%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0.0000 0.0000 0.0000 0.0000 0.0618 0.3083 0.2938 0.0000 0.0331 0.3030

We start downloading historical monthly prices from Yahoo Finance:

#I indicate the respective weight obtained in the conservative portfolio.
w1 = .0618
w2 = .3083
w3 = .2938
w4 = .0331
w5 = .3030
weights_aggressive = c(w1, w2, w3, w4, w5)

#I declare the variable with the tickers of conservative inside.
tickers_aggressive = c("TTD", "APPS", "TSLA", "NRT", "PAYC")

#January 2020 - December 2020
getSymbols(Symbols = tickers_aggressive, periodicity="monthly",from="2020-01-01", to="2020-12-31")

## [1] "TTD"  "APPS" "TSLA" "NRT"  "PAYC"

datasets_list_aggressive <- lapply(tickers_aggressive, get)
prices_aggressive <- do.call(merge, datasets_list_aggressive)

#I select only Adjusted prices:
prices_aggressive <- Ad(prices_aggressive)

#I change the name of the columns to be equal to the tickers vector:
names(prices_aggressive) = tickers_aggressive

firstprices_aggressive = as.vector(prices_aggressive[1,])

#I can calculate the holding portfolio return as follows:
HPR_aggressive = sweep(prices_aggressive, 2, firstprices_aggressive,'/') - 1
HPR_aggressive$Port = w1*HPR_aggressive$TTD + w2*HPR_aggressive$APPS + w3*HPR_aggressive$TSLA + w4*HPR_aggressive$NRT + w5*HPR_aggressive$PAYC

#I can calculate how much $1 invested in this portfolio would have grew over time. We use the HPR of the portfolio and add 1 to get a growth factor for each month
HPR_aggressive$inv_port = 1 * (1+HPR_aggressive$Port)
plot(HPR_aggressive$inv_port)

Disruptive

Portfolio expected return: 6.73% Portfolio standard deviation: 15.61%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0 0.000000 0 0 0.159614 0.439114 0.300944 0.000000 0.000000 0.100329

We start downloading historical monthly prices from Yahoo Finance:

#I indicate the respective weight obtained in the conservative portfolio.
w1 = .1596
w2 = .4391
w3 = .3009
w4 = .1003
weights_disruptive = c(w1, w2, w3, w4)

#I declare the variable with the tickers of conservative inside.
tickers_disruptive = c("TTD", "APPS", "TSLA", "PAYC")

#January 2020 - December 2020
getSymbols(Symbols = tickers_disruptive, periodicity="monthly",from="2020-01-01", to="2020-12-31")

## [1] "TTD"  "APPS" "TSLA" "PAYC"

datasets_list_disruptive <- lapply(tickers_disruptive, get)
prices_disruptive <- do.call(merge, datasets_list_disruptive)

#I select only Adjusted prices:
prices_disruptive <- Ad(prices_disruptive)

#I change the name of the columns to be equal to the tickers vector:
names(prices_disruptive) = tickers_disruptive

firstprices_disruptive = as.vector(prices_disruptive[1,])

#I can calculate the holding portfolio return as follows:
HPR_disruptive = sweep(prices_disruptive, 2, firstprices_disruptive,'/') - 1
HPR_disruptive$Port = w1*HPR_disruptive$TTD + w2*HPR_disruptive$APPS + w3*HPR_disruptive$TSLA + w4*HPR_disruptive$PAYC

#I can calculate how much $1 invested in this portfolio would have grew over time. We use the HPR of the portfolio and add 1 to get a growth factor for each month
HPR_disruptive$inv_port = 1 * (1+HPR_disruptive$Port)
plot(HPR_disruptive$inv_port)

This is our first backtesting, were we checked the historical performance of our portfolios. As we have seen, the less conservative the portfolio, the higher the return of the portfolio. This indicates us that the portfolios were perfectly elaborated, since we know the higher the risk, the higher the return.

For the very first backtesting, if we have invested 1 in our conservative portfolio from JAN 2020, to DEC 2020 we would have around 2.5 USD. Second, if we have invested 1 in our aggresive portfolio from JAN 2020, to DEC 2020 we would have around 5 USD. Finally, if we have invested 1 in our disruptive portfolio from JAN 2020, to DEC 2020 we would have around 6 USD.

Do a back-testing to calculate the return you would have reached for each portfolio during 2021 (from Jan 2021 to Jun or Oct 2021).

Conservative

Conservative portfolio

Portfolio expected return: 3.53% Portfolio standard deviation: 10.52%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0.0000 0.0012 0.0000 0.0000 0.0000 0.0724 0.1894 0.1211 0.2738 0.3422

We start downloading historical monthly prices from Yahoo Finance:

#I indicate the respective weight obtained in the conservative portfolio.
w1 = .0012
w2 = .0724
w3 = .1894
w4 = .1211
w5 = .2738
w6 = .3422
weights_conservative_2 = c(w2, w3, w4, w5, w6)

#I declare the variable with the tickers of conservative inside.
tickers_conservative_2 = c("EXPI", "APPS", "TSLA", "NSP", "NRT", "PAYC")

#January 2021 - October 2021
getSymbols(Symbols = tickers_conservative_2, periodicity="monthly",from="2021-01-01", to="2021-10-31")

## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols

## [1] "EXPI" "APPS" "TSLA" "NSP"  "NRT"  "PAYC"

datasets_list_conservative_2 <- lapply(tickers_conservative_2, get)
prices_conservative_2 <- do.call(merge, datasets_list_conservative_2)

#I select only Adjusted prices:
prices_conservative_2 <- Ad(prices_conservative_2)

#I change the name of the columns to be equal to the tickers vector:
names(prices_conservative_2) = tickers_conservative_2

firstprices_conservative_2 = as.vector(prices_conservative_2[1,])

#I can calculate the holding portfolio return as follows:
HPR_conservative_2 = sweep(prices_conservative_2, 2, firstprices_conservative_2,'/') - 1
HPR_conservative_2$Port = w1*HPR_conservative_2$EXPI + w2*HPR_conservative_2$APPS + w3*HPR_conservative_2$TSLA + w4*HPR_conservative_2$NSP + w5*HPR_conservative_2$NRT + w6*HPR_conservative_2$PAYC

#I can calculate how much $1 invested in this portfolio would have grew over time. We use the HPR of the portfolio and add 1 to get a growth factor for each month
HPR_conservative_2$inv_port = 1 * (1+HPR_conservative_2$Port)
plot(HPR_conservative_2$inv_port)

Aggressive

Portfolio expected return: 5.82% Portfolio standard deviation: 13.20%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0.0000 0.0000 0.0000 0.0000 0.0618 0.3083 0.2938 0.0000 0.0331 0.3030

We start downloading historical monthly prices from Yahoo Finance:

#I indicate the respective weight obtained in the conservative portfolio.
w1 = .0618
w2 = .3083
w3 = .2938
w4 = .0331
w5 = .3030
weights_aggressive_2 = c(w1, w2, w3, w4, w5)

#I declare the variable with the tickers of conservative inside.
tickers_aggressive_2 = c("TTD", "APPS", "TSLA", "NRT", "PAYC")

#January 2021 - October 2021
getSymbols(Symbols = tickers_aggressive_2, periodicity="monthly",from="2021-01-01", to="2021-10-31")

## [1] "TTD"  "APPS" "TSLA" "NRT"  "PAYC"

datasets_list_aggressive_2 <- lapply(tickers_aggressive_2, get)
prices_aggressive_2 <- do.call(merge, datasets_list_aggressive_2)

#I select only Adjusted prices:
prices_aggressive_2 <- Ad(prices_aggressive_2)

#I change the name of the columns to be equal to the tickers vector:
names(prices_aggressive_2) = tickers_aggressive_2

firstprices_aggressive_2 = as.vector(prices_aggressive_2[1,])

#I can calculate the holding portfolio return as follows:
HPR_aggressive_2 = sweep(prices_aggressive_2, 2, firstprices_aggressive_2,'/') - 1
HPR_aggressive_2$Port = w1*HPR_aggressive_2$TTD + w2*HPR_aggressive_2$APPS + w3*HPR_aggressive_2$TSLA + w4*HPR_aggressive_2$NRT + w5*HPR_aggressive_2$PAYC

#I can calculate how much $1 invested in this portfolio would have grew over time. We use the HPR of the portfolio and add 1 to get a growth factor for each month
HPR_aggressive_2$inv_port = 1 * (1+HPR_aggressive_2$Port)
plot(HPR_aggressive_2$inv_port)

Disruptive

Portfolio expected return: 6.73% Portfolio standard deviation: 15.61%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0 0.000000 0 0 0.159614 0.439114 0.300944 0.000000 0.000000 0.100329

We start downloading historical monthly prices from Yahoo Finance:

#I indicate the respective weight obtained in the conservative portfolio.
w1 = .1596
w2 = .4391
w3 = .3009
w4 = .1003
weights_disruptive_2 = c(w1, w2, w3, w4)

#I declare the variable with the tickers of conservative inside.
tickers_disruptive_2 = c("TTD", "APPS", "TSLA", "PAYC")

#January 2021 - October 2021
getSymbols(Symbols = tickers_disruptive_2, periodicity="monthly",from="2021-01-01", to="2021-10-31")

## [1] "TTD"  "APPS" "TSLA" "PAYC"

datasets_list_disruptive_2 <- lapply(tickers_disruptive_2, get)
prices_disruptive_2 <- do.call(merge, datasets_list_disruptive_2)

#I select only Adjusted prices:
prices_disruptive_2 <- Ad(prices_disruptive_2)

#I change the name of the columns to be equal to the tickers vector:
names(prices_disruptive_2) = tickers_disruptive_2

firstprices_disruptive_2 = as.vector(prices_disruptive_2[1,])

#I can calculate the holding portfolio return as follows:
HPR_disruptive_2 = sweep(prices_disruptive_2, 2, firstprices_disruptive_2,'/') - 1
HPR_disruptive_2$Port = w1*HPR_disruptive_2$TTD + w2*HPR_disruptive_2$APPS + w3*HPR_disruptive_2$TSLA + w4*HPR_disruptive_2$PAYC

#I can calculate how much $1 invested in this portfolio would have grew over time. We use the HPR of the portfolio and add 1 to get a growth factor for each month
HPR_disruptive_2$inv_port = 1 * (1+HPR_disruptive_2$Port)
plot(HPR_disruptive_2$inv_port)

Now, we have realized a backtesting for this year, if we have invested 1 in our conservative portfolio from JAN 2021, to OCT 2021 we would have around 1.8 USD. Second, if we have invested 1 in our aggresive portfolio from JAN 2021, to OCT 2021 we would have around 1.4 USD. Finally, if we have invested 1 in our disruptive portfolio from JAN 2021, to OCT 2021 we would have around 1.3 USD. The performance of each porfolio were greater last year.

Conclusion

In conclusion, after a deep research I determine that great explanatory variables could be Price to earnings, Price to book and Book to market ratio. We run a logit model in order to determine the probability of beat the market in the future. We got that the three variables were significant and statistically supported since their p value were lower than 0.05. Then, we run many market models and we got the top 40 companies with higher chances to beat the market, after this, we collect the 10 companies with the highest B1 and a trusted p value.

Once we got our 10 winning stocks, we start to design 3 diverse financial portfolios, based on the optimal portfolio theory:

Conservative portfolio

Portfolio expected return: 3.53% Portfolio standard deviation: 10.52%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0.0000 0.0012 0.0000 0.0000 0.0000 0.0724 0.1894 0.1211 0.2738 0.3422

Aggressive portfolio

Portfolio expected return: 5.82% Portfolio standard deviation: 13.20%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0.0000 0.0000 0.0000 0.0000 0.0618 0.3083 0.2938 0.0000 0.0331 0.3030

Disruptive portfolio

Portfolio expected return: 6.73% Portfolio standard deviation: 15.61%

TVTY EXPI SQ BLMN TTD APPS TSLA NSP NRT PAYC 0 0.000000 0 0 0.159614 0.439114 0.300944 0.000000 0.000000 0.100329

After we create the porfolios, we run two backtestings to supervise the historical performance of each portfolio. We got that the performance of each portfolio were greater last year. In my opinion, the three portfolios are an excellent option; nevertheless, due the macroeconomic context I would invest in the conservative one.

Bibliography

https://www.investopedia.com/terms/b/booktomarketratio.asp

https://www.investopedia.com/terms/p/price-earningsratio.asp

https://www.investopedia.com/terms/p/price-to-bookratio.asp

https://www.investopedia.com/terms/c/covariance.asp

https://www.investopedia.com/terms/c/correlation.asp

https://rpubs.com/cdorante/fz2024_w1

https://rpubs.com/cdorante/fz2024_w2

https://rpubs.com/cdorante/fz2024_w3

Final Evidence - Financial Modeling and Programming

Rafael Romo - A01701493

Final Evidence - Financial Modeling and Programming

Problem Setup

Fundamental analysis with a probabilistic model

Explain what type of modeling is used for the first screening process.

The logit regression and its applications in Finance

Justify which explanatory variables you propose to use in the model with bibliographic references

Explain the data management steps to create the dependent variables and the financial ratios/variables.

Run and show the results of the model. Provide a clear interpretation of the logit model. Make sure you interpret the type of effect and the magnitude of the coefficients and their significant levels.

Explain the screening process to select 40 firms. Show the list of firms along with their industries.

The market model

Explain what type of model you used in the second screening.

The Linear regression model

Explain the data management process used before you run the market regression models.

Explain the algorithm used to run one regression model for each stock

Generate the regression models for all stocks and show the main results for all models: Beta0, Beta1, standard errors, p-values.

Justify the criteria you used to select only 10 firms based on the previous results of the models. You have to show that you understand the meaning of beta0 and beta1 and their corresponding standard errors and p-values.

Select and show the top 10 firms that satisfy your criteria.

Portfolio optimization – design of portfolios

Explain how you use portfolio theory for the design of your final portfolios with the 10 selected stocks.

Explain your data management and estimation processes needed (expected returns, variance-covariance matrix, etc) before you do portfolio optimization

Expected return

Covariance

Correlation

Calculate the Global Minimum Variance (GMV) portfolio and interpret the result.

Generate 1,000 portfolios with random weights and explain the process and plot of these 1,000 portfolios

Calculate the efficient frontier and interpret the results

Calculate the tangent (optimal) portfolio and interpret the results

Propose 2 portfolios: the aggressive and the conservative portfolios. Explain your criteria for this proposal. Show the expected return and expected risk of these 2 portfolios.

Conservative portfolio

Aggressive portfolio

Propose a disruptive portfolio based on 1 cognitive bias from Behavioral Finance. Make your any assumption you wish so that you justify your portfolio design base on the cognitive bias you select. Estimate the expected return and risk of this portfolio.

3 portfolios along with their expected risk and return.

Backtesting

Do a back-testing to calculate the return of each portfolio in the period from January 2020 to December 2020.

Conservative

Aggressive

Disruptive

Do a back-testing to calculate the return you would have reached for each portfolio during 2021 (from Jan 2021 to Jun or Oct 2021).

Conservative

Aggressive

Disruptive

Conclusion

Bibliography