Workshop 2, Financial Modeling and Programming

Author

Alberto Dorantes, Ph.D.

Published

November 12, 2024

Abstract

This is an INDIVIDUAL workshop. In this workshop we review the market regression model to calculate aplha and beta, and also we learn how to run the model many times using a loop. We use the results of alpha and beta to select stocks.

1 The Linear regression model

The simple linear regression model is used to understand the linear relationship between two variables assuming that one variable, the independent variable (IV), can be used as a predictor of the other variable, the dependent variable (DV). In this part we illustrate a simple regression model with the Market Model.

The Market Model states that the expected return of a stock is given by its alpha coefficient (b0) plus its market beta coefficient (b1) multiplied times the market return. In mathematical terms:

E[R_i] = α + β(R_M)

We can express the same equation using B0 as alpha, and B1 as market beta:

E[R_i] = β_0 + β_1(R_M)

We can estimate the alpha and market beta coefficient by running a simple linear regression model specifying that the market return is the independent variable and the stock return is the dependent variable. It is strongly recommended to use continuously compounded returns instead of simple returns to estimate the market regression model. The market regression model can be expressed as:

r_{(i,t)} = b_0 + b_1*r_{(M,t)} + ε_t

Where:

ε_t is the error at time t. Thanks to the Central Limit Theorem, this error behaves like a Normal distributed random variable ∼ N(0, σ_ε); the error term ε_t is expected to have mean=0 and a specific standard deviation σ_ε (also called volatility).

r_{(i,t)} is the return of the stock i at time t.

r_{(M,t)} is the market return at time t.

b_0 and b_1 are called regression coefficients.

2 Running a market regression model with real data

2.1 Data collection

We first load the quantmod package and download monthly price data for Tesla and the S&P500 market index. We also merge both datasets into one:

# load package quantmod
library(quantmod)

# Download the data
getSymbols(c("TSLA", "^GSPC"), from="2019-01-01", to= "2022-10-31", periodicity="monthly")

[1] "TSLA" "GSPC"

#Merge both xts-zoo objects into one dataset, but selecting only adjusted prices:

adjprices<-Ad(merge(TSLA,GSPC))

2.2 Return calculation

We calculate continuously returns for both, Tesla and the S&P500:

returns <- diff(log(adjprices)) 
#I dropped the na's:
returns <- na.omit(returns)

#I renamed the columns:
colnames(returns) <- c("TSLA", "GSPC")

2.3 Visualize the relationship

Do a scatter plot putting the S&P500 returns as the independent variable (X) and the stock return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns.Type:

plot.default(x=returns$GSPC,y=returns$TSLA)
abline(lm(returns$TSLA ~ returns$GSPC),col='blue')

# As you see, I indicated that the Market returns goes in the X axis and 
#   Tesla returns in the Y axis. 
# In the market model, the independent variable is the market returns, while
#   the dependent variable is the stock return

Sometimes graphs can be deceiving. Always check the Y sale and the X scale. In this case, the X goes from -0.10 to 0.10, while the Y scale goes from -0.20 to 0.40. Then, the real slope of the line should be steeper.

We can change the X scale so that both Y and X axis have similar ranges:

# I indicate that X axis goes from -0.7 to 0.7
plot.default(x=returns$GSPC,y=returns$TSLA, xlim=c(-0.7,0.7))
abline(lm(returns$TSLA ~ returns$GSPC),col='blue')

Now we see that the the market and stock returns have a similar scale. With this we can better appreciate their linear relationship.

WHAT DOES THE PLOT TELL YOU? BRIEFLY EXPLAIN

2.4 Running the market regression model

We can run the market regression model with the lm() function. The first parameter of the function is the DEPENDENT VARIABLE (in this case, the stock return), and the second parameter must be the INDEPENDENT VARIABLE, also named the EXPLANATORY VARIABLE (in this case, the market return).

What you will get is called The Market Regression Model. You are trying to examine how the market returns can explain stock returns from Jan 2017 to Dec 2020.

Assign your market model to an object named “model1”:

model1 <- lm(TSLA ~ GSPC, data=returns)
summary(model1)


Call:
lm(formula = TSLA ~ GSPC, data = returns)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.35579 -0.07596 -0.00766  0.10567  0.40883 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.03627    0.02386   1.520    0.136    
GSPC         2.16271    0.42682   5.067 8.13e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1584 on 43 degrees of freedom
Multiple R-squared:  0.3739,    Adjusted R-squared:  0.3593 
F-statistic: 25.67 on 1 and 43 DF,  p-value: 8.126e-06

YOU HAVE TO INTERPRET THIS MODEL. MAKE SURE YOU RESPOND THE FOLLOWING THE QUESTIONS:

- IS THE STOCK SIGNIFICANTLY OFFERING RETURNS OVER THE MARKET?

SINCE BETA0 IS POSITIVE WE CAN SAY THAT, ON AVERAGE, TSLA IS OFFERING RETURNS OVER THE MARKET. HOWEVER, IT IS NOT SIGNIFICANTLY OFFERING RETURNS OVER THE MARKET AT THE 95% CONFIDENCE LEVEL SINCE ITS PVALUE IS GREATER THAN 0.05 (IT IS 0.136).

- IS THE STOCK SIGNIFICANTLY RISKIER THAN THE MARKET?

BETA1 IS 2.1627124, SO IT IS RISKIER THAN THE MARKET ON AVERAGE. TO KNOW WHETHER IT IS SIGNIFICANTLY RISKIER THAN THE MARKET WE NEED TO GET THE 95% CONFIDENCE INTERVAL (CI) AND CHECK WHETHER THIS INTERVAL IS GREATER THAN 1. IF THE MINIMUM OF THIS 95% CI IS GREATER THAN 1, THEN WE CAN SAY THAT THE STOCK IS SIGNIFICANTLY RISKIER THAN THE MARKET.

THEN, THE MINIMUM OF BETA 1 IS 2.1627124 MINUS 2 TIMES ITS STANDARD ERROR 0.4268225. THEN, THE MINIMUM 95%CI OF BETA1 IS 1.3090674. THEN, SINCE WE GOT A MINIMUM OF BETA1 THAT IS GREATER THAN 1, WE CAN SAY THAT TESLA IS SIGNIFICANTLY RISKIER THAN THE MARKET.

We can save the regression summary in another variable and display the regression coefficients:

s <-summary(model1)
s$coefficients

              Estimate Std. Error  t value     Pr(>|t|)
(Intercept) 0.03626792 0.02386133 1.519945 1.358444e-01
GSPC        2.16271240 0.42682249 5.067007 8.125628e-06

The beta0 coefficient is in the first row (the intercept), while the beta1 coefficient is in the second raw.

The regression coefficients will be in the first column of this matrix. The second column is the standard error of the coefficients, which are the standard deviation of the coefficients. The third column is the t-value of each coefficient, and finally their p-values in the fourth column.

We can get beta0, beta1, standard errors, t-values and p-values and store in other variables:

b0= s$coefficients[1,1]
b1= s$coefficients[2,1]

stderr_b0 = s$coefficients[1,2]
stderr_b1 = s$coefficients[2,2]

tval_b0 = s$coefficients[1,3]
tval_b1 = s$coefficients[2,3]

pvalue_b0 = s$coefficients[1,4]
pvalue_b1 = s$coefficients[2,4]

The reference [2,1] of the matrix s$coefficients means that I want to get the element of the matrix that is in the row #2 and the column #1. In this position we can find the beta1 of the model.

Then, any R matrix you can make reference as matrix[#row,#column] to get the value of any cell according to its position in row and column numbers.

We can also get the coefficients applying the function coef to the regression object:

coefs <- coef(model1)
coefs

(Intercept)        GSPC 
 0.03626792  2.16271240

b0 = coefs[1]
b1 = coefs[2]

3 Automating the calculation of market models for many stocks

3.1 Automating downloading and return calculation

If we want to analyze the market risk of many stocks, we need to calculate the market model for many stocks. The challenge here will be to automate the process of estimating market models for any list of tickers.

Let’s start with an easy example of 2 stocks:

Here we will learn how to automate the downloading of many stock prices and calculate returns starting with a definition of a set of tickers.

We define a vector with the tickers for Oracle, Netflix, and the S&P500, and download the monthly stock prices:

# Set the ticker ^GSPC as GSPC (previously defined name)
# The seSymbolLookup function helps to avoid having problems with tickers
#   with special characters such as ^GSPC
setSymbolLookup(GSPC = list(name="^GSPC"))
#Now I can keep using the ticker without the ^character, which sometimes 
#   cause problems

# I define a list of tickers including the market index GSPC:
tickers <- c("ORCL","NFLX", "GSPC")

# I get monthly data for the 3 tickers:
getSymbols(tickers, periodicity = "monthly",
             from = "2019-01-01", to = "2022-10-31")

[1] "ORCL" "NFLX" "GSPC"

We can clone a dataset with the get function:

mktindex <- get("GSPC")

The get function receives a name of a dataset and then create a new dataset.

Now mktindex is a clone of the GSPC dataset. Why this can be useful?

As programmers, sometimes we do not know which stocks (tickers) we will be using in our program. Then, it is a good idea to find a way to clone a dataset that has a specific name, which can be any ticker name.

We can clone 2 datasets as data1 and data2 for the stocks:

data1 = get("ORCL")
data2 = get("NFLX")

To avoid writing one get function for each ticker, we can use the lapply function, which applies a function to a list of elements of a vector.

We apply the get function to the tickers vector:

datasets_list <- lapply(tickers, get)

Now we have the 3 datasets inside a list; each element of the list is an xts dataset with the stock price data.

We can see the content of the first dataset:

head(datasets_list[[1]])

           ORCL.Open ORCL.High ORCL.Low ORCL.Close ORCL.Volume ORCL.Adjusted
2019-01-01     44.48     50.50    44.41      50.23   324808100      45.72005
2019-02-01     50.52     53.03    49.82      52.13   257843100      47.63733
2019-03-01     52.50     54.38    51.00      53.71   370697000      49.08115
2019-04-01     53.88     55.53    53.33      55.33   193506500      50.56153
2019-05-01     55.32     55.42    50.51      50.60   240092700      46.44626
2019-06-01     50.56     57.47    49.89      56.97   417219900      52.29334

We can merge the 3 datasets into one with merge:

prices<-merge(datasets_list[[1]], datasets_list[[2]], datasets_list[[3]])

Another faster way to do this is using the do.call function, which runs a function like merge with more than 1 argument:

prices <- do.call(merge, datasets_list)

I do this to generalize our program for any number of tickers! The do.call function is similar to the lapply function. For the do.call function, the first argument is the function, and the second argument is the list of datasets.

Now I select adjusted prices and name the columns with the ticker names:

# We select only Adjusted prices:
prices <- Ad(prices)
# We change the name of the columns to be equal to the tickers vector:
names(prices)<-tickers

With this code, you can change the list of tickers from 2 to any set of tickers, and you will end up with a clean dataset of only adjusted prices for all the tickers.

Another much efficient way to do the previous data management is using tidyverse. Let’s see a chunk that performs all the previous processes from getSymbols to merge:

library(tidyverse)
setSymbolLookup(GSPC = list(name="^GSPC"))
tickers <- c("ORCL","NFLX", "GSPC")

prices2 <- getSymbols(tickers,periodicity = "monthly",
             from = "2020-01-01", to = "2023-10-31") |>
  map(~Ad(get(.))) |>
  reduce(merge) |>
  `colnames<-`(tickers)

prices2 dataset will be exactly the same as prices.

Can you follow the logic of this chunk?

Now we calculate continuously compounded returns for all tickers:

returns <- na.omit(diff(log(prices)))

3.2 Running manually the market model for 2 stocks

Now we can run the market model for both stocks manually:

model1 <- lm(returns$ORCL ~ returns$GSPC)
summary(model1)


Call:
lm(formula = returns$ORCL ~ returns$GSPC)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.100983 -0.034276 -0.005166  0.033795  0.166705 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.003327   0.008236   0.404    0.688    
returns$GSPC 0.983316   0.147328   6.674 3.82e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05469 on 43 degrees of freedom
Multiple R-squared:  0.5088,    Adjusted R-squared:  0.4974 
F-statistic: 44.55 on 1 and 43 DF,  p-value: 3.819e-08

model2 <- lm(returns$NFLX ~ returns$GSPC)
summary(model2)


Call:
lm(formula = returns$NFLX ~ returns$GSPC)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.54727 -0.04276  0.00577  0.06175  0.19953 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.01343    0.01860  -0.722 0.474206    
returns$GSPC  1.26228    0.33270   3.794 0.000459 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1235 on 43 degrees of freedom
Multiple R-squared:  0.2508,    Adjusted R-squared:  0.2334 
F-statistic:  14.4 on 1 and 43 DF,  p-value: 0.0004587

We can also run these models using the reference of the columns:

# Since ORCL is in the 1st column, and the GSPC index in the 3rd column:
model1 <- lm(returns[,1] ~ returns[,3])
# Since NFLX is in the 2nd column, and the GSPC index in the 3rd column:
model2 <- lm(returns[,2] ~ returns[,3])
summary(model1)


Call:
lm(formula = returns[, 1] ~ returns[, 3])

Residuals:
      Min        1Q    Median        3Q       Max 
-0.100983 -0.034276 -0.005166  0.033795  0.166705 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.003327   0.008236   0.404    0.688    
returns[, 3] 0.983316   0.147328   6.674 3.82e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05469 on 43 degrees of freedom
Multiple R-squared:  0.5088,    Adjusted R-squared:  0.4974 
F-statistic: 44.55 on 1 and 43 DF,  p-value: 3.819e-08

summary(model2)


Call:
lm(formula = returns[, 2] ~ returns[, 3])

Residuals:
     Min       1Q   Median       3Q      Max 
-0.54727 -0.04276  0.00577  0.06175  0.19953 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.01343    0.01860  -0.722 0.474206    
returns[, 3]  1.26228    0.33270   3.794 0.000459 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1235 on 43 degrees of freedom
Multiple R-squared:  0.2508,    Adjusted R-squared:  0.2334 
F-statistic:  14.4 on 1 and 43 DF,  p-value: 0.0004587

The only thing that changes in the lm models is the column name used for the dependent variable (1 or 2); the rest of the parameters is exactly the same.

How can we run these models using a loop?

A loop helps us to execute a set of commands many times. Let’s see this example:

for (i in 1:2) {
  model <- lm(returns[,i] ~ returns[,3])
  print(summary(model))
  
}


Call:
lm(formula = returns[, i] ~ returns[, 3])

Residuals:
      Min        1Q    Median        3Q       Max 
-0.100983 -0.034276 -0.005166  0.033795  0.166705 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.003327   0.008236   0.404    0.688    
returns[, 3] 0.983316   0.147328   6.674 3.82e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05469 on 43 degrees of freedom
Multiple R-squared:  0.5088,    Adjusted R-squared:  0.4974 
F-statistic: 44.55 on 1 and 43 DF,  p-value: 3.819e-08


Call:
lm(formula = returns[, i] ~ returns[, 3])

Residuals:
     Min       1Q   Median       3Q      Max 
-0.54727 -0.04276  0.00577  0.06175  0.19953 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.01343    0.01860  -0.722 0.474206    
returns[, 3]  1.26228    0.33270   3.794 0.000459 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1235 on 43 degrees of freedom
Multiple R-squared:  0.2508,    Adjusted R-squared:  0.2334 
F-statistic:  14.4 on 1 and 43 DF,  p-value: 0.0004587

4 Use a Loop to generate information and store it in a Matrix.

An analyst made an analysis and tells you that financial leverage has a negative quadratic effect on the expected financial performance. This is a typical inverted U-shaped relationship. In other words, financial leverage has a positive effect on performance up to a certain level of leverage; after this level the relationship becomes negative: the more financial leverage, the less the firm performance. The equation is the following:

Performance=20-(leverage-4)^{2} Leverage is measure from level 0 (no leverage) up to the maximum leverage =10. Calculate the expected firm performance for leverage levels 0 to 10 and store the results in a matrix:

# I initialize the matrix as empty:
matrix_results = c()
# I loop over i that will take values from 0 to 1 jumping by 1:
for (i in seq(from=0,to=10, by=1)) {
  # I calculate the performance according to the leverage level i:
  performance = 20 - (i - 4)^2
  # I put the leverage and performance in a vector:
  result = c(i,performance)
  # I append/ attach the result vector to the matrix
  # rbind stands for row bind, so I attach the result vector as a row below to whatever the matrix has 
  #   at the moment of the iteration
  matrix_results = rbind(matrix_results,result)
}
# I change the matrix to a data frame; data frames have advantages compared to matrices

matrix_results = as.data.frame(matrix_results)
# I set the names for the columns:
names(matrix_results) = c("leverage","performance")
matrix_results

          leverage performance
result           0           4
result.1         1          11
result.2         2          16
result.3         3          19
result.4         4          20
result.5         5          19
result.6         6          16
result.7         7          11
result.8         8           4
result.9         9          -5
result.10       10         -16

# plotting performance for different leverage levels:
plot(x=matrix_results$leverage, y=matrix_results$performance)

Imagine that I want to select only those cases where performance was positive. I can do a selection of a data frame as follows:

library(dplyr)
selected_cases <- matrix_results |>
      filter(performance>0)
selected_cases

         leverage performance
result          0           4
result.1        1          11
result.2        2          16
result.3        3          19
result.4        4          20
result.5        5          19
result.6        6          16
result.7        7          11
result.8        8           4

5 CHALLENGE 1

The problem here is that we run both models to create the linear regressions, but we only displayed the models, but did not save the beta information for each stock. Adjust the loop of section 4.2 to store the data in a matrix that contains B0, and B1 coefficients, standard errors and PValues for each company.

Answer the following questions: HOW CAN WE STORE THIS INFORMATION FOR BOTH STOCKS? WHICH STRUCTURE CAN YOU USE?

HOW CAN YOU SELECT THOSE STOCKS THAT HAVE A SIGNIFICANT BETA0, GREATER THA ZERO?

6 CHALLENGE 1 SOLUTION

I first write the STEPS of the algorithm:

For each stock I need to store beta0, beta1, and their standard errors and pvalues. I start initializing an empty matrix betas where I will be storing the beta results for each stock. Then, I write a for loop from i=1 to 2, and in each iteration I do the following:

Run the market regression model for stock i
Save the summary of the regression along with the coefficients matrix
From the coefficients matrix extract b0,b1,stderrb0,stderrb1,pvalueb0, pvalueb1
Save these values in a temporal vector
Accumulate this vector into the beta matrix

After the loop ends, rename the rows with the ticker names, and the columns with the corresponding naems beta0, stdb0, etc.

Finally, I identify whether any of these stocks are offering returns over the market by selecting the stocks with beta0 that are positive and significant.

# I initialize an empty vector/matrix betas: 
betas =c()
for (i in 1:2) {
  # I run the market regression model for stock i:
  model <- lm(returns[,i] ~ returns$GSPC)
  #print(summary(model))
  # I store the summary of the model in a temporal object:
  smodel = summary(model)
  # From the coefficients matrix I extract the coefficients along with their standard errors and pvalues:
  b0 = smodel$coefficients[1,1]
  b1 = smodel$coefficients[2,1]
  seb0 = smodel$coefficients[1,2]
  seb1 =smodel$coefficients[2,2]
  pb0 = smodel$coefficients[1,4]
  pb1 = smodel$coefficients[2,4]
  # I put the values together in a temporal vector:
  vectorbetas = c(b0,b1,seb0,seb1,pb0,pb1)
  # I bind this vector with the betas cumulative matrix:
  betas = rbind(betas,vectorbetas) 
}
# Finally I rename columns and rows of the betas matrix:
rownames(betas) = tickers[1:2]
colnames(betas) =c("b0","b1","seb0","seb1","pvalueb0","pvalueb1")
# I create a data frame from the beta matrix:
betasdf = data.frame(betas)
betasdf

               b0        b1        seb0      seb1  pvalueb0     pvalueb1
ORCL  0.003326706 0.9833164 0.008236294 0.1473277 0.6882821 3.818593e-08
NFLX -0.013428536 1.2622757 0.018599214 0.3326957 0.4742058 4.586850e-04

Looking at the content of the data frame, none of these 2 stocks have a significant and positive beta0 since their pvalues are greater than 0.05. If I had not only 2 stocks, but many stocks, I can write a query with dplyr to check which stocks have significant and positive beta0:

library(dplyr)
stocks_over_market <- betasdf |> 
          arrange(desc(b0)) |>
          filter(b0>0,pvalueb0<0.05)
stocks_over_market

[1] b0       b1       seb0     seb1     pvalueb0 pvalueb1
<0 rows> (or 0-length row.names)

In this case, the stocks_over_market data frame ended up empty since there is no stocks with positive and significant beta0

7 CHALLENGE 2

Based on the preliminary selection of 100 stocks on the Challenge of workshop 1:

Create a list with the ticker of each company
Using a Loop, generate the market model for each company and store the data
Based on the data of each company select a set of 10 companies based on any of these criteria:
1. CRITERIA A: Stocks that are SIGNIFICANTLY offering returns over the market
2. CRITERIA B: Stocks that are SIGNIFICANTLY less risky than the market according to the market model regressions.

8 CHALLENGE 2 SOLUTION

I will use the data management code of Workshop 1 to prepare the data for the logistic model:

# Load the datasets 
uspanel = read.csv("dataus2024.csv")
usfirms = read.csv("firmsus2024.csv")

I calculate the period EBIT for all stocks-quarters:

# I check the cases with sgae=NA and revenue>0: 
uspanel |> filter(q=="2024q2", !is.na(revenue), is.na(sgae)) |> select(firm,q,revenue,cogs,sgae)

       firm      q   revenue     cogs sgae
1       AFL 2024q2  10575000        0   NA
2       AIG 2024q2  13323000        0   NA
3       AJX 2024q2    -28436    54044   NA
4       ALL 2024q2  30973000        0   NA
5      AMSF 2024q2    156319        0   NA
6      AMTB 2024q2    324942   273607   NA
7       ARR 2024q2    271405   305239   NA
8       ASB 2024q2   1183806   932729   NA
9      AUBN 2024q2     20596    16641   NA
10      AXP 2024q2  36204000 26732000   NA
11      BAC 2024q2  96600000 78951000   NA
12     BCBP 2024q2     97605    80773   NA
13      BFH 2024q2   2418000  1438000   NA
14     BFIN 2024q2     37737    32798   NA
15       BK 2024q2  19542000 16664000   NA
16      BOH 2024q2    508258   410187   NA
17     BOKF 2024q2   1738434  1404510   NA
18      BWB 2024q2    122860   100648   NA
19     BXMT 2024q2    269165    63041   NA
20        C 2024q2  86453000 72758000   NA
21     CCBG 2024q2    133289    98574   NA
22      CFB 2024q2    253627   203081   NA
23     CFFI 2024q2     81836    65557   NA
24      CFR 2024q2   1404352  1038203   NA
25     CINF 2024q2   5479000        0   NA
26      CMA 2024q2   2517000  2067000   NA
27      CNA 2024q2   6963000        0   NA
28     CNOB 2024q2    267861   212534   NA
29      COF 2024q2  26331000 17506000   NA
30      CPF 2024q2    173607   129861   NA
31     CRBG 2024q2   9546000        0   NA
32     DCOM 2024q2    342702   281600   NA
33      DFS 2024q2  11656000  6946000   NA
34       DX 2024q2    199844   168029   NA
35     EFSC 2024q2    447019   328628   NA
36      EIG 2024q2    440100        0   NA
37      EQH 2024q2   5740000        0   NA
38     ERIE 2024q2    357926        0   NA
39      FAF 2024q2   3036900        0   NA
40     FBIZ 2024q2    127875   100864   NA
41     FCBC 2024q2     91419    57560   NA
42     FFIN 2024q2    363818   228605   NA
43      FHN 2024q2   2546000  1927000   NA
44     FIBK 2024q2    733400   564000   NA
45     FLIC 2024q2     89680    79505   NA
46  FNM_old 2024q2  75100000 64543000   NA
47  FRE_old 2024q2  59507000 52018000   NA
48     FRME 2024q2    530253   404995   NA
49     GBCI 2024q2    615429   513073   NA
50      GNW 2024q2   3633000        0   NA
51       GS 2024q2  63089000 53336000   NA
52     HBAN 2024q2   5814000  4511000   NA
53     HBCP 2024q2     96888    73966   NA
54      HBT 2024q2    140021    92846   NA
55      HIG 2024q2  12905000        0   NA
56     HOMB 2024q2    728791   452487   NA
57     HTGC 2024q2    217855    85039   NA
58      HTH 2024q2    791680   715973   NA
59     IBOC 2024q2    513523   243673   NA
60     ISTR 2024q2     77010    67871   NA
61      JPM 2024q2 142257000 96593000   NA
62      KEY 2024q2   5394000  4580000   NA
63     KNSL 2024q2    757344        0   NA
64     KREF 2024q2    315704   255682   NA
65        L 2024q2   8498000        0   NA
66     LADR 2024q2    258691   198479   NA
67     LMND 2024q2    241100        0   NA
68      LNC 2024q2   9269000        0   NA
69     MBWM 2024q2    176152   120850   NA
70      MCB 2024q2    241239   191005   NA
71      MCY 2024q2   2579080        0   NA
72      MET 2024q2  33880000        0   NA
73      MKL 2024q2   8168498        0   NA
74      MTG 2024q2    599638        0   NA
75     MYFW 2024q2     90701    83301   NA
76     NAVI 2024q2   2252000  2090000   NA
77     NBHC 2024q2    295902   222484   NA
78      NLY 2024q2   2775496  2308167   NA
79     NMFC 2024q2    185144   108916   NA
80     NMIH 2024q2    318375        0   NA
81     NRIM 2024q2     90138    68049   NA
82     NTRS 2024q2   8263400  6799700   NA
83     NYMT 2024q2     81792   185701   NA
84     OCFC 2024q2    344299   269833   NA
85      OMF 2024q2   2746000  1444000   NA
86      ONB 2024q2   1424437  1060081   NA
87      ORC 2024q2    128037   113240   NA
88      ORI 2024q2   3887600        0   NA
89     PEBO 2024q2    307846   222333   NA
90      PFG 2024q2   8364100        0   NA
91      PMT 2024q2    488771   427781   NA
92      PNC 2024q2  17058000 13193000   NA
93     PPBI 2024q2    465481   340164   NA
94      PRK 2024q2    310538   214501   NA
95      PRU 2024q2  38392000        0   NA
96      RDN 2024q2    640565        0   NA
97      RGA 2024q2  11215000        0   NA
98      RLI 2024q2    861273        0   NA
99     SBSI 2024q2    228225   172634   NA
100    SFST 2024q2    105096    97479   NA
101    SIGI 2024q2   2360964        0   NA
102     SLM 2024q2   1620997   866094   NA
103     SNV 2024q2   1573989  1354650   NA
104 SOV_old 2024q2   8692481  7090498   NA
105    SRCE 2024q2    283014   190749   NA
106     SSB 2024q2   1195162   852221   NA
107    STBA 2024q2    282654   197576   NA
108    STEL 2024q2    312314   239700   NA
109     STT 2024q2  10728000  9218000   NA
110     SYF 2024q2  10850000  4723000   NA
111    TCBI 2024q2    931189   798657   NA
112    TFIN 2024q2    241128   222726   NA
113     THG 2024q2   3087800        0   NA
114    TIPT 2024q2   1044894        0   NA
115    TRTX 2024q2    184198   142430   NA
116     TRV 2024q2  22511000        0   NA
117     TWO 2024q2    686827   400731   NA
118    UBFO 2024q2     32421    20325   NA
119    UMBF 2024q2   1362510  1077676   NA
120     UNM 2024q2   6433700        0   NA
121     UVE 2024q2    748173        0   NA
122    UVSP 2024q2    245016   194729   NA
123    VBTX 2024q2    393074   312049   NA
124     VEL 2024q2    236625   192296   NA
125    VOYA 2024q2   4084000        0   NA
126    WAFD 2024q2   1042814   849924   NA
127     WAL 2024q2   2447600  1915600   NA
128     WFC 2024q2  63126000 49205000   NA
129     WRB 2024q2   6570805        0   NA
130    WSBC 2024q2    460311   367352   NA
131    WSFS 2024q2    694314   481972   NA
132    WTBA 2024q2     96783    83220   NA

uspanel = uspanel |>
  mutate(sgae = ifelse(!is.na(revenue) & is.na(sgae),0,sgae),
        # for some reason there are some cases of firms with revenue and cogs, but with sgae=NA;
        #    this should be the case for service firms, so I will replace sgae=0 when revenue>0;
        #    if I do not replace sgae=0 for these cases, then YTDebit cannot be calculated! 
         YTDebit = revenue - cogs - sgae, 
         ebitp = ifelse(fiscalq==1,YTDebit, YTDebit - lag(YTDebit)))

I create a column for the industry in the uspanel dataset by pulling the industry from the usfirms dataset:

# I create a temporal usfirms1 dataset with only 2 columns. I rename the empresa column with firm to have the same name for the ticker in both files:
usfirms1 = usfirms |>
  mutate(firm = empresa) |> 
  select(firm,naics1, naics2)
# I merge the usfirms1 with the uspanel to pull the industry:
uspanel = left_join(uspanel,usfirms1,by='firm')

I calculate annual stock return for all firms-quarters:

uspanel = uspanel |>
  # I sort by firm-quarter:
  arrange(firm,q) |>
  # I group by firm to make sure that the lag function works only on the 
  #   quarters of the firm : 
  group_by(firm) |>
  # I create market value, operating earnings per share deflated by stock price,  and stock annual return:
  mutate(marketvalue = originalprice * sharesoutstanding,
         oepsp = (YTDebit / sharesoutstanding) / originalprice,
         # The annual return is equal to the current stock price minus its
         #   stock price of 4 quarters ago:
         r = log(adjprice) - dplyr::lag(log(adjprice),4)
  ) |>
  # I do ungroup to undo the group_by
  ungroup()

I winsorize the oepsp:

library(statar)
hist(uspanel$oepsp)

uspanel$oepspw= winsorize(uspanel$oepsp, probs=c(0.005,0.999))
hist(uspanel$oepspw)

For each quarter-industry I calculate the median return and the binary variable rabove that indicates whether the stock returns beats the industry returns:

uspanel = uspanel |>
  group_by(q,naics1) |>
  mutate(industry_ret = median(r,na.rm=TRUE),
         rabove = ifelse(r>industry_ret,1,0)) |>
  ungroup()

I calculate another variable equal to the rabove, but 1 quarter in the future (1 lag in the future):

uspanel = uspanel |> 
  group_by(firm) |>
  mutate(F1rabove = lead(rabove,1))
# The lead function gets the future value of the variable

# I check how many rows/cases have 1 and 0's:
table(uspanel$F1rabove)


     0      1 
115114 114211

I run the logit model to examine whether oepspw is related to the probability of a stock to beat its industry return 1 quarter in the future:

logitm1 <- glm(F1rabove ~ oepspw, data= uspanel, family="binomial",na.action=na.omit)
summary(logitm1)


Call:
glm(formula = F1rabove ~ oepspw, family = "binomial", data = uspanel, 
    na.action = na.omit)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.020277   0.004256  -4.764  1.9e-06 ***
oepspw       0.884870   0.017116  51.699  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 311904  on 224995  degrees of freedom
Residual deviance: 308217  on 224994  degrees of freedom
  (332185 observations deleted due to missingness)
AIC: 308221

Number of Fisher Scoring iterations: 4

coefficients = coef(logitm1)
# The coefficients for the odd ratio interpretation are:
exp(0.1*coefficients)

(Intercept)      oepspw 
  0.9979744   1.0925201

9 Prediction with the logit model

We can use the last model to predict the probability whether the firm will beat the return of the industry 1 year in the future. For this prediction, it might be a good idea to select only the firms in the last quarter of the dataset (2024Q2), so we can predict which firm will beat the industry in 2025:

We create a dataset with only the Q2 of 2024:

uspanel2024 <- uspanel |>
          select(firm,q,yearf, oepspw, F1rabove) |>
          filter(q=="2024q2") |>
        as.data.frame()

Now we run the prediction using the model 2 with this new dataset:

uspanel2024 <- uspanel2024 |> 
  mutate(pred=predict(logitm1,newdata=uspanel2024,type=c("response")) )

The type=c(“response”) parameter indicates to get the prediction in probability. If I do not specify this option, the prediction is calculated in the logarithm of ODDS RATIO.

9.1 Selection of best stocks based on results of the logit model

We can sort the firms according to the predicted probability of beating the benchmark, and then select the top 100 with the highest probability:

top110<- uspanel2024 |>
     merge(usfirms1,by="firm") |> # merge with firms1 to pull firm name and industry
     arrange(desc(pred)) |>  # sort by probability, from highest to lowest
     #slice_head(prop=0.05) |> # select the top 5% firms  
     head(n=110)

# Show the # of firms selected:
nrow(top110)

[1] 110

I extract the 100 firms (tickers) column and put it into a char vector:

tickers = as.vector(top110$firm)

I selected the top 110 just in case there are some tickers that do not exist in Yahoo.

Now I write a for loop to get the historical prices of the 100 tickers. Here is a difficult problem to solve related to data validation. It is possible that some of the tickers are not in Yahoo Finance. If I use getSymbols with the ticker vector and it is one ticker that does not exist in Yahoo, then the getSymbols will stop running. Then, I can write a loop to run the getSymbols for each ticker and detect when a ticker is not in Yahoo.

The tryCatch function is used to run a set of R lines of code in the try mode, so if there is an error in any R code, R DOES NOT STOP RUNNING AND I can cach the error and do a specific action:

# I do a loop to run getSymbols and in each iteration I can detect which one has an error because
#   it does not exist in Yahoo Finance:
# I initialize an empty list where I will be storing the tickers that exist in Yahoo Finance:
finaltickerlist = c()
# I initilize a counter i=0 and tottickers=0. 
#  The counter i will be the counter of ticker #, while the counter tottickers 
#  will count the tickers that are found in Yahoo to know when to stop the loop after finding 100 tickers in Yahoo:
i=0
tottickers = 0
library(quantmod)
#for (t in tickers) {
# I will use a while loop instead of a for to make sure that I get exactly 100 tickers:
while (tottickers<100 & i<110) {
 # I increment i in 1:
 i = i + 1  
 # I save the ticker i in t: 
 t = tickers[i]
  tryCatch(
    {
    getSymbols(t, from="2020-01-01", to="2024-10-31",periodicity="monthly")
    # Since the ticker was found in Yahoo Finance, then:
    tottickers=tottickers + 1
    # I add the ticker found in Yahoo into the finaltickerlist vector: 
    finaltickerlist = c(finaltickerlist,t)
    cat("The ticker ", t, " was found in Yahoo \n")
    if (tottickers==1) {
      # If I am in the ticker 1 I only get the Adjusted price, but I do not merge
      #    the historical prices with the prices dataset:
      temp = get(t)
      # prices will be the final big dataset with all prices for all stocks:
      prices = Ad(temp)
    }
    else { # if tottickers>1, so this means that I already created the price dataset, 
      #  so I need to do a merge:
      temp=get(t)
      prices = merge(prices,Ad(temp))
    }
    }, 
    error = function(e) {
      cat("THE TICKER ",t," WAS NOT FOUND IN YAHOO FINANCE \n")
    }
  )
}

The ticker  AJG  was found in Yahoo 
The ticker  APA  was found in Yahoo 
The ticker  LNC  was found in Yahoo 
The ticker  TIPT  was found in Yahoo 
The ticker  UVE  was found in Yahoo 
The ticker  GNW  was found in Yahoo 
THE TICKER  OCN  WAS NOT FOUND IN YAHOO FINANCE 
The ticker  CYH  was found in Yahoo 
The ticker  PRU  was found in Yahoo 
The ticker  MCY  was found in Yahoo 
The ticker  RGA  was found in Yahoo 
The ticker  ALL  was found in Yahoo 
The ticker  THG  was found in Yahoo 
The ticker  MET  was found in Yahoo 
The ticker  UNM  was found in Yahoo 
The ticker  ATUS  was found in Yahoo 
THE TICKER  VIA__old  WAS NOT FOUND IN YAHOO FINANCE 
The ticker  VOYA  was found in Yahoo 
The ticker  CNA  was found in Yahoo 
The ticker  FAF  was found in Yahoo 
The ticker  CRBG  was found in Yahoo 
The ticker  L  was found in Yahoo 
The ticker  ORI  was found in Yahoo 
The ticker  TRV  was found in Yahoo 
The ticker  PFG  was found in Yahoo 
The ticker  BFH  was found in Yahoo 
The ticker  HIG  was found in Yahoo 
The ticker  EQH  was found in Yahoo 
The ticker  SIGI  was found in Yahoo 
The ticker  EIG  was found in Yahoo 
The ticker  UNIT  was found in Yahoo 
The ticker  MKL  was found in Yahoo 
The ticker  WRB  was found in Yahoo 
The ticker  VNCE  was found in Yahoo 
The ticker  SYF  was found in Yahoo 
The ticker  THC  was found in Yahoo 
The ticker  CINF  was found in Yahoo 
The ticker  DTIL  was found in Yahoo 
The ticker  ILPT  was found in Yahoo 
The ticker  WLFC  was found in Yahoo 
The ticker  AIG  was found in Yahoo 
The ticker  VGZ  was found in Yahoo 
The ticker  RM  was found in Yahoo 
The ticker  BIPC  was found in Yahoo 
The ticker  OMF  was found in Yahoo 
The ticker  TWO  was found in Yahoo 
The ticker  AFL  was found in Yahoo 
The ticker  LMND  was found in Yahoo 
The ticker  OI  was found in Yahoo 
The ticker  AAL  was found in Yahoo 
The ticker  AMSF  was found in Yahoo 
The ticker  SLM  was found in Yahoo 
The ticker  COF  was found in Yahoo 
The ticker  ENVA  was found in Yahoo 
The ticker  SABR  was found in Yahoo 
The ticker  EZPW  was found in Yahoo 
The ticker  CHTR  was found in Yahoo 
The ticker  DIT  was found in Yahoo 
The ticker  DFS  was found in Yahoo 
The ticker  HZO  was found in Yahoo 
The ticker  GMS  was found in Yahoo 
The ticker  RDN  was found in Yahoo 
The ticker  KEQU  was found in Yahoo 
The ticker  GM  was found in Yahoo 
The ticker  JACK  was found in Yahoo 
The ticker  HOV  was found in Yahoo 
The ticker  DAKT  was found in Yahoo 
The ticker  RLI  was found in Yahoo 
The ticker  SR  was found in Yahoo 
The ticker  AMWD  was found in Yahoo 
The ticker  GBX  was found in Yahoo 
The ticker  JBL  was found in Yahoo 
The ticker  UAL  was found in Yahoo 
The ticker  AL  was found in Yahoo 
The ticker  MIND  was found in Yahoo 
The ticker  DLHC  was found in Yahoo 
The ticker  INN  was found in Yahoo 
The ticker  GPI  was found in Yahoo 
The ticker  PDCO  was found in Yahoo 
The ticker  CIVI  was found in Yahoo 
The ticker  NMIH  was found in Yahoo 
The ticker  CABO  was found in Yahoo 
The ticker  CZR  was found in Yahoo 
The ticker  C  was found in Yahoo 
The ticker  SJM  was found in Yahoo 
The ticker  PHM  was found in Yahoo 
The ticker  VLGEA  was found in Yahoo 
The ticker  TNL  was found in Yahoo 
The ticker  OPY  was found in Yahoo 
The ticker  FDP  was found in Yahoo 
The ticker  MCB  was found in Yahoo 
The ticker  MHO  was found in Yahoo 
The ticker  VGR  was found in Yahoo 
The ticker  TRN  was found in Yahoo 
The ticker  DLX  was found in Yahoo 
The ticker  MTG  was found in Yahoo 
The ticker  CALM  was found in Yahoo 
The ticker  SNCY  was found in Yahoo 
The ticker  MTN  was found in Yahoo 
The ticker  NAVI  was found in Yahoo 
The ticker  COOP  was found in Yahoo 
The ticker  CFFI  was found in Yahoo

I check whether I ended up with a price dataset of 100 columns, one for each ticker:

ncol(prices)

[1] 100

I create a dataset for returns of these 100 stocks:

returns = diff(log(prices))
# finaltickerlist has the tickers that were found in Yahoo Finance:
colnames(returns) = finaltickerlist

Now I get the price data of the market index and calculate its monthly returns:

getSymbols(Symbol="^GSPC", from="2020-01-01", to="2024-10-31",periodicity="monthly")

[1] "GSPC"

sp500ret = diff(log(Ad(GSPC)))

I now run a loop to estimate all market regression models and store them in a data frame:

# I initialize an empty vector/matrix betas: 
betas =c()
for (i in 1:ncol(returns)) {
  # I run the market regression model for stock i:
  model <- lm(returns[,i] ~ sp500ret$GSPC)
  #print(summary(model))
  # I store the summary of the model in a temporal object:
  smodel = summary(model)
  # From the coefficients matrix I extract the coefficients along with their standard errors and pvalues:
  b0 = smodel$coefficients[1,1]
  b1 = smodel$coefficients[2,1]
  seb0 = smodel$coefficients[1,2]
  seb1 =smodel$coefficients[2,2]
  pb0 = smodel$coefficients[1,4]
  pb1 = smodel$coefficients[2,4]
  # I save the number of valid observations (months) used in the regression:
  N = model$df.residual + 2 
  # I save N to later drop those tickers with very few history available since their beta estimations will not be reliable
  # I put the values together in a temporal vector:
  vectorbetas = c(b0,b1,seb0,seb1,pb0,pb1, N)
  # I bind this vector with the betas cumulative matrix:
  betas = rbind(betas,vectorbetas) 
}
# Finally I rename columns and rows of the betas matrix:
# I use the finaltickerlist since it has the tickers that where FOUND IN YAHOO FINANCE:
rownames(betas) = finaltickerlist[1:ncol(returns)]
colnames(betas) =c("b0","b1","seb0","seb1","pb0","pb1","N")
# I create a data frame from the beta matrix:
betasdf = data.frame(betas)

betasdf

	b0	b1	seb0	seb1	pb0	pb1	N
AJG	0.0113380	0.7393096	0.0071630	0.1330144	0.1191870	0.0000008	57
APA	-0.0345240	3.3370945	0.0360089	0.6686753	0.3418732	0.0000064	57
LNC	-0.0223796	1.8342301	0.0159093	0.2954323	0.1651429	0.0000001	57
TIPT	0.0087913	1.1767830	0.0165935	0.3081359	0.5983811	0.0003425	57
UVE	-0.0089182	0.9499332	0.0154648	0.2871777	0.5665114	0.0016617	57
GNW	-0.0001226	0.8838090	0.0169328	0.3144380	0.9942496	0.0068344	57
CYH	-0.0185765	1.7556430	0.0313874	0.5828554	0.5563784	0.0039148	57
PRU	-0.0035397	1.2985852	0.0081189	0.1507666	0.6645560	0.0000000	57
MCY	0.0006371	0.8561270	0.0099851	0.1854206	0.9493542	0.0000238	57
RGA	-0.0006142	0.9260017	0.0115070	0.2136816	0.9576226	0.0000629	57
ALL	0.0052743	0.4791604	0.0087641	0.1627479	0.5497784	0.0047364	57
THG	-0.0041453	0.7352850	0.0080893	0.1502162	0.6103921	0.0000090	57
MET	0.0005412	1.0361110	0.0099504	0.1847764	0.9568183	0.0000007	57
UNM	0.0107548	0.8126791	0.0128720	0.2390300	0.4070415	0.0012612	57
ATUS	-0.0584363	1.5950116	0.0214332	0.3980079	0.0085722	0.0001862	57
VOYA	-0.0038743	1.0277517	0.0078811	0.1463499	0.6249618	0.0000000	57
CNA	0.0007392	0.6721973	0.0073793	0.1370322	0.9205692	0.0000087	57
FAF	-0.0103280	1.3776770	0.0067747	0.1258053	0.1331174	0.0000000	57
CRBG	0.0114968	0.6665792	0.0187263	0.4615248	0.5455533	0.1627472	24
L	-0.0005005	0.8390366	0.0067960	0.1261994	0.9415607	0.0000000	57
ORI	0.0042975	0.9366377	0.0082018	0.1523056	0.6024027	0.0000001	57
TRV	0.0063405	0.6503174	0.0074961	0.1392006	0.4013040	0.0000197	57
PFG	-0.0010812	1.2059120	0.0091295	0.1695326	0.9061577	0.0000000	57
BFH	-0.0287154	2.1501943	0.0216137	0.4013606	0.1894753	0.0000017	57
HIG	0.0032758	0.9601577	0.0093990	0.1745376	0.7287737	0.0000010	57
EQH	-0.0006974	1.4085885	0.0098480	0.1828744	0.9438045	0.0000000	57
SIGI	0.0009735	0.5749528	0.0084151	0.1562667	0.9083283	0.0005334	57
EIG	0.0033086	0.2181556	0.0106267	0.1973353	0.7567107	0.2737523	57
UNIT	-0.0115029	1.4583343	0.0230401	0.4278492	0.6195911	0.0012288	57
MKL	-0.0029650	0.7759603	0.0075975	0.1410843	0.6978552	0.0000010	57
WRB	0.0048429	0.6560227	0.0082789	0.1537367	0.5609585	0.0000787	57
VNCE	-0.0558057	1.8868295	0.0346208	0.6428990	0.1127056	0.0048607	57
SYF	-0.0054996	1.7062420	0.0127636	0.2370161	0.6682390	0.0000000	57
THC	0.0064087	2.1458508	0.0155155	0.2881181	0.6811760	0.0000000	57
CINF	0.0003360	0.7061352	0.0105423	0.1957681	0.9746898	0.0006687	57
DTIL	-0.0716242	1.2937344	0.0287758	0.5343592	0.0158637	0.0188011	57
ILPT	-0.0476199	1.7844570	0.0226104	0.4198693	0.0397732	0.0000833	57
WLFC	0.0080878	1.1833150	0.0222588	0.4133396	0.7177353	0.0059303	57
AIG	-0.0025594	1.2047644	0.0124726	0.2316120	0.8381706	0.0000030	57
VGZ	-0.0169125	1.6111618	0.0187612	0.3483901	0.3712742	0.0000232	57
RM	-0.0133970	1.6511920	0.0163004	0.3026945	0.4146951	0.0000012	57
BIPC	-0.0060592	1.3036751	0.0118454	0.2457618	0.6111498	0.0000024	54
OMF	-0.0023353	1.5450776	0.0139289	0.2586556	0.8674655	0.0000002	57
TWO	-0.0439982	2.4489830	0.0205205	0.3810600	0.0364597	0.0000000	57
AFL	0.0048980	0.9596290	0.0078087	0.1450049	0.5330902	0.0000000	57
LMND	-0.0343659	1.6666532	0.0302652	0.6291802	0.2618073	0.0108976	50
OI	-0.0143482	1.2178123	0.0149605	0.2778133	0.3417204	0.0000531	57
AAL	-0.0272285	1.5097844	0.0155172	0.2881509	0.0848761	0.0000026	57
AMSF	-0.0010850	0.3538947	0.0100081	0.1858480	0.9140621	0.0621182	57
SLM	0.0015965	1.2379177	0.0121828	0.2262315	0.8962208	0.0000011	57
COF	-0.0049734	1.5078013	0.0122869	0.2281643	0.6872175	0.0000000	57
ENVA	0.0075924	1.4216987	0.0150296	0.2790961	0.6154638	0.0000044	57
SABR	-0.0531091	2.0126269	0.0260138	0.4830690	0.0460022	0.0001102	57
EZPW	0.0009183	0.9842858	0.0141193	0.2621924	0.9483806	0.0004213	57
CHTR	-0.0188152	1.0789512	0.0120383	0.2235479	0.1238023	0.0000115	57
DIT	0.0048551	0.4953523	0.0161000	0.2989720	0.7641270	0.1032439	57
DFS	-0.0010414	1.4975458	0.0131692	0.2445494	0.9372593	0.0000001	57
HZO	-0.0125688	1.9216400	0.0165536	0.3073959	0.4509275	0.0000001	57
GMS	0.0042707	1.7003253	0.0112299	0.2085371	0.7051886	0.0000000	57
RDN	-0.0024941	1.1346986	0.0112744	0.2093626	0.8257416	0.0000014	57
KEQU	0.0088525	0.7396444	0.0179389	0.3331211	0.6236399	0.0305344	57
GM	-0.0068359	1.4796292	0.0122657	0.2277702	0.5795693	0.0000000	57
JACK	-0.0270817	1.9923884	0.0161809	0.3004750	0.0998722	0.0000000	57
HOV	0.0050693	2.8768282	0.0299148	0.5555090	0.8660591	0.0000033	57
DAKT	0.0027270	1.1335075	0.0165543	0.3074088	0.8697611	0.0005202	57
RLI	0.0079262	0.3608798	0.0094364	0.1752318	0.4045696	0.0441982	57
SR	-0.0068911	0.5478912	0.0073282	0.1360831	0.3511486	0.0001751	57
AMWD	-0.0210336	1.7696686	0.0142693	0.2649776	0.1461706	0.0000000	57
GBX	0.0042405	1.4256692	0.0156000	0.2896870	0.7867707	0.0000082	57
JBL	0.0081466	1.2517571	0.0098525	0.1829589	0.4118946	0.0000000	57
UAL	-0.0147082	1.5492610	0.0183997	0.3416773	0.4275133	0.0000317	57
AL	-0.0144037	1.6639299	0.0120150	0.2231146	0.2357412	0.0000000	57
MIND	-0.0944896	1.7749395	0.0479676	0.8907463	0.0538982	0.0512723	57
DLHC	-0.0006415	1.1622485	0.0181187	0.3364592	0.9718843	0.0010695	57
INN	-0.0299584	2.1157988	0.0155649	0.2890368	0.0594413	0.0000000	57
GPI	0.0083219	1.4810674	0.0143332	0.2661637	0.5638814	0.0000008	57
PDCO	-0.0082330	1.0626405	0.0117684	0.2185363	0.4871346	0.0000101	57
CIVI	0.0081122	1.4536967	0.0187859	0.3488485	0.6675553	0.0001099	57
NMIH	-0.0099721	1.3334456	0.0147531	0.2739616	0.5019157	0.0000099	57
CABO	-0.0356232	0.8512585	0.0120768	0.2242626	0.0046638	0.0003689	57
CZR	-0.0397440	3.2698129	0.0243790	0.4527115	0.1087619	0.0000000	57
C	-0.0140461	1.4632023	0.0104668	0.1943659	0.1851203	0.0000000	57
SJM	0.0019157	0.2254668	0.0077158	0.1432808	0.8048457	0.1213152	57
PHM	0.0020400	1.7541107	0.0124673	0.2315148	0.8706258	0.0000000	57
VLGEA	0.0048138	0.2857615	0.0090340	0.1677588	0.5962813	0.0941369	57
TNL	-0.0147136	1.7793744	0.0129833	0.2410970	0.2620176	0.0000000	57
OPY	0.0034775	1.1347066	0.0107546	0.1997094	0.7476556	0.0000005	57
FDP	-0.0014145	0.3716580	0.0119085	0.2211380	0.9058802	0.0985002	57
MCB	-0.0102722	1.1753253	0.0186966	0.3471917	0.5849451	0.0013182	57
MHO	-0.0024442	2.3977187	0.0165943	0.3081516	0.8834399	0.0000000	57
VGR	0.0040314	1.0809234	0.0111707	0.2074369	0.7195642	0.0000029	57
TRN	-0.0011655	1.3427481	0.0109939	0.2041545	0.9159580	0.0000000	57
DLX	-0.0274098	1.4926078	0.0126830	0.2355206	0.0350550	0.0000000	57
MTG	-0.0016569	1.4118011	0.0116711	0.2167298	0.8876271	0.0000000	57
CALM	0.0198805	-0.1541520	0.0108815	0.2020658	0.0731274	0.4487946	57
SNCY	-0.0363092	1.4843406	0.0200047	0.4175285	0.0770229	0.0009883	42
MTN	-0.0161188	1.1945487	0.0090680	0.1683895	0.0810054	0.0000000	57
NAVI	-0.0115599	1.5118823	0.0109989	0.2042466	0.2978506	0.0000000	57
COOP	0.0197037	1.4805047	0.0141093	0.2620058	0.1681725	0.0000006	57
CFFI	0.0031484	0.3621739	0.0115736	0.2149178	0.7866149	0.0976215	57

Based on the data of each company select a set of 10 companies based on any of these criteria:

CRITERIA A: Stocks that are SIGNIFICANTLY offering returns over the market

CRITERIA B: Stocks that are SIGNIFICANTLY less risky than the market according to the market model regressions.

Selecting the stocks of CRITERIA A:

stocks_over_market <- betasdf |>
  filter(b0>0,pb0<=0.05) |>
  arrange(desc(b0))
stocks_over_market

[1] b0   b1   seb0 seb1 pb0  pb1  N   
<0 rows> (or 0-length row.names)

nrow(stocks_over_market)

[1] 0

There is only 0 stock(s) that have a positive and significant beta0.

Filtering the stocks according to CRITERIA B:

# I first add the 95% minimum and the maximum values of the C.I. for beta1:
betasdf <- betasdf |> 
  mutate(
    minb1= b1 - 2*stderr_b1,
    maxb1= b1 + 2*stderr_b1)
# I filter those that might be significantly less risky than the market:
less_risky_market <- betasdf |> 
  filter(maxb1<1) |>
  arrange(b1)
less_risky_market

             b0        b1       seb0      seb1        pb0       pb1  N
CALM 0.01988046 -0.154152 0.01088146 0.2020658 0.07312736 0.4487946 57
         minb1    maxb1
CALM -1.007797 0.699493

nrow(less_risky_market)

[1] 1

There is only 1 stock(s) that are significantly less risky than the market.

Applying these 2 criteria I only got 2 stocks. If I want to get the best 10 stocks with these criteria, I can relax both criteria and do a filter by steps. Let’s do the following:

I will first use beta0 information to select the best 20 stocks. I can sort all stocks by the minimum of the 95% CI of beta0 (by the maximum to the minimum value):

# I first add the 95% minimum and the maximum values of the C.I. for beta1:
betasdf <- betasdf |> 
  mutate(
    minb0= b0 - 2*stderr_b0,
    maxb0= b0 + 2*stderr_b0)

# I do the first filter selecting 20 stocks based on the min of b0. 
# I will add the restriction that the estimation of the market regression model had more than 30 months, to ensure that the beta estimations are more stable:
filter1 <- betasdf |>
  filter(N>30) |> 
  arrange(desc(minb0)) |>
  head(20)

filter1

	b0	b1	seb0	seb1	pb0	pb1	N	minb1	maxb1	minb0	maxb0
CALM	0.0198805	-0.1541520	0.0108815	0.2020658	0.0731274	0.4487946	57	-1.0077970	0.699493	-0.0278422	0.0676031
COOP	0.0197037	1.4805047	0.0141093	0.2620058	0.1681725	0.0000006	57	0.6268597	2.334150	-0.0280189	0.0674264
AJG	0.0113380	0.7393096	0.0071630	0.1330144	0.1191870	0.0000008	57	-0.1143354	1.592955	-0.0363847	0.0590607
UNM	0.0107548	0.8126791	0.0128720	0.2390300	0.4070415	0.0012612	57	-0.0409659	1.666324	-0.0369679	0.0584775
KEQU	0.0088525	0.7396444	0.0179389	0.3331211	0.6236399	0.0305344	57	-0.1140005	1.593289	-0.0388702	0.0565751
TIPT	0.0087913	1.1767830	0.0165935	0.3081359	0.5983811	0.0003425	57	0.3231380	2.030428	-0.0389314	0.0565139
GPI	0.0083219	1.4810674	0.0143332	0.2661637	0.5638814	0.0000008	57	0.6274224	2.334712	-0.0394008	0.0560445
JBL	0.0081466	1.2517571	0.0098525	0.1829589	0.4118946	0.0000000	57	0.3981121	2.105402	-0.0395761	0.0558692
CIVI	0.0081122	1.4536967	0.0187859	0.3488485	0.6675553	0.0001099	57	0.6000517	2.307342	-0.0396104	0.0558349
WLFC	0.0080878	1.1833150	0.0222588	0.4133396	0.7177353	0.0059303	57	0.3296700	2.036960	-0.0396349	0.0558104
RLI	0.0079262	0.3608798	0.0094364	0.1752318	0.4045696	0.0441982	57	-0.4927652	1.214525	-0.0397965	0.0556488
ENVA	0.0075924	1.4216987	0.0150296	0.2790961	0.6154638	0.0000044	57	0.5680537	2.275344	-0.0401303	0.0553151
THC	0.0064087	2.1458508	0.0155155	0.2881181	0.6811760	0.0000000	57	1.2922058	2.999496	-0.0413140	0.0541313
TRV	0.0063405	0.6503174	0.0074961	0.1392006	0.4013040	0.0000197	57	-0.2033276	1.503962	-0.0413821	0.0540632
ALL	0.0052743	0.4791604	0.0087641	0.1627479	0.5497784	0.0047364	57	-0.3744846	1.332805	-0.0424484	0.0529970
HOV	0.0050693	2.8768282	0.0299148	0.5555090	0.8660591	0.0000033	57	2.0231832	3.730473	-0.0426534	0.0527919
AFL	0.0048980	0.9596290	0.0078087	0.1450049	0.5330902	0.0000000	57	0.1059840	1.813274	-0.0428247	0.0526207
DIT	0.0048551	0.4953523	0.0161000	0.2989720	0.7641270	0.1032439	57	-0.3582927	1.348997	-0.0428676	0.0525777
WRB	0.0048429	0.6560227	0.0082789	0.1537367	0.5609585	0.0000787	57	-0.1976223	1.509668	-0.0428797	0.0525656
VLGEA	0.0048138	0.2857615	0.0090340	0.1677588	0.5962813	0.0941369	57	-0.5678835	1.139406	-0.0429089	0.0525365

After this first screening using beta0 information, I now use the information of beta1 to further select only 10 stocks.

For beta1 I will consider conservative stocks that are not too risky compared to the market, assuming that I will create a portfolio in the sort-term:

filter2 <- filter1 |> 
  arrange(maxb1) |>
  head(10)

filter2

	b0	b1	seb0	seb1	pb0	pb1	N	minb1	maxb1	minb0	maxb0
CALM	0.0198805	-0.1541520	0.0108815	0.2020658	0.0731274	0.4487946	57	-1.0077970	0.699493	-0.0278422	0.0676031
VLGEA	0.0048138	0.2857615	0.0090340	0.1677588	0.5962813	0.0941369	57	-0.5678835	1.139406	-0.0429089	0.0525365
RLI	0.0079262	0.3608798	0.0094364	0.1752318	0.4045696	0.0441982	57	-0.4927652	1.214525	-0.0397965	0.0556488
ALL	0.0052743	0.4791604	0.0087641	0.1627479	0.5497784	0.0047364	57	-0.3744846	1.332805	-0.0424484	0.0529970
DIT	0.0048551	0.4953523	0.0161000	0.2989720	0.7641270	0.1032439	57	-0.3582927	1.348997	-0.0428676	0.0525777
TRV	0.0063405	0.6503174	0.0074961	0.1392006	0.4013040	0.0000197	57	-0.2033276	1.503962	-0.0413821	0.0540632
WRB	0.0048429	0.6560227	0.0082789	0.1537367	0.5609585	0.0000787	57	-0.1976223	1.509668	-0.0428797	0.0525656
AJG	0.0113380	0.7393096	0.0071630	0.1330144	0.1191870	0.0000008	57	-0.1143354	1.592955	-0.0363847	0.0590607
KEQU	0.0088525	0.7396444	0.0179389	0.3331211	0.6236399	0.0305344	57	-0.1140005	1.593289	-0.0388702	0.0565751
UNM	0.0107548	0.8126791	0.0128720	0.2390300	0.4070415	0.0012612	57	-0.0409659	1.666324	-0.0369679	0.0584775

These will be my selected stocks based on their beta0 and beta1 information. These stocks can be candidates to include them in a winning portfolio.

I can show the name of these firms and their industry:

selected_tickers = rownames(filter2)
selected_tickers = as.data.frame(selected_tickers)
colnames(selected_tickers) = c("firm")

selected_tickers = merge(selected_tickers,usfirms,by.x="firm", by.y="empresa")

selected_tickers

firm	Nombre	status	partind	naics1	naics2	SectorEconomatica
AJG	Arthur J. Gallagher & Co.	activo	0.12	Servicios financieros y de seguros	Servicios relacionados con los seguros y las fianzas	Finanzas y Seguros
ALL	Allstate Corp	activo	0.10	Servicios financieros y de seguros	Instituciones de seguros	Finanzas y Seguros
CALM	Cal-Maine Foods, Inc	activo	NA	Agricultura, ganadería, aprovechamiento forestal, pesca y caza	Producción de Animales y Acuicultura	Agro & Pesca
DIT	Amcon Distributing Co	activo	NA	Comercio al por mayor	Comercio al por mayor de productos de tienda de comestibles	Comercio
KEQU	Kewaunee Scientific Corp	activo	NA	Industrias manufactureras	Fabricación de maquinaria y equipo para las industrias manufactureras, excepto la metalmecánica	Maquinaria Indust
RLI	Rli Corp	activo	NA	Servicios financieros y de seguros	Instituciones de seguros	Finanzas y Seguros
TRV	Travelers Companies, Inc	activo	0.12	Servicios financieros y de seguros	Instituciones de seguros	Finanzas y Seguros
UNM	Unum Group	activo	NA	Servicios financieros y de seguros	Instituciones de seguros	Finanzas y Seguros
VLGEA	Village Super Market, Inc	activo	NA	Comercio al por menor	Almacenes de alimentos	Comercio
WRB	W. R. Berkley Corp	activo	0.04	Servicios financieros y de seguros	Instituciones de seguros	Finanzas y Seguros