Workshop 2 Solution, Financial Modeling and Programming

Author

Alberto Dorantes, Ph.D.

Published

November 10, 2025

Abstract

This is an INDIVIDUAL workshop. In this workshop we review the market regression model to calculate aplha and beta, and also we learn how to run the model many times using a loop. We use the results of alpha and beta to select stocks.

1 Review of the Market Regression model

The simple linear regression model is used to understand the linear relationship between two variables assuming that one variable, the independent variable (IV), can be used as a predictor of the other variable, the dependent variable (DV). In this part we illustrate a simple regression model with the Market Model.

The Market Model states that the expected return of a stock is given by its alpha coefficient (b0) plus its market beta coefficient (b1) multiplied times the market return. In mathematical terms:

E[R_i] = α + β(R_M)

We can express the same equation using B0 as alpha, and B1 as market beta:

E[R_i] = β_0 + β_1(R_M)

We can estimate the alpha and market beta coefficient by running a simple linear regression model specifying that the market return is the independent variable and the stock return is the dependent variable. It is strongly recommended to use continuously compounded returns instead of simple returns to estimate the market regression model. The market regression model can be expressed as:

r_{(i,t)} = b_0 + b_1*r_{(M,t)} + ε_t

Where:

ε_t is the error at time t. Thanks to the Central Limit Theorem, this error behaves like a Normal distributed random variable ∼ N(0, σ_ε); the error term ε_t is expected to have mean=0 and a specific standard deviation σ_ε (also called volatility).

r_{(i,t)} is the return of the stock i at time t.

r_{(M,t)} is the market return at time t.

b_0 and b_1 are called regression coefficients.

In the next sections we get real data and run a Market Model for Tesla

1.1 Data collection

We first load the yfinance package and download monthly price data for Tesla and the S&P500 Index:

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib
import matplotlib.pyplot as plt

# Download a dataset with prices for Alfa and the Mexican IPyC:
data = yf.download("TSLA, ^GSPC", start="2020-01-01", end="2025-09-30", interval='1mo')

YF.download() has changed argument auto_adjust default to True

[                       0%                       ]
[*********************100%***********************]  2 of 2 completed


# I create another dataset with the Adjusted Closing price of both instruments:
adjprices = data['Close']

1.2 Return calculation

We calculate continuously returns for both, Alfa and the IPCyC. We use the diff function to get the monthly difference of the log of prices, which is the % change of the price (in continuous compounding)

returns = np.log(adjprices).diff(1).dropna()
returns.columns

Index(['TSLA', '^GSPC'], dtype='object', name='Ticker')

# I change the name of the columns to avoid special characters like ^MXX
returns.columns=['TSLA','SP500']
returns.columns

Index(['TSLA', 'SP500'], dtype='object')

1.3 Visualize the relationship

We do a scatter plot putting the SP500 returns as the independent variable (X) and the stock return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns.Type:

import seaborn as sb
#plt.clf()
x = returns['SP500']
y = returns['TSLA']
# I plot the (x,y) values along with the regression line that fits the data:
sb.regplot(x=x,y=y)
plt.xlabel('SP500 returns')
plt.ylabel('TSLA returns') 
plt.show()

Sometimes graphs can be deceiving. In this case, the range of X axis and Y axis are different, so it is better to do a graph where we can make both X and Y ranges with equal distance. Type:

plt.clf()

sb.regplot(x=x,y=y)
# I adjust the scale of the X axis so that the magnitude of each unit of X is similar to that of the Y axis 
plt.xticks(np.arange(-1,1,0.20))

([<matplotlib.axis.XTick object at 0x0000011EAC833250>, <matplotlib.axis.XTick object at 0x0000011EAC832AD0>, <matplotlib.axis.XTick object at 0x0000011EAC8016D0>, <matplotlib.axis.XTick object at 0x0000011EAC87E850>, <matplotlib.axis.XTick object at 0x0000011EAC87EFD0>, <matplotlib.axis.XTick object at 0x0000011EAC87F750>, <matplotlib.axis.XTick object at 0x0000011EAC87FED0>, <matplotlib.axis.XTick object at 0x0000011EAC8A0690>, <matplotlib.axis.XTick object at 0x0000011EAC8A0E10>, <matplotlib.axis.XTick object at 0x0000011EAC8A1590>], [Text(-1.0, 0, '−1.0'), Text(-0.8, 0, '−0.8'), Text(-0.6000000000000001, 0, '−0.6'), Text(-0.40000000000000013, 0, '−0.4'), Text(-0.20000000000000018, 0, '−0.2'), Text(-2.220446049250313e-16, 0, '0.0'), Text(0.19999999999999973, 0, '0.2'), Text(0.3999999999999997, 0, '0.4'), Text(0.5999999999999996, 0, '0.6'), Text(0.7999999999999996, 0, '0.8')])

# I label the axis:
plt.xlabel('SP500 returns')

plt.ylabel('TSLA returns') 
plt.show()

We can see that the slope of the line that represents the points is much more steep compared to the previous plot. This means that the returns have a strong positive relationship with the market returns, but we see that the SENSIBILITY of the stock return is high since the slope of the line is very steep compared with the previous plot.

1.4 Running the Regression Model

The OLS function from the statsmodel package is used to estimate a regression model. We run a simple regression model to see how the monthly returns of the stock are related with the market return.

The first parameter of the OLS function is the DEPENDENT VARIABLE (in this case, the stock return), and the second parameter must be the INDEPENDENT VARIABLE, also named the EXPLANATORY VARIABLE (in this case, the market return).

Before we run the OLS function, we need to add a column of 1’s to the X vector in order to estimate the beta0 coefficient (the constant).

What you will get is called The Single-Index Model. You are trying to examine how the market returns can explain stock returns.

Run the Single-Index model (Y=stock return, the X=market return). You can use the function OLS from the statsmodels.api library:

import statsmodels.formula.api as smf

# I estimate the OLS regression model:
mkmodel = smf.ols('TSLA ~ SP500',data=returns).fit()
# The Dependent variable Y is the first one in the formula, and the second is the IV: Y ~ X

# I display the summary of the regression: 
print(mkmodel.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   TSLA   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.306
Method:                 Least Squares   F-statistic:                     30.53
Date:              jue., 27 nov. 2025   Prob (F-statistic):           6.01e-07
Time:                        14:06:48   Log-Likelihood:                 28.718
No. Observations:                  68   AIC:                            -53.44
Df Residuals:                      66   BIC:                            -49.00
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0113      0.020      0.568      0.572      -0.029       0.051
SP500          2.1348      0.386      5.525      0.000       1.363       2.906
==============================================================================
Omnibus:                        1.000   Durbin-Watson:                   1.708
Prob(Omnibus):                  0.607   Jarque-Bera (JB):                1.085
Skew:                          -0.246   Prob(JB):                        0.581
Kurtosis:                       2.623   Cond. No.                         19.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The regression output shows a lot of information about the relationship between the Market return (X) and the stock return (Y).

Looking at the table of beta coefficients, the first row (Intercept) shows the information of the beta0 coefficient, which is the intercept of the regression equation, also known as constant.

The second row (SP500) shows the information of the beta1 coefficient, which represents the slope of the regression line. In this example, since the X variable is the market return and the Y variable is the stock return, beta1 can be interpreted as the sensitivity or market risk of the stock.

For each beta coefficient, the following is calculated and shown:

coef : this is the average value of the beta coefficient
std err : this is the standard error of the coefficient, which is the standard deviation of the beta coefficient.
t : this is the t-Statistics of the following Hypothesis test:

H0: beta = 0;

Ha: beta <> 0;
P>|t| : is the p-value of the above hypothesis test; if it is a value < 0.05, we can say that the beta coefficient is SIGNIFICANTLY different than ZERO with a 95% confidence.
[0.025 0.975] : This is the 95% Confidence Interval of the beta coefficient. This shows the possible values that the beta coefficient can take in the future with 95% probability.

How the t-Statistic, p-value and the 95% C.I. are related?

Below is a way to extract the important values of the regression output.

After running/fitting the regression model, Python generated the object mkmodel that stores detailed information of the model. This objet has attributes and functions that we can use to extract important outputs for our analysis.

Important attributes of this mkmodel object are:

params: It is a vector that contains the beta coefficients
bse: It is a vector that contains the standard errors of the beta coefficients
pvalues: It is a vector with the p-values for each coefficient
conf_int(): it is a vector of vectors, which contains the min and max of the 95% confidence interval for each coefficient

We can get these values as follows:

# I get the beta coefficients:
b0=mkmodel.params[0]
b1=mkmodel.params[1]
# I get the standard errors: 
seb0 = mkmodel.bse[0]
seb1 = mkmodel.bse[1]
# I calculate the t-values of each beta by dividing the coefficient by its standard error
tb0 = b0 / seb0 
tb1 = b1 / seb1 
# I get the p-values of each coefficient: 
pvalueb0 = mkmodel.pvalues[0]
pvalueb1 = mkmodel.pvalues[1]

# I can calculate the confidence level of beta0 according to its pvalue
confb0 = 100 * (1-pvalueb0)
# I can calculate the confidence level of beta1 according to its pvalue
confb1 = 100 * (1-pvalueb1)

# I can construct the 95% Confidence Interval for each coefficient: 
# I get the min and max of beta0: 
minb0 = mkmodel.conf_int()[0][0]
maxb0 = mkmodel.conf_int()[1][0]
# I get the min and max of beta1:  
minb1 = mkmodel.conf_int()[0][1]
maxb1 = mkmodel.conf_int()[1][1]

THE REGRESSION EQUATION.

ACCORDING TO THE REGRESSION OUTPUT, THE REGRESSION EQUATION THAT EXPLAINS THE EXPECTED RETURN OF TESLA BASED ON THE SP500 RETURN IS:

E[TSLAret]= beta0 + beta1(SP500ret)

E[TSLAAret]= 0.0113 + 2.1348(SP500ret)

2 Automating the estimation of many market models

2.1 Review of how a Loop works and how to store values in a Matrix

An analyst made an analysis and tells you that financial leverage has a negative quadratic effect on the expected financial performance. This is a typical inverted U-shaped relationship. In other words, financial leverage has a positive effect on performance up to a certain level of leverage; after this level the relationship becomes negative: the more financial leverage, the less the firm performance. The equation is the following:

Performance=20-(leverage-4)^{2} Leverage is measure from level 0 (no leverage) up to the maximum leverage =10. Calculate the expected firm performance for leverage levels 0 to 10 and store the results in a matrix:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Initialize empty list (instead of matrix)
matrix_results = []

# Loop from 0 to 10 (inclusive) by 1
for i in range(0, 11):
    # Calculate performance
    performance = 20 - (i - 4)**2
    # Append the result as a row (list)
    matrix_results.append([i, performance])

# Convert to DataFrame
matrix_results = pd.DataFrame(matrix_results, columns=["leverage", "performance"])

# Display the DataFrame
print(matrix_results)

    leverage  performance
0          0            4
1          1           11
2          2           16
3          3           19
4          4           20
5          5           19
6          6           16
7          7           11
8          8            4
9          9           -5
10        10          -16

# Plot
plt.plot(matrix_results["leverage"], matrix_results["performance"])
plt.xlabel("Leverage")
plt.ylabel("Performance")
plt.title("Performance vs Leverage")
plt.show()

3 CHALLENGE 1

Write a detailed Python Pseudo-Code, and its Python code to automatically run the Market Regression Model for a list of 20 US active stocks (You can check for active stocks in Yahoo Finance or in the usfirms dataset we used in Workshop 1). Think how you can implement it with a loop.

For each stock you have to come up with a way to store the the beta coefficients along with their standard errors, tvalues, and 95% confidence interval.

4 CHALLENGE 1 SOLUTION

I first write the STEPS of the algorithm:

General approach: For each stock I need to run a market regression model, and then extract the beta0 and beta1 coefficients along with their corresponding standard errors and pvalues.

Define a tickerlist as a vector of 20 tickers; add the ticker “^GSPC” as the first element of the vector

Define tickersfound as an empty vector

Start with an empty dataset called prices

Write a loop to bring close prices for each stock:

for each ticker in tickerlist:

Using yf.download, download close prices of the ticker and name it as price
If the ticker exist in Yahoo Finance and if the ticker is not ^GSPC:
- Add ticker to the tickersfound vector/list
- If it is the first ticker, assign prices = price
- else
  - Merge price with prices matching by the date index
Else # if the ticker does not exist:
- Print a message: The ticker ___ does not exist

At this point I will have prices as a time-series dataset with all close prices that existed in Yahoo Finance.

The tickersfound will be a vector with the tickers that were found in Yahoo Finance

Using prices, create returns as the log returns of prices

Initialize betamatrix as an empty matrix with 8 columns (where the beta data will be stored for each stock). Name the columns as b0, b1, stderrb0, stderrb1, tvalueb0, tvalueb1, pvalueb0, pvalueb1

Then, write a for loop that iterate for each ticker of tickersfound, and with each iteration i do the following:

Using the prices dataframe, run the a regression model where the Y variable is the ‘^GSPC’ column, and the X is the ticker column
From the regression summary extract extract: intercept, beta coefficient, standard error of intercept, standard error of beta, tvalue of intercept, tvalue of beta, pvalue of intercept, pvalue of beta.
Save these values in a temporal vector
Append this vector into the betamatrix

Finally, I identify whether any of these stocks are offering returns over the market by selecting the stocks with beta0 that are positive and significant.

Python implementation:

I set the tickerlist:

tickerlist = ['^GSPC','MSFT', 'TSLA', 'UNH', 'ORCL', 'NVDA', 'WMT', 'COST', 'SPGI', 'AMD', 'AAPL', 'JPM', 'TGT', 'AMZN', 'CAT', 'ADBE', 'NKE', 'PFE', 'BA', 'XOM', 'KO', 'MCD']
print(f"'tickerlist' initialized with {len(tickerlist)} tickers.")

'tickerlist' initialized with 22 tickers.

Define tickersfound as an empty vector

Start with an empty dataset called prices

Write a loop to bring close prices for each stock:

for each ticker in tickerlist:

Using yf.download, download close prices of the ticker and name it as price
If the ticker exist in Yahoo Finance and if the ticker is not ^GSPC:
- Add ticker to the tickersfound vector/list
- If it is the first ticker, assign prices = price
- else
  - Merge price with prices matching by the date index
Else # if the ticker does not exist:
- Print a message: The ticker ___ does not exist

import yfinance as yf
import pandas as pd

# Initialize an empty DataFrame to store all prices
prices = pd.DataFrame()
tickersfound = [] # To keep track of successfully downloaded tickers
marketticker = "^GSPC"
start_date = "2020-01-01"
end_date = "2024-12-31"

for ticker in tickerlist:
    try:
        # Download monthly data, ensuring auto_adjust for accurate close prices
        # yf.download returns a DataFrame, we need the 'Close' column which is a Series
        price_series = yf.download(ticker, start=start_date, end=end_date, interval='1mo', auto_adjust=True)['Close']

        if not price_series.empty:
            if ticker!=marketticker: 
                tickersfound.append(ticker)
            # Rename the Series to the ticker symbol for easy column identification
            if prices.empty:
                # If 'prices' is empty, this is the first ticker, so initialize 'prices' DataFrame
                price_series.name=ticker            
                prices = price_series 
            else:
                # Otherwise, merge the current ticker's prices with the existing 'prices' DataFrame
                # Use outer merge to keep all dates and fill missing values with NaN
                prices = pd.merge(prices, price_series, left_index=True, right_index=True, how='outer')
        else:
            print(f"The ticker {ticker} does not exist or has no 'Close' price data in the specified range.")

    except Exception as e:
        print(f"An error occurred while downloading data for {ticker}: {e}")


[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

[*********************100%***********************]  1 of 1 completed

# Display the first few rows of the final 'prices' DataFrame
if not prices.empty:
    print("\nSuccessfully downloaded and merged monthly close prices into 'prices' DataFrame. Here's a preview:")
    prices.head()
else:
    print("No stock data was successfully downloaded or merged.")


Successfully downloaded and merged monthly close prices into 'prices' DataFrame. Here's a preview:
Ticker            ^GSPC        MSFT  ...         KO         MCD
Date                                 ...                       
2020-01-01  3225.520020  161.630188  ...  48.923050  187.353989
2020-02-01  2954.219971  153.825455  ...  44.809826  170.016983
2020-03-01  2584.590088  150.151718  ...  37.069267  145.687973
2020-04-01  2912.429932  170.621307  ...  38.780270  165.256943
2020-05-01  3044.310059  174.467697  ...  39.447884  164.164383

[5 rows x 22 columns]

print("\nFinal 'tickersfound' list:")


Final 'tickersfound' list:

print(tickersfound)

['MSFT', 'TSLA', 'UNH', 'ORCL', 'NVDA', 'WMT', 'COST', 'SPGI', 'AMD', 'AAPL', 'JPM', 'TGT', 'AMZN', 'CAT', 'ADBE', 'NKE', 'PFE', 'BA', 'XOM', 'KO', 'MCD']

At this point I will have prices as a time-series dataset with all close prices that existed in Yahoo Finance.

The tickersfound will be a vector with the tickers that were found in Yahoo Finance

Using prices, create returns as the log returns of prices

import numpy as np
logprices = np.log(prices)
returns = logprices.diff().dropna()
returns.head()

Ticker         ^GSPC      MSFT      TSLA  ...       XOM        KO       MCD
Date                                      ...                              
2020-02-01 -0.087860 -0.049492  0.026424  ... -0.188652 -0.087821 -0.097101
2020-03-01 -0.133668 -0.024172 -0.242781  ... -0.289365 -0.189639 -0.154431
2020-04-01  0.119421  0.127800  0.400210  ...  0.202010  0.045123  0.126034
2020-05-01  0.044287  0.022293  0.065730  ... -0.021754  0.017069 -0.006633
2020-06-01  0.018221  0.107645  0.257109  ...  0.002572 -0.043790 -0.003334

[5 rows x 22 columns]

Initialize betamatrix as an empty matrix with 8 columns (where the beta data will be stored for each stock). Name the columns as b0, b1, stderrb0, stderrb1, tvalueb0, tvalueb1, pvalueb0, pvalueb1

Then, write a for loop that iterate for each ticker of tickersfound, and with each iteration i do the following:

Using the prices dataframe, run the a regression model where the Y variable is the ‘^GSPC’ column, and the X is the ticker column
From the regression summary extract extract: intercept, beta coefficient, standard error of intercept, standard error of beta, tvalue of intercept, tvalue of beta, pvalue of intercept, pvalue of beta.
Save these values in a temporal vector
Append this vector into the betamatrix

import statsmodels.api as sm
import pandas as pd

# Create an empty Pandas DataFrame for betamatrix with specified columns
betamatrix = pd.DataFrame(columns=['b0', 'b1', 'stderrb0', 'stderrb1', 'tvalueb0', 'tvalueb1', 'pvalueb0', 'pvalueb1', 'minb0','maxb0','minb1','maxb1'])

print("Libraries 'statsmodels.api' and 'numpy' imported. 'betamatrix' DataFrame initialized.")

Libraries 'statsmodels.api' and 'numpy' imported. 'betamatrix' DataFrame initialized.

for ticker in tickersfound:
    # Create a temporary DataFrame with the market and current ticker prices
    df_reg = returns[[marketticker, ticker]].dropna()

    # Define the dependent variable (Y) as the individual stock returns
    Y = df_reg[ticker]

    # Define the independent variable (X) as the market returns
    X = df_reg[marketticker]

    # Add a constant to the independent variable for the intercept
    X = sm.add_constant(X)

    # Create and fit the OLS model
    model = sm.OLS(Y, X).fit()

    # Extract parameters using .iloc to avoid FutureWarning
    b0 = model.params.iloc[0]
    b1 = model.params.iloc[1]

    # Extract standard errors using .iloc to avoid FutureWarning
    stderrb0 = model.bse.iloc[0]
    stderrb1 = model.bse.iloc[1]

    # Extract t-values using .iloc to avoid FutureWarning
    tvalueb0 = model.tvalues.iloc[0]
    tvalueb1 = model.tvalues.iloc[1]

    # Extract p-values using .iloc to avoid FutureWarning
    pvalueb0 = model.pvalues.iloc[0]
    pvalueb1 = model.pvalues.iloc[1]

    # I construct the 95% Confidence Interval for each coefficient: 
    # I get the min and max of beta0: 
    minb0 = model.conf_int().iloc[0][0]
    maxb0 = model.conf_int().iloc[0][1]
    # I get the min and max of beta1:  
    minb1 = model.conf_int().iloc[1][0]
    maxb1 = model.conf_int().iloc[1][1]

    # Append the results to the betamatrix DataFrame
    betamatrix.loc[ticker] = [b0, b1, stderrb0, stderrb1, tvalueb0, tvalueb1, pvalueb0, pvalueb1, minb0, maxb0, minb1, maxb1]

# Display the betamatrix
print("Beta matrix calculated:")

Beta matrix calculated:

betamatrix.head()

            b0        b1  stderrb0  ...     maxb0     minb1     maxb1
MSFT  0.007135  0.882545  0.005702  ...  0.018553  0.669231  1.095858
TSLA  0.016396  2.103818  0.021589  ...  0.059628  1.296147  2.911489
UNH   0.005613  0.594790  0.007218  ...  0.020066  0.324769  0.864811
ORCL  0.010318  1.039361  0.008441  ...  0.027220  0.723589  1.355134
NVDA  0.036180  1.652489  0.014778  ...  0.065773  1.099635  2.205344

[5 rows x 12 columns]

5 CHALLENGE 2

Interpret with your words ONE of the 20 regression models

I selected Tesla:

ticker='TSLA'
df_reg = returns[[marketticker, ticker]].dropna()
# Define the dependent variable (Y) as the individual stock returns
Y = df_reg[ticker]
# Define the independent variable (X) as the market returns
X = df_reg[marketticker]
# Add a constant to the independent variable for the intercept
X = sm.add_constant(X)
# Create and fit the OLS model
model = sm.OLS(Y, X).fit()
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   TSLA   R-squared:                       0.323
Model:                            OLS   Adj. R-squared:                  0.311
Method:                 Least Squares   F-statistic:                     27.21
Date:              jue., 27 nov. 2025   Prob (F-statistic):           2.65e-06
Time:                        14:06:59   Log-Likelihood:                 24.397
No. Observations:                  59   AIC:                            -44.79
Df Residuals:                      57   BIC:                            -40.64
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0164      0.022      0.759      0.451      -0.027       0.060
^GSPC          2.1038      0.403      5.216      0.000       1.296       2.911
==============================================================================
Omnibus:                        0.803   Durbin-Watson:                   1.704
Prob(Omnibus):                  0.669   Jarque-Bera (JB):                0.896
Skew:                          -0.243   Prob(JB):                        0.639
Kurtosis:                       2.642   Cond. No.                         19.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

INTERPRETATION:

ABOUT BETA0:

BETA0=0.0164. SINCE BETA0 IS POSITIVE WE CAN SAY THAT, ON AVERAGE, TSLA IS OFFERING RETURNS OVER THE MARKET. HOWEVER, IT IS NOT SIGNIFICANTLY OFFERING RETURNS OVER THE MARKET AT THE 95% CONFIDENCE LEVEL SINCE ITS PVALUE IS GREATER THAN 0.05 (IT IS 0.451).

THEN, ON AVERAGE, TESLA IS OFFERING EXCESS RETURNS IN ABOUT 1.64% ABOVE THE MARKET. HOWEVER, THIS EXCESS RETURN IS NOT STATISTICALLY SIGNIFICANT.

WE CAN ALSO USE ITS 95% CONFIDENCE INTERVAL TO CONCLUDE THAT BETA0 NOT ALWAYS STATS POSITIVE SINCE IT CAN MOVE FROM -0.027 UP TO 0.060.

ABOUT BETA1

BETA1 IS 2.1, SO IT IS RISKIER THAN THE MARKET ON AVERAGE. TO KNOW WHETHER IT IS SIGNIFICANTLY RISKIER THAN THE MARKET WE NEED TO SEE THE 95% CONFIDENCE INTERVAL (CI) AND CHECK WHETHER THIS INTERVAL IS GREATER THAN 1. IF THE MINIMUM OF THIS 95% CI IS GREATER THAN 1, THEN WE CAN SAY THAT THE STOCK IS SIGNIFICANTLY RISKIER THAN THE MARKET.

THEN, THE MINIMUM OF BETA 1 IS 1.2 (BETA1 - 2*STDERR). SINCE THE MINIMUM OF BETA1 IS GREATER THAN 1, WE CAN SAY THAT TESLA IS SIGNIFICANTLY RISKIER THAN THE MARKET.

ABOUT R-SQUARED

THE R-SQUARED OF THE MODEL IS ABOUT 32%. THIS MEANS THAT THE 32% OF THE VARIABILITY OF TESLA RETURNS ARE EXPLAINED BY THE VARIABILITY OF THE MARKET RETURNS.

Write the Python code and run it to do the following:

Based on results you got for all market models, check how many firms satisfy each of the following conditions/criteria:
1. CRITERIA A: Stocks that are SIGNIFICANTLY offering returns over the market
2. CRITERIA B: Stocks that are SIGNIFICANTLY more risky than the market at the 95% confidence level

Show and explain your results

CRITERIA A. Stocks that are significantly offering positive returns over the market:

These stocks are those with beta0>0 and with its pvalue<0.05 (considering 95% confidence level):

# Examine evidence for alpha (significant positive intercept b0)
# Using a common significance level, e.g., p-value < 0.05
significant_alpha_threshold = 0.05
positive_alpha_stocks = betamatrix[(betamatrix['b0'] > 0) & (betamatrix['pvalueb0'] < significant_alpha_threshold)]
positive_alpha_stocks

           b0        b1  stderrb0  ...     maxb0     minb1     maxb1
NVDA  0.03618  1.652489  0.014778  ...  0.065773  1.099635  2.205344

[1 rows x 12 columns]

In this sample, only Nvidia stock is offering significant monthly returns over the market in about 3.6% greater than the market returns.

CRITERIA B. Stocks that are significantly riskier than the market.

In this case, we cannot use the p-value of beta1 since this pvalue is calculated to see whether beta1 is significantly greater than zero.

We can use the 95% confidence interval. We identify these significantly riskier stocks if its minimum value of the 95% C.I. is greater than 1:

risky_stocks = betamatrix[(betamatrix['minb1'] >= 1)]
risky_stocks

            b0        b1  stderrb0  ...     maxb0     minb1     maxb1
TSLA  0.016396  2.103818  0.021589  ...  0.059628  1.296147  2.911489
NVDA  0.036180  1.652489  0.014778  ...  0.065773  1.099635  2.205344
AMD  -0.000830  1.652776  0.015984  ...  0.031177  1.054807  2.250744
BA   -0.026441  1.629959  0.015556  ...  0.004710  1.047998  2.211920

[4 rows x 12 columns]

We see that TSLA, NVDA, AMD and BA are significantly riskier than the market since about 95% of the time its beta1 will be greater than 1.

Bellow is a code to classify stocks based on their average beta1’s and beta0:

print("\n--- Beta Value Analysis ---\n")


--- Beta Value Analysis ---

# 1. Rank stocks by beta values (b1)
ranked_by_beta = betamatrix.sort_values(by='b1', ascending=False)
print("Stocks ranked by Beta (b1, high to low):\n")

Stocks ranked by Beta (b1, high to low):

ranked_by_beta[['b1']].head(10)

            b1
TSLA  2.103818
AMD   1.652776
NVDA  1.652489
BA    1.629959
ADBE  1.325609
AAPL  1.182445
SPGI  1.172302
JPM   1.133878
AMZN  1.125345
TGT   1.093662

# 2. Categorize stocks based on beta1 values
defensive_stocks = ranked_by_beta[ranked_by_beta['b1'] < 0.75]
market_neutral_stocks = ranked_by_beta[(ranked_by_beta['b1'] >= 0.75) & (ranked_by_beta['b1'] <= 1.25)]
aggressive_stocks = ranked_by_beta[ranked_by_beta['b1'] > 1.25]

print("\nDefensive Stocks (Beta < 0.75):\n")


Defensive Stocks (Beta < 0.75):

defensive_stocks[['b1']]

           b1
MCD  0.729690
KO   0.642710
UNH  0.594790
PFE  0.586247
WMT  0.518398

print("\nMarket-Neutral Stocks (0.75 <= Beta <= 1.25):\n")


Market-Neutral Stocks (0.75 <= Beta <= 1.25):

market_neutral_stocks[['b1']]

            b1
AAPL  1.182445
SPGI  1.172302
JPM   1.133878
AMZN  1.125345
TGT   1.093662
CAT   1.093628
ORCL  1.039361
NKE   1.019652
MSFT  0.882545
XOM   0.880933
COST  0.814994

print("\nAggressive Stocks (Beta > 1.25):\n")


Aggressive Stocks (Beta > 1.25):

aggressive_stocks[['b1']]

            b1
TSLA  2.103818
AMD   1.652776
NVDA  1.652489
BA    1.629959
ADBE  1.325609

# 3. Examine evidence for alpha (significant positive intercept b0)
# Using a common significance level, e.g., p-value < 0.05
significant_alpha_threshold = 0.05
positive_alpha_stocks = betamatrix[(betamatrix['b0'] > 0) & (betamatrix['pvalueb0'] < significant_alpha_threshold)]

if not positive_alpha_stocks.empty:
    print(f"\nStocks with statistically significant positive Alpha (b0 > 0 and p-value < {significant_alpha_threshold}):\n")
    positive_alpha_stocks[['b0', 'pvalueb0']].sort_values(by='b0', ascending=False)
else:
    print("\nNo stocks found with statistically significant positive Alpha at the 0.05 level.")


Stocks with statistically significant positive Alpha (b0 > 0 and p-value < 0.05):

           b0  pvalueb0
NVDA  0.03618  0.017457