Workshop 2 Solution, Financial Modeling and Programming
Author
Alberto Dorantes, Ph.D.
Published
November 10, 2025
Abstract
This is an INDIVIDUAL workshop. In this workshop we review the market regression model to calculate aplha and beta, and also we learn how to run the model many times using a loop. We use the results of alpha and beta to select stocks.
1 Review of the Market Regression model
The simple linear regression model is used to understand the linear relationship between two variables assuming that one variable, the independent variable (IV), can be used as a predictor of the other variable, the dependent variable (DV). In this part we illustrate a simple regression model with the Market Model.
The Market Model states that the expected return of a stock is given by its alpha coefficient (b0) plus its market beta coefficient (b1) multiplied times the market return. In mathematical terms:
E[R_i] = α + β(R_M)
We can express the same equation using B0 as alpha, and B1 as market beta:
E[R_i] = β_0 + β_1(R_M)
We can estimate the alpha and market beta coefficient by running a simple linear regression model specifying that the market return is the independent variable and the stock return is the dependent variable. It is strongly recommended to use continuously compounded returns instead of simple returns to estimate the market regression model. The market regression model can be expressed as:
r_{(i,t)} = b_0 + b_1*r_{(M,t)} + ε_t
Where:
ε_t is the error at time t. Thanks to the Central Limit Theorem, this error behaves like a Normal distributed random variable ∼ N(0, σ_ε); the error term ε_t is expected to have mean=0 and a specific standard deviation σ_ε (also called volatility).
r_{(i,t)} is the return of the stock i at time t.
r_{(M,t)} is the market return at time t.
b_0 and b_1 are called regression coefficients.
In the next sections we get real data and run a Market Model for Tesla
1.1 Data collection
We first load the yfinance package and download monthly price data for Tesla and the S&P500 Index:
import numpy as npimport pandas as pdimport yfinance as yfimport matplotlibimport matplotlib.pyplot as plt# Download a dataset with prices for Alfa and the Mexican IPyC:data = yf.download("TSLA, ^GSPC", start="2020-01-01", end="2025-09-30", interval='1mo')
YF.download() has changed argument auto_adjust default to True
[ 0% ]
[*********************100%***********************] 2 of 2 completed
# I create another dataset with the Adjusted Closing price of both instruments:adjprices = data['Close']
1.2 Return calculation
We calculate continuously returns for both, Alfa and the IPCyC. We use the diff function to get the monthly difference of the log of prices, which is the % change of the price (in continuous compounding)
# I change the name of the columns to avoid special characters like ^MXXreturns.columns=['TSLA','SP500']returns.columns
Index(['TSLA', 'SP500'], dtype='object')
1.3 Visualize the relationship
We do a scatter plot putting the SP500 returns as the independent variable (X) and the stock return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns.Type:
import seaborn as sb#plt.clf()x = returns['SP500']y = returns['TSLA']# I plot the (x,y) values along with the regression line that fits the data:sb.regplot(x=x,y=y)plt.xlabel('SP500 returns')plt.ylabel('TSLA returns') plt.show()
Sometimes graphs can be deceiving. In this case, the range of X axis and Y axis are different, so it is better to do a graph where we can make both X and Y ranges with equal distance. Type:
plt.clf()sb.regplot(x=x,y=y)# I adjust the scale of the X axis so that the magnitude of each unit of X is similar to that of the Y axis plt.xticks(np.arange(-1,1,0.20))
([<matplotlib.axis.XTick object at 0x0000011EAC833250>, <matplotlib.axis.XTick object at 0x0000011EAC832AD0>, <matplotlib.axis.XTick object at 0x0000011EAC8016D0>, <matplotlib.axis.XTick object at 0x0000011EAC87E850>, <matplotlib.axis.XTick object at 0x0000011EAC87EFD0>, <matplotlib.axis.XTick object at 0x0000011EAC87F750>, <matplotlib.axis.XTick object at 0x0000011EAC87FED0>, <matplotlib.axis.XTick object at 0x0000011EAC8A0690>, <matplotlib.axis.XTick object at 0x0000011EAC8A0E10>, <matplotlib.axis.XTick object at 0x0000011EAC8A1590>], [Text(-1.0, 0, '−1.0'), Text(-0.8, 0, '−0.8'), Text(-0.6000000000000001, 0, '−0.6'), Text(-0.40000000000000013, 0, '−0.4'), Text(-0.20000000000000018, 0, '−0.2'), Text(-2.220446049250313e-16, 0, '0.0'), Text(0.19999999999999973, 0, '0.2'), Text(0.3999999999999997, 0, '0.4'), Text(0.5999999999999996, 0, '0.6'), Text(0.7999999999999996, 0, '0.8')])
# I label the axis:plt.xlabel('SP500 returns')plt.ylabel('TSLA returns') plt.show()
We can see that the slope of the line that represents the points is much more steep compared to the previous plot. This means that the returns have a strong positive relationship with the market returns, but we see that the SENSIBILITY of the stock return is high since the slope of the line is very steep compared with the previous plot.
1.4 Running the Regression Model
The OLS function from the statsmodel package is used to estimate a regression model. We run a simple regression model to see how the monthly returns of the stock are related with the market return.
The first parameter of the OLS function is the DEPENDENT VARIABLE (in this case, the stock return), and the second parameter must be the INDEPENDENT VARIABLE, also named the EXPLANATORY VARIABLE (in this case, the market return).
Before we run the OLS function, we need to add a column of 1’s to the X vector in order to estimate the beta0 coefficient (the constant).
What you will get is called The Single-Index Model. You are trying to examine how the market returns can explain stock returns.
Run the Single-Index model (Y=stock return, the X=market return). You can use the function OLS from the statsmodels.api library:
import statsmodels.formula.api as smf# I estimate the OLS regression model:mkmodel = smf.ols('TSLA ~ SP500',data=returns).fit()# The Dependent variable Y is the first one in the formula, and the second is the IV: Y ~ X# I display the summary of the regression: print(mkmodel.summary())
The regression output shows a lot of information about the relationship between the Market return (X) and the stock return (Y).
Looking at the table of beta coefficients, the first row (Intercept) shows the information of the beta0 coefficient, which is the intercept of the regression equation, also known as constant.
The second row (SP500) shows the information of the beta1 coefficient, which represents the slope of the regression line. In this example, since the X variable is the market return and the Y variable is the stock return, beta1 can be interpreted as the sensitivity or market risk of the stock.
For each beta coefficient, the following is calculated and shown:
coef : this is the average value of the beta coefficient
std err : this is the standard error of the coefficient, which is the standard deviation of the beta coefficient.
t : this is the t-Statistics of the following Hypothesis test:
H0: beta = 0;
Ha: beta <> 0;
P>|t| : is the p-value of the above hypothesis test; if it is a value < 0.05, we can say that the beta coefficient is SIGNIFICANTLY different than ZERO with a 95% confidence.
[0.025 0.975] : This is the 95% Confidence Interval of the beta coefficient. This shows the possible values that the beta coefficient can take in the future with 95% probability.
How the t-Statistic, p-value and the 95% C.I. are related?
Below is a way to extract the important values of the regression output.
After running/fitting the regression model, Python generated the object mkmodel that stores detailed information of the model. This objet has attributes and functions that we can use to extract important outputs for our analysis.
Important attributes of this mkmodel object are:
params: It is a vector that contains the beta coefficients
bse: It is a vector that contains the standard errors of the beta coefficients
pvalues: It is a vector with the p-values for each coefficient
conf_int(): it is a vector of vectors, which contains the min and max of the 95% confidence interval for each coefficient
We can get these values as follows:
# I get the beta coefficients:b0=mkmodel.params[0]b1=mkmodel.params[1]# I get the standard errors: seb0 = mkmodel.bse[0]seb1 = mkmodel.bse[1]# I calculate the t-values of each beta by dividing the coefficient by its standard errortb0 = b0 / seb0 tb1 = b1 / seb1 # I get the p-values of each coefficient: pvalueb0 = mkmodel.pvalues[0]pvalueb1 = mkmodel.pvalues[1]# I can calculate the confidence level of beta0 according to its pvalueconfb0 =100* (1-pvalueb0)# I can calculate the confidence level of beta1 according to its pvalueconfb1 =100* (1-pvalueb1)# I can construct the 95% Confidence Interval for each coefficient: # I get the min and max of beta0: minb0 = mkmodel.conf_int()[0][0]maxb0 = mkmodel.conf_int()[1][0]# I get the min and max of beta1: minb1 = mkmodel.conf_int()[0][1]maxb1 = mkmodel.conf_int()[1][1]
THE REGRESSION EQUATION.
ACCORDING TO THE REGRESSION OUTPUT, THE REGRESSION EQUATION THAT EXPLAINS THE EXPECTED RETURN OF TESLA BASED ON THE SP500 RETURN IS:
E[TSLAret]= beta0 + beta1(SP500ret)
E[TSLAAret]= 0.0113 + 2.1348(SP500ret)
2 Automating the estimation of many market models
2.1 Review of how a Loop works and how to store values in a Matrix
An analyst made an analysis and tells you that financial leverage has a negative quadratic effect on the expected financial performance. This is a typical inverted U-shaped relationship. In other words, financial leverage has a positive effect on performance up to a certain level of leverage; after this level the relationship becomes negative: the more financial leverage, the less the firm performance. The equation is the following:
Performance=20-(leverage-4)^{2}
Leverage is measure from level 0 (no leverage) up to the maximum leverage =10. Calculate the expected firm performance for leverage levels 0 to 10 and store the results in a matrix:
import numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Initialize empty list (instead of matrix)matrix_results = []# Loop from 0 to 10 (inclusive) by 1for i inrange(0, 11):# Calculate performance performance =20- (i -4)**2# Append the result as a row (list) matrix_results.append([i, performance])# Convert to DataFramematrix_results = pd.DataFrame(matrix_results, columns=["leverage", "performance"])# Display the DataFrameprint(matrix_results)
# Plotplt.plot(matrix_results["leverage"], matrix_results["performance"])plt.xlabel("Leverage")plt.ylabel("Performance")plt.title("Performance vs Leverage")plt.show()
3 CHALLENGE 1
Write a detailed Python Pseudo-Code, and its Python code to automatically run the Market Regression Model for a list of 20 US active stocks (You can check for active stocks in Yahoo Finance or in the usfirms dataset we used in Workshop 1). Think how you can implement it with a loop.
For each stock you have to come up with a way to store the the beta coefficients along with their standard errors, tvalues, and 95% confidence interval.
4 CHALLENGE 1 SOLUTION
I first write the STEPS of the algorithm:
General approach: For each stock I need to run a market regression model, and then extract the beta0 and beta1 coefficients along with their corresponding standard errors and pvalues.
Define a tickerlist as a vector of 20 tickers; add the ticker “^GSPC” as the first element of the vector
Define tickersfound as an empty vector
Start with an empty dataset called prices
Write a loop to bring close prices for each stock:
for each ticker in tickerlist:
Using yf.download, download close prices of the ticker and name it as price
If the ticker exist in Yahoo Finance and if the ticker is not ^GSPC:
Add ticker to the tickersfound vector/list
If it is the first ticker, assign prices = price
else
Merge price with prices matching by the date index
Else # if the ticker does not exist:
Print a message: The ticker ___ does not exist
At this point I will have prices as a time-series dataset with all close prices that existed in Yahoo Finance.
The tickersfound will be a vector with the tickers that were found in Yahoo Finance
Using prices, create returns as the log returns of prices
Initialize betamatrix as an empty matrix with 8 columns (where the beta data will be stored for each stock). Name the columns as b0, b1, stderrb0, stderrb1, tvalueb0, tvalueb1, pvalueb0, pvalueb1
Then, write a for loop that iterate for each ticker of tickersfound, and with each iteration i do the following:
Using the prices dataframe, run the a regression model where the Y variable is the ‘^GSPC’ column, and the X is the ticker column
From the regression summary extract extract: intercept, beta coefficient, standard error of intercept, standard error of beta, tvalue of intercept, tvalue of beta, pvalue of intercept, pvalue of beta.
Save these values in a temporal vector
Append this vector into the betamatrix
Finally, I identify whether any of these stocks are offering returns over the market by selecting the stocks with beta0 that are positive and significant.
Write a loop to bring close prices for each stock:
for each ticker in tickerlist:
Using yf.download, download close prices of the ticker and name it as price
If the ticker exist in Yahoo Finance and if the ticker is not ^GSPC:
Add ticker to the tickersfound vector/list
If it is the first ticker, assign prices = price
else
Merge price with prices matching by the date index
Else # if the ticker does not exist:
Print a message: The ticker ___ does not exist
import yfinance as yfimport pandas as pd# Initialize an empty DataFrame to store all pricesprices = pd.DataFrame()tickersfound = [] # To keep track of successfully downloaded tickersmarketticker ="^GSPC"start_date ="2020-01-01"end_date ="2024-12-31"for ticker in tickerlist:try:# Download monthly data, ensuring auto_adjust for accurate close prices# yf.download returns a DataFrame, we need the 'Close' column which is a Series price_series = yf.download(ticker, start=start_date, end=end_date, interval='1mo', auto_adjust=True)['Close']ifnot price_series.empty:if ticker!=marketticker: tickersfound.append(ticker)# Rename the Series to the ticker symbol for easy column identificationif prices.empty:# If 'prices' is empty, this is the first ticker, so initialize 'prices' DataFrame price_series.name=ticker prices = price_series else:# Otherwise, merge the current ticker's prices with the existing 'prices' DataFrame# Use outer merge to keep all dates and fill missing values with NaN prices = pd.merge(prices, price_series, left_index=True, right_index=True, how='outer')else:print(f"The ticker {ticker} does not exist or has no 'Close' price data in the specified range.")exceptExceptionas e:print(f"An error occurred while downloading data for {ticker}: {e}")
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
# Display the first few rows of the final 'prices' DataFrameifnot prices.empty:print("\nSuccessfully downloaded and merged monthly close prices into 'prices' DataFrame. Here's a preview:") prices.head()else:print("No stock data was successfully downloaded or merged.")
Successfully downloaded and merged monthly close prices into 'prices' DataFrame. Here's a preview:
Ticker ^GSPC MSFT ... KO MCD
Date ...
2020-01-01 3225.520020 161.630188 ... 48.923050 187.353989
2020-02-01 2954.219971 153.825455 ... 44.809826 170.016983
2020-03-01 2584.590088 150.151718 ... 37.069267 145.687973
2020-04-01 2912.429932 170.621307 ... 38.780270 165.256943
2020-05-01 3044.310059 174.467697 ... 39.447884 164.164383
[5 rows x 22 columns]
Initialize betamatrix as an empty matrix with 8 columns (where the beta data will be stored for each stock). Name the columns as b0, b1, stderrb0, stderrb1, tvalueb0, tvalueb1, pvalueb0, pvalueb1
Then, write a for loop that iterate for each ticker of tickersfound, and with each iteration i do the following:
Using the prices dataframe, run the a regression model where the Y variable is the ‘^GSPC’ column, and the X is the ticker column
From the regression summary extract extract: intercept, beta coefficient, standard error of intercept, standard error of beta, tvalue of intercept, tvalue of beta, pvalue of intercept, pvalue of beta.
Save these values in a temporal vector
Append this vector into the betamatrix
import statsmodels.api as smimport pandas as pd# Create an empty Pandas DataFrame for betamatrix with specified columnsbetamatrix = pd.DataFrame(columns=['b0', 'b1', 'stderrb0', 'stderrb1', 'tvalueb0', 'tvalueb1', 'pvalueb0', 'pvalueb1', 'minb0','maxb0','minb1','maxb1'])print("Libraries 'statsmodels.api' and 'numpy' imported. 'betamatrix' DataFrame initialized.")
Libraries 'statsmodels.api' and 'numpy' imported. 'betamatrix' DataFrame initialized.
for ticker in tickersfound:# Create a temporary DataFrame with the market and current ticker prices df_reg = returns[[marketticker, ticker]].dropna()# Define the dependent variable (Y) as the individual stock returns Y = df_reg[ticker]# Define the independent variable (X) as the market returns X = df_reg[marketticker]# Add a constant to the independent variable for the intercept X = sm.add_constant(X)# Create and fit the OLS model model = sm.OLS(Y, X).fit()# Extract parameters using .iloc to avoid FutureWarning b0 = model.params.iloc[0] b1 = model.params.iloc[1]# Extract standard errors using .iloc to avoid FutureWarning stderrb0 = model.bse.iloc[0] stderrb1 = model.bse.iloc[1]# Extract t-values using .iloc to avoid FutureWarning tvalueb0 = model.tvalues.iloc[0] tvalueb1 = model.tvalues.iloc[1]# Extract p-values using .iloc to avoid FutureWarning pvalueb0 = model.pvalues.iloc[0] pvalueb1 = model.pvalues.iloc[1]# I construct the 95% Confidence Interval for each coefficient: # I get the min and max of beta0: minb0 = model.conf_int().iloc[0][0] maxb0 = model.conf_int().iloc[0][1]# I get the min and max of beta1: minb1 = model.conf_int().iloc[1][0] maxb1 = model.conf_int().iloc[1][1]# Append the results to the betamatrix DataFrame betamatrix.loc[ticker] = [b0, b1, stderrb0, stderrb1, tvalueb0, tvalueb1, pvalueb0, pvalueb1, minb0, maxb0, minb1, maxb1]# Display the betamatrixprint("Beta matrix calculated:")
Interpret with your words ONE of the 20 regression models
I selected Tesla:
ticker='TSLA'df_reg = returns[[marketticker, ticker]].dropna()# Define the dependent variable (Y) as the individual stock returnsY = df_reg[ticker]# Define the independent variable (X) as the market returnsX = df_reg[marketticker]# Add a constant to the independent variable for the interceptX = sm.add_constant(X)# Create and fit the OLS modelmodel = sm.OLS(Y, X).fit()print(model.summary())
BETA0=0.0164. SINCE BETA0 IS POSITIVE WE CAN SAY THAT, ON AVERAGE, TSLA IS OFFERING RETURNS OVER THE MARKET. HOWEVER, IT IS NOT SIGNIFICANTLY OFFERING RETURNS OVER THE MARKET AT THE 95% CONFIDENCE LEVEL SINCE ITS PVALUE IS GREATER THAN 0.05 (IT IS 0.451).
THEN, ON AVERAGE, TESLA IS OFFERING EXCESS RETURNS IN ABOUT 1.64% ABOVE THE MARKET. HOWEVER, THIS EXCESS RETURN IS NOT STATISTICALLY SIGNIFICANT.
WE CAN ALSO USE ITS 95% CONFIDENCE INTERVAL TO CONCLUDE THAT BETA0 NOT ALWAYS STATS POSITIVE SINCE IT CAN MOVE FROM -0.027 UP TO 0.060.
ABOUT BETA1
BETA1 IS 2.1, SO IT IS RISKIER THAN THE MARKET ON AVERAGE. TO KNOW WHETHER IT IS SIGNIFICANTLY RISKIER THAN THE MARKET WE NEED TO SEE THE 95% CONFIDENCE INTERVAL (CI) AND CHECK WHETHER THIS INTERVAL IS GREATER THAN 1. IF THE MINIMUM OF THIS 95% CI IS GREATER THAN 1, THEN WE CAN SAY THAT THE STOCK IS SIGNIFICANTLY RISKIER THAN THE MARKET.
THEN, THE MINIMUM OF BETA 1 IS 1.2 (BETA1 - 2*STDERR). SINCE THE MINIMUM OF BETA1 IS GREATER THAN 1, WE CAN SAY THAT TESLA IS SIGNIFICANTLY RISKIER THAN THE MARKET.
ABOUT R-SQUARED
THE R-SQUARED OF THE MODEL IS ABOUT 32%. THIS MEANS THAT THE 32% OF THE VARIABILITY OF TESLA RETURNS ARE EXPLAINED BY THE VARIABILITY OF THE MARKET RETURNS.
Write the Python code and run it to do the following:
Based on results you got for all market models, check how many firms satisfy each of the following conditions/criteria:
CRITERIA A: Stocks that are SIGNIFICANTLY offering returns over the market
CRITERIA B: Stocks that are SIGNIFICANTLY more risky than the market at the 95% confidence level
Show and explain your results
CRITERIA A. Stocks that are significantly offering positive returns over the market:
These stocks are those with beta0>0 and with its pvalue<0.05 (considering 95% confidence level):
# Examine evidence for alpha (significant positive intercept b0)# Using a common significance level, e.g., p-value < 0.05significant_alpha_threshold =0.05positive_alpha_stocks = betamatrix[(betamatrix['b0'] >0) & (betamatrix['pvalueb0'] < significant_alpha_threshold)]positive_alpha_stocks
We see that TSLA, NVDA, AMD and BA are significantly riskier than the market since about 95% of the time its beta1 will be greater than 1.
Bellow is a code to classify stocks based on their average beta1’s and beta0:
print("\n--- Beta Value Analysis ---\n")
--- Beta Value Analysis ---
# 1. Rank stocks by beta values (b1)ranked_by_beta = betamatrix.sort_values(by='b1', ascending=False)print("Stocks ranked by Beta (b1, high to low):\n")