Workshop 2, Financial Modeling and Programming

Author

Alberto Dorantes, Ph.D.

Published

November 10, 2025

Abstract

This is an INDIVIDUAL workshop. In this workshop we review the market regression model to calculate aplha and beta, and also we learn how to run the model many times using a loop. We use the results of alpha and beta to select stocks.

0.1 General Directions for each workshop

You have to work on Google Colab for all your workshops. In Google Colab, you MUST LOGIN with your @tec.mx account and then create a google colab document for each workshop.

You must share each Colab document (workshop) with the following account:

cdorante@tec.mx

You must give Edit privileges to these accounts.

In Google Colab you can work with Python or R notebooks. The default is Python notebooks.

Your Notebook will have a default name like “Untitled2.ipynb”. Click on this name and change it to “W1-Programming-YourFirstName-YourLastname”.

In your Workshop Notebook you have to:

You have to do all the challenges to get full credit for the workshop. The accuracy of the challenge will not significantly affect your grade; completion will have more weight for your workshop grade.
It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your personal notebook to study for the FINAL EXAM. Your own workshop/notebook will be very helpful for your further study.

Once you finish your workshop, make sure that you RUN ALL CHUNKS. You can run each code chunk by clicking on the “Run” button located in the top-left section of each chunk. You can also run all the chunks in one-shot with Ctrl-F9. You have to submit to Canvas the web link of your Google Colab workshop.

1 Review of the Market Regression model

The simple linear regression model is used to understand the linear relationship between two variables assuming that one variable, the independent variable (IV), can be used as a predictor of the other variable, the dependent variable (DV). In this part we illustrate a simple regression model with the Market Model.

The Market Model states that the expected return of a stock is given by its alpha coefficient (b0) plus its market beta coefficient (b1) multiplied times the market return. In mathematical terms:

E[R_i] = α + β(R_M)

We can express the same equation using B0 as alpha, and B1 as market beta:

E[R_i] = β_0 + β_1(R_M)

We can estimate the alpha and market beta coefficient by running a simple linear regression model specifying that the market return is the independent variable and the stock return is the dependent variable. It is strongly recommended to use continuously compounded returns instead of simple returns to estimate the market regression model. The market regression model can be expressed as:

r_{(i,t)} = b_0 + b_1*r_{(M,t)} + ε_t

Where:

ε_t is the error at time t. Thanks to the Central Limit Theorem, this error behaves like a Normal distributed random variable ∼ N(0, σ_ε); the error term ε_t is expected to have mean=0 and a specific standard deviation σ_ε (also called volatility).

r_{(i,t)} is the return of the stock i at time t.

r_{(M,t)} is the market return at time t.

b_0 and b_1 are called regression coefficients.

In the next sections we get real data and run a Market Model for Tesla

1.1 Data collection

We first load the yfinance package and download monthly price data for Tesla and the S&P500 Index:

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib
import matplotlib.pyplot as plt

# Download a dataset with prices for Alfa and the Mexican IPyC:
data = yf.download("TSLA, ^GSPC", start="2020-01-01", end="2025-09-30", interval='1mo')

YF.download() has changed argument auto_adjust default to True

[                       0%                       ]
[*********************100%***********************]  2 of 2 completed


# I create another dataset with the Adjusted Closing price of both instruments:
adjprices = data['Close']

1.2 Return calculation

We calculate continuously returns for both, Alfa and the IPCyC. We use the diff function to get the monthly difference of the log of prices, which is the % change of the price (in continuous compounding)

returns = np.log(adjprices).diff(1).dropna()
returns.columns

Index(['TSLA', '^GSPC'], dtype='object', name='Ticker')

# I change the name of the columns to avoid special characters like ^MXX
returns.columns=['TSLA','SP500']
returns.columns

Index(['TSLA', 'SP500'], dtype='object')

1.3 Visualize the relationship

We do a scatter plot putting the SP500 returns as the independent variable (X) and the stock return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns.Type:

import seaborn as sb
#plt.clf()
x = returns['SP500']
y = returns['TSLA']
# I plot the (x,y) values along with the regression line that fits the data:
sb.regplot(x=x,y=y)
plt.xlabel('SP500 returns')
plt.ylabel('TSLA returns') 
plt.show()

Sometimes graphs can be deceiving. In this case, the range of X axis and Y axis are different, so it is better to do a graph where we can make both X and Y ranges with equal distance. Type:

plt.clf()

sb.regplot(x=x,y=y)
# I adjust the scale of the X axis so that the magnitude of each unit of X is similar to that of the Y axis 
plt.xticks(np.arange(-1,1,0.20))

([<matplotlib.axis.XTick object at 0x00000284A28FF250>, <matplotlib.axis.XTick object at 0x00000284A28FEAD0>, <matplotlib.axis.XTick object at 0x00000284A28D96D0>, <matplotlib.axis.XTick object at 0x00000284A294E850>, <matplotlib.axis.XTick object at 0x00000284A294EFD0>, <matplotlib.axis.XTick object at 0x00000284A294F750>, <matplotlib.axis.XTick object at 0x00000284A294FED0>, <matplotlib.axis.XTick object at 0x00000284A2970690>, <matplotlib.axis.XTick object at 0x00000284A2970E10>, <matplotlib.axis.XTick object at 0x00000284A2971590>], [Text(-1.0, 0, '−1.0'), Text(-0.8, 0, '−0.8'), Text(-0.6000000000000001, 0, '−0.6'), Text(-0.40000000000000013, 0, '−0.4'), Text(-0.20000000000000018, 0, '−0.2'), Text(-2.220446049250313e-16, 0, '0.0'), Text(0.19999999999999973, 0, '0.2'), Text(0.3999999999999997, 0, '0.4'), Text(0.5999999999999996, 0, '0.6'), Text(0.7999999999999996, 0, '0.8')])

# I label the axis:
plt.xlabel('SP500 returns')

plt.ylabel('TSLA returns') 
plt.show()

We can see that the slope of the line that represents the points is much more steep compared to the previous plot. This means that the returns have a strong positive relationship with the market returns, but we see that the SENSIBILITY of the stock return is high since the slope of the line is very steep compared with the previous plot.

1.4 Running the Regression Model

The OLS function from the statsmodel package is used to estimate a regression model. We run a simple regression model to see how the monthly returns of the stock are related with the market return.

The first parameter of the OLS function is the DEPENDENT VARIABLE (in this case, the stock return), and the second parameter must be the INDEPENDENT VARIABLE, also named the EXPLANATORY VARIABLE (in this case, the market return).

Before we run the OLS function, we need to add a column of 1’s to the X vector in order to estimate the beta0 coefficient (the constant).

What you will get is called The Single-Index Model. You are trying to examine how the market returns can explain stock returns.

Run the Single-Index model (Y=stock return, the X=market return). You can use the function OLS from the statsmodels.api library:

import statsmodels.formula.api as smf

# I estimate the OLS regression model:
mkmodel = smf.ols('TSLA ~ SP500',data=returns).fit()
# The Dependent variable Y is the first one in the formula, and the second is the IV: Y ~ X

# I display the summary of the regression: 
print(mkmodel.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   TSLA   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.306
Method:                 Least Squares   F-statistic:                     30.53
Date:              lun., 10 nov. 2025   Prob (F-statistic):           6.01e-07
Time:                        14:30:21   Log-Likelihood:                 28.718
No. Observations:                  68   AIC:                            -53.44
Df Residuals:                      66   BIC:                            -49.00
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0113      0.020      0.568      0.572      -0.029       0.051
SP500          2.1348      0.386      5.525      0.000       1.363       2.906
==============================================================================
Omnibus:                        1.000   Durbin-Watson:                   1.708
Prob(Omnibus):                  0.607   Jarque-Bera (JB):                1.085
Skew:                          -0.246   Prob(JB):                        0.581
Kurtosis:                       2.623   Cond. No.                         19.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The regression output shows a lot of information about the relationship between the Market return (X) and the stock return (Y).

Looking at the table of beta coefficients, the first row (Intercept) shows the information of the beta0 coefficient, which is the intercept of the regression equation, also known as constant.

The second row (SP500) shows the information of the beta1 coefficient, which represents the slope of the regression line. In this example, since the X variable is the market return and the Y variable is the stock return, beta1 can be interpreted as the sensitivity or market risk of the stock.

For each beta coefficient, the following is calculated and shown:

coef : this is the average value of the beta coefficient
std err : this is the standard error of the coefficient, which is the standard deviation of the beta coefficient.
t : this is the t-Statistics of the following Hypothesis test:

H0: beta = 0;

Ha: beta <> 0;
P>|t| : is the p-value of the above hypothesis test; if it is a value < 0.05, we can say that the beta coefficient is SIGNIFICANTLY different than ZERO with a 95% confidence.
[0.025 0.975] : This is the 95% Confidence Interval of the beta coefficient. This shows the possible values that the beta coefficient can take in the future with 95% probability.

How the t-Statistic, p-value and the 95% C.I. are related?

Below is a way to extract the important values of the regression output.

After running/fitting the regression model, Python generated the object mkmodel that stores detailed information of the model. This objet has attributes and functions that we can use to extract important outputs for our analysis.

Important attributes of this mkmodel object are:

params: It is a vector that contains the beta coefficients
bse: It is a vector that contains the standard errors of the beta coefficients
pvalues: It is a vector with the p-values for each coefficient
conf_int(): it is a vector of vectors, which contains the min and max of the 95% confidence interval for each coefficient

We can get these values as follows:

# I get the beta coefficients:
b0=mkmodel.params[0]
b1=mkmodel.params[1]
# I get the standard errors: 
seb0 = mkmodel.bse[0]
seb1 = mkmodel.bse[1]
# I calculate the t-values of each beta by dividing the coefficient by its standard error
tb0 = b0 / seb0 
tb1 = b1 / seb1 
# I get the p-values of each coefficient: 
pvalueb0 = mkmodel.pvalues[0]
pvalueb1 = mkmodel.pvalues[1]

# I can calculate the confidence level of beta0 according to its pvalue
confb0 = 100 * (1-pvalueb0)
# I can calculate the confidence level of beta1 according to its pvalue
confb1 = 100 * (1-pvalueb1)

# I can construct the 95% Confidence Interval for each coefficient: 
# I get the min and max of beta0: 
minb0 = mkmodel.conf_int()[0][0]
maxb0 = mkmodel.conf_int()[1][0]
# I get the min and max of beta1:  
minb1 = mkmodel.conf_int()[0][1]
maxb1 = mkmodel.conf_int()[1][1]

THE REGRESSION EQUATION.

ACCORDING TO THE REGRESSION OUTPUT, THE REGRESSION EQUATION THAT EXPLAINS THE EXPECTED RETURN OF TESLA BASED ON THE SP500 RETURN IS:

E[TSLAret]= beta0 + beta1(SP500ret)

E[TSLAAret]= 0.0113 + 2.1348(SP500ret)

2 Automating the estimation of many market models

2.1 Review of how a Loop works and how to store values in a Matrix

An analyst made an analysis and tells you that financial leverage has a negative quadratic effect on the expected financial performance. This is a typical inverted U-shaped relationship. In other words, financial leverage has a positive effect on performance up to a certain level of leverage; after this level the relationship becomes negative: the more financial leverage, the less the firm performance. The equation is the following:

Performance=20-(leverage-4)^{2} Leverage is measure from level 0 (no leverage) up to the maximum leverage =10. Calculate the expected firm performance for leverage levels 0 to 10 and store the results in a matrix:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Initialize empty list (instead of matrix)
matrix_results = []

# Loop from 0 to 10 (inclusive) by 1
for i in range(0, 11):
    # Calculate performance
    performance = 20 - (i - 4)**2
    # Append the result as a row (list)
    matrix_results.append([i, performance])

# Convert to DataFrame
matrix_results = pd.DataFrame(matrix_results, columns=["leverage", "performance"])

# Display the DataFrame
print(matrix_results)

    leverage  performance
0          0            4
1          1           11
2          2           16
3          3           19
4          4           20
5          5           19
6          6           16
7          7           11
8          8            4
9          9           -5
10        10          -16

# Plot
plt.plot(matrix_results["leverage"], matrix_results["performance"])
plt.xlabel("Leverage")
plt.ylabel("Performance")
plt.title("Performance vs Leverage")
plt.show()

3 CHALLENGE 1

Write a detailed Python Pseudo-Code, and its Python code to automatically run the Market Regression Model for a list of 20 US active stocks (You can check for active stocks in Yahoo Finance or in the usfirms dataset we used in Workshop 1). Think how you can implement it with a loop.

For each stock you have to come up with a way to store the the beta coefficients along with their standard errors, tvalues, and 95% confidence interval.

4 CHALLENGE 2

Interpret with your words ONE of the 20 regression models
Write the Python code and run it to do the following:

Based on results you got for all market models, check how many firms satisfy each of the following conditions/criteria:
1. CRITERIA A: Stocks that are SIGNIFICANTLY offering returns over the market
2. CRITERIA B: Stocks that are SIGNIFICANTLY more risky than the market at the 95% confidence level

(You might find that few or none of the stocks satisfy these criteria)

Show and explain your results

5 W2 submission

The grade of this Workshop will be the following:

Complete (100%): If you submit an ORIGINAL and COMPLETE work with all the activities, with your notes, and with your OWN RESPONSES to questions
Incomplete (75%): If you submit an ORIGINAL work with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
Not submitted (0%)

Remember that you have to submit your Colab link file through Canvas BEFORE THE FIRST CLASS OF NEXT WEEK.