Workshop 3, Econometric Models

Author

Alberto Dorantes

Published

February 24, 2026

Abstract

This is an INDIVIDUAL workshop. In this workshop we learn more about linear regression and the Capital Asset Pricing Model. The CAPM can be estimated by running a regression model with premium returns.

1 Estimating the CAPM model for a stock

2 The CAPM model

The Capital Asset Pricing Model states that the expected return of a stock is given by the risk-free rate plus its beta coefficient multiplied by the market premium return. In mathematical terms:

E[R_i] = R_f + β_1(R_M − R_f )

We can express the same equation as:

(E[R_i] − R_f ) = β_1(R_M − R_f )

Then, we are saying that the expected value of the premium return of a stock is equal to the premium market return multiplied by its market beta coefficient. You can estimate the beta coefficient of the CAPM using a regression model and using continuously compounded returns instead of simple returns. However, you must include the intercept b0 in the regression equation:

(r_i − r_f ) = β_0 + β_1(r_M − r_f ) + ε

Where ε ∼ N(0, σ_ε); the error is a random shock with an expected mean=0 and a specific standard deviation or volatility. This error represents the result of all factors that influence stock returns, and cannot be explained by the model (by the market).

In the market model, the dependent variable was the stock return and the independent variable was the market return. Unlike the market model, here the dependent variable is the difference between the stock return minus the risk-free rate (the stock premium return), and the independent variable is the premium return, which is equal to the market return minus the risk-free rate. Let’s run this model in with a couple of stocks.

3 Data collection

We load the libraries to collect, process and visualize stock data from Yahoo Finance:

import yfinance as yf
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

3.1 Download stock data

We download monthly stock data for Apple, Tesla and the S&P500 from Dec 2021 to Jan 31, 2026 from Yahoo Finance using the yfinance function and obtain continuously compounded returns for each:

data = yf.download("^GSPC AAPL TSLA", start='2020-12-01', end='2026-01-31', interval='1mo', auto_adjust=True)


[                       0%                       ]
[**********************67%*******                ]  2 of 3 completed
[*********************100%***********************]  3 of 3 completed

# Get adjusted close prices
adjprices = data['Close']  
# Calculate continuously compounded returns for the 3 prices:
#returns = np.log(adjprices) - np.log(adjprices.shift(1))
#returns = returns.dropna()
returns = np.log(adjprices).diff(1).dropna()
# I used the diff function instead of subtracting the log price and its previous log price

I have monthly returns from Jan 2021:

returns

Ticker          AAPL      TSLA     ^GSPC
Date                                    
2021-01-01 -0.005517  0.117344 -0.011199
2021-02-01 -0.084562 -0.161038  0.025757
2021-03-01  0.008806 -0.011270  0.041563
2021-04-01  0.073453  0.060293  0.051097
2021-05-01 -0.053514 -0.126372  0.005471
...              ...       ...       ...
2025-09-01  0.093605  0.286693  0.034714
2025-10-01  0.059980  0.026275  0.022433
2025-11-01  0.030883 -0.059540  0.001299
2025-12-01 -0.024418  0.044445 -0.000524
2026-01-01 -0.046608 -0.043887  0.013570

[61 rows x 3 columns]

3.2 Download risk-free data from the FED

We download the risk-free monthly rate for the US (3-month treasury bills), which is the TB3MS ticker. We do this with the pandas_datareader library:

# You have to install the pandas-datareader package:
#!pip install pandas-datareader
import pandas_datareader.data as pdr

import pandas_datareader.data as pdr
import datetime
# I define start as the month Jan 2020
start = datetime.datetime(2021,1,1)
# I define the end month as Jan 2026
end = datetime.datetime(2026,1,31)
Tbills = pdr.DataReader('TB3MS','fred',start,end)

We see the content of Tbills:

Tbills

            TB3MS
DATE             
2021-01-01   0.08
2021-02-01   0.04
2021-03-01   0.03
2021-04-01   0.02
2021-05-01   0.02
...           ...
2025-09-01   3.92
2025-10-01   3.82
2025-11-01   3.78
2025-12-01   3.59
2026-01-01   3.57

[61 rows x 1 columns]

The TB3MS serie is given in percentage and in annual rate. I divide it by 100 and 12 to get a monthly simple rate since I am using monthly rates for the stocks:

rfrate = Tbills / 100 / 12

Now I get the continuously compounded return from the simple return:

rfrate = np.log(1+rfrate)

I used the formula to get cc reteurns from simple returns, which is applying the natural log of the growth factor (1+rfrate)

3.3 Estimating the premium returns

Now you have to generate new variables (columns) for the premium returns for the stocks and the S&P 500.

The premium returns will be equal to the returns minus the risk-free rate. However, it is a good idea to check whether the returns dataset and the rfrate dataset have the same time periods of information:

print(returns.shape)

(61, 3)

print(rfrate.shape)

(61, 1)

Both data frames have 61 rows (months) of data. We can check the beginning and end of each dataset to make sure they have the same time periods:

print(returns.head())

Ticker          AAPL      TSLA     ^GSPC
Date                                    
2021-01-01 -0.005517  0.117344 -0.011199
2021-02-01 -0.084562 -0.161038  0.025757
2021-03-01  0.008806 -0.011270  0.041563
2021-04-01  0.073453  0.060293  0.051097
2021-05-01 -0.053514 -0.126372  0.005471

print(returns.tail())

Ticker          AAPL      TSLA     ^GSPC
Date                                    
2025-09-01  0.093605  0.286693  0.034714
2025-10-01  0.059980  0.026275  0.022433
2025-11-01  0.030883 -0.059540  0.001299
2025-12-01 -0.024418  0.044445 -0.000524
2026-01-01 -0.046608 -0.043887  0.013570

print(rfrate.head())

               TB3MS
DATE                
2021-01-01  0.000067
2021-02-01  0.000033
2021-03-01  0.000025
2021-04-01  0.000017
2021-05-01  0.000017

print(rfrate.tail())

               TB3MS
DATE                
2025-09-01  0.003261
2025-10-01  0.003178
2025-11-01  0.003145
2025-12-01  0.002987
2026-01-01  0.002971

Both data frames have the same time periods, so we are ready to calculate the premium returns:

# I create new columns for the Premium returns in the returns dataset:
returns['TSLA_Premr'] = returns['TSLA'] - rfrate['TB3MS'] 
returns['GSPC_Premr'] = returns['^GSPC'] - rfrate['TB3MS']

4 Visualize the relationship

We do a scatter plot putting the S&P500 premium returns as the independent variable (X) and Tesla premium return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns:

import seaborn as sb
plt.clf()
x = returns['GSPC_Premr']
y = returns['TSLA_Premr']
# I plot the (x,y) values along with the regression line that fits the data:
sb.regplot(x=x,y=y)
plt.xlabel('Market Premium returns')
plt.ylabel('TSLA Premium returns') 
plt.show()

Sometimes graphs can be deceiving. In this case, the range of X axis and Y axis are different, so it is better to do a graph where we can make both X and Y ranges with equal distance.

plt.clf()

sb.regplot(x=x,y=y)
# I adjust the scale of the X axis so that the magnitude of each unit of X is equal to that of the Y axis 
plt.xticks(np.arange(-0.6,0.8,0.2))

([<matplotlib.axis.XTick object at 0x0000017837511310>, <matplotlib.axis.XTick object at 0x00000178374979D0>, <matplotlib.axis.XTick object at 0x0000017837540190>, <matplotlib.axis.XTick object at 0x0000017837540910>, <matplotlib.axis.XTick object at 0x0000017837541090>, <matplotlib.axis.XTick object at 0x0000017837541810>, <matplotlib.axis.XTick object at 0x0000017837541F90>], [Text(-0.6, 0, '−0.6'), Text(-0.39999999999999997, 0, '−0.4'), Text(-0.19999999999999996, 0, '−0.2'), Text(1.1102230246251565e-16, 0, '0.0'), Text(0.20000000000000007, 0, '0.2'), Text(0.4, 0, '0.4'), Text(0.6000000000000002, 0, '0.6')])

# I label the axis:
plt.xlabel('Market Premium returns')

plt.ylabel('TSLA Premium returns') 
plt.show()

QUESTION: WHAT DOES THE PLOT TELL YOU? BRIEFLY EXPLAIN

5 Estimating the CAPM model for a stock

Use the premium returns to run the CAPM regression model for each stock.

We run the CAPM for TESLA:

import statsmodels.formula.api as smf

# I estimate the OLS regression model:
mkmodel = smf.ols('TSLA_Premr ~ GSPC_Premr',data=returns).fit()
# I display the summary of the regression: 
print(mkmodel.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:             TSLA_Premr   R-squared:                       0.203
Model:                            OLS   Adj. R-squared:                  0.190
Method:                 Least Squares   F-statistic:                     15.04
Date:              jue., 26 feb. 2026   Prob (F-statistic):           0.000268
Time:                        10:28:49   Log-Likelihood:                 29.894
No. Observations:                  61   AIC:                            -55.79
Df Residuals:                      59   BIC:                            -51.57
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.0057      0.020     -0.291      0.772      -0.045       0.033
GSPC_Premr     1.7518      0.452      3.878      0.000       0.848       2.656
==============================================================================
Omnibus:                        1.587   Durbin-Watson:                   1.888
Prob(Omnibus):                  0.452   Jarque-Bera (JB):                1.536
Skew:                          -0.293   Prob(JB):                        0.464
Kurtosis:                       2.489   Cond. No.                         23.4
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The beta0 coefficient of the model is -0.0057, while beta1 is 1.7518.

The 95% confidence interval for beta0 goes from -0.0449 to 0.0335, while the 95% confidence interval for beta1 goes from 0.8479 to 2.6558.

If we subtract and add about 2 times the standard error of beta0 to beta0 we get the 95% confidence interval for beta0. Why? Because thanks to the Central Limit Theorem, beta0the beta coefficients will behave similar to a normal distributed variables since the beta0 can be expressed as a linear combination of random variables.

We can construct the 95% confidence interval for beta1 in the same way we calculate the 95% C.I. for beta0.

6 CHALLENGE 1

Respond the following questions regarding Tesla CAPM model:

(a) INTERPRET THE RESULTS OF THE COEFFICIENTS (b0 and b1), THEIR STANDARD ERRORS, P-VALUES AND 95% CONFIDENCE INTERVALS.

(b) ACCORDING TO THE EFFICIENT MARKET HYPOTHESIS, WHAT IS THE EXPECTED VALUE OF b0 in the CAPM REGRESSION MODEL?

(c) ACCORDING TO YOUR RESULTS, IS TESLA SIGNIFICANTLY RISKIER THAN THE MARKET ? WHAT IS THE t-test YOU NEED TO DO TO RESPOND THIS QUESTION? Do the test and provide your interpretation. (Hint: Here you have to change the null hypothesis for b1: H0: b1=1; Ha=b1<>1)

7 CHALLENGE 2

Follow the same procedure to get Apple’s CAPM and respond the following questions: (a) INTERPRET THE RESULTS OF THE COEFFICIENTS (b0 and b1), THEIR STANDARD ERRORS, P-VALUES AND 95% CONFIDENCE INTERVALS.

(b) ACCORDING TO THE EFFICIENT MARKET HYPOTHESIS, WHAT IS THE EXPECTED VALUE OF b0 in the CAPM REGRESSION MODEL?

WHAT IS THE EFFICIENT MARKET HYPOTHESIS? BRIEFLY DESCRIBE WHAT THIS HYPOTHESIS SAYS.

YOU HAVE TO DO YOUR OWN RESEARCH

8 READING

Read carefully: Basics of Linear Regression Models.

9 Quiz 3 and W3 submission

Go to Canvas and respond Quiz 3 about Linear Regression. You will be able to try this quiz up to 3 times. Questions in this Quiz are related to concepts of the readings related to this Workshop. The grade of this Workshop will be the following:

Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
Not submitted (0%)