import yfinance as yf
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as pltWorkshop 3, Econometric Models
1 Estimating the CAPM model for a stock
2 The CAPM model
The Capital Asset Pricing Model states that the expected return of a stock is given by the risk-free rate plus its beta coefficient multiplied by the market premium return. In mathematical terms:
E[R_i] = R_f + β_1(R_M − R_f )
We can express the same equation as:
(E[R_i] − R_f ) = β_1(R_M − R_f )
Then, we are saying that the expected value of the premium return of a stock is equal to the premium market return multiplied by its market beta coefficient. You can estimate the beta coefficient of the CAPM using a regression model and using continuously compounded returns instead of simple returns. However, you must include the intercept b0 in the regression equation:
(r_i − r_f ) = β_0 + β_1(r_M − r_f ) + ε
Where ε ∼ N(0, σ_ε); the error is a random shock with an expected mean=0 and a specific standard deviation or volatility. This error represents the result of all factors that influence stock returns, and cannot be explained by the model (by the market).
In the market model, the dependent variable was the stock return and the independent variable was the market return. Unlike the market model, here the dependent variable is the difference between the stock return minus the risk-free rate (the stock premium return), and the independent variable is the premium return, which is equal to the market return minus the risk-free rate. Let’s run this model in with a couple of stocks.
3 Data collection
We load the libraries to collect, process and visualize stock data from Yahoo Finance:
3.1 Download stock data
We download monthly stock data for Apple, Tesla and the S&P500 from Dec 2021 to Jan 31, 2026 from Yahoo Finance using the yfinance function and obtain continuously compounded returns for each:
data = yf.download("^GSPC AAPL TSLA", start='2020-12-01', end='2026-01-31', interval='1mo', auto_adjust=True)
[ 0% ]
[**********************67%******* ] 2 of 3 completed
[*********************100%***********************] 3 of 3 completed
# Get adjusted close prices
adjprices = data['Close']
# Calculate continuously compounded returns for the 3 prices:
#returns = np.log(adjprices) - np.log(adjprices.shift(1))
#returns = returns.dropna()
returns = np.log(adjprices).diff(1).dropna()
# I used the diff function instead of subtracting the log price and its previous log priceI have monthly returns from Jan 2021:
returnsTicker AAPL TSLA ^GSPC
Date
2021-01-01 -0.005517 0.117344 -0.011199
2021-02-01 -0.084562 -0.161038 0.025757
2021-03-01 0.008806 -0.011270 0.041563
2021-04-01 0.073453 0.060293 0.051097
2021-05-01 -0.053514 -0.126372 0.005471
... ... ... ...
2025-09-01 0.093605 0.286693 0.034714
2025-10-01 0.059980 0.026275 0.022433
2025-11-01 0.030883 -0.059540 0.001299
2025-12-01 -0.024418 0.044445 -0.000524
2026-01-01 -0.046608 -0.043887 0.013570
[61 rows x 3 columns]
3.2 Download risk-free data from the FED
We download the risk-free monthly rate for the US (3-month treasury bills), which is the TB3MS ticker. We do this with the pandas_datareader library:
# You have to install the pandas-datareader package:
#!pip install pandas-datareader
import pandas_datareader.data as pdrimport pandas_datareader.data as pdr
import datetime
# I define start as the month Jan 2020
start = datetime.datetime(2021,1,1)
# I define the end month as Jan 2026
end = datetime.datetime(2026,1,31)
Tbills = pdr.DataReader('TB3MS','fred',start,end)We see the content of Tbills:
Tbills TB3MS
DATE
2021-01-01 0.08
2021-02-01 0.04
2021-03-01 0.03
2021-04-01 0.02
2021-05-01 0.02
... ...
2025-09-01 3.92
2025-10-01 3.82
2025-11-01 3.78
2025-12-01 3.59
2026-01-01 3.57
[61 rows x 1 columns]
The TB3MS serie is given in percentage and in annual rate. I divide it by 100 and 12 to get a monthly simple rate since I am using monthly rates for the stocks:
rfrate = Tbills / 100 / 12Now I get the continuously compounded return from the simple return:
rfrate = np.log(1+rfrate)I used the formula to get cc reteurns from simple returns, which is applying the natural log of the growth factor (1+rfrate)
4 Visualize the relationship
We do a scatter plot putting the S&P500 premium returns as the independent variable (X) and Tesla premium return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns:
import seaborn as sb
plt.clf()
x = returns['GSPC_Premr']
y = returns['TSLA_Premr']
# I plot the (x,y) values along with the regression line that fits the data:
sb.regplot(x=x,y=y)
plt.xlabel('Market Premium returns')
plt.ylabel('TSLA Premium returns')
plt.show()Sometimes graphs can be deceiving. In this case, the range of X axis and Y axis are different, so it is better to do a graph where we can make both X and Y ranges with equal distance.
plt.clf()
sb.regplot(x=x,y=y)
# I adjust the scale of the X axis so that the magnitude of each unit of X is equal to that of the Y axis
plt.xticks(np.arange(-0.6,0.8,0.2))([<matplotlib.axis.XTick object at 0x0000017837511310>, <matplotlib.axis.XTick object at 0x00000178374979D0>, <matplotlib.axis.XTick object at 0x0000017837540190>, <matplotlib.axis.XTick object at 0x0000017837540910>, <matplotlib.axis.XTick object at 0x0000017837541090>, <matplotlib.axis.XTick object at 0x0000017837541810>, <matplotlib.axis.XTick object at 0x0000017837541F90>], [Text(-0.6, 0, '−0.6'), Text(-0.39999999999999997, 0, '−0.4'), Text(-0.19999999999999996, 0, '−0.2'), Text(1.1102230246251565e-16, 0, '0.0'), Text(0.20000000000000007, 0, '0.2'), Text(0.4, 0, '0.4'), Text(0.6000000000000002, 0, '0.6')])
# I label the axis:
plt.xlabel('Market Premium returns')
plt.ylabel('TSLA Premium returns')
plt.show()QUESTION: WHAT DOES THE PLOT TELL YOU? BRIEFLY EXPLAIN
5 Estimating the CAPM model for a stock
Use the premium returns to run the CAPM regression model for each stock.
We run the CAPM for TESLA:
import statsmodels.formula.api as smf
# I estimate the OLS regression model:
mkmodel = smf.ols('TSLA_Premr ~ GSPC_Premr',data=returns).fit()
# I display the summary of the regression:
print(mkmodel.summary()) OLS Regression Results
==============================================================================
Dep. Variable: TSLA_Premr R-squared: 0.203
Model: OLS Adj. R-squared: 0.190
Method: Least Squares F-statistic: 15.04
Date: jue., 26 feb. 2026 Prob (F-statistic): 0.000268
Time: 10:28:49 Log-Likelihood: 29.894
No. Observations: 61 AIC: -55.79
Df Residuals: 59 BIC: -51.57
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -0.0057 0.020 -0.291 0.772 -0.045 0.033
GSPC_Premr 1.7518 0.452 3.878 0.000 0.848 2.656
==============================================================================
Omnibus: 1.587 Durbin-Watson: 1.888
Prob(Omnibus): 0.452 Jarque-Bera (JB): 1.536
Skew: -0.293 Prob(JB): 0.464
Kurtosis: 2.489 Cond. No. 23.4
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The beta0 coefficient of the model is -0.0057, while beta1 is 1.7518.
The 95% confidence interval for beta0 goes from -0.0449 to 0.0335, while the 95% confidence interval for beta1 goes from 0.8479 to 2.6558.
If we subtract and add about 2 times the standard error of beta0 to beta0 we get the 95% confidence interval for beta0. Why? Because thanks to the Central Limit Theorem, beta0the beta coefficients will behave similar to a normal distributed variables since the beta0 can be expressed as a linear combination of random variables.
We can construct the 95% confidence interval for beta1 in the same way we calculate the 95% C.I. for beta0.
6 CHALLENGE 1
Respond the following questions regarding Tesla CAPM model:
(a) INTERPRET THE RESULTS OF THE COEFFICIENTS (b0 and b1), THEIR STANDARD ERRORS, P-VALUES AND 95% CONFIDENCE INTERVALS.
(b) ACCORDING TO THE EFFICIENT MARKET HYPOTHESIS, WHAT IS THE EXPECTED VALUE OF b0 in the CAPM REGRESSION MODEL?
(c) ACCORDING TO YOUR RESULTS, IS TESLA SIGNIFICANTLY RISKIER THAN THE MARKET ? WHAT IS THE t-test YOU NEED TO DO TO RESPOND THIS QUESTION? Do the test and provide your interpretation. (Hint: Here you have to change the null hypothesis for b1: H0: b1=1; Ha=b1<>1)
7 CHALLENGE 2
Follow the same procedure to get Apple’s CAPM and respond the following questions: (a) INTERPRET THE RESULTS OF THE COEFFICIENTS (b0 and b1), THEIR STANDARD ERRORS, P-VALUES AND 95% CONFIDENCE INTERVALS.
(b) ACCORDING TO THE EFFICIENT MARKET HYPOTHESIS, WHAT IS THE EXPECTED VALUE OF b0 in the CAPM REGRESSION MODEL?
(c) ACCORDING TO YOUR RESULTS, IS TESLA SIGNIFICANTLY RISKIER THAN THE MARKET ? WHAT IS THE t-test YOU NEED TO DO TO RESPOND THIS QUESTION? Do the test and provide your interpretation. (Hint: Here you have to change the null hypothesis for b1: H0: b1=1; Ha=b1<>1)
WHAT IS THE EFFICIENT MARKET HYPOTHESIS? BRIEFLY DESCRIBE WHAT THIS HYPOTHESIS SAYS.
YOU HAVE TO DO YOUR OWN RESEARCH
8 READING
Read carefully: Basics of Linear Regression Models.
9 Quiz 3 and W3 submission
Go to Canvas and respond Quiz 3 about Linear Regression. You will be able to try this quiz up to 3 times. Questions in this Quiz are related to concepts of the readings related to this Workshop. The grade of this Workshop will be the following:
Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
Not submitted (0%)