import yfinance as yf
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
Portfolio Index, Econometric Models
1 Download prices
I load the libraries to collect, process and visualize stock data from Yahoo Finance:
I download monthly stock prices for several firms and the S&P500 for at least 3 years
= yf.download('NVDA MSFT GOOG AMZN ^GSPC', start='2020-12-01', end='2024-07-31', interval='1mo') data
[ 0%% ]
[******************* 40%% ] 2 of 5 completed
[**********************60%%*** ] 3 of 5 completed
[**********************80%%************ ] 4 of 5 completed
[*********************100%%**********************] 5 of 5 completed
I select only adjusted prices:
= data['Adj Close']
adjprices adjprices.head()
AMZN GOOG MSFT NVDA ^GSPC
Date
2020-12-01 162.846497 87.494438 215.425507 13.020117 3756.070068
2021-01-01 160.309998 91.682671 224.665512 12.958867 3714.239990
2021-02-01 154.646500 101.727242 225.072327 13.681893 3811.149902
2021-03-01 154.703995 103.313934 228.881668 13.316516 3972.889893
2021-04-01 173.371002 120.369019 244.812225 14.978984 4181.169922
The first 4 are stock prices. The 5th column is the S&P500 index. For now I will use only the stock prices to create the portfolio index.
2 Creating an index for an equally-weighted portfolio:
I calculate the holing period returns (HPR) for each price. The HPR is the cumulative return for a stock over time, asumming that we hold the investment for all periods.
= adjprices / adjprices.iloc[0] - 1
HPR # adjprices.iloc[0] refers to the FIRST ROW of the adjprices dataset, which is the FIRST MONTH OF DATA
I create the equally-weighted portfolio:
# I create a dataset with the growth factors for each stock.
# I assume an equally-weighted portfolio. For 4 stocks I assign 25% each:
= 1 + (0.25* HPR.iloc[:,0] + 0.25* HPR.iloc[:,1] + 0.25* HPR.iloc[:,2] + 0.25* HPR.iloc[:,3])
GROWTHFACTOR1 # I create the Portfolio Index starting in 100:
= 100 * GROWTHFACTOR1
INDEXPORT1 # I show the first rows:
INDEXPORT1.head()
Date
2020-12-01 100.000000
2021-01-01 101.762005
2021-02-01 105.198109
2021-03-01 105.400813
2021-04-01 118.180587
dtype: float64
type(INDEXPORT1)
<class 'pandas.core.series.Series'>
INDEXPORT1 is a Series object since it has only 1 column. I convert it to a data frame
I have monthly returns from Dec 2020:
= INDEXPORT1.to_frame()
INDEXPORT1 # I name the column as INDEXPORT1:
= ['INDEXPORT1']
INDEXPORT1.columns INDEXPORT1.head()
INDEXPORT1
Date
2020-12-01 100.000000
2021-01-01 101.762005
2021-02-01 105.198109
2021-03-01 105.400813
2021-04-01 118.180587
I can visualize the Index over time:
'INDEXPORT1'])
plt.plot(INDEXPORT1['AI Portfolio Index')
plt.title( plt.show()
3 Adding the INDEXPORT to the price dataset
Now I merge the INDEXPORT1 with the adjprices data frame before I calculate returns:
# To add the INDEXPORT1 as a column to adjprices, I join the INDEXPORT into the adjprices:
= adjprices.join(INDEXPORT1)
adjprices adjprices.head()
AMZN GOOG ... ^GSPC INDEXPORT1
Date ...
2020-12-01 162.846497 87.494438 ... 3756.070068 100.000000
2021-01-01 160.309998 91.682671 ... 3714.239990 101.762005
2021-02-01 154.646500 101.727242 ... 3811.149902 105.198109
2021-03-01 154.703995 103.313934 ... 3972.889893 105.400813
2021-04-01 173.371002 120.369019 ... 4181.169922 118.180587
[5 rows x 6 columns]
4 Calculate returns for all prices and the Index
# I create returns as the first difference of log prices
= np.log(adjprices) - np.log(adjprices.shift(1))
returns # I drop the first row that has null values:
= returns.dropna()
returns returns.head()
AMZN GOOG MSFT NVDA ^GSPC INDEXPORT1
Date
2021-01-01 -0.015699 0.046758 0.041998 -0.004715 -0.011199 0.017467
2021-02-01 -0.035968 0.103962 0.001809 0.054293 0.025757 0.033209
2021-03-01 0.000372 0.015477 0.016783 -0.027068 0.041563 0.001925
2021-04-01 0.113920 0.152790 0.067286 0.117643 0.051097 0.114443
2021-05-01 -0.073076 0.000597 -0.009963 0.079071 0.005471 0.001943
5 Download the risk-free rate
I download the Treasury Bills rate of the US:
import pandas_datareader.data as pdr
import datetime
# I define start as the month Jan 2020
= datetime.datetime(2021,1,1)
start # I define the end month as July 2024
= datetime.datetime(2024,7,1)
end = pdr.DataReader('TB3MS','fred',start,end) Tbills
We see the content of Tbills:
Tbills.head()
TB3MS
DATE
2021-01-01 0.08
2021-02-01 0.04
2021-03-01 0.03
2021-04-01 0.02
2021-05-01 0.02
The TB3MS serie is given in percentage and in annual rate. I divide it by 100 and 12 to get a monthly simple rate since I am using monthly rates for the stocks:
= Tbills / 100 / 12 rfrate
Now I get the continuously compounded return from the simple return:
= np.log(1+rfrate) rfrate
8 RUNNNG THE CAPM FOR THE INDEX1 PORTFOLIO
import statsmodels.formula.api as smf
# I estimate the OLS regression model:
= smf.ols('INDEX1_Premr ~ SP500_Premr',data=returns).fit()
mkmodel # I display the summary of the regression:
print(mkmodel.summary())
OLS Regression Results
==============================================================================
Dep. Variable: INDEX1_Premr R-squared: 0.620
Model: OLS Adj. R-squared: 0.611
Method: Least Squares F-statistic: 67.02
Date: jue., 29 ago. 2024 Prob (F-statistic): 3.68e-10
Time: 20:08:29 Log-Likelihood: 60.751
No. Observations: 43 AIC: -117.5
Df Residuals: 41 BIC: -114.0
Df Model: 1
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 0.0164 0.009 1.761 0.086 -0.002 0.035
SP500_Premr 1.5916 0.194 8.186 0.000 1.199 1.984
==============================================================================
Omnibus: 1.837 Durbin-Watson: 1.991
Prob(Omnibus): 0.399 Jarque-Bera (JB): 0.942
Skew: 0.076 Prob(JB): 0.624
Kurtosis: 3.709 Cond. No. 21.1
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.