Portfolio Index, Econometric Models

Author

Alberto Dorantes

Published

August 29, 2024

Abstract
This is an example to create an index for any portfolio of stocks. It also illustrate how to create premium returns and prepare a dataset before running a regression model

1 Download prices

I load the libraries to collect, process and visualize stock data from Yahoo Finance:

import yfinance as yf
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

I download monthly stock prices for several firms and the S&P500 for at least 3 years

data = yf.download('NVDA MSFT GOOG AMZN ^GSPC', start='2020-12-01', end='2024-07-31', interval='1mo')

[                       0%%                      ]
[*******************   40%%                      ]  2 of 5 completed
[**********************60%%***                   ]  3 of 5 completed
[**********************80%%************          ]  4 of 5 completed
[*********************100%%**********************]  5 of 5 completed

I select only adjusted prices:

adjprices = data['Adj Close']
adjprices.head()  
                  AMZN        GOOG        MSFT       NVDA        ^GSPC
Date                                                                  
2020-12-01  162.846497   87.494438  215.425507  13.020117  3756.070068
2021-01-01  160.309998   91.682671  224.665512  12.958867  3714.239990
2021-02-01  154.646500  101.727242  225.072327  13.681893  3811.149902
2021-03-01  154.703995  103.313934  228.881668  13.316516  3972.889893
2021-04-01  173.371002  120.369019  244.812225  14.978984  4181.169922

The first 4 are stock prices. The 5th column is the S&P500 index. For now I will use only the stock prices to create the portfolio index.

2 Creating an index for an equally-weighted portfolio:

I calculate the holing period returns (HPR) for each price. The HPR is the cumulative return for a stock over time, asumming that we hold the investment for all periods.

HPR = adjprices / adjprices.iloc[0] - 1
# adjprices.iloc[0] refers to the FIRST ROW of the adjprices dataset, which is the FIRST MONTH OF DATA

I create the equally-weighted portfolio:

# I create a dataset with the growth factors for each stock.
# I assume an equally-weighted portfolio. For 4 stocks I assign 25% each: 
GROWTHFACTOR1 = 1 + (0.25* HPR.iloc[:,0] + 0.25* HPR.iloc[:,1] + 0.25* HPR.iloc[:,2] + 0.25* HPR.iloc[:,3])
# I create the Portfolio Index starting in 100: 
INDEXPORT1 = 100 * GROWTHFACTOR1 
# I show the first rows:
INDEXPORT1.head()
Date
2020-12-01    100.000000
2021-01-01    101.762005
2021-02-01    105.198109
2021-03-01    105.400813
2021-04-01    118.180587
dtype: float64
type(INDEXPORT1)
<class 'pandas.core.series.Series'>

INDEXPORT1 is a Series object since it has only 1 column. I convert it to a data frame

I have monthly returns from Dec 2020:

INDEXPORT1 = INDEXPORT1.to_frame()
# I name the column as INDEXPORT1:
INDEXPORT1.columns = ['INDEXPORT1']
INDEXPORT1.head()
            INDEXPORT1
Date                  
2020-12-01  100.000000
2021-01-01  101.762005
2021-02-01  105.198109
2021-03-01  105.400813
2021-04-01  118.180587

I can visualize the Index over time:

plt.plot(INDEXPORT1['INDEXPORT1'])
plt.title('AI Portfolio Index')
plt.show()

3 Adding the INDEXPORT to the price dataset

Now I merge the INDEXPORT1 with the adjprices data frame before I calculate returns:

# To add the INDEXPORT1 as a column to adjprices, I join the INDEXPORT into the adjprices: 
adjprices = adjprices.join(INDEXPORT1)
adjprices.head()  
                  AMZN        GOOG  ...        ^GSPC  INDEXPORT1
Date                                ...                         
2020-12-01  162.846497   87.494438  ...  3756.070068  100.000000
2021-01-01  160.309998   91.682671  ...  3714.239990  101.762005
2021-02-01  154.646500  101.727242  ...  3811.149902  105.198109
2021-03-01  154.703995  103.313934  ...  3972.889893  105.400813
2021-04-01  173.371002  120.369019  ...  4181.169922  118.180587

[5 rows x 6 columns]

4 Calculate returns for all prices and the Index

# I create returns as the first difference of log prices
returns = np.log(adjprices) - np.log(adjprices.shift(1))
# I drop the first row that has null values:
returns = returns.dropna()
returns.head()
                AMZN      GOOG      MSFT      NVDA     ^GSPC  INDEXPORT1
Date                                                                    
2021-01-01 -0.015699  0.046758  0.041998 -0.004715 -0.011199    0.017467
2021-02-01 -0.035968  0.103962  0.001809  0.054293  0.025757    0.033209
2021-03-01  0.000372  0.015477  0.016783 -0.027068  0.041563    0.001925
2021-04-01  0.113920  0.152790  0.067286  0.117643  0.051097    0.114443
2021-05-01 -0.073076  0.000597 -0.009963  0.079071  0.005471    0.001943

5 Download the risk-free rate

I download the Treasury Bills rate of the US:

import pandas_datareader.data as pdr
import datetime
# I define start as the month Jan 2020
start = datetime.datetime(2021,1,1)
# I define the end month as July 2024
end = datetime.datetime(2024,7,1)
Tbills = pdr.DataReader('TB3MS','fred',start,end)

We see the content of Tbills:

Tbills.head()
            TB3MS
DATE             
2021-01-01   0.08
2021-02-01   0.04
2021-03-01   0.03
2021-04-01   0.02
2021-05-01   0.02

The TB3MS serie is given in percentage and in annual rate. I divide it by 100 and 12 to get a monthly simple rate since I am using monthly rates for the stocks:

rfrate = Tbills / 100 / 12

Now I get the continuously compounded return from the simple return:

rfrate = np.log(1+rfrate)

6 Adding the premium returns to the price dataset

I first change the index as date (with no time) to keep the same index as the rfrate data frame:

# I
returns.index = returns.index.date
returns.head()
                AMZN      GOOG      MSFT      NVDA     ^GSPC  INDEXPORT1
2021-01-01 -0.015699  0.046758  0.041998 -0.004715 -0.011199    0.017467
2021-02-01 -0.035968  0.103962  0.001809  0.054293  0.025757    0.033209
2021-03-01  0.000372  0.015477  0.016783 -0.027068  0.041563    0.001925
2021-04-01  0.113920  0.152790  0.067286  0.117643  0.051097    0.114443
2021-05-01 -0.073076  0.000597 -0.009963  0.079071  0.005471    0.001943

I add the risk-free rate to the returns dataset:

returns = returns.join(rfrate)
returns.head()
                AMZN      GOOG      MSFT  ...     ^GSPC  INDEXPORT1     TB3MS
2021-01-01 -0.015699  0.046758  0.041998  ... -0.011199    0.017467  0.000067
2021-02-01 -0.035968  0.103962  0.001809  ...  0.025757    0.033209  0.000033
2021-03-01  0.000372  0.015477  0.016783  ...  0.041563    0.001925  0.000025
2021-04-01  0.113920  0.152790  0.067286  ...  0.051097    0.114443  0.000017
2021-05-01 -0.073076  0.000597 -0.009963  ...  0.005471    0.001943  0.000017

[5 rows x 7 columns]

I add the premium returns to the returns dataset:

returns['INDEX1_Premr'] = returns['INDEXPORT1'] - returns['TB3MS']
returns['SP500_Premr'] = returns['^GSPC'] - returns['TB3MS']
returns.head()
                AMZN      GOOG      MSFT  ...     TB3MS  INDEX1_Premr  SP500_Premr
2021-01-01 -0.015699  0.046758  0.041998  ...  0.000067      0.017400    -0.011266
2021-02-01 -0.035968  0.103962  0.001809  ...  0.000033      0.033175     0.025724
2021-03-01  0.000372  0.015477  0.016783  ...  0.000025      0.001900     0.041538
2021-04-01  0.113920  0.152790  0.067286  ...  0.000017      0.114427     0.051081
2021-05-01 -0.073076  0.000597 -0.009963  ...  0.000017      0.001926     0.005455

[5 rows x 9 columns]

7 Visualize relationship between premium returns

I do a scatter plot putting the S&P500 premium returns as the independent variable (X) and the Index1 premium return as the dependent variable (Y). I also add the regression line to the plot:

import seaborn as sb
plt.clf()
x = returns['SP500_Premr']
y = returns['INDEX1_Premr']
# I plot the (x,y) values along with the regression line that fits the data:
sb.regplot(x=x,y=y)
plt.xlabel('Market Premium returns')
plt.ylabel('INDEX Premium returns') 
plt.xticks(np.arange(-0.3,0.4,0.1))
([<matplotlib.axis.XTick object at 0x000002016E569AE0>, <matplotlib.axis.XTick object at 0x000002016E569F90>, <matplotlib.axis.XTick object at 0x000002016E54F8E0>, <matplotlib.axis.XTick object at 0x000002016E5D40A0>, <matplotlib.axis.XTick object at 0x000002016E593880>, <matplotlib.axis.XTick object at 0x000002016E593430>, <matplotlib.axis.XTick object at 0x000002016E5937F0>], [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
plt.show()

8 RUNNNG THE CAPM FOR THE INDEX1 PORTFOLIO

import statsmodels.formula.api as smf

# I estimate the OLS regression model:
mkmodel = smf.ols('INDEX1_Premr ~ SP500_Premr',data=returns).fit()
# I display the summary of the regression: 
print(mkmodel.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:           INDEX1_Premr   R-squared:                       0.620
Model:                            OLS   Adj. R-squared:                  0.611
Method:                 Least Squares   F-statistic:                     67.02
Date:              jue., 29 ago. 2024   Prob (F-statistic):           3.68e-10
Time:                        20:08:29   Log-Likelihood:                 60.751
No. Observations:                  43   AIC:                            -117.5
Df Residuals:                      41   BIC:                            -114.0
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept       0.0164      0.009      1.761      0.086      -0.002       0.035
SP500_Premr     1.5916      0.194      8.186      0.000       1.199       1.984
==============================================================================
Omnibus:                        1.837   Durbin-Watson:                   1.991
Prob(Omnibus):                  0.399   Jarque-Bera (JB):                0.942
Skew:                           0.076   Prob(JB):                        0.624
Kurtosis:                       3.709   Cond. No.                         21.1
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.