# yfinance downloads data from Yahoo Finance
import yfinance as yf
# numpy is used to do numeric calculations
import numpy as np
# pandas is used for data management
import pandas as pd
# matplotlib is used for graphs
import matplotlib
import matplotlib.pyplot as pltWorkshop 4, Business Analytics for Decision Making
1 Workshop Directions
For this Workshop you have to read the following chapters of my e-book:
In your Google Colab Notebook you have to replicate any Python code and do all CHALLENGES of this Workshop.
2 Introduction to Linear Regression
Up to know we have learn about
Descriptive Statistics
The Histogram
The Central Limit Theorem
Hypothesis Testing
Covariance and Correlation
Without the idea of summarizing data with descriptive statistics, we cannot conceive the histogram. Without the idea of the histogram we cannot conceive the CLT, and without the CLT we cannot make inferences for hypothesis testing. We can apply hypothesis testing to test claims about random variables. These random variables can be one mean, difference of 2 means, correlation, and also coefficients of the linear regression model. But what is the linear regression model?
We learned that covariance and correlation are measures of linear relationship between 2 random variables, X and Y. The simple regression model also measures the linear relationship between 2 random variables (X and Y), but the difference is that X is supposed to explain the movements Y, so Y depends on the movement of X, the independent variable. In addition, the regression model estimates a linear equation (regression line) to represent how much Y (on average) moves with movements of X, and what is the expected value of Y when X=0.
The simple linear regression model is used to understand the linear relationship between two variables assuming that one variable, the independent variable (IV), can be used as a predictor of the other variable, the dependent variable (DV).
Besides using linear regression models to better understand how the dependent variable moves or changes according to changes in the independent variable, linear regression models are also used for prediction or forecasting of the dependent variable.
The simple regression model considers only one independent variable, while the multiple regression model can include more than one independent variables. But both models only consider one dependent variable. Then, we can use regression models for:
• Understanding the relationship between a dependent variable and a one or more independent variables - also called explanatory variables
• Predicting or estimating the expected value of the dependent variable according to specific value of the independent variables
3 Application: The Market Regression Model
The Market Model (also named Single-index model) in Finance states that the expected return of a stock is given by its alpha coefficient (b0) plus its market beta coefficient (b1) multiplied times the market return. In mathematical terms:
E[R_i] = α + β(R_M)
We can express the same equation using BO as alpha, and B1 as market beta:
E[R_i] = β_0 + β_1(R_M)
We can estimate the alpha and market beta coefficient by running a simple linear regression model specifying that the market return is the independent variable and the stock return is the dependent variable. It is strongly recommended to use continuously compounded returns instead of simple returns to estimate the market regression model.
The market regression model can be expressed as:
r_{(i,t)} = b_0 + b_1*r_{(M,t)} + ε_t
Where:
ε_t is the error at time t. Thanks to the Central Limit Theorem, this error behaves like a Normal distributed random variable ∼ N(0, σ_ε); the error term ε_t is expected to have mean=0 and a specific standard deviation σ_ε (also called volatility).
r_{(i,t)} is the return of the stock i at time t.
r_{(M,t)} is the market return at time t
b_0 and b_1 are called regression coefficients
Now it’s time to use real data to better understand this model. Download monthly prices for Alfa (ALFAA.MX) and the Mexican market index IPCyC (^MXX) from Yahoo from January 2018 to Jan 2023.
3.1 Data collection
We first load the yfinance package and download monthly price data for Alfa and the Mexican market index.
Import the Python libraries
# Download a dataset with prices for Alfa and the Mexican IPyC:
data = yf.download("ALFAA.MX, ^MXX", start="2018-01-01", end="2023-01-31", interval='1mo')YF.download() has changed argument auto_adjust default to True
[ 0% ]
[*********************100%***********************] 2 of 2 completed
# I create another dataset with the Adjusted Closing price of both instruments:
adjprices = data['Close']3.2 Return calculation
We calculate continuously returns for both, Alfa and the IPCyC:
returns = (np.log(adjprices) - np.log(adjprices.shift(1))).dropna()
returns.columns=['ALFA','MXX']3.3 Visualize the relationship
Do a scatter plot putting the IPCyC returns as the independent variable (X) and the stock return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns.Type:
import seaborn as sb
plt.clf()
x = returns['MXX']
y = returns['ALFA']
# I plot the (x,y) values along with the regression line that fits the data:
sb.regplot(x=x,y=y)
plt.xlabel('Market returns')
plt.ylabel('Alfa returns')
plt.show()Sometimes graphs can be deceiving. In this case, the range of X axis and Y axis are different, so it is better to do a graph where we can make both X and Y ranges with equal distance. Type:
plt.clf()
sb.regplot(x=x,y=y)
# I adjust the scale of the X axis so that the magnitude of each unit of X is equal to that of the Y axis
plt.xticks(np.arange(-1,1,0.2))([<matplotlib.axis.XTick object at 0x000001DA08CFB9D0>, <matplotlib.axis.XTick object at 0x000001DA08CFB250>, <matplotlib.axis.XTick object at 0x000001DA08CD5E50>, <matplotlib.axis.XTick object at 0x000001DA08D4AFD0>, <matplotlib.axis.XTick object at 0x000001DA08D4B750>, <matplotlib.axis.XTick object at 0x000001DA08D4BED0>, <matplotlib.axis.XTick object at 0x000001DA08D80690>, <matplotlib.axis.XTick object at 0x000001DA08D80E10>, <matplotlib.axis.XTick object at 0x000001DA08D81590>, <matplotlib.axis.XTick object at 0x000001DA08D81D10>], [Text(-1.0, 0, '−1.0'), Text(-0.8, 0, '−0.8'), Text(-0.6000000000000001, 0, '−0.6'), Text(-0.40000000000000013, 0, '−0.4'), Text(-0.20000000000000018, 0, '−0.2'), Text(-2.220446049250313e-16, 0, '0.0'), Text(0.19999999999999973, 0, '0.2'), Text(0.3999999999999997, 0, '0.4'), Text(0.5999999999999996, 0, '0.6'), Text(0.7999999999999996, 0, '0.8')])
# I label the axis:
plt.xlabel('Market returns')
plt.ylabel('Alfa returns')
plt.show()3.3.1 CHALLENGE: WHAT DOES THE PLOT TELL YOU? BRIEFLY RESPOND**
3.4 RUNNING THE MARKET REGRESSION MODEL
The OLS function from the satsmodel package is used to estimate a regression model. We run a simple regression model to see how the monthly returns of the stock are related with the market return.
The first parameter of the OLS function is the DEPENDENT VARIABLE (in this case, the stock return), and the second parameter must be the INDEPENDENT VARIABLE, also named the EXPLANATORY VARIABLE (in this case, the market return).
Before we run the OLS function, we need to add a column of 1’s to the X vector in order to estimate the beta0 coefficient (the constant).
What you will get is called The Single-Index Model. You are trying to examine how the market returns can explain stock returns.
Run the Single-Index model (Y=stock return, the X=market return). You can use the function OLS from the statsmodels.api library:
import statsmodels.api as sm
# I add a column of 1's to the X dataframe in order to include the beta0 coefficient (intercept) in the model:
X = sm.add_constant(x)
# I estimate the OLS regression model:
mkmodel = sm.OLS(y,X).fit()
# I display the summary of the regression:
print(mkmodel.summary()) OLS Regression Results
==============================================================================
Dep. Variable: ALFA R-squared: 0.345
Model: OLS Adj. R-squared: 0.334
Method: Least Squares F-statistic: 30.53
Date: vie., 16 ene. 2026 Prob (F-statistic): 8.15e-07
Time: 18:15:42 Log-Likelihood: 49.515
No. Observations: 60 AIC: -95.03
Df Residuals: 58 BIC: -90.84
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.0106 0.014 -0.762 0.449 -0.038 0.017
MXX 1.3823 0.250 5.525 0.000 0.881 1.883
==============================================================================
Omnibus: 13.385 Durbin-Watson: 2.213
Prob(Omnibus): 0.001 Jarque-Bera (JB): 47.953
Skew: -0.206 Prob(JB): 3.86e-11
Kurtosis: 7.360 Cond. No. 18.0
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
ANOTHER EASIER WAY TO RUN THE REGRESSION MODEL:
import statsmodels.formula.api as smf
# I estimate the OLS regression model:
mkmodel2 = smf.ols('ALFA ~ MXX',data=returns).fit()
# I display the summary of the regression:
print(mkmodel2.summary()) OLS Regression Results
==============================================================================
Dep. Variable: ALFA R-squared: 0.345
Model: OLS Adj. R-squared: 0.334
Method: Least Squares F-statistic: 30.53
Date: vie., 16 ene. 2026 Prob (F-statistic): 8.15e-07
Time: 18:15:42 Log-Likelihood: 49.515
No. Observations: 60 AIC: -95.03
Df Residuals: 58 BIC: -90.84
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -0.0106 0.014 -0.762 0.449 -0.038 0.017
MXX 1.3823 0.250 5.525 0.000 0.881 1.883
==============================================================================
Omnibus: 13.385 Durbin-Watson: 2.213
Prob(Omnibus): 0.001 Jarque-Bera (JB): 47.953
Skew: -0.206 Prob(JB): 3.86e-11
Kurtosis: 7.360 Cond. No. 18.0
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The regression output shows a lot of information about the relationship between the X (independent) and the Y (dependent) variables.
For now we can focus on the table of 2 rows. The first row (Intercept) shows the information of the beta0 coefficient, which is the intercept of the regression equation, also known as constant.
The second row (MXX) shows the information of the beta1 coefficient, which represents the slope of the regression line. In this example, since the X variable is the market return and the Y variable is the stock return, beta1 can be interpreted as the sensitivity or market risk of the stock.
For each beta coefficient, the following is calculated and shown:
coef : this is the average value of the beta coefficient
std err : this is the standard error of the coeffcient, which is the standard deviation of the beta coefficient.
t : this is the t-Statistics of the following Hypothesis test:
H0: beta = 0; Ha: beta <> 0;
P>|t| : is the p-value of the above hypothesis test; if it is a value < 0.05, we can say that the beta coefficient is SIGNIFICANTLY different than ZERO with a 95% confidence.
** [0.025 0.975]** : This is the 95% Confidence Interval of the beta coefficient. This shows the possible values that the beta coefficient can take in the future with 95% probability.
How the t-Statistic, p-value and the 95% C.I. are related?
INTERPRETATION OF THE REGRESSION OUTPUT
IN A SIMPLE REGRESSION MODEL, BETA0 (THE INTERCEPT WHERE THE LINE CROSSES THE Y AXIS), AND BETA1 (THE INCLINATION OR SLOPE OF THE LINE) ARE ESTIMATED.
THE REGRESSION MODEL FINDS THE BEST LINE THAT BETTER REPRESENTS ALL THE POINTS. THE BETA0 AND BETA1 COEFFICIENTS TOGETHER DEFINE THE REGRESSION LINE.
THE REGRESSION EQUATION.
ACCORDING TO THE REGRESSION OUTPUT, THE REGRESSION EQUATION THAT EXPLAINS THE RETURN OF ALFA BASED ON THE IPC’S RETURN IS:
E[ALFAret]= b0 + b1(MXXret)
E[ALFAret]= -0.0106 + 1.3823(MXXret)
THE REGRESSION MODEL AUTOMATICALLY PERFORMS ONE HYPOTHESIS TEST FOR EACH COEFFICIENT. IN THIS CASE WE HAVE 2 BETA COEFFICIENTS, SO 2 HYPOTHESIS TESTS ARE DONE. YOU CAN SEE THAT IN THE COEFFICIENTS TABLE IN THE OUTPUT.
WE START LOOKING AT THE TABLE OF COEFFICIENTS. WHERE IT SAYS (Intercept), YOU CAN SEE THE RESULT OF THE HYPOTHESIS TESTING FOR BETA0. WHERE IT SAYS THE NAME OF THE INDEPENDENT VARIABLE, IN THIS CASE, THE MARKET RETURN (MXX), YOU CAN SEE THE RESULT FOR THE BETA1 OF THE STOCK.
THE HYPOTHESIS TEST FOR BETA0 IS THE FOLLOWING:
H0: BETA0=0; THIS MEANS THAT THE INTERCEPT OF THE LINE (THE POINT WHERE THE LINE CROSSES THE Y AXIS) HAS AN AVERAGE OF ZERO. IN THE CONTEXT OF THE MARKET MODEL THIS MEANS THAT THE ALFA STOCK DOES NOT OFFER SIGNIFICANTLY LOWER NOR HIGHER RETURNS THAN THE MARKET.
HA: BETA0 <>0; THIS MEANS THAT THE INTERCEPT IS SIGNIFICANTLY DIFFERENT THAN ZERO; IN OTHER WORDS, ALFA OFFERS RETURNS ABOVE (OR BELOW) THE MARKET.
ABOUT STANDARD ERROR, T-VALUE AND P-VALUE OF THE HYPOTHESIS TESTS:
ACCORDING TO THE CENTRAL LIMIT THEOREM, SINCE THE BETA0 CAN BE EXPRESSED AS A LINEAR COMBINATION OF THE STOCK AND THE MARKET RETURN, BETA0 WILL HAVE A DISTRIBUTION SIMILAR TO A NORMAL DISTRIBUTION WITH ITS MEAN AND STANDARD DEVIATION EQUAL TO THE STANDARD ERROR.
IN OTHER WORDS, BETA0 WILL MOVE IN THE FUTURE, AND THE MEAN VALUE WILL BE ABOUT -0.0106, AND IT WILL VARY ON AVERAGE ABOUT 0.0139, WHICH IS THE STANDARD DEVIATION OR STANDARD ERROR OF BETA0.
WHAT DOES THIS MEAN? THIS MEAN THAT IF WE COULD TRAVEL INTO THE FUTURE AND COLLECT A NEW SAMPLE FOR EACH FUTURE MONTH, WE CAN ESTIMATE ONE BETA0 FOR EACH SAMPLE, SO WE COULD IMAGINE MAY BETA0’s THAT WILL CHANGE, BUT ALL THESE VALUES WILL BE AROUND ITS CURRENT MEAN.
IF WE COULD TRAVEL TO THE FUTURE, COLLECT THESE SAMPLES, AND FOR EACH SAMPLE CALCULATE A BETA0, THE HISTOGRAM WITH THESE BETA0’s WILL LOOK LIKE:
ACCORDING TO THIS HISTOGRAM, THE AVERAGE MIGHT BE BETWEEN -0.01 AND -0.005 SINCE IT IS THE RANGE OF BETA0 VALUES THAT APPEARS MORE OFTEN (IT HAS THE HIGHEST BAR). IF WE ADD AND SUBTRACT ABOUT 2 TIMES 0.014 (THE STANDARD ERROR OF BETA0) FROM THE MIDPOINT -0.01, WE COVER ABOUT 95% OF THE DIFFERENT VALUES OF BETA0!
THE ESTIMATION FOR BETA0 IS -0.0106. THIS IS THE MEAN FOR BETA0. SINCE REALITY ALWAYS CHANGE, BETA0 MIGHT CHANGE IN THE FUTURE. HOW MUCH IT CAN CHANGE? THAT IS GIVEN BY ITS STANDARD DEVIATION, WHICH IS CALLED STANDARD ERROR. AND THANKS TO THE CENTRAL LIMIT THEOREM, BETA0 WILL BEHAVE LIKE A NORMAL DISTRIBUTED VARIABLE.
IN THIS CASE, THE STANDARD ERROR OF BETA0 IS 0.0139. THIS MEANS THAT IN THE FUTURE BETA0 WILL HAVE A MEAN OF -0.0106, AND ABOUT 68% OF THE TIME WILL VARY ONE STANDARD DEVIATION LESS THAN ITS MEAN AND 1 STANDARD DEVIATION ABOVE ITS MEAN. IN ADDITION, WE CAN SAY THAT 95% OF THE TIME BETA0 WILL MOVE BETWEEN -2 STANDARD DEVIATIONS AND + 2 STANDARD DEVIATIONS FROM -0.0106.
THIS IS THE MEAN FOR BETA0. SINCE REALITY ALWAYS CHANGE, BETA0 MIGHT CHANGE IN THE FUTURE. HOW MUCH IT CAN CHANGE? THAT IS GIVEN BY ITS STANDARD DEVIATION, WHICH IS CALLED STANDARD ERROR. AND THANKS TO THE CENTRAL LIMIT THEOREM, BETA0 WILL BEHAVE LIKE A NORMAL DISTRIBUTED VARIABLE.
FOLLOWING THE HYPOTHESIS TEST METHOD, WE CALCULATE THE CORRESPONDING t-value OF THIS HYPOTHESIS AS FOLLOWS:
t=\frac{(B_{0}-0)}{SD(B_{0})}
THEN, t = (-0.0106 - 0 ) / 0.0139 = -0.7622. THIS VALUE IS AUTOMATICALLY CALCULATED IN THE REGRESSION OUTPUT IN THE COEFFICIENTS TABLE IN THE ROW (intercept)!
REMEMBER THAT t-value IS THE DISTANCE BETWEEN THE HYPOTHETICAL VALUE OF THE VARIABLE OF ANALYSIS (IN THIS CASE, B_0=-0.0106) AND ITS HYPOTHETICAL VALUE, WHICH IS ZERO. BUT THIS DISTANCE IS MEASURED IN STANDARD DEVIATIONS OF THE VARIABLE OF ANALYSIS. REMEMBER THAT THE STANDARD ERROR OF THE VARIABLE OF ANALYSIS IS CALLED STANDARD ERROR (IN THIS CASE, THE STD.ERROR OF B_0 = 0.0139).
SINCE THE ABSOLUTE VALUE OF THE t-value OF B_0 IS LESS THAN 2, THEN WE CANNOT REJECT THE NULL HYPOTHESIS. IN OTHER WORDS, WE CAN SAY THAT B_0 IS NOT SIGNIFICANTLY LESS THAN ZERO (AT THE 95% CONFIDENCE LEVEL).
THE HYPOTHESIS TEST FOR BETA1 IS THE FOLLOWING:
H0: B_1 = 0 (THERE IS NO RELATIONSHIP BETWEEN THE MARKET AND THE STOCK RETURN)
Ha: B_1 > 0 (THERE IS A POSITIVE RELATIONSHIP BETWEEN THE THE MARKET AND THE STOCK RETURN)
IN THIS HYPOTHESIS, THE VARIABLE OF ANALYSIS IS BETA1 (B_1).
FOLLOWING THE HYPOTHESIS TEST METHOD, WE CALCULATE THE CORRESPONDING t-value OF THIS HYPOTHESIS AS FOLLOWS:
t=\frac{(B_{1}-0)}{SD(B_{1})}
THEN, t = (1.3823 - 0 ) / 0.2502 = 5.525. THIS VALUE IS AUTOMATICALLY CALCULATED IN THE REGRESSION OUTPUT IN THE COEFFICIENTS TABLE IN THE SECOND ROW OF THE COEFFICIENT TABLE.
REMEMBER THAT t-value IS THE DISTANCE BETWEEN THE HYPOTHETICAL VALUE OF THE VARIABLE OF ANALYSIS (IN THIS CASE, B_1=1.3823) AND ITS HYPOTHETICAL VALUE, WHICH IS ZERO. BUT THIS DISTANCE IS MEASURED IN STANDARD DEVIATIONS OF THE VARIABLE OF ANALYSIS. REMEMBER THAT THE STANDARD ERROR OF THE VARIABLE OF ANALYSIS IS CALLED STANDARD ERROR (IN THIS CASE, THE STD.ERROR OF B_1 = 0.2502).
THE ESTIMATION FOR BETA1 IS 1.3823. THIS IS THE MEAN FOR BETA1. SINCE REALITY ALWAYS CHANGE, BETA1 MIGHT CHANGE IN THE FUTURE. HOW MUCH IT CAN CHANGE? THAT IS GIVEN BY ITS STANDARD DEVIATION, WHICH IS CALLED STANDARD ERROR OF BETA1. THANKS TO THE CENTRAL LIMIT THEREFORE WE CAN MAKE SURE THAT BETA1 WILL MOVE LIKE A NORMAL DISTRIBUTED VARIABLE IN THE FUTURE WITH THE MEAN AND STANDARD DEVIATIONS (STANDARD ERROR) CALCULATED IN THE REGRESSION OUTPUT.
WE CAN SAY THAT 95% OF THE TIME BETA1 WILL MOVE BETWEEN -2 STANDARD DEVIATIONS AND + 2 STANDARD DEVIATIONS FROM 1.3823.
SINCE THE ABSOLUTE VALUE OF THE t-value OF B_1 IS MUCH GREATER THAN 2, THEN WE HAVE ENOUGH STATISTICAL EVIDENCE AT THE 95% CONFIDENCE TO SAY THAT WE REJECT THE NULL HYPOTHESIS. IN OTHER WORDS, WE CAN SAY THAT B_1 IS SIGNIFICANTLY GREATER THAN ZERO. WE CAN ALSO SAY THAT WE HAVE ENOUGH STATISTICAL EVIDENCE TO SAY THAT THERE IS A POSITIVE RELATIONSHIP BETWEEN THE STOCK AND THE MARKET RETURN.
3.4.0.1 MORE ABOUT THE INTERPRETATION OF THE BETA COEFFICIENTS AND THEIR t-values AND p-values
THEN, IN THIS OUTPUT WE SEE THAT B_0 = -0.0106, AND B_1 = 1.3823. WE CAN ALSO SEE THE STANDARD ERROR, t-value AND p-value OF BOTH B_0 AND B_1.
B_0 ON AVERAGE IS NEGATIVE, BUT IT IS NOT SIGNIFICANTLY NEGATIVE (AT THE 95% CONFIDENCE) SINCE ITS p-value>0.05 AND ITS ABSOLUTE VALUE OF t-value<2. THEN I CAN SAY THAT IT SEEMS THAT ALFA RETURN ON AVERAGE UNDERPERFORMS THE MARKET RETURN BY -1.0613% (SINCE B_0 = -0.0106). IN OTHER WORDS, THE EXPECTED RETURN OF ALFA IF THE MARKET RETURN IS ZERO IS NEGATIVE. HOWEVER, THIS IS NOT SIGNIFICANTLY LESS THAN ZERO SINCE ITS p-value>0.05! THEN, I DO NOT HAVE STATISTICAL EVIDENCE AT THE 95% CONFIDENCE LEVEL TO SAY THAT ALFA UNDERPERFORMS THE MARKET.
B_1 IS +1.3823 (ON AVERAGE). SINCE ITS p-value<0.05 I CAN SAY THAT B_1 IS SIGNFICANTLY GREATER THAN ZERO (AT THE 95% CONFIDENCE INTERVAL). IN OTHER WORDS, I HAVE STRONG STATISTICAL EVIDENCE TO SAY THAT ALFA RETURN IS POSITIVELY RELATED TO THE MARKET RETURN SINCE ITS B_1 IS SIGNIFICANTLY GREATER THAN ZERO.
INTERPRETING THE MAGNITUDE OF B_1, WE CAN SAY THAT IF THE MARKET RETURN INCREASES BY +1%, I SHOULD EXPECT THAT, ON AVERAGE,THE RETURN OF ALFA WILL INCREASE BY 1.3823%. THE SAME HAPPENS IF THE MARKET RETURN LOSSES 1%, THEN IT IS EXPECTED THAT ALFA RETURN, ON AVERAGE, LOSSES ABOUT 1.3823%. THEN, ON AVERAGE IT SEEMS THAT ALFA IS RISKIER THAN THE MARKET (ON AVERAGE). BUT WE NEED TO CHECK WHETHER IT IS SIGNIFICANTLY RISKIER THAN THE MARKET.
AN IMPORTANT ANALYSIS OF B_1 IS TO CHECK WHETHER B_1 IS SIGNIFICANTLY MORE RISKY OR LESS RISKY THAN THE MARKET. IN OTHER WORDS, IT IS IMPORTANT TO CHECK WHETHER B_1 IS LESS THAN 1 OR GREATER THAN 1. TO DO THIS CAN DO ANOTHER HYPOTHESIS TEST TO CHECK WHETHER B_1 IS SIGNIFICANTLY GREATER THAN 1!
WE CAN DO THE FOLLOWING HYPOTHESIS TEST TO CHECK WHETHER ALFA IS RISKIER THAN THE MARKET:
H0: B_1 = 1 (ALFA IS EQUALLY RISKY THAN THE MARKET)
Ha: B_1 > 1 (ALFA IS RISKIER THAN THE MARKET)
IN THIS HYPOTHESIS, THE VARIABLE OF ANALYSIS IS BETA1 (B_1).
FOLLOWING THE HYPOTHESIS TEST METHOD, WE CALCULATE THE CORRESPONDING t-value OF THIS HYPOTHESIS AS FOLLOWS:
t=\frac{(B_{1}-1)}{SD(B_{1})}
THEN, t = (1.3823 - 1 ) / 0.2502 = 1.5281. THIS VALUE IS NOT AUTOMATICALLY CALCULATED IN THE REGRESSION OUTPUT.
SINCE t-value > 2, THEN WE CAN SAY THAT WE HAVE SIGNIFICANT EVIDENCE TO REJECT THE NULL HYPOTHESIS. IN OTHER WORDS, WE CAN SAY THAT ALFA IS SIGNIFICANTLY RISKIER THAN THE MARKET (AT THE 95% CONFIDENCE LEVEL)
3.4.1 95% CONFIDENCE INTERVAL OF THE BETA COEFFICIENTS
WE CAN USE THE 95% CONFIDENCE INTERVAL OF BETA COEFFICIENTS AS AN ALTERNATIVE TO MAKE CONCLUSIONS ABOUT B_0 AND B_1 (INSTEAD OF USING t-values AND p-values).
THE 95% CONFINDENCE INTERVALS FOR BOTH BETAS ARE DISPLAYED IN THE REGRESSION OUTPUT
THE FIRST ROW SHOWS THE 95% CONFIDENCE INTERVAL FOR B_0, AND THE SECOND ROW SHOWS THE CONFIDENCE INTERVAL OF B_1. WE CAN SEE THAT THESE VALUES ARE VERY SIMILAR TO THE “ROUGH” ESTIMATE USING t-critical-value = 2. THE EXACT CRITICAL t-value DEPENDS ON THE # OF OBSERVATIONS OF THE SAMPLE.
HOW WE INTERPRET THE 95% CONFIDENCE INTERVAL FOR B_0?
IN THE NEAR FUTURE, B_0 CAN HAVE A VALUE BETWEEN -0.0385 AND 0.0173 95% OF THE TIME. IN OTHER WORDS B_0 CAN MOVE FROM A NEGATIVE VALUE TO ZERO TO A POSITIVE VALUE. THEN, WE CANNOT SAY THAT 95% OF THE TIME, B_0 WILL BE NEGATIVE. IN OTHER WORDS, WE CONCLUDE THAT B_0 IS NOT SIGNIFICANTLY NEGATIVE AT THE 95% CONFIDENCE LEVEL.
HOW OFTEN B_0 WILL BE NEGATIVE? LOOKING AT THE 95% CONFIDENCE INTERVAL, B_0 WILL BE NEGATIVE AROUND MORE THAN 50% OF THE TIME. BEING MORE SPECIFIC, WE CALCULATE THIS BY SUBTRACTING THE p-value FROM 1: (1-pvalue). IN THIS CASE, THE P-VALUE= 0.449. THEN 55.0979% OF THE TIME B_0 WILL BE NEGATIVE!
HOW WE INTERPRET THE 95% CONFIDENCE INTERVAL FOR B_1?
IN THE NEAR FUTURE, B_1 CAN MOVE BETWEEN 0.8815 AND 1.8831 95% OF THE TIME. IN OTHER WORDS, B_1 CAN HAVE A VALUE GREATER THAN 1 AT LEAST 95% OF THE TIME. THEN, WE CAN SAY THAT B_1 IS SIGNIFICANTLY POSITIVE AND GREATER THAN 1. IN OTHER WORDS, ALFA IS SIGNIFICANTLY RISKIER THAN THE MARKET SINCE ITS B_1>1 AT LEAST 95% OF THE TIME.
3.5 OPTIONAL EXERCISE: Estimate moving betas for the market regression model
How the beta coefficients of a stock move over time? Are the b_1 and b_0 of a stock stable? if not, do they change gradually or can they radically change over time? We will run several rolling regression for Alfa to try to respond these questions.
Before we do the exercise, I will review the meaning of the beta coefficients in the context of the market model.
In the market regression model, b_1 is a measure of the sensitivity; it measures how much the stock return might move (on average) when the market return moves in +1%.
Then, according to the market regression model, the stock return will change if the market return changes, and also it will change by many other external factors. The aggregation of these external factors is what the error term represents.
It is said that b_1 in the market model measures the systematic risk of the stock, which depends on changes in the market return. The unsystematic risk of the stock is given by the error term, that is also named the random shock, which is the summary of the overall reaction of all investors to news that might affect the stock (news about the company, its industry, regulations, national news, global news).
We can make predictions of the stock return by measuring the systematic risk with the market regression model, but we cannot predict the unsystematic risk. The most we can measure with the market model is the variability of this unsystematic risk (the variance of the error).
In this exercise you have to estimate rolling regressions by moving time windows and run 1 regression for each time window.
For the same ALFAA.MX stock, run rolling regressions using a time window of 36 months, starting from Jan 2010.
The first regression has to start in Jan 2010 and end in Dec 2012 (36 months). For the second you have to move time window 1 month ahead, so it will start in Feb 2010 and ends in Jan 2013. For the third regression you move another month ahead and run the regression. You continue running all possible regressions until you end up with a window with the last 36 months of the dataset.
This sounds complicated, but fortunately we can use the function RollingOLS that automatically performs rolling regressions by shifting the 36-moth window by 1 month in each iteration.
Then, you have to do the following:
- Download monthly stock prices for ALFAA.MX and the market (^MXX) from Jan 2010 to Jul 2022, and calculate cc returns.
# Getting price data and selecting adjusted price columns:
sprices = yf.download("ALFAA.MX ^MXX",start="2010-01-01",interval="1mo")
[ 0% ]
[*********************100%***********************] 2 of 2 completed
sprices = sprices['Close']
# Calculating returns:
sr = np.log(sprices) - np.log(sprices.shift(1))
# Deleting the first month with NAs:
sr=sr.dropna()
sr.columns=['ALFAAret','MXXret']- Run rolling regressions and save the moving b_0 and b_1 coefficients for all time windows.
from statsmodels.regression.rolling import RollingOLS
import statsmodels.api as sm
x=sm.add_constant(sr['MXXret'])
y = sr['ALFAAret']
rolreg = RollingOLS(y,x,window=36).fit()
betas = rolreg.params
# I check the last pairs of beta values:
betas.tail() const MXXret
Date
2025-09-01 -0.001343 0.815850
2025-10-01 -0.001452 0.931314
2025-11-01 -0.000932 0.935685
2025-12-01 0.001846 0.911521
2026-01-01 0.001134 0.922324
- Do a plot to see how b_1 and b_0 has changed over time.
plt.clf()
plt.plot(betas['MXXret'])
plt.title('Moving beta1 for Alfaa')
plt.xlabel('Date')
plt.ylabel('beta1')
plt.show()plt.clf()
plt.plot(betas['const'])
plt.title('Moving beta0 for Alfaa')
plt.xlabel('Date')
plt.ylabel('beta0')
plt.show()We can see that the both beta coefficients move over time; they are not constant. There is no apparent pattern for the changes of the beta coefficients, but we can appreciate how much they can move over time; in other words, we can visualize their standard deviation, which is the average movement from their means.
We can actually calculate the mean and standard deviation of all these pairs of moving beta coefficients and see how they compare with their beta coefficients and their standard errors of the original regression when we use only 1 sample with the last 36 months:
betas.describe() const MXXret
count 157.000000 157.000000
mean -0.002675 1.323990
std 0.013632 0.503794
min -0.025590 0.428319
25% -0.012623 1.024404
50% -0.006710 1.266704
75% 0.004951 1.699193
max 0.030649 2.343647
We calculated more than 150 regressions using 36-month rolling windows! For each regression we calculated a pair of b_0 and b_1.
Compared with the first market regression of Alfa using the most recent months from 2018, we see that the mean of the moving betas is very similar to the estimated beta of the first regression. Also, we see that the standard deviation of the moving b_0 is very similar to the standard error of b_0 estimated in the first regression. The standard deviation of b_1 was much higher than the standard error of b_1 of the first regression. This difference might be because the moving betas were estimated using data from 2010, while the first regression used data from 2018, so it seems that the systematic risk of Alfa (measured by its b_1) has been reducing in the recent months.
I hope that now you can understand why we need an estimation of the standard error of the beta coefficients (standard deviation of the coefficients).
4 CHALLENGE: Interpret a market regression model
Select any stock from the US market and run a corresponding Market regression model.
Ask Gemini for the code to collect prices, calculate log returns and run the linear regression.
Respond to the following questions:
Is the Stock offering significantly returns over the market? Why yes or why not? Explain
Is the Stock significantly riskier, or less riskier than the market? Explain