blog-1

Introduction

In financial analysis, simple regression models serve as powerful tools to understand the linear relationship between two variables. This research delves into a fundamental aspect of financial modeling: the Market Model. The Market Model posits that a stock’s anticipated return is determined by its alpha coefficient (intercept) and its market beta coefficient (slope) multiplied by the market return. This study aims to estimate these coefficients through a simple linear regression model, using monthly returns of Alfa (ALFAA.MX) and the IPCyC (^MXX) from January 2015 to December 2019.

Simple regression model

In a simple regression model, we aim to grasp the linear connection between two variables. Here, one variable, known as the independent variable (IV), is considered a predictor of the other variable, called the dependent variable (DV)

Let’s delve into a simple regression model using the Market Model.

According to the Market Model, the anticipated return of a stock is determined by its alpha coefficient (b0) plus its market beta coefficient (b1) multiplied by the market return. Mathematically, this is expressed as:

\(E[Ri]=α+β(RM)\)

We can express the same equation using BO as alpha, and B1 as market beta:

\(E[Ri]=β0+β1(RM)\)

We can estimate the alpha and market beta coefficient by running a simple linear regression model specifying that the market return is the independent variable and the stock return is the dependent variable. It is strongly recommended to use continuously compounded returns instead of simple returns to estimate the market regression model. The market regression model can be expressed as:

\(r(i,t)=β0+β1∗r(M,t)+εt\)

Where:

\(εt\) is the error at time \(t\). Thanks to the Central Limit Theorem, this error behaves like a Normal distributed random variable \(∼ N(0, σε)\); the error term is expected to have \(mean=0\) and a specific standard deviation (also called volatility).

\(r(i,t)\) is the return of the stock \(i\) at time \(t\).

\(r(M,t)\) is the market return at time \(t\)

\(β0\) and \(β1\) are coefficients or constants

Data download

Now, let’s dive into utilizing actual data to gain a deeper understanding of this model. Please download monthly prices for Alfa (ALFAA.MX) and the IPCyC (^MXX) from Yahoo Finance, spanning from January 2015 to December 2019. We’ll use ALFAA.MXand the IPCyC to build our own market model. Here’s what you need to do:

# Load package quantmod
library(quantmod)

# Download the data
getSymbols(c("ALFAA.MX", "^MXX"), from = "2015-01-01", to = "2019-12-31", 
           periodicity = "monthly", src = "yahoo")

## [1] "ALFAA.MX" "MXX"

First 5 rows of ALFAA.MX
ALFAA.MX.Open	ALFAA.MX.High	ALFAA.MX.Low	ALFAA.MX.Close	ALFAA.MX.Volume	ALFAA.MX.Adjusted
30.09299	30.61373	24.94046	25.09576	203906166	24.69958
25.09576	29.91941	24.75774	29.59053	263032096	29.12338
29.58139	29.85546	27.05080	28.13795	209991890	27.69374
28.48511	31.05224	27.90042	28.45770	189243285	28.00844
28.45770	31.44507	27.13302	27.64462	202740723	27.23527

First 5 rows of ALFAA.MX
MXX.Open	MXX.High	MXX.Low	MXX.Close	MXX.Volume	MXX.Adjusted
43146.52	43325.43	40723.66	40950.58	4064376800	40950.58
40954.62	44439.99	40944.81	44190.17	3934966300	44190.17
44179.93	44441.15	42674.25	43724.78	4235277600	43724.78
43709.49	46078.07	43697.45	44582.39	3899846400	44582.39
44589.10	45540.68	44124.59	44703.62	3729981800	44703.62

# Calculate continuously returns for the stock and the market index
r_ALFAA <- na.omit(diff(log(ALFAA.MX$ALFAA.MX.Adjusted))) #I dropped the na's
# For the IPC:
r_MXX <- na.omit(diff(log(MXX$MXX.Adjusted)))

# I merge them into the same object using the merge function:
all_rets <- merge(r_ALFAA, r_MXX)

#I renamed the columns:
colnames(all_rets) <- c("ALFAA", "MXX")

# Take a look at your objects!

Returns for ALFAA.MX and MXX
ALFAA	MXX
0.1647553103	0.0761364806
-0.0503348786	-0.0105873906
0.0112994670	0.0194239340
-0.0279932223	0.0027155551
-0.0076297082	0.0078005872
0.0641637798	-0.0066981839
0.0418957917	-0.0233063960
-0.0126585173	-0.0252327250
0.0427491676	0.0438318758
-0.0259065675	-0.0255628781
0.0168568047	-0.0102100642
-0.0112061966	0.0150859017
-0.0267485101	0.0019270596
0.0551087268	0.0483631066
-0.0690366645	-0.0021012980
0.0006182966	-0.0071307902
-0.0275706922	0.0110919128
-0.0254155435	0.0149890505
-0.0032636739	0.0186975683
-0.0108464675	-0.0062354566
-0.0543242520	0.0160305740
-0.0530210254	-0.0577350426
-0.0559903475	0.0071887205
0.0515654548	0.0293220952
-0.0519547136	-0.0030742159
0.0644411982	0.0353244004
-0.0560826696	0.0147190446
0.0397856083	-0.0096459255
-0.0417201538	0.0216752730
-0.0451562036	0.0228896708
0.0140818570	0.0038858356
-0.0850156652	-0.0170238718
-0.1378282204	-0.0347716086
0.0362784460	-0.0320362189
0.0401092179	0.0469148557
0.0774062207	0.0220777116
-0.0424107056	-0.0616829627
0.0409801659	-0.0280702289
0.0304463617	0.0472830961
-0.1590216009	-0.0794996199
0.1203837632	0.0650242349
0.0933405347	0.0418053314
-0.0091107887	-0.0030294947
-0.0406064332	-0.0008787224
-0.1195980201	-0.1191735929
-0.0453859935	-0.0515961757
0.1334089633	-0.0022192245
0.0290855074	0.0548478208
-0.0921993101	-0.0268213139
-0.0620206878	0.0106260144
-0.0728582053	0.0299535367
-0.0685829331	-0.0423242248
0.0528011096	0.0095917422
-0.1276404334	-0.0547140851
0.0308446998	0.0421550761
0.0121917659	0.0090798806
-0.0369491213	0.0075510999
-0.0824437042	-0.0120037839
0.0408220082	0.0166939920

Returns for ALFAA.MX and MXX
ALFAA	MXX
0.164755310	0.076136481
-0.050334879	-0.010587391
0.011299467	0.019423934
-0.027993222	0.002715555
-0.007629708	0.007800587

Visualize the relationship

Create a scatter plot with the IPCyC returns as the independent variable (X) and the stock return as the dependent variable (Y). Additionally, include a line that best represents the relationship between the stock returns and the market returns.

plot.default(x=all_rets$MXX,y=all_rets$ALFAA)
abline(lm(all_rets$ALFAA ~ all_rets$MXX),col='red')

As you can see, I’ve designated the Market returns for the X-axis and Alfa returns for the Y-axis. In the market model, the independent variable is the market returns, while the dependent variable is the stock return.

Graphs can sometimes be misleading. In this instance, the range of the X-axis and Y-axis differs, so it’s preferable to create a graph where both the X and Y ranges have equal distances. Additionally, we’ll include a line that more accurately represents the relationship between the stock returns and the market returns.

plot.default(x=all_rets$MXX,y=all_rets$ALFAA, xlim=c(-0.30,0.30) )
abline(lm(all_rets$ALFAA ~ all_rets$MXX),col='red')

What does the plot indicate?

In this case, the regression line is the most accurate representation of the relationship between the market return and the stock return.

I observe a positive correlation between market returns and Alfa returns. As market returns rise, Alfa returns tend to increase as well. It appears that for every 1% increase in market returns, Alfa returns increase slightly more than 1%, as indicated by the slope of the line, which appears to be slightly steeper than 45 degrees. When the angle of the regression line (relative to the X-axis) is 45 degrees, the slope equals 1.

Running the market regression model

Utilizing the lm() function, run a simple regression model to analyze how the monthly returns of the stock are associated with the market return. The first parameter of the function should be the DEPENDENT VARIABLE (in this case, the stock return), and the second parameter should be the INDEPENDENT VARIABLE, also known as the EXPLANATORY VARIABLE (in this case, the market return).

What you’ll obtain is referred to as The Market Regression Model. You’re investigating how market returns can elucidate stock returns from January 2015 to August 2020.

Assign your market model to an object named “reg”.

# Run the regression with the lm function:
reg <- lm(r_ALFAA ~ r_MXX)
# The first variable is the Dependent variable (the stock return), and 
#   the variable after the ~ is the Independent variable or explanatory
#   variable (the market return)

# I get the summary of the regression output into a variable
sumreg<- summary(reg)
# I display the main results of the regression:
sumreg

## 
## Call:
## lm(formula = r_ALFAA ~ r_MXX)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.097620 -0.036817 -0.004479  0.030757  0.146254 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.010251   0.006642  -1.543    0.128    
## r_MXX        1.168903   0.187718   6.227  6.1e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.051 on 57 degrees of freedom
## Multiple R-squared:  0.4049, Adjusted R-squared:  0.3944 
## F-statistic: 38.77 on 1 and 57 DF,  p-value: 6.096e-08

To calculate the sum of squares of total deviations from the mean of Y (SST), follow these steps:

Compute a variable representing the mean of the dependent variable Y (in this scenario, the stock return):

meanY = mean(r_ALFAA)
print(meanY)

## [1] -0.00903579

Calculate a variable with the squared deviations of each value of Y (stock returns) from its mean, and get the sum of these values:

# Calculate a vector for the squared deviations of each value of stock returns
#   from its mean:
squared_deviations_1 <- (r_ALFAA - meanY)^2
# Now I get the sum of these squared deviations
SST = sum(squared_deviations_1)
SST

## [1] 0.2490846

To calculate the sum of squares of the regression model (SSRM), you need to utilize the predicted values of the regression model, also known as the fitted values of the model. These values are stored in the regression object “reg” that we created with the lm function.

The fitted (predicted) values of the regression model are stored in the fitted.values attribute of the regression object:

fittedY = reg$fitted.values
fittedY

##                     [,1]
## 2015-02-01  0.0787451497
## 2015-03-01 -0.0226266356
## 2015-04-01  0.0124536890
## 2015-05-01 -0.0070767837
## 2015-06-01 -0.0011328750
## 2015-07-01 -0.0180805306
## 2015-08-01 -0.0374939181
## 2015-09-01 -0.0397456097
## 2015-10-01  0.0409842032
## 2015-11-01 -0.0401315266
## 2015-12-01 -0.0221855778
## 2016-01-01  0.0073829504
## 2016-02-01 -0.0079984585
## 2016-03-01  0.0462807720
## 2016-04-01 -0.0127072173
## 2016-05-01 -0.0185862054
## 2016-06-01  0.0027143651
## 2016-07-01  0.0072697407
## 2016-08-01  0.0116046380
## 2016-09-01 -0.0175396473
## 2016-10-01  0.0084871805
## 2016-11-01 -0.0777376633
## 2016-12-01 -0.0018480877
## 2017-01-01  0.0240236784
## 2017-02-01 -0.0138444639
## 2017-03-01  0.0310397905
## 2017-04-01  0.0069541301
## 2017-05-01 -0.0215261544
## 2017-06-01  0.0150852857
## 2017-07-01  0.0165047988
## 2017-08-01 -0.0057088395
## 2017-09-01 -0.0301502573
## 2017-10-01 -0.0508956385
## 2017-11-01 -0.0476982335
## 2017-12-01  0.0445879074
## 2018-01-01  0.0155556973
## 2018-02-01 -0.0823523986
## 2018-03-01 -0.0430623762
## 2018-04-01  0.0450183446
## 2018-05-01 -0.1031783411
## 2018-06-01  0.0657560134
## 2018-07-01  0.0386153695
## 2018-08-01 -0.0137921891
## 2018-09-01 -0.0112781452
## 2018-10-01 -0.1495533635
## 2018-11-01 -0.0705619239
## 2018-12-01 -0.0128450619
## 2019-01-01  0.0538607733
## 2019-02-01 -0.0416025159
## 2019-03-01  0.0021697751
## 2019-04-01  0.0247617722
## 2019-05-01 -0.0597239136
## 2019-06-01  0.0009608113
## 2019-07-01 -0.0742064573
## 2019-08-01  0.0390241871
## 2019-09-01  0.0003624949
## 2019-10-01 -0.0014245013
## 2019-11-01 -0.0242822619
## 2019-12-01  0.0092626518

Now you can get the SSRM with a similar process we followed to get the SST. Remember that you have to get the sum of squared deviations of each fitted value from the mean of Y.

# Calculate a vector for the squared deviations of each fitted value 
#   from the Y mean:
squared_deviations_2 = (fittedY-meanY)^2
# Sum these squared deviations to get SSRM
SSRM = sum(squared_deviations_2)
SSRM

## [1] 0.1008424

In a similar process, you can get the sum of squares for the errors (SSE). To get the SSE you have to get the sum of squares of the difference between the real values of Y (stock return) and the predicted values (fittedY).

You can compare if your calculations of sum of squares are correct by running the ANOVA function as follows:

anova(reg)

## Analysis of Variance Table
## 
## Response: r_ALFAA
##           Df  Sum Sq  Mean Sq F value    Pr(>F)    
## r_MXX      1 0.10084 0.100842  38.775 6.096e-08 ***
## Residuals 57 0.14824 0.002601                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The standard errors of the beta coefficients (b0 and b1) indicate the variability or uncertainty associated with these coefficients in the regression model. Specifically: The standard error of b0 is the standard deviation of b0, and the standard error of b1 is the standard deviation of b1. In this case, the values are as follows: The value of b0 is -0.0102618, with a standard error of 0.0066429. The value of b1 is 1.1688907, with a standard error of 0.1877383. The standard error of a coefficient provides information about the expected standard deviation (average variation) of the coefficient in the near future. Since regression coefficients can change over time, the standard error indicates how much the coefficient might deviate from its mean value in the future. Regression coefficients, such as b0 and b1, are linear combinations of random variables. According to the Central Limit Theorem, these coefficients will behave like normally distributed variables. Therefore, we can use the standard error of a coefficient to construct its 95% confidence interval.

What is the total sum of squares (SST) ?

We can calculate the sum of squares of a regression model using the function anova. To do this, we need to apply the function anova to a regression object. Here’s how we can do it in this case:

sumsquares <- anova(reg)
sumsquares

## Analysis of Variance Table
## 
## Response: r_ALFAA
##           Df  Sum Sq  Mean Sq F value    Pr(>F)    
## r_MXX      1 0.10084 0.100842  38.775 6.096e-08 ***
## Residuals 57 0.14824 0.002601                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the column labeled “Sum Sq,” we can observe: A) The sum of squares of the regression model (SSRM), which in this case equals 0.1008404. B) The sum of squares of the errors or residuals (SSE), which in this case equals 0.1482748.

The total sum of squares (SST) is equal to SSRM + SSE. Let me explain what each of these sum of squares represents:

The SST, or Sum of Total Squares, represents the total variability in the dependent variable (Y). It measures how much the observed values of Y deviate from the mean of Y.

If we express the general formula for a regression line as :

\(E[Yi]=b0+b1(Xi)\)

where i goes from 1 to N observations, then:

\(SST=∑i=1N(Yi−Y¯)^2\)

Then, SST is the sum of all squared distances from each point \(Yi\) to the mean of \(Y (Y¯)\)

We consider \(Y¯\) as the UNCONDITIONAL mean, since it is independent to the values of \(X\)

With the anova function, the SST will be equal to the sum of the SSRM and SSE. In this case, SST = 0.2491151.

We can decompose SST in two parts: the sum of squared distances that are explained by the regression model (Sum of Squared Regression Model), and the sum of squared distances that are NOT explained by the regression model (Sum of Squared Errors):

\(SST = SSR + SSE\)

The SSE is the sum of squared errors:

\(SSE=∑i=1N(Yi−E[Y])^2\)

The distances from Yi to \(E[Yi]\) are the distances that cannot be explained with the regression model.

The SSR is the sum of the squared distances from the unconditional mean \((Y¯)\) to the expected mean according to the regression model:

\(SSRM=∑i=1N(E[Y]−Y¯)^2\)

The coefficient of determination of the regression model R-squared is defined as:

\(R^2=SSRM / SST\)

\(R^2\) is the percentage of variance of Y that is explained by the variance of X. In other words, it gives us a % of explanation of the variation of Y given variations of X.

Conclusion

this blog elucidates the applicability of the Market Model in understanding the relationship between market returns and individual stock returns. The estimated alpha and beta coefficients provide valuable insights for investors seeking to gauge the sensitivity of a stock’s performance to market movements. Future research endeavors could explore alternative regression methodologies and incorporate additional explanatory variables to enhance the predictive accuracy of financial models.