1 Spurious Regression

When we want to examine the relationship between two non-stationary variables by running a regression model, we have the risk to end up with a non-valid - spurious - regression.

#we need to install wbstats package
library(wbstats)
## Warning: package 'wbstats' was built under R version 4.0.5
# Mexico - Infant mortality
infantm<-wb_data(indicator = c("SP.DYN.IMRT.IN"), 
      country="MEX", start_date = 1980, end_date = 2020)
# Mexico - Export value
exports<-wb_data(indicator = c("TX.VAL.MRCH.XD.WD"), 
      country="MEX", start_date = 1980, end_date = 2020)

The wb function brings a data frame with the requested data. We can plot the data to have an idea of these 2 variables:

plot.ts(infantm$SP.DYN.IMRT.IN)

plot.ts(exports$TX.VAL.MRCH.XD.WD)

Now run a regression using these series. Report the result of the regression.

m1 <- lm(exports$TX.VAL.MRCH.XD.WD ~ infantm$SP.DYN.IMRT.IN)
summary(m1)
## 
## Call:
## lm(formula = exports$TX.VAL.MRCH.XD.WD ~ infantm$SP.DYN.IMRT.IN)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.331 -39.723  -7.456  37.848  77.194 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            275.1783    16.0408   17.16  < 2e-16 ***
## infantm$SP.DYN.IMRT.IN  -6.1847     0.5327  -11.61  4.6e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 41.65 on 38 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7801, Adjusted R-squared:  0.7743 
## F-statistic: 134.8 on 1 and 38 DF,  p-value: 4.602e-14

Did you find significant relationship? is your result what you expected?

AFTER LOOKING AT THE RESULTS OF THE MODEL, I CAN SAY THAT THE INFANT MORTALITY IS SIGNIFICANTLY AND NEGATIVELY RELATED TO THE EXPORTS IN MEXICO, THE T-VALUE OF B1 IS MUCH BIGGER THAN 2 WHICH REPRESENTS A REALLY STRONG RELATIONSHIP BETWEEN THIS VARIABLES. “THE LOWER THE INFANT MORTALITY THE HIGHER THE EXPORTS” ALTHOUGH WE HAVE THIS STATISTICAL EVIDENCE, WE CAN EASILY SAY THAT THIS IS REALLY WEIRD DUE TO THE FACT THAT IT´S NOT EXPECTED TO FIND A RELATIONSHIP BETWEEN INFANT MORTALITY AND EXPORTS IN MEXICO.

2 Cointegration between Financial series (From Jan 1, 2015 to Oct 2, 2017.)

Using daily data of Mexico IPCyC market index and the S&P 500, examine whether two series are cointegrated. Generate an index for each instrument. To do these indexes, create a variable that represents how 1.00 peso or 1.00 dollar invested in each instrument would be changing over time.

For both cases, run a cointegration test and INTERPRET your results.

library(quantmod)
## Warning: package 'quantmod' was built under R version 4.0.3
## Loading required package: xts
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.0.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
getSymbols(Symbols<-c("^MXX", "^GSPC"), periodicity= "daily", from= "2015-01-01", to= "2017-10-02")
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "^MXX"  "^GSPC"
data = na.omit(merge (MXX, GSPC))
firstmxx = as.numeric(MXX$MXX.Adjusted[1])
firstusa = as.numeric(GSPC$GSPC.Adjusted[1])
invmxx <- data$MXX.Adjusted / firstmxx
invusa <- data$GSPC.Adjusted / firstusa
plot(invmxx)

plot(invusa)

m1 <- lm(invmxx$MXX.Adjusted ~ invusa$GSPC.Adjusted)
summary(m1)
## 
## Call:
## lm(formula = invmxx$MXX.Adjusted ~ invusa$GSPC.Adjusted)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.06656 -0.01835  0.00244  0.01944  0.06810 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           0.35668    0.01383   25.79   <2e-16 ***
## invusa$GSPC.Adjusted  0.69852    0.01311   53.29   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02702 on 672 degrees of freedom
## Multiple R-squared:  0.8086, Adjusted R-squared:  0.8083 
## F-statistic:  2840 on 1 and 672 DF,  p-value: < 2.2e-16
adf.test(m1$residuals, k=0)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  m1$residuals
## Dickey-Fuller = -3.5752, Lag order = 0, p-value = 0.03508
## alternative hypothesis: stationary

Están cointegrados, se ouede confar en el modelo

AFTER LOOKING AT THE RESULTS OF THE MODEL AND AT THE RESIDUALS WE CAN SAY THAT THERE´S COINTEGRATION DUE TO THE P-VALUE GREATER THAN 0.05 WHICH REPRESENTS STATIONARITY,AND WE CAN EASILY TRUST IN OUR MODEL. FOR EACH 1 MXN YOU GET 0.69852 USD.

3 Cointegration between Financial series (From Oct 3, 2017 to Feb 28, 2018)

getSymbols(Symbols<-c("^MXX", "^GSPC"), periodicity= "daily", from = "2017-10-03", to = "2018-02-28")
## [1] "^MXX"  "^GSPC"
data_1 = na.omit(merge (MXX, GSPC))
firstmxx_1 = as.numeric(MXX$MXX.Adjusted[1])
firstusa_1 = as.numeric(GSPC$GSPC.Adjusted[1])



invmxx_1 <- data_1$MXX.Adjusted / firstmxx_1
invusa_1 <- data_1$GSPC.Adjusted / firstusa_1
plot(invmxx_1)

plot(invusa_1)

m2 <- lm(invmxx_1$MXX.Adjusted ~ invusa_1$GSPC.Adjusted)
summary(m2)
## 
## Call:
## lm(formula = invmxx_1$MXX.Adjusted ~ invusa_1$GSPC.Adjusted)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.037454 -0.014008 -0.002567  0.017749  0.040777 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.79716    0.05928  13.447  < 2e-16 ***
## invusa_1$GSPC.Adjusted  0.16206    0.05640   2.874  0.00501 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02006 on 95 degrees of freedom
## Multiple R-squared:  0.07997,    Adjusted R-squared:  0.07028 
## F-statistic: 8.257 on 1 and 95 DF,  p-value: 0.005008
adf.test(m2$residuals, k=0)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  m2$residuals
## Dickey-Fuller = -2.0418, Lag order = 0, p-value = 0.5592
## alternative hypothesis: stationary

AFTER LOOKING AT THE RESULTS OF THE MODEL AND AT THE RESIDUALS WE CAN SAY THAT THERE´S NO COINTEGRATION DUE TO THE P-VALUE SMALLER THAN 0.05 WHICH REPRESENTS A LACK OF STATIONARITY,AND WE CAN EASILY SAY THE RESULTS OF OUR MODEL ARE NOT RELIABLE. (SPURIOUS RELATIONSHIP)

4 Holding return of a portfolio of 2 stocks

Once we have an idea about cointegration, we will review how to form a portfolio of 2 assets and calculate its holding return over time. The holding period return of an asset from day 1 to day N (HPR) can be calculated with any of the following ways:

HPR = (Price of the stock at day N / Price of the stock at day 1) - 1

HPR = exp ( Sum of all continuous compounded returns from day 1 to day N) - 1

The expected return of a portfolio of 2 assets is estimated as a weighted average of the expected stock returns:

E[Rp]=w1∗E[R1]+w2∗E(R2)

The holding return of a portfolio of 2 assets for a specific period of time is estimated as a weighted average of the holding returns of the assets:

HPRp=w1∗HPR1+w2∗HPR2

Download daily prices from CEMEX and ALFA from Jan 1, 2015 to Dec 31, 2017.

Calculate their holding period returns of both stocks

getSymbols(Symbols<-c("CEMEXCPO.MX", "ALFAA.MX"), periodicity= "daily", from = "2015-01-01", to = "2017-12-31")
## [1] "CEMEXCPO.MX" "ALFAA.MX"

CEMEX STOCK

n <- (nrow(CEMEXCPO.MX))
price0 <- as.numeric(CEMEXCPO.MX$CEMEXCPO.MX.Adjusted[1])
pricen <- as.numeric(CEMEXCPO.MX$CEMEXCPO.MX.Adjusted[n])
HPRCEMEX <- ((pricen /price0)-1)*100
print(paste("HPR of CEMEX = ", HPRCEMEX))
## [1] "HPR of CEMEX =  12.6396524035649"

ALFA STOCK

n_1 <- (nrow(ALFAA.MX))
price0_1 <- as.numeric(ALFAA.MX$ALFAA.MX.Adjusted[1])
pricen_1 <- as.numeric(ALFAA.MX$ALFAA.MX.Adjusted[n])
HPRALFAA <- ((pricen_1 /price0_1)-1 )*100
print(paste("HPR of ALFAA = ", HPRALFAA))
## [1] "HPR of ALFAA =  -32.2857992258085"

Create a portfolio 1 assigning 30% to CEMEX and 70% for ALFA and calculate the HPR of the portfolio

p1 <- HPRCEMEX*0.3 + HPRALFAA*0.7 
print(paste("HPR of my portfolio 1 ", p1))
## [1] "HPR of my portfolio 1  -18.8081637369965"

Create a portfolio 2 assigning -100% to CEMEX and +200% to ALFA and calcuate the HPR of this portfolio

p2 <- HPRCEMEX*-1 + HPRALFAA*2 
print(paste("HPR of my portfolio 2 ", p2))
## [1] "HPR of my portfolio 2  -77.2112508551819"

What does a negative sign mean a portfolio? Briefly explain with the previous example.

THE NEGATIVE SIGN REPRESENTS A SALE WE DO, TO BUY MORE OF THE OTHER STOCK.

5 CHALLENGE: Statistical arbitrage

dataset<- merge(ALFAA.MX$ALFAA.MX.Adjusted, CEMEXCPO.MX$CEMEXCPO.MX.Adjusted)
reg1 <- lm( dataset$ALFAA.MX.Adjusted  ~ dataset$CEMEXCPO.MX.Adjusted)
summary(reg1)
## 
## Call:
## lm(formula = dataset$ALFAA.MX.Adjusted ~ dataset$CEMEXCPO.MX.Adjusted)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9402 -1.0178  0.5681  1.8485  5.5081 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   42.9127     0.5181   82.83   <2e-16 ***
## dataset$CEMEXCPO.MX.Adjusted  -1.0468     0.0382  -27.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.784 on 752 degrees of freedom
## Multiple R-squared:  0.4996, Adjusted R-squared:  0.4989 
## F-statistic: 750.8 on 1 and 752 DF,  p-value: < 2.2e-16
adf.test(reg1$residuals, k=0)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  reg1$residuals
## Dickey-Fuller = -3.8726, Lag order = 0, p-value = 0.01537
## alternative hypothesis: stationary

Then, if this is the case, what can you do to take advantage in financial trading?

I CAN CONCLUDE THAT IF THE STOCKS SEPARATE CEMEX WOULD GO UP AND ALFA DOWN

If you were to invest from Jan 1, 2018 to Feb 28, 2018 in a portfolio of these stocks, which weights would you assign?

rm(list=ls())
getSymbols(("CEMEXCPO.MX"), periodicity= "daily", from = "2018-01-01", to = "2018-02-28")
## [1] "CEMEXCPO.MX"
getSymbols(("ALFAA.MX"), periodicity= "daily", from = "2018-01-01", to = "2018-02-28")
## [1] "ALFAA.MX"
n <- (nrow(CEMEXCPO.MX))
price0 <- as.numeric(CEMEXCPO.MX$CEMEXCPO.MX.Adjusted[1])
pricen <- as.numeric(CEMEXCPO.MX$CEMEXCPO.MX.Adjusted[n])
HPRCEMEX <- ((pricen /price0)-1)*100
print(paste("HPR of CEMEX = ", HPRCEMEX))
## [1] "HPR of CEMEX =  -15.7929656158527"
n_1 <- (nrow(ALFAA.MX))
price0_1 <- as.numeric(ALFAA.MX$ALFAA.MX.Adjusted[1])
pricen_1 <- as.numeric(ALFAA.MX$ALFAA.MX.Adjusted[n])
HPRALFAA <- ((pricen_1 /price0_1)-1 )*100
print(paste("HPR of ALFAA = ", HPRALFAA))
## [1] "HPR of ALFAA =  1.18954999293077"

I WOULD ASIGN +200% TO ALFA AND -100% TO CEMEX

PCH <- HPRALFAA*2 + HPRCEMEX*-1
print(paste("HPR of my Portafolio = ", PCH))
## [1] "HPR of my Portafolio =  18.1720656017143"