Overview

This blog focuses on running a regression between two technology stocks, Apple and Amazon. We will be using the quantmod package to calculate the stock returns, on a daily and monthly basis. The two datasets will then be used to run a regression to see if we can use one stock’s returns to predict the returns of the other one. We then use three other technology stocks and run a multiple linear regression to see if we can up up with a better fit.

Load Libraries

library(quantmod)
library(tidyverse)
library(corrplot)

Extracting the data

We wil be using prices starting 01 Jan 2015 for this purpose. First we extract the price data for both the tickers

aapl<- getSymbols.yahoo("AAPL", from='2015-01-01', auto.assign = F)[,6]
amzn<- getSymbols.yahoo("AMZN", from='2015-01-01', auto.assign = F)[,6]

Calculate returns

Next we will calculate the daily and monthly returns using the quantmod package.

aapl_daily<-periodReturn(aapl, period='daily', type='log')
aapl_monthly<-periodReturn(aapl, period='monthly', type='log')
amzn_daily<-periodReturn(amzn, period='daily', type='log')
amzn_monthly<-periodReturn(amzn, period='monthly', type='log')

Let us have a look at the correlations first. We have a 56% correlation between the returns of the two stocks.

print('Correlation daily returns')
## [1] "Correlation daily returns"
cor(aapl_daily, amzn_daily)
##               daily.returns
## daily.returns      0.559679
print('Correlation monthly returns')
## [1] "Correlation monthly returns"
cor(aapl_monthly, amzn_monthly)
##                 monthly.returns
## monthly.returns       0.4663392

We will now plot the returns for these stocks

par(mfrow=c(1,2))
chartSeries(aapl_daily,theme=chartTheme("white"))

chartSeries(amzn_daily,theme=chartTheme("white"))

Simple Linear Regression

We will now run a linear regression on the two returns, First we combine the data into one file and change the headers

data<-cbind(aapl_daily, amzn_daily)
names(data)<-c("AAPL", "AMZN")

We plot a scatter plot to look at the patterns. As expected we see a lot of the daily returns values in the range -2.5% to 2.5%

scatter.smooth(x=data$AAPL, y=data$AMZN, main="AAPL ~ AMZN")

We now fit the linear model

linearMod <- lm(AAPL ~ AMZN, data=data)  # build linear regression model on full data
summary(linearMod)
## 
## Call:
## lm(formula = AAPL ~ AMZN, data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.108204 -0.006794  0.000048  0.007318  0.084976 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.0001933  0.0004039   0.478    0.632    
## AMZN        0.5380647  0.0206884  26.008   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01552 on 1483 degrees of freedom
## Multiple R-squared:  0.3132, Adjusted R-squared:  0.3128 
## F-statistic: 676.4 on 1 and 1483 DF,  p-value: < 2.2e-16

We see a low Rsquare of 31% even though we started with a 56% correlation.

Multiple Regression

We will now bring in three other technology stocks and do the same to see if we can come up with a better fit, Facebook, Goodle and Netflix. Together these 5 stocks are commonly referred to as FAANG.

fb<- getSymbols.yahoo("FB", from='2015-01-01', auto.assign = F)[,6]
goog<-aapl<- getSymbols.yahoo("GOOG", from='2015-01-01', auto.assign = F)[,6]
nflx<-aapl<- getSymbols.yahoo("NFLX", from='2015-01-01', auto.assign = F)[,6]
fb_daily<-periodReturn(fb, period='daily', type='log')
goog_daily<-periodReturn(goog, period='daily', type='log')
nflx_daily<-periodReturn(nflx, period='daily', type='log')
data<-cbind(data, fb_daily, goog_daily, nflx_daily)
names(data)<-c("AAPL", "AMZN", "FB", "GOOG", "NFLX")

Let us run the regression now

MultMod <- lm(AAPL ~ AMZN+FB+GOOG+NFLX, data=data)  # build linear regression model on full data
summary(MultMod)
## 
## Call:
## lm(formula = AAPL ~ AMZN + FB + GOOG + NFLX, data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.085362 -0.007011  0.000159  0.006945  0.086697 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.0002336  0.0003608   0.648  0.51737    
## AMZN        0.1722568  0.0267929   6.429 1.73e-10 ***
## FB          0.2149750  0.0251871   8.535  < 2e-16 ***
## GOOG        0.3367687  0.0319229  10.549  < 2e-16 ***
## NFLX        0.0513347  0.0164420   3.122  0.00183 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01386 on 1480 degrees of freedom
## Multiple R-squared:  0.4534, Adjusted R-squared:  0.452 
## F-statistic:   307 on 4 and 1480 DF,  p-value: < 2.2e-16

We find that even though the p value is low, all these four technology stocks return an adjusted Rsquare of only 45%. The rest of the changes in returns must be due to company specific opportunities/factors.

Here is a corrplot for all the returns.

data %>% 
  cor(., use = "complete.obs") %>%
  corrplot(., method = "number", type = "upper", tl.col = "black", tl.cex=.8, diag = FALSE)

Conclusion

It does not look like we can predict returns of Apple stocks with the four other technology tickers we have used here. Analysts better do their company specific research to stock pick for their portfolios.