Overview
This blog focuses on running a regression between two technology stocks, Apple and Amazon. We will be using the quantmod package to calculate the stock returns, on a daily and monthly basis. The two datasets will then be used to run a regression to see if we can use one stock’s returns to predict the returns of the other one. We then use three other technology stocks and run a multiple linear regression to see if we can up up with a better fit.
Load Libraries
library(quantmod)
library(tidyverse)
library(corrplot)
Extracting the data
We wil be using prices starting 01 Jan 2015 for this purpose. First we extract the price data for both the tickers
aapl<- getSymbols.yahoo("AAPL", from='2015-01-01', auto.assign = F)[,6]
amzn<- getSymbols.yahoo("AMZN", from='2015-01-01', auto.assign = F)[,6]
Calculate returns
Next we will calculate the daily and monthly returns using the quantmod package.
aapl_daily<-periodReturn(aapl, period='daily', type='log')
aapl_monthly<-periodReturn(aapl, period='monthly', type='log')
amzn_daily<-periodReturn(amzn, period='daily', type='log')
amzn_monthly<-periodReturn(amzn, period='monthly', type='log')
Let us have a look at the correlations first. We have a 56% correlation between the returns of the two stocks.
print('Correlation daily returns')
## [1] "Correlation daily returns"
cor(aapl_daily, amzn_daily)
## daily.returns
## daily.returns 0.559679
print('Correlation monthly returns')
## [1] "Correlation monthly returns"
cor(aapl_monthly, amzn_monthly)
## monthly.returns
## monthly.returns 0.4663392
We will now plot the returns for these stocks
par(mfrow=c(1,2))
chartSeries(aapl_daily,theme=chartTheme("white"))
chartSeries(amzn_daily,theme=chartTheme("white"))
Simple Linear Regression
We will now run a linear regression on the two returns, First we combine the data into one file and change the headers
data<-cbind(aapl_daily, amzn_daily)
names(data)<-c("AAPL", "AMZN")
We plot a scatter plot to look at the patterns. As expected we see a lot of the daily returns values in the range -2.5% to 2.5%
scatter.smooth(x=data$AAPL, y=data$AMZN, main="AAPL ~ AMZN")
We now fit the linear model
linearMod <- lm(AAPL ~ AMZN, data=data) # build linear regression model on full data
summary(linearMod)
##
## Call:
## lm(formula = AAPL ~ AMZN, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.108204 -0.006794 0.000048 0.007318 0.084976
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0001933 0.0004039 0.478 0.632
## AMZN 0.5380647 0.0206884 26.008 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01552 on 1483 degrees of freedom
## Multiple R-squared: 0.3132, Adjusted R-squared: 0.3128
## F-statistic: 676.4 on 1 and 1483 DF, p-value: < 2.2e-16
We see a low Rsquare of 31% even though we started with a 56% correlation.
Multiple Regression
We will now bring in three other technology stocks and do the same to see if we can come up with a better fit, Facebook, Goodle and Netflix. Together these 5 stocks are commonly referred to as FAANG.
fb<- getSymbols.yahoo("FB", from='2015-01-01', auto.assign = F)[,6]
goog<-aapl<- getSymbols.yahoo("GOOG", from='2015-01-01', auto.assign = F)[,6]
nflx<-aapl<- getSymbols.yahoo("NFLX", from='2015-01-01', auto.assign = F)[,6]
fb_daily<-periodReturn(fb, period='daily', type='log')
goog_daily<-periodReturn(goog, period='daily', type='log')
nflx_daily<-periodReturn(nflx, period='daily', type='log')
data<-cbind(data, fb_daily, goog_daily, nflx_daily)
names(data)<-c("AAPL", "AMZN", "FB", "GOOG", "NFLX")
Let us run the regression now
MultMod <- lm(AAPL ~ AMZN+FB+GOOG+NFLX, data=data) # build linear regression model on full data
summary(MultMod)
##
## Call:
## lm(formula = AAPL ~ AMZN + FB + GOOG + NFLX, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.085362 -0.007011 0.000159 0.006945 0.086697
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0002336 0.0003608 0.648 0.51737
## AMZN 0.1722568 0.0267929 6.429 1.73e-10 ***
## FB 0.2149750 0.0251871 8.535 < 2e-16 ***
## GOOG 0.3367687 0.0319229 10.549 < 2e-16 ***
## NFLX 0.0513347 0.0164420 3.122 0.00183 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01386 on 1480 degrees of freedom
## Multiple R-squared: 0.4534, Adjusted R-squared: 0.452
## F-statistic: 307 on 4 and 1480 DF, p-value: < 2.2e-16
We find that even though the p value is low, all these four technology stocks return an adjusted Rsquare of only 45%. The rest of the changes in returns must be due to company specific opportunities/factors.
Here is a corrplot for all the returns.
data %>%
cor(., use = "complete.obs") %>%
corrplot(., method = "number", type = "upper", tl.col = "black", tl.cex=.8, diag = FALSE)
Conclusion
It does not look like we can predict returns of Apple stocks with the four other technology tickers we have used here. Analysts better do their company specific research to stock pick for their portfolios.