Now write a function to automate the task:
library(tidyverse)
library(tidyquant)
library(tibbletime)
library(moments)
ticks <- c("QQQ","GLD","ALB")
yourfunction <- function(tick=ticks,fromDate="2019-01-01",toDate="2019-10-31") {
x <- tq_get(tick, get = "stock.prices", from = fromDate, to = toDate)
x %>% group_by(symbol) %>%
dplyr::mutate(value = log(adjusted)-log(dplyr::lag(adjusted))) %>%
stats::na.omit() %>%
dplyr::summarise(mean = mean(value),
sd = sd(value),
skewness= moments::skewness(value),
kurt = moments::kurtosis(value))
}
# test your function as per below
yourfunction(tick=c("FB","V","PFE"),fromDate="2019-01-01",toDate=as.character(Sys.Date()))
## # A tibble: 3 x 5
## symbol mean sd skewness kurt
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 FB 0.00169 0.0183 0.418 8.30
## 2 PFE -0.000546 0.0126 -0.978 6.18
## 3 V 0.00135 0.0120 -0.479 5.05
You have chosen the following sector ETFs:
XLV - Health Care.XLE - Energy.XLK - Technology.Use the period from “2018-01-01” to “2019-10-31”.
tick=c("SPY","XLV","XLE","XLK")
df <- tq_get(tick, get = "stock.prices",
from = "2018-01-01",
to = "2019-10-31")
df.wide <- df %>% group_by(symbol) %>% dplyr::mutate(value = log(adjusted)-log(dplyr::lag(adjusted))) %>%
stats::na.omit() %>% select(date, symbol, value) %>%
pivot_wider(names_from = symbol, values_from = value)
Compute the correlations between the three independent variables and, in a few words, discuss its implications on assumptions made in multivariate regression models.
library(PerformanceAnalytics)
dftxs <- df.wide %>%
timetk::tk_xts(rename_index = "date")
PerformanceAnalytics::chart.Correlation(dftxs)
#or to calculate it by formula
cor(df.wide$XLV, df.wide$XLE)
## [1] 0.5875933
cor(df.wide$XLV, df.wide$XLK)
## [1] 0.7651036
cor(df.wide$XLE, df.wide$XLK)
## [1] 0.6265774
Correlations are shown in the graph. In a multivariate regression analysis, we assume no multicollinearity. According to the correlation between the three variables, there are multicollinearity between them, because they are all correlated. This can be a problem for the regression model since the “indenpendent varaibles” are not indenpendent. With multicollinearity, one should be careful when adding variables into the model, since it can falsely increase R square from correlation between the variables. #### C) Build Your Multiple Regressions (6 points)
XLK.XLV.XLE.Print your results and explain in a few bullets your thoughts and conclusions on the quality and significance of this multiple regression.
fit1 <- lm(SPY ~ XLK, df.wide)
summary(fit1)
##
## Call:
## lm(formula = SPY ~ XLK, data = df.wide)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0139371 -0.0018300 -0.0000934 0.0019978 0.0107294
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.133e-05 1.597e-04 -0.447 0.655
## XLK 6.722e-01 1.171e-02 57.421 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.003422 on 458 degrees of freedom
## Multiple R-squared: 0.878, Adjusted R-squared: 0.8778
## F-statistic: 3297 on 1 and 458 DF, p-value: < 2.2e-16
fit2 <- lm(SPY ~ XLK + XLV, df.wide)
summary(fit2)
##
## Call:
## lm(formula = SPY ~ XLK + XLV, data = df.wide)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0067542 -0.0016756 -0.0000728 0.0017316 0.0075759
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.707e-05 1.175e-04 -0.571 0.568
## XLK 4.703e-01 1.337e-02 35.163 <2e-16 ***
## XLV 3.561e-01 1.805e-02 19.729 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002518 on 457 degrees of freedom
## Multiple R-squared: 0.9341, Adjusted R-squared: 0.9338
## F-statistic: 3241 on 2 and 457 DF, p-value: < 2.2e-16
fit3 <- lm(SPY ~ XLK + XLV + XLE, df.wide)
summary(fit3)
##
## Call:
## lm(formula = SPY ~ XLK + XLV + XLE, data = df.wide)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0080237 -0.0013128 -0.0000201 0.0013766 0.0063243
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.573e-05 9.868e-05 0.362 0.717
## XLK 4.137e-01 1.191e-02 34.740 <2e-16 ***
## XLV 3.094e-01 1.548e-02 19.989 <2e-16 ***
## XLE 1.369e-01 9.787e-03 13.989 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002108 on 456 degrees of freedom
## Multiple R-squared: 0.9539, Adjusted R-squared: 0.9536
## F-statistic: 3146 on 3 and 456 DF, p-value: < 2.2e-16
plot(fit3)
The model, at first glance, looks very significant with an adjusted R sqaure of 95.36% and very low P values. However, due to the multicollinearity between the variables, the incremental value increased in R square might be a results from the correlation between the variables. This could lead to inaccurate coefficient estimate since the model has hard time isolating variables. The residue analysis shows that the model has high goodness of fit. In conclusion, the regression model shows that SPY is positively correlated with all three variables and the correlations are significant. One should also take multicollinearity into the consideration when discussing the reliability of the model.
Thinking hedge ratios, how many units of each of the three sectors ETFs would you buy to replicate the SPY return?
cor(df.wide$SPY, df.wide$XLV)*sd(df.wide$SPY)/sd(df.wide$XLV)
## [1] 0.8417804
cor(df.wide$SPY, df.wide$XLE)*sd(df.wide$SPY)/sd(df.wide$XLE)
## [1] 0.5437735
cor(df.wide$SPY, df.wide$XLK)*sd(df.wide$SPY)/sd(df.wide$XLK)
## [1] 0.6721947
One can buy 0.8417804 unit of XLV, 0.5437735 unit of XLE, 0.6721947 units of XLK to replicate SPY return.