Gold vs Gold - Multiple Linear Regression with a Dichotomouos Variable

Using Gold ETF prices and S&P 500 prices from Yahoo, and Gold spot prices from FRED, I created a Multiple Linear Regression to see if monthly percent changes in Gold spot price and negative percent change days in the S&P 500 (“down market days”) have an effect on monthly percent changes in the Gold ETF (dependent variable) on a contemporaneous basis.

The reasoning behind this exercise is to discern investor “safe-haven”" behavior when equity markets (as proxied by the S&P 500) are down, and if increases in Gold spot prices have a role in the GLD ETF. The one issue here is that the GLD ETF is suppose to mirror the Gold spot price, so there’s a lot of overlap in the two time series from a pattern and distribution perspective.

getSymbols("GLD",src="yahoo",auto.assign=TRUE)
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## 
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
## 
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## [1] "GLD"
getSymbols("^GSPC",src="yahoo",auto.assign=TRUE)
## [1] "GSPC"
GOLD = getSymbols("GOLDPMGBD228NLBM",src="FRED",auto.assign=FALSE)

# drop NAs
GLD = na.omit(GLD)
GOLD = na.omit(GOLD)
SPX = na.omit(GSPC)

# convert to monthly using last
GLD.monthly = GLD[endpoints(GLD,on="months",k=1),]
GOLD.monthly = GOLD[endpoints(GOLD,on="months",k=1),]
SPX.monthly = SPX[endpoints(SPX,on="months",k=1),]


# Plot monthly data
par(mfrow=c(1,1))
plot.xts(GLD.monthly$GLD.Adjusted["2007-02-28::2018-03-29"], main="Gold ETF Price")

plot.xts(GOLD.monthly["2007-02-28::2018-03-29"], main="Gold Spot Price")

plot.xts(SPX.monthly$GSPC.Adjusted["2007-02-28::2018-03-29"], main="S&P 500")

# calculate pct change
GLD.pct = log(GLD.monthly/lag(GLD.monthly,1))*100
GOLD.pct = log(GOLD.monthly/lag(GOLD.monthly,1))*100
SPX.pct = log(SPX.monthly/lag(SPX.monthly,1))*100

# select same sample period
GLD.pct05 = GLD.pct$GLD.Adjusted["2007-02-28::2018-03-29"]
GOLD.pct05 = GOLD.pct["2007-02-28::2018-03-29"]
SPX.pct05 = SPX.pct$GSPC.Adjusted["2007-02-28::2018-03-29"]
SPX_DV = ifelse(SPX.pct05 < 0, 1, 0)

EDA

There is a large positive correlation between the independent variable and pct change in gold spot prices (contempraneously). However, the dummary variable (1 if S&P 500 was down, or 0 if it was up) isn’t statistically significant. The adjusted r-squared is still fairly high.

The residuals are roughly random (as we would like, see QQ Plot and residuals plot), but there are some outliers (points 61, 62, & 96) that are influencing the relationship of these variables.

par(mfrow=c(2,2))

# Boxplots of continuous variables
boxplot(coredata(GLD.pct05),main="Boxplot of GLD ETF, Monthly % Ch.")
boxplot(coredata(GOLD.pct05),main="Boxplot of GOLD Spot Prices, Monthly % Ch.")
par(mfrow=c(1,1))

# Table of how many down (1) vs up (0) days in the S&P 500
table(coredata(SPX_DV))
## 
##  0  1 
## 83 51
# Mosaic plot of GLD ETF and S&P down/up days
plot(table(coredata(SPX_DV),coredata(SPX_DV)),main="Mosaic plot of GLD ETF and S&P down/up days")

# Plot of Gold ETF % ch. vs Gold Spot Price % ch.
plot(coredata(GLD.pct05)~coredata(GOLD.pct05), main="Gold ETF % Ch. vs Gold Spot Price % Ch.", xlab="Gold Spot Price % ch.", ylab="Gold ETF % ch.", col="orange", pch=16, cex=1.3)
abline(lm(GLD.pct05 ~ GOLD.pct05))

# Correlation
cor(coredata(GLD.pct05), coredata(GOLD.pct05))
##              GOLDPMGBD228NLBM
## GLD.Adjusted        0.9740247
# Multiple linear regression
model2 = lm(GLD.pct05 ~ GOLD.pct05 + SPX_DV)
summary(model2)
## 
## Call:
## lm(formula = GLD.pct05 ~ GOLD.pct05 + SPX_DV)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4366 -0.6306 -0.0451  0.5627  4.6367 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.01425    0.13387   0.106    0.915    
## GOLD.pct05   0.95449    0.01940  49.207   <2e-16 ***
## SPX_DV      -0.06759    0.21702  -0.311    0.756    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.218 on 131 degrees of freedom
## Multiple R-squared:  0.9488, Adjusted R-squared:  0.948 
## F-statistic:  1213 on 2 and 131 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model2)