Using Gold ETF prices from Yahoo and Gold spot prices from FRED, I created a Simple Linear Regression to see if monthly percent changes in Gold spot price (independent variable) have an effect on monthly percent changes in the Gold ETF (dependent variable) on a contemporaneous basis.
getSymbols("GLD",src="yahoo",auto.assign=TRUE)
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
##
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## [1] "GLD"
GOLD = getSymbols("GOLDPMGBD228NLBM",src="FRED",auto.assign=FALSE)
# drop NAs
GLD = na.omit(GLD)
GOLD = na.omit(GOLD)
# convert to monthly using last
GLD.monthly = GLD[endpoints(GLD,on="months",k=1),]
GOLD.monthly = GOLD[endpoints(GOLD,on="months",k=1),]
# Plot monthly data
par(mfrow=c(1,1))
plot.xts(GLD.monthly$GLD.Adjusted["2007-02-28::2018-03-29"], main="Gold ETF Price")
plot.xts(GOLD.monthly["2007-02-28::2018-03-29"], main="Gold Spot Price")
# calculate pct change
GLD.pct = log(GLD.monthly/lag(GLD.monthly,1))*100
GOLD.pct = log(GOLD.monthly/lag(GOLD.monthly,1))*100
# select same sample period
GLD.pct05 = GLD.pct$GLD.Adjusted["2007-02-28::2018-03-29"]
GOLD.pct05 = GOLD.pct["2007-02-28::2018-03-29"]
There is a large positive correlation between the two variables (contempraneously). A simple linear regression seems to capture the relationship well (statistically significant). The 4 plots of the model show that the residuals are roughly random (as we would like, see plot “Residuals vs. Fitted”), the residuals are roughly normally distributed with some outliers (see plot “Normal Q-Q”), but there are some outliers that are influencing the relationship of these variables (points 62, 77, & 96; see plot “Residuals vs. Leverage”).
plot(coredata(GLD.pct05)~coredata(GOLD.pct05), main="Gold ETF % Ch. vs Gold Spot Price % Ch.", xlab="Gold Spot Price % ch.", ylab="Gold ETF % ch.", col="orange", pch=16, cex=1.3)
abline(lm(GLD.pct05 ~ GOLD.pct05))
# Correlation
cor(coredata(GLD.pct05), coredata(GOLD.pct05))
## GOLDPMGBD228NLBM
## GLD.Adjusted 0.9740247
# Simple linear regression
model = lm(GLD.pct05 ~ GOLD.pct05)
summary(model)
##
## Call:
## lm(formula = GLD.pct05 ~ GOLD.pct05)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4105 -0.6285 -0.0221 0.5218 4.6604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01132 0.10538 -0.107 0.915
## GOLD.pct05 0.95420 0.01931 49.420 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.214 on 132 degrees of freedom
## Multiple R-squared: 0.9487, Adjusted R-squared: 0.9483
## F-statistic: 2442 on 1 and 132 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model)