library(tidyr)
library(dplyr)
library(ggplot2)
library(sqldf)
Silver and gold have historically been seen as close substitutes for one another. Both precious metals have been used as currency in the past and both can be used to back currency. There is significant evidence that these metals being an attractive investment in their own right and can play a useful role in diversifying risk. Therefore, we can expect that the prices share similar dynamics. However, there are also economic fundamentals that may act to drive the prices of silver and gold apart.
The main objective of this project is to study the relationship and correlation between prices of silver and gold in commodity market.
Each case represents the price of silver and gold for a day. The full dataset represents data for 48 years with approximately 12400 cases. But for this study we have taken a subset of price data for only 13 year period from 2001 to 2013.
The data is collected from Quandl website which provides free historic data on registration on the site.
This is observational study.
The datasets for this project are downloaded from the following sites after free registration :
Price of gold is the response variable. It is numerical continuous variable.
The explanatory variable is the price of silver and is numerical.
setwd(getwd())
gold_df = read.csv(file="DATA606\\LBMA-GOLD_FULL.csv", head=TRUE, sep=",",stringsAsFactors = FALSE)
silver_df = read.csv(file="DATA606\\LBMA-SILVER_FULL.csv", head=TRUE, sep=",",stringsAsFactors = FALSE)
gold_df$TransactionDate = as.Date(as.character(gold_df$TransactionDate), "%m/%d/%Y")
silver_df$TransactionDate = as.Date(as.character(silver_df$TransactionDate))
head(gold_df)
## TransactionDate USD GBP EURO
## 1 2017-03-28 1257.25 1001.65 1156.92
## 2 2017-03-27 1257.55 998.62 1155.56
## 3 2017-03-24 1247.50 999.62 1155.31
## 4 2017-03-23 1247.50 996.38 1156.43
## 5 2017-03-22 1249.05 1003.36 1154.62
## 6 2017-03-21 1241.60 996.88 1148.36
head(silver_df)
## TransactionDate USD GBP EURO
## 1 2017-03-28 17.94 14.2914 16.5330
## 2 2017-03-27 17.94 14.2505 16.5102
## 3 2017-03-24 17.63 14.1108 16.3135
## 4 2017-03-23 17.55 14.0434 16.2711
## 5 2017-03-22 17.58 14.1228 16.2974
## 6 2017-03-21 17.31 13.8824 16.0130
summary(gold_df)
## TransactionDate USD GBP EURO
## Min. :1968-04-01 Min. : 34.75 Min. : 14.48 Min. : 236.7
## 1st Qu.:1980-06-18 1st Qu.: 271.25 1st Qu.: 173.34 1st Qu.: 326.2
## Median :1992-09-21 Median : 374.25 Median : 227.51 Median : 566.0
## Mean :1992-09-22 Mean : 492.67 Mean : 301.55 Mean : 674.4
## 3rd Qu.:2004-12-22 3rd Qu.: 516.81 3rd Qu.: 281.07 3rd Qu.:1007.1
## Max. :2017-03-28 Max. :1895.00 Max. :1182.82 Max. :1382.3
## NA's :10 NA's :7785
summary(silver_df)
## TransactionDate USD GBP EURO
## Min. :1968-04-01 Min. : 1.272 Min. : 0.510 Min. : 3.812
## 1st Qu.:1980-06-18 1st Qu.: 4.510 1st Qu.: 2.684 1st Qu.: 5.220
## Median :1992-09-21 Median : 5.511 Median : 3.420 Median : 9.927
## Mean :1992-09-22 Mean : 8.919 Mean : 5.334 Mean :11.263
## 3rd Qu.:2004-12-22 3rd Qu.:11.962 3rd Qu.: 6.537 3rd Qu.:14.996
## Max. :2017-03-28 Max. :49.450 Max. :29.258 Max. :32.883
## NA's :7786
gold_df1 = select (gold_df, matches("TransactionDate|USD"))
gold_df1 = mutate (gold_df1, MetalPrice = "GoldPrice")
head(gold_df1, 20)
## TransactionDate USD MetalPrice
## 1 2017-03-28 1257.25 GoldPrice
## 2 2017-03-27 1257.55 GoldPrice
## 3 2017-03-24 1247.50 GoldPrice
## 4 2017-03-23 1247.50 GoldPrice
## 5 2017-03-22 1249.05 GoldPrice
## 6 2017-03-21 1241.60 GoldPrice
## 7 2017-03-20 1232.40 GoldPrice
## 8 2017-03-17 1229.60 GoldPrice
## 9 2017-03-16 1229.35 GoldPrice
## 10 2017-03-15 1198.80 GoldPrice
## 11 2017-03-14 1204.60 GoldPrice
## 12 2017-03-13 1204.20 GoldPrice
## 13 2017-03-10 1202.65 GoldPrice
## 14 2017-03-09 1206.55 GoldPrice
## 15 2017-03-08 1209.20 GoldPrice
## 16 2017-03-07 1216.65 GoldPrice
## 17 2017-03-06 1230.95 GoldPrice
## 18 2017-03-03 1226.50 GoldPrice
## 19 2017-03-02 1238.10 GoldPrice
## 20 2017-03-01 1240.40 GoldPrice
nrow(gold_df1)
## [1] 12396
silver_df1 = select (silver_df, matches("TransactionDate|USD"))
silver_df1 = mutate (silver_df1, MetalPrice = "SilverPrice")
head(silver_df1, 20)
## TransactionDate USD MetalPrice
## 1 2017-03-28 17.94 SilverPrice
## 2 2017-03-27 17.94 SilverPrice
## 3 2017-03-24 17.63 SilverPrice
## 4 2017-03-23 17.55 SilverPrice
## 5 2017-03-22 17.58 SilverPrice
## 6 2017-03-21 17.31 SilverPrice
## 7 2017-03-20 17.23 SilverPrice
## 8 2017-03-17 17.40 SilverPrice
## 9 2017-03-16 17.46 SilverPrice
## 10 2017-03-15 16.91 SilverPrice
## 11 2017-03-14 17.00 SilverPrice
## 12 2017-03-13 17.02 SilverPrice
## 13 2017-03-10 16.89 SilverPrice
## 14 2017-03-09 17.14 SilverPrice
## 15 2017-03-08 17.40 SilverPrice
## 16 2017-03-07 17.70 SilverPrice
## 17 2017-03-06 17.81 SilverPrice
## 18 2017-03-03 17.66 SilverPrice
## 19 2017-03-02 18.33 SilverPrice
## 20 2017-03-01 18.33 SilverPrice
nrow(silver_df1)
## [1] 12396
total_df = rbind(gold_df1, silver_df1)
total_df = arrange(total_df, TransactionDate)
head(total_df, 20)
## TransactionDate USD MetalPrice
## 1 1968-04-01 37.700 GoldPrice
## 2 1968-04-01 2.259 SilverPrice
## 3 1968-04-02 37.300 GoldPrice
## 4 1968-04-02 2.202 SilverPrice
## 5 1968-04-03 37.600 GoldPrice
## 6 1968-04-03 2.195 SilverPrice
## 7 1968-04-04 36.950 GoldPrice
## 8 1968-04-04 2.125 SilverPrice
## 9 1968-04-05 37.000 GoldPrice
## 10 1968-04-05 2.104 SilverPrice
## 11 1968-04-08 37.050 GoldPrice
## 12 1968-04-08 2.134 SilverPrice
## 13 1968-04-09 37.500 GoldPrice
## 14 1968-04-09 2.205 SilverPrice
## 15 1968-04-10 37.700 GoldPrice
## 16 1968-04-10 2.203 SilverPrice
## 17 1968-04-11 38.000 GoldPrice
## 18 1968-04-11 2.161 SilverPrice
## 19 1968-04-16 38.000 GoldPrice
## 20 1968-04-16 2.253 SilverPrice
nrow(total_df)
## [1] 24792
total_df2 = spread(total_df, MetalPrice, USD)
head(total_df2, 20)
## TransactionDate GoldPrice SilverPrice
## 1 1968-04-01 37.70 2.259
## 2 1968-04-02 37.30 2.202
## 3 1968-04-03 37.60 2.195
## 4 1968-04-04 36.95 2.125
## 5 1968-04-05 37.00 2.104
## 6 1968-04-08 37.05 2.134
## 7 1968-04-09 37.50 2.205
## 8 1968-04-10 37.70 2.203
## 9 1968-04-11 38.00 2.161
## 10 1968-04-16 38.00 2.253
## 11 1968-04-17 37.80 2.303
## 12 1968-04-18 37.55 2.232
## 13 1968-04-19 37.65 2.281
## 14 1968-04-22 38.00 2.360
## 15 1968-04-23 38.40 2.360
## 16 1968-04-24 38.25 2.299
## 17 1968-04-25 38.30 2.308
## 18 1968-04-26 38.65 2.327
## 19 1968-04-29 38.75 2.396
## 20 1968-04-30 39.10 2.397
nrow(total_df2)
## [1] 12396
calcPriceRatio = function(goldPrice, silverPrice) {
round((goldPrice/silverPrice), 4)
}
total_df3 = mutate(total_df2, PriceRatio=calcPriceRatio(GoldPrice, SilverPrice), TransactionYear=substring(TransactionDate,1,4))
tail(total_df3, 20)
## TransactionDate GoldPrice SilverPrice PriceRatio TransactionYear
## 12377 2017-03-01 1240.40 18.33 67.6705 2017
## 12378 2017-03-02 1238.10 18.33 67.5450 2017
## 12379 2017-03-03 1226.50 17.66 69.4507 2017
## 12380 2017-03-06 1230.95 17.81 69.1157 2017
## 12381 2017-03-07 1216.65 17.70 68.7373 2017
## 12382 2017-03-08 1209.20 17.40 69.4943 2017
## 12383 2017-03-09 1206.55 17.14 70.3938 2017
## 12384 2017-03-10 1202.65 16.89 71.2049 2017
## 12385 2017-03-13 1204.20 17.02 70.7521 2017
## 12386 2017-03-14 1204.60 17.00 70.8588 2017
## 12387 2017-03-15 1198.80 16.91 70.8930 2017
## 12388 2017-03-16 1229.35 17.46 70.4095 2017
## 12389 2017-03-17 1229.60 17.40 70.6667 2017
## 12390 2017-03-20 1232.40 17.23 71.5264 2017
## 12391 2017-03-21 1241.60 17.31 71.7273 2017
## 12392 2017-03-22 1249.05 17.58 71.0495 2017
## 12393 2017-03-23 1247.50 17.55 71.0826 2017
## 12394 2017-03-24 1247.50 17.63 70.7601 2017
## 12395 2017-03-27 1257.55 17.94 70.0975 2017
## 12396 2017-03-28 1257.25 17.94 70.0808 2017
total_2001_2013_df = filter(total_df3, TransactionYear %in% c("2001", "2002", "2003", "2004", "2005", "2006", "2007",
"2008", "2009", "2010", "2011", "2012", "2013"))
nrow(total_2001_2013_df)
## [1] 3285
tail(total_2001_2013_df, 20)
## TransactionDate GoldPrice SilverPrice PriceRatio TransactionYear
## 3266 2013-12-02 1229.50 19.75 62.2532 2013
## 3267 2013-12-03 1217.25 19.17 63.4977 2013
## 3268 2013-12-04 1227.50 19.05 64.4357 2013
## 3269 2013-12-05 1222.50 19.46 62.8212 2013
## 3270 2013-12-06 1233.00 19.49 63.2632 2013
## 3271 2013-12-09 1237.00 19.50 63.4359 2013
## 3272 2013-12-10 1266.25 20.05 63.1546 2013
## 3273 2013-12-11 1260.75 20.39 61.8318 2013
## 3274 2013-12-12 1225.25 19.80 61.8813 2013
## 3275 2013-12-13 1232.00 19.55 63.0179 2013
## 3276 2013-12-16 1234.75 19.50 63.3205 2013
## 3277 2013-12-17 1231.75 20.02 61.5260 2013
## 3278 2013-12-18 1230.50 19.94 61.7101 2013
## 3279 2013-12-19 1196.00 19.34 61.8407 2013
## 3280 2013-12-20 1195.25 19.33 61.8339 2013
## 3281 2013-12-23 1199.00 19.37 61.8998 2013
## 3282 2013-12-24 1214.50 19.40 62.6031 2013
## 3283 2013-12-27 1214.50 19.92 60.9689 2013
## 3284 2013-12-30 1204.50 19.65 61.2977 2013
## 3285 2013-12-31 1225.00 19.50 62.8205 2013
summary(total_2001_2013_df)
## TransactionDate GoldPrice SilverPrice PriceRatio TransactionYear
## Min. :2001-01-02 Min. : 255.9 Min. : 4.065 Min. :31.53 Length:3285
## 1st Qu.:2004-03-31 1st Qu.: 399.3 1st Qu.: 6.200 1st Qu.:53.09 Class :character
## Median :2007-07-03 Median : 673.0 Median :12.850 Median :60.28 Mode :character
## Mean :2007-07-01 Mean : 831.9 Mean :14.807 Mean :60.03
## 3rd Qu.:2010-09-29 3rd Qu.:1244.0 3rd Qu.:19.650 3rd Qu.:65.72
## Max. :2013-12-31 Max. :1895.0 Max. :48.700 Max. :85.54
ggplot (gold_df, aes(TransactionDate, USD)) + geom_line (aes(color="Gold")) +
labs (color="Legend") + scale_colour_manual ("", breaks = c("gold"), values = c("goldenrod3")) +
ggtitle ("Gold Prices") + theme (plot.title = element_text(lineheight=0.7, face="bold"))
ggplot(data=gold_df, aes(gold_df$USD)) + geom_histogram(aes(fill = ..count..)) +
scale_fill_gradient("Count", low = "yellow", high = "goldenrod3")
ggplot (silver_df, aes(TransactionDate, USD)) +geom_line (aes(color="Silver")) +
labs (color="Legend") + scale_colour_manual ("", breaks = c("silver"), values = c("ivory4")) +
ggtitle ("Silver Prices") + theme (plot.title = element_text(lineheight=0.7, face="bold"))
ggplot(data=silver_df, aes(silver_df$USD)) + geom_histogram(aes(fill = ..count..)) +
scale_fill_gradient("Count", low = "grey", high = "ivory4")
ggplot (total_2001_2013_df, aes(TransactionDate, PriceRatio)) + geom_line (aes(color="Blue")) +
labs (color="Legend") + scale_colour_manual ("", breaks = c("gold"), values = c("goldenrod3")) +
ggtitle ("Gold and Silver Price Ratio") + theme (plot.title = element_text(lineheight=0.7, face="bold"))
ggplot(data=total_2001_2013_df, aes(total_2001_2013_df$PriceRatio)) + geom_histogram(aes(fill = ..count..)) +
scale_fill_gradient("Count", low = "lightblue", high = "green") +
ggtitle ("Gold and Silver Price Ratio Histogram")
In this section we will create a linear regression model and calculate the correlation between the data to see if there is a strong relationship between silver and gold prices.
findCorrelation <- function() {
x = total_2001_2013_df$SilverPrice
y = total_2001_2013_df$GoldPrice
corr = round(cor(x, y),4)
print (paste0("Correlation = ",corr))
return (corr)
}
c = findCorrelation()
## [1] "Correlation = 0.9641"
findStatsFunction <- function() {
m = lm (GoldPrice ~ SilverPrice, data = total_2001_2013_df)
s = summary(m)
print(s)
slp = round(m$coefficients[2], 4)
int = round(m$coefficients[1], 4)
return (m)
}
m = findStatsFunction()
##
## Call:
## lm(formula = GoldPrice ~ SilverPrice, data = total_2001_2013_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -868.52 -63.32 -28.43 98.98 266.19
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 145.1343 3.9866 36.41 <2e-16 ***
## SilverPrice 46.3836 0.2228 208.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 128.3 on 3283 degrees of freedom
## Multiple R-squared: 0.9296, Adjusted R-squared: 0.9296
## F-statistic: 4.334e+04 on 1 and 3283 DF, p-value: < 2.2e-16
\[ \hat{goldPrice} = 145.1343 + 46.3836 * silverPrice \]
plot = ggplot(total_2001_2013_df, aes(SilverPrice, GoldPrice)) + geom_point(colour="blue") +
xlab("Silver Price") + ylab("Gold Price") + labs(title = "Gold Price vs Silver Price") +
geom_abline(aes(slope=round(m$coefficients[2], 4), intercept=round(m$coefficients[1], 4), color="red"))
print(plot)
| Linear Regression Equation | Correlation Coefficient | Multiple R-Square | R-Square | Description |
|---|---|---|---|---|
| goldPrice = 145.1343 + (46.3836 * silverPrice) | 0.9641 | 0.9296 | 0.9296 | Strong correlation. Model fits the data |
| H_0 : | Null Hypothesis | There is no relationship between silver and gold prices |
| H_A : | Alternative Hypothesis | There is a relationship between silver and gold prices |
Here the multiple R value is 0.9296 which shows that there is significant correlation between silver and gold prices. Also the value of R square is 0.9296 which shows the extent to which the silver price affects the gold price. Therefore, we reject the null hypothesis (H_0) and accept the Alternative hypothesis (H_1).
We notice that the two variables (silver price and gold Price) change in the same direction. If the silver price increases the gold price increases as well. Therefore, there is a positive correlation of 0.9641 between the two variables. Also, from the linear regression model, we can reject the null hypothesis and accept the alternative hypothesis. We conclude that there is a strong relationship between silver and gold prices for the 13 year period of study (2001-2013).