library(tidyr)
library(dplyr)
library(ggplot2)
library(sqldf)

Correlation between Gold and Silver Prices

(1) Introduction:

Silver and gold have historically been seen as close substitutes for one another. Both precious metals have been used as currency in the past and both can be used to back currency. There is significant evidence that these metals being an attractive investment in their own right and can play a useful role in diversifying risk. Therefore, we can expect that the prices share similar dynamics. However, there are also economic fundamentals that may act to drive the prices of silver and gold apart.

The main objective of this project is to study the relationship and correlation between prices of silver and gold in commodity market.

(2) Data:

(2.1) Cases:

Each case represents the price of silver and gold for a day. The full dataset represents data for 48 years with approximately 12400 cases. But for this study we have taken a subset of price data for only 13 year period from 2001 to 2013.

(2.2) Data Collection

The data is collected from Quandl website which provides free historic data on registration on the site.

(2.3) Type of study

This is observational study.

(2.4) Data Source

The datasets for this project are downloaded from the following sites after free registration :

(2.5) Response

Price of gold is the response variable. It is numerical continuous variable.

(2.6) Explanatory

The explanatory variable is the price of silver and is numerical.

(3) Data Load:

(3.1) Silver and Gold price data are loaded from the datasets

setwd(getwd())

gold_df = read.csv(file="DATA606\\LBMA-GOLD_FULL.csv", head=TRUE, sep=",",stringsAsFactors = FALSE)
silver_df = read.csv(file="DATA606\\LBMA-SILVER_FULL.csv", head=TRUE, sep=",",stringsAsFactors = FALSE)

gold_df$TransactionDate = as.Date(as.character(gold_df$TransactionDate), "%m/%d/%Y")
silver_df$TransactionDate = as.Date(as.character(silver_df$TransactionDate))

head(gold_df)
##   TransactionDate     USD     GBP    EURO
## 1      2017-03-28 1257.25 1001.65 1156.92
## 2      2017-03-27 1257.55  998.62 1155.56
## 3      2017-03-24 1247.50  999.62 1155.31
## 4      2017-03-23 1247.50  996.38 1156.43
## 5      2017-03-22 1249.05 1003.36 1154.62
## 6      2017-03-21 1241.60  996.88 1148.36
head(silver_df)
##   TransactionDate   USD     GBP    EURO
## 1      2017-03-28 17.94 14.2914 16.5330
## 2      2017-03-27 17.94 14.2505 16.5102
## 3      2017-03-24 17.63 14.1108 16.3135
## 4      2017-03-23 17.55 14.0434 16.2711
## 5      2017-03-22 17.58 14.1228 16.2974
## 6      2017-03-21 17.31 13.8824 16.0130
summary(gold_df)
##  TransactionDate           USD               GBP               EURO       
##  Min.   :1968-04-01   Min.   :  34.75   Min.   :  14.48   Min.   : 236.7  
##  1st Qu.:1980-06-18   1st Qu.: 271.25   1st Qu.: 173.34   1st Qu.: 326.2  
##  Median :1992-09-21   Median : 374.25   Median : 227.51   Median : 566.0  
##  Mean   :1992-09-22   Mean   : 492.67   Mean   : 301.55   Mean   : 674.4  
##  3rd Qu.:2004-12-22   3rd Qu.: 516.81   3rd Qu.: 281.07   3rd Qu.:1007.1  
##  Max.   :2017-03-28   Max.   :1895.00   Max.   :1182.82   Max.   :1382.3  
##                                         NA's   :10        NA's   :7785
summary(silver_df)
##  TransactionDate           USD              GBP              EURO       
##  Min.   :1968-04-01   Min.   : 1.272   Min.   : 0.510   Min.   : 3.812  
##  1st Qu.:1980-06-18   1st Qu.: 4.510   1st Qu.: 2.684   1st Qu.: 5.220  
##  Median :1992-09-21   Median : 5.511   Median : 3.420   Median : 9.927  
##  Mean   :1992-09-22   Mean   : 8.919   Mean   : 5.334   Mean   :11.263  
##  3rd Qu.:2004-12-22   3rd Qu.:11.962   3rd Qu.: 6.537   3rd Qu.:14.996  
##  Max.   :2017-03-28   Max.   :49.450   Max.   :29.258   Max.   :32.883  
##                                                         NA's   :7786

(4) Data Transformtion, Cleanup and Preparation:

(4.1) Apply tidyr select function and select only columns (Date and USD Price) that are relevant for our purpose

gold_df1 = select (gold_df, matches("TransactionDate|USD"))

(4.2) Apply tidyr mutate function to identify the price of the metal as Gold

gold_df1 = mutate (gold_df1, MetalPrice = "GoldPrice")
head(gold_df1, 20)
##    TransactionDate     USD MetalPrice
## 1       2017-03-28 1257.25  GoldPrice
## 2       2017-03-27 1257.55  GoldPrice
## 3       2017-03-24 1247.50  GoldPrice
## 4       2017-03-23 1247.50  GoldPrice
## 5       2017-03-22 1249.05  GoldPrice
## 6       2017-03-21 1241.60  GoldPrice
## 7       2017-03-20 1232.40  GoldPrice
## 8       2017-03-17 1229.60  GoldPrice
## 9       2017-03-16 1229.35  GoldPrice
## 10      2017-03-15 1198.80  GoldPrice
## 11      2017-03-14 1204.60  GoldPrice
## 12      2017-03-13 1204.20  GoldPrice
## 13      2017-03-10 1202.65  GoldPrice
## 14      2017-03-09 1206.55  GoldPrice
## 15      2017-03-08 1209.20  GoldPrice
## 16      2017-03-07 1216.65  GoldPrice
## 17      2017-03-06 1230.95  GoldPrice
## 18      2017-03-03 1226.50  GoldPrice
## 19      2017-03-02 1238.10  GoldPrice
## 20      2017-03-01 1240.40  GoldPrice
nrow(gold_df1)
## [1] 12396

(4.3) Apply tidyr select function and select only columns (Date and USD Price) that are relevant for our purpose

silver_df1 = select (silver_df, matches("TransactionDate|USD"))

(4.4) Apply tidyr mutate function to identify the price of the metal as Silver

silver_df1 = mutate (silver_df1, MetalPrice = "SilverPrice")
head(silver_df1, 20)
##    TransactionDate   USD  MetalPrice
## 1       2017-03-28 17.94 SilverPrice
## 2       2017-03-27 17.94 SilverPrice
## 3       2017-03-24 17.63 SilverPrice
## 4       2017-03-23 17.55 SilverPrice
## 5       2017-03-22 17.58 SilverPrice
## 6       2017-03-21 17.31 SilverPrice
## 7       2017-03-20 17.23 SilverPrice
## 8       2017-03-17 17.40 SilverPrice
## 9       2017-03-16 17.46 SilverPrice
## 10      2017-03-15 16.91 SilverPrice
## 11      2017-03-14 17.00 SilverPrice
## 12      2017-03-13 17.02 SilverPrice
## 13      2017-03-10 16.89 SilverPrice
## 14      2017-03-09 17.14 SilverPrice
## 15      2017-03-08 17.40 SilverPrice
## 16      2017-03-07 17.70 SilverPrice
## 17      2017-03-06 17.81 SilverPrice
## 18      2017-03-03 17.66 SilverPrice
## 19      2017-03-02 18.33 SilverPrice
## 20      2017-03-01 18.33 SilverPrice
nrow(silver_df1)
## [1] 12396

(4.5) Now combine both data frames (Silver and Gold price data frames) into one consolidated data frame for analysis

total_df = rbind(gold_df1, silver_df1)

(4.6) Apply arrange function to sort the data by Date

total_df = arrange(total_df, TransactionDate)
head(total_df, 20)
##    TransactionDate    USD  MetalPrice
## 1       1968-04-01 37.700   GoldPrice
## 2       1968-04-01  2.259 SilverPrice
## 3       1968-04-02 37.300   GoldPrice
## 4       1968-04-02  2.202 SilverPrice
## 5       1968-04-03 37.600   GoldPrice
## 6       1968-04-03  2.195 SilverPrice
## 7       1968-04-04 36.950   GoldPrice
## 8       1968-04-04  2.125 SilverPrice
## 9       1968-04-05 37.000   GoldPrice
## 10      1968-04-05  2.104 SilverPrice
## 11      1968-04-08 37.050   GoldPrice
## 12      1968-04-08  2.134 SilverPrice
## 13      1968-04-09 37.500   GoldPrice
## 14      1968-04-09  2.205 SilverPrice
## 15      1968-04-10 37.700   GoldPrice
## 16      1968-04-10  2.203 SilverPrice
## 17      1968-04-11 38.000   GoldPrice
## 18      1968-04-11  2.161 SilverPrice
## 19      1968-04-16 38.000   GoldPrice
## 20      1968-04-16  2.253 SilverPrice
nrow(total_df)
## [1] 24792

(4.7) Apply spread function to spread the data for each metal by adding one column for Gold and another for Silver

total_df2 = spread(total_df, MetalPrice, USD)
head(total_df2, 20)
##    TransactionDate GoldPrice SilverPrice
## 1       1968-04-01     37.70       2.259
## 2       1968-04-02     37.30       2.202
## 3       1968-04-03     37.60       2.195
## 4       1968-04-04     36.95       2.125
## 5       1968-04-05     37.00       2.104
## 6       1968-04-08     37.05       2.134
## 7       1968-04-09     37.50       2.205
## 8       1968-04-10     37.70       2.203
## 9       1968-04-11     38.00       2.161
## 10      1968-04-16     38.00       2.253
## 11      1968-04-17     37.80       2.303
## 12      1968-04-18     37.55       2.232
## 13      1968-04-19     37.65       2.281
## 14      1968-04-22     38.00       2.360
## 15      1968-04-23     38.40       2.360
## 16      1968-04-24     38.25       2.299
## 17      1968-04-25     38.30       2.308
## 18      1968-04-26     38.65       2.327
## 19      1968-04-29     38.75       2.396
## 20      1968-04-30     39.10       2.397
nrow(total_df2)
## [1] 12396

(4.8) Create a function to calculate the price ratio and round it to 4 decimal digits

calcPriceRatio = function(goldPrice, silverPrice) {
  round((goldPrice/silverPrice), 4)
}

(4.9) Apply tidyr mutate function to add PriceRatio and TransactionYear columns

total_df3 = mutate(total_df2, PriceRatio=calcPriceRatio(GoldPrice, SilverPrice), TransactionYear=substring(TransactionDate,1,4))
tail(total_df3, 20)
##       TransactionDate GoldPrice SilverPrice PriceRatio TransactionYear
## 12377      2017-03-01   1240.40       18.33    67.6705            2017
## 12378      2017-03-02   1238.10       18.33    67.5450            2017
## 12379      2017-03-03   1226.50       17.66    69.4507            2017
## 12380      2017-03-06   1230.95       17.81    69.1157            2017
## 12381      2017-03-07   1216.65       17.70    68.7373            2017
## 12382      2017-03-08   1209.20       17.40    69.4943            2017
## 12383      2017-03-09   1206.55       17.14    70.3938            2017
## 12384      2017-03-10   1202.65       16.89    71.2049            2017
## 12385      2017-03-13   1204.20       17.02    70.7521            2017
## 12386      2017-03-14   1204.60       17.00    70.8588            2017
## 12387      2017-03-15   1198.80       16.91    70.8930            2017
## 12388      2017-03-16   1229.35       17.46    70.4095            2017
## 12389      2017-03-17   1229.60       17.40    70.6667            2017
## 12390      2017-03-20   1232.40       17.23    71.5264            2017
## 12391      2017-03-21   1241.60       17.31    71.7273            2017
## 12392      2017-03-22   1249.05       17.58    71.0495            2017
## 12393      2017-03-23   1247.50       17.55    71.0826            2017
## 12394      2017-03-24   1247.50       17.63    70.7601            2017
## 12395      2017-03-27   1257.55       17.94    70.0975            2017
## 12396      2017-03-28   1257.25       17.94    70.0808            2017

(4.10) Select price data for years 2001 to 2013

total_2001_2013_df = filter(total_df3, TransactionYear %in% c("2001", "2002", "2003", "2004", "2005", "2006", "2007",
                                                              "2008", "2009", "2010", "2011", "2012", "2013"))
nrow(total_2001_2013_df) 
## [1] 3285
tail(total_2001_2013_df, 20)
##      TransactionDate GoldPrice SilverPrice PriceRatio TransactionYear
## 3266      2013-12-02   1229.50       19.75    62.2532            2013
## 3267      2013-12-03   1217.25       19.17    63.4977            2013
## 3268      2013-12-04   1227.50       19.05    64.4357            2013
## 3269      2013-12-05   1222.50       19.46    62.8212            2013
## 3270      2013-12-06   1233.00       19.49    63.2632            2013
## 3271      2013-12-09   1237.00       19.50    63.4359            2013
## 3272      2013-12-10   1266.25       20.05    63.1546            2013
## 3273      2013-12-11   1260.75       20.39    61.8318            2013
## 3274      2013-12-12   1225.25       19.80    61.8813            2013
## 3275      2013-12-13   1232.00       19.55    63.0179            2013
## 3276      2013-12-16   1234.75       19.50    63.3205            2013
## 3277      2013-12-17   1231.75       20.02    61.5260            2013
## 3278      2013-12-18   1230.50       19.94    61.7101            2013
## 3279      2013-12-19   1196.00       19.34    61.8407            2013
## 3280      2013-12-20   1195.25       19.33    61.8339            2013
## 3281      2013-12-23   1199.00       19.37    61.8998            2013
## 3282      2013-12-24   1214.50       19.40    62.6031            2013
## 3283      2013-12-27   1214.50       19.92    60.9689            2013
## 3284      2013-12-30   1204.50       19.65    61.2977            2013
## 3285      2013-12-31   1225.00       19.50    62.8205            2013
summary(total_2001_2013_df)
##  TransactionDate        GoldPrice       SilverPrice       PriceRatio    TransactionYear   
##  Min.   :2001-01-02   Min.   : 255.9   Min.   : 4.065   Min.   :31.53   Length:3285       
##  1st Qu.:2004-03-31   1st Qu.: 399.3   1st Qu.: 6.200   1st Qu.:53.09   Class :character  
##  Median :2007-07-03   Median : 673.0   Median :12.850   Median :60.28   Mode  :character  
##  Mean   :2007-07-01   Mean   : 831.9   Mean   :14.807   Mean   :60.03                     
##  3rd Qu.:2010-09-29   3rd Qu.:1244.0   3rd Qu.:19.650   3rd Qu.:65.72                     
##  Max.   :2013-12-31   Max.   :1895.0   Max.   :48.700   Max.   :85.54

(5) Data Visualization:

(5.1) Gold Price Data

ggplot (gold_df, aes(TransactionDate, USD)) + geom_line (aes(color="Gold")) +
  labs (color="Legend") + scale_colour_manual ("", breaks = c("gold"), values = c("goldenrod3")) +
  ggtitle ("Gold Prices") + theme (plot.title = element_text(lineheight=0.7, face="bold"))

(5.2) Gold Price Histogram

ggplot(data=gold_df, aes(gold_df$USD)) + geom_histogram(aes(fill = ..count..)) +
  scale_fill_gradient("Count", low = "yellow", high = "goldenrod3")

(5.3) Silver Price Data

ggplot (silver_df, aes(TransactionDate, USD)) +geom_line (aes(color="Silver")) +
  labs (color="Legend") + scale_colour_manual ("", breaks = c("silver"), values = c("ivory4")) +
  ggtitle ("Silver Prices") + theme (plot.title = element_text(lineheight=0.7, face="bold"))

(5.4) Silver Price Histogram

ggplot(data=silver_df, aes(silver_df$USD)) + geom_histogram(aes(fill = ..count..)) +
  scale_fill_gradient("Count", low = "grey", high = "ivory4")

(5.5) Gold/Silver Price Ratio Data

ggplot (total_2001_2013_df, aes(TransactionDate, PriceRatio)) + geom_line (aes(color="Blue")) +
  labs (color="Legend") + scale_colour_manual ("", breaks = c("gold"), values = c("goldenrod3")) +
  ggtitle ("Gold and Silver Price Ratio") + theme (plot.title = element_text(lineheight=0.7, face="bold"))

(5.6) Gold/Silver Price Ratio Histogram

ggplot(data=total_2001_2013_df, aes(total_2001_2013_df$PriceRatio)) + geom_histogram(aes(fill = ..count..)) +
  scale_fill_gradient("Count", low = "lightblue", high = "green") +
  ggtitle ("Gold and Silver Price Ratio Histogram")

(6) Statistical Analysis:

In this section we will create a linear regression model and calculate the correlation between the data to see if there is a strong relationship between silver and gold prices.

(6.1) Create a function to calculate the correlation and round it to 4 decimal digits

findCorrelation <- function() {
  x = total_2001_2013_df$SilverPrice
  y = total_2001_2013_df$GoldPrice
  corr = round(cor(x, y),4)
  print (paste0("Correlation = ",corr))
  return (corr)
}

c = findCorrelation()
## [1] "Correlation = 0.9641"

(6.2) Create a function for Linear Model

findStatsFunction <- function() {
  m = lm (GoldPrice ~ SilverPrice, data = total_2001_2013_df)
  s = summary(m)
  print(s)
  
  slp = round(m$coefficients[2], 4)
  int = round(m$coefficients[1], 4)

  return (m)
}
m = findStatsFunction()
## 
## Call:
## lm(formula = GoldPrice ~ SilverPrice, data = total_2001_2013_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -868.52  -63.32  -28.43   98.98  266.19 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 145.1343     3.9866   36.41   <2e-16 ***
## SilverPrice  46.3836     0.2228  208.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 128.3 on 3283 degrees of freedom
## Multiple R-squared:  0.9296, Adjusted R-squared:  0.9296 
## F-statistic: 4.334e+04 on 1 and 3283 DF,  p-value: < 2.2e-16

\[ \hat{goldPrice} = 145.1343 + 46.3836 * silverPrice \]

(6.3) Display the Linear Model

plot = ggplot(total_2001_2013_df, aes(SilverPrice, GoldPrice)) + geom_point(colour="blue") + 
    xlab("Silver Price") + ylab("Gold Price") + labs(title = "Gold Price vs Silver Price") +
    geom_abline(aes(slope=round(m$coefficients[2], 4), intercept=round(m$coefficients[1], 4), color="red"))
  print(plot)

(6.4) Regression Statistics

Linear Regression Equation Correlation Coefficient Multiple R-Square R-Square Description
goldPrice = 145.1343 + (46.3836 * silverPrice) 0.9641 0.9296 0.9296 Strong correlation. Model fits the data


(6.5) Hypothesis Testing

H_0 :   Null Hypothesis   There is no relationship between silver and gold prices
H_A :   Alternative Hypothesis   There is a relationship between silver and gold prices

Here the multiple R value is 0.9296 which shows that there is significant correlation between silver and gold prices. Also the value of R square is 0.9296 which shows the extent to which the silver price affects the gold price. Therefore, we reject the null hypothesis (H_0) and accept the Alternative hypothesis (H_1).

(7) Conclusion:

We notice that the two variables (silver price and gold Price) change in the same direction. If the silver price increases the gold price increases as well. Therefore, there is a positive correlation of 0.9641 between the two variables. Also, from the linear regression model, we can reject the null hypothesis and accept the alternative hypothesis. We conclude that there is a strong relationship between silver and gold prices for the 13 year period of study (2001-2013).