Silver and gold are close substitutes for one another. Both have been used as currency in the pas. There is significant evidence that these metals being an attractive investment and can play a useful role in diversifying risk. There are also economic fundamentals that may act to drive the prices of silver and gold apart.
The main objective of this project is to study the relationship and correlation between prices of silver and gold in commodity market.
The datasets for this project are from the following sites:
Each case represents the price of silver and gold for a day. The full dataset represents data for 48 years with approximately 12618 cases.
The data is collected from quandl.com
This is observational study.
Price of gold is the response variable. It is numerical continuous variable.
The explanatory variable is the price of silver and is numerical.
#install.packages("tidyr")
#install.packages("magrittr")
#install.packages("sqldf")
#install.packages("ggplot2")
#install.packages("dplyr")
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.4.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
## dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 6): Library not loaded: /opt/X11/lib/libfontconfig.1.dylib
## Referenced from: /Library/Frameworks/R.framework/Resources/modules//R_X11.so
## Reason: Incompatible library version: R_X11.so requires version 11.0.0 or later, but libfontconfig.1.dylib provides version 10.0.0
## Could not load tcltk. Will use slower R code instead.
## Loading required package: RSQLite
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:tidyr':
##
## extract
Silver and Gold price data are loaded from the datasets
gold_df <- read.csv(file="/Users/anjalhussan/Desktop/data_science/proposal/GOLD.csv", head=TRUE, sep=",",stringsAsFactors = FALSE)
head(gold_df)
## Date USD..AM. USD..PM. GBP..AM. GBP..PM. EURO..AM. EURO..PM.
## 1 2017-11-30 1282.15 1280.20 952.64 948.88 1084.06 1074.98
## 2 2017-11-29 1294.85 1283.85 965.70 957.50 1092.46 1085.11
## 3 2017-11-28 1293.90 1291.85 972.75 974.18 1088.95 1087.61
## 4 2017-11-27 1294.70 1294.90 969.73 969.36 1084.83 1085.00
## 5 2017-11-24 1289.15 1290.50 967.89 966.58 1086.37 1082.60
## 6 2017-11-23 1290.15 1290.35 969.93 969.96 1089.40 1089.18
silver_df = read.csv(file="/Users/anjalhussan/Desktop/data_science/proposal/SILVER.csv", head=TRUE, sep=",",stringsAsFactors = FALSE)
head(silver_df)
## Date USD GBP EURO
## 1 2017-11-30 16.570 12.32 14.00
## 2 2017-11-29 16.895 12.60 14.26
## 3 2017-11-28 17.070 12.84 14.36
## 4 2017-11-27 17.100 12.81 14.32
## 5 2017-11-24 17.050 12.80 14.38
## 6 2017-11-23 17.095 12.84 14.43
Apply tidyr select function and select only columns (Date and USD..AM Price) form gold_df data set that are relevant for our purpose
select(gold_df)
## data frame with 0 columns and 12618 rows
gold_dataSet = select (gold_df, matches("Date|USD..AM"))
Apply tidyr mutate functin to identify the price of the metal as gold
gold_dataSet = mutate (gold_dataSet, MetalPrice = "GoldPrice")
colnames(gold_dataSet)[2] <- "USD"
head(gold_dataSet)
## Date USD MetalPrice
## 1 2017-11-30 1282.15 GoldPrice
## 2 2017-11-29 1294.85 GoldPrice
## 3 2017-11-28 1293.90 GoldPrice
## 4 2017-11-27 1294.70 GoldPrice
## 5 2017-11-24 1289.15 GoldPrice
## 6 2017-11-23 1290.15 GoldPrice
Apply tidyr select function and select only columns (Date and USD Price) form silver_df data set that are relevant for our purpose. Also Apply tidyr mutate function to identify the price of the metal as Silver
select(silver_df)
## data frame with 0 columns and 12628 rows
Silver_dataSet = select (silver_df, matches("Date|USD"))
silver_df$Date = as.Date(as.character(silver_df$Date))
Silver_dataSet = mutate (Silver_dataSet, MetalPrice = "SilverPrice")
head(Silver_dataSet)
## Date USD MetalPrice
## 1 2017-11-30 16.570 SilverPrice
## 2 2017-11-29 16.895 SilverPrice
## 3 2017-11-28 17.070 SilverPrice
## 4 2017-11-27 17.100 SilverPrice
## 5 2017-11-24 17.050 SilverPrice
## 6 2017-11-23 17.095 SilverPrice
Now combine both data frames (Silver and Gold price data frames) into one consolidated data frame for analysis
Silver_dataSet <- Silver_dataSet[1:nrow(gold_dataSet),1:3]
gold_dataSet <- gold_dataSet[,1:3]
#do.call(rbind, gold_dataSet)
combinedDataSets = rbind(gold_dataSet, Silver_dataSet)
head(combinedDataSets)
## Date USD MetalPrice
## 1 2017-11-30 1282.15 GoldPrice
## 2 2017-11-29 1294.85 GoldPrice
## 3 2017-11-28 1293.90 GoldPrice
## 4 2017-11-27 1294.70 GoldPrice
## 5 2017-11-24 1289.15 GoldPrice
## 6 2017-11-23 1290.15 GoldPrice
Apply arrange function to sort the data by Date
combinedDataSets = arrange(combinedDataSets, Date)
head(combinedDataSets)
## Date USD MetalPrice
## 1 1968-01-02 35.18 GoldPrice
## 2 1968-01-03 35.16 GoldPrice
## 3 1968-01-04 35.14 GoldPrice
## 4 1968-01-05 35.14 GoldPrice
## 5 1968-01-08 35.14 GoldPrice
## 6 1968-01-09 35.14 GoldPrice
Apply spread function to spread the data for each metal by adding one column for Gold and another for Silver
spreadedTotal = spread(combinedDataSets, MetalPrice, USD)
head(spreadedTotal, 20)
## Date GoldPrice SilverPrice
## 1 1968-01-02 35.18 NA
## 2 1968-01-03 35.16 NA
## 3 1968-01-04 35.14 NA
## 4 1968-01-05 35.14 NA
## 5 1968-01-08 35.14 NA
## 6 1968-01-09 35.14 NA
## 7 1968-01-10 35.15 NA
## 8 1968-01-11 35.17 NA
## 9 1968-01-12 35.18 NA
## 10 1968-01-15 35.18 NA
## 11 1968-01-16 35.19 2.069
## 12 1968-01-17 35.20 2.067
## 13 1968-01-18 35.20 2.058
## 14 1968-01-19 35.19 2.078
## 15 1968-01-22 35.19 2.088
## 16 1968-01-23 35.19 2.080
## 17 1968-01-24 35.20 2.064
## 18 1968-01-25 35.20 2.018
## 19 1968-01-26 35.20 2.069
## 20 1968-01-29 35.19 2.039
Create a function to calculate the price ratio and round it to 4 decimal digits
calcPriceRatio = function(goldPrice, silverPrice)
{
round((goldPrice/silverPrice), 4)
}
Apply tidyr mutate function to add PriceRatio and transactionYear columns
dataSetsWithPriceRatio = mutate(spreadedTotal, PriceRatio=calcPriceRatio(GoldPrice, SilverPrice), transactionYear=substring(Date,1,4))
tail(dataSetsWithPriceRatio, 20)
## Date GoldPrice SilverPrice PriceRatio transactionYear
## 12611 2017-11-03 1275.30 17.085 74.6444 2017
## 12612 2017-11-06 1271.60 16.915 75.1759 2017
## 12613 2017-11-07 1276.35 17.005 75.0573 2017
## 12614 2017-11-08 1282.25 16.995 75.4487 2017
## 12615 2017-11-09 1284.00 17.100 75.0877 2017
## 12616 2017-11-10 1284.45 17.000 75.5559 2017
## 12617 2017-11-13 1278.40 16.925 75.5332 2017
## 12618 2017-11-14 1273.70 16.935 75.2111 2017
## 12619 2017-11-15 1285.70 17.115 75.1212 2017
## 12620 2017-11-16 1277.70 17.040 74.9824 2017
## 12621 2017-11-17 1283.85 17.085 75.1449 2017
## 12622 2017-11-20 1292.35 17.145 75.3777 2017
## 12623 2017-11-21 1280.00 17.000 75.2941 2017
## 12624 2017-11-22 1283.95 16.965 75.6823 2017
## 12625 2017-11-23 1290.15 17.095 75.4694 2017
## 12626 2017-11-24 1289.15 17.050 75.6100 2017
## 12627 2017-11-27 1294.70 17.100 75.7135 2017
## 12628 2017-11-28 1293.90 17.070 75.7996 2017
## 12629 2017-11-29 1294.85 16.895 76.6410 2017
## 12630 2017-11-30 1282.15 16.570 77.3778 2017
Select price data for years from 2001 to 2013
PriceDataYear_2001_2013 = filter(dataSetsWithPriceRatio, transactionYear %in% c("2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013"))
tail(PriceDataYear_2001_2013, 20)
## Date GoldPrice SilverPrice PriceRatio transactionYear
## 3266 2013-12-02 1237.50 19.75 62.6582 2013
## 3267 2013-12-03 1219.00 19.17 63.5889 2013
## 3268 2013-12-04 1213.00 19.05 63.6745 2013
## 3269 2013-12-05 1234.00 19.46 63.4121 2013
## 3270 2013-12-06 1230.75 19.49 63.1478 2013
## 3271 2013-12-09 1228.50 19.50 63.0000 2013
## 3272 2013-12-10 1245.75 20.05 62.1322 2013
## 3273 2013-12-11 1255.25 20.39 61.5620 2013
## 3274 2013-12-12 1243.50 19.80 62.8030 2013
## 3275 2013-12-13 1222.75 19.55 62.5448 2013
## 3276 2013-12-16 1229.50 19.50 63.0513 2013
## 3277 2013-12-17 1237.25 20.02 61.8007 2013
## 3278 2013-12-18 1233.25 19.94 61.8480 2013
## 3279 2013-12-19 1205.25 19.34 62.3190 2013
## 3280 2013-12-20 1195.00 19.33 61.8210 2013
## 3281 2013-12-23 1192.75 19.37 61.5772 2013
## 3282 2013-12-24 1196.50 19.40 61.6753 2013
## 3283 2013-12-27 1209.25 19.92 60.7053 2013
## 3284 2013-12-30 1201.50 19.65 61.1450 2013
## 3285 2013-12-31 1201.50 19.50 61.6154 2013
summary(PriceDataYear_2001_2013)
## Date GoldPrice SilverPrice PriceRatio
## Length:3285 Min. : 256.7 Min. : 4.065 Min. :31.44
## Class :character 1st Qu.: 399.3 1st Qu.: 6.200 1st Qu.:53.19
## Mode :character Median : 673.6 Median :12.850 Median :60.22
## Mean : 832.2 Mean :14.807 Mean :60.04
## 3rd Qu.:1243.0 3rd Qu.:19.650 3rd Qu.:65.73
## Max. :1896.5 Max. :48.700 Max. :83.79
## transactionYear
## Length:3285
## Class :character
## Mode :character
##
##
##
Gold Price Data
ggplot (gold_df, aes(as.Date(Date), USD..AM.)) +
geom_line (aes(color="Gold")) +
labs (color="Legend") +
scale_colour_manual ("", breaks = c("gold"), values = c("goldenrod3")) +
ggtitle ("Gold Prices") + theme (plot.title = element_text(lineheight=0.7, face="bold"))
Gold Price Histogram
ggplot(data=gold_df, aes(gold_df$USD..AM.)) +
geom_histogram(aes(fill = ..count..)) +
scale_fill_gradient("Count", low = "yellow", high = "goldenrod3")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1 rows containing non-finite values (stat_bin).
Silver Price Data
ggplot (silver_df, aes(as.Date(Date), USD)) +
geom_line (aes(color="Silver")) +
labs (color="Legend") +
scale_colour_manual ("", breaks = c("silver"), values = c("ivory4")) +
ggtitle ("Silver Prices") +
theme (plot.title = element_text(lineheight=0.7, face="bold"))
Silver Price Histogram
ggplot(data=silver_df, aes(silver_df$USD)) +
geom_histogram(aes(fill = ..count..)) +
scale_fill_gradient("Count", low = "grey", high = "ivory4")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 19 rows containing non-finite values (stat_bin).
Gold/Silver Price Ratio Data
ggplot (PriceDataYear_2001_2013, aes(as.Date(Date), PriceRatio)) +
geom_line (aes(color="Blue")) +
labs (color="Legend") +
scale_colour_manual ("", breaks = c("gold"), values = c("goldenrod3")) +
ggtitle ("Gold and Silver Price Ratio") +
theme (plot.title = element_text(lineheight=0.7, face="bold"))
Gold/Silver Price Ratio Histogram
ggplot(data=PriceDataYear_2001_2013, aes(PriceDataYear_2001_2013$PriceRatio)) +
geom_histogram(aes(fill = ..count..)) +
scale_fill_gradient("Count", low = "grey", high = "blue") +
ggtitle ("Gold and Silver Price Ratio Histogram")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In this section we will create a linear regression model and calculate the correlation between the data to see if there is a strong relationship between silver and gold prices.
Create a function to calculate the correlation and round it to 4 decimal digits
findCorrelation <- function() {
x = PriceDataYear_2001_2013$SilverPrice
y = PriceDataYear_2001_2013$GoldPrice
corr = round(cor(x, y),4)
print (paste0("Correlation = ",corr))
return (corr)
}
c = findCorrelation()
## [1] "Correlation = 0.9643"
Create a function for Linear Model
findStatsFunction <- function() {
m = lm (GoldPrice ~ SilverPrice, data = PriceDataYear_2001_2013)
s = summary(m)
print(s)
slp = round(m$coefficients[2], 4)
int = round(m$coefficients[1], 4)
return (m)
}
m = findStatsFunction()
##
## Call:
## lm(formula = GoldPrice ~ SilverPrice, data = PriceDataYear_2001_2013)
##
## Residuals:
## Min 1Q Median 3Q Max
## -874.17 -63.04 -28.61 100.34 263.92
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 144.9725 3.9801 36.42 <2e-16 ***
## SilverPrice 46.4107 0.2224 208.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 128.1 on 3283 degrees of freedom
## Multiple R-squared: 0.9299, Adjusted R-squared: 0.9298
## F-statistic: 4.353e+04 on 1 and 3283 DF, p-value: < 2.2e-16
Display the Linear Model
plot = ggplot(PriceDataYear_2001_2013, aes(SilverPrice, GoldPrice)) + geom_point(colour="blue") +
xlab("Silver Price") + ylab("Gold Price") + labs(title = "Gold Price vs Silver Price") +
geom_abline(aes(slope=round(m$coefficients[2], 4), intercept=round(m$coefficients[1], 4), color="red"))
print(plot)
Linear Regression Equation: goldPrice = 144.9725 + (46.4107 * silverPrice) Correlation Coefficient: 0.9643 Multiple R-Square: 0.9299 Adjusted R-Square: 0.9298 Description: Strong correlation. Model fits the data
H_0 : Null Hypothesis: There is no relationship between silver and gold prices H_A : Alternative Hypothesis: There is a relationship between silver and gold prices
The multiple R value is 0.9299 which indicats that there is significant correlation between silver and gold prices. The value of Adjusted R square is 0.9298 which also indicates that silver price affects the gold price. Therefore, we reject the null hypothesis (H_0) and accept the Alternative hypothesis (H_1).
The two variables (silver price and gold Price) change in the same direction. If the silver price increases the gold price increases as well. Therefore, there is a positive correlation of 0.9641 between the two variables. Also, from the linear regression model, we can reject the null hypothesis and accept the alternative hypothesis. In conclusion, there is a strong relationship between silver and gold prices for the 13 year period of study (2001-2013).