In this post I will show how to easily plot Standard & Poor’s 500 (S&P 500) correlation against other world stock indices, in a reproducible way. The importance of this project is that all investors need to find out assets that are negatively correlated (or, at least, less correlated) against his/her portfolio. If he/she is highly positioned in stocks from United States, this is a good start. Negative correlation is important to reduce portfolio’s risk, maintaining return.
First, we will load packages needed to run the project:
# install package "pacman" (if it is not installed)
if (!require("pacman")) install.packages("pacman")
# load packages (or install and load, if they are not installed)
p_load(tidyquant, tidyverse, janitor, lares, rvest)
We will start webbscrapping page from “Yahoo Finance”, in order to get names and symbols of world indices:
# read url into R
webpage <- read_html('https://finance.yahoo.com/world-indices')
# get table
(table <- webpage %>%
html_nodes("tr td") %>%
html_text()
) %>%
head(n = 12)
## [1] "^GSPC" "S&P 500"
## [3] "4,697.96" "-6.58"
## [5] "-0.14%" "2.442B"
## [7] "" ""
## [9] "" "^DJI"
## [11] "Dow Jones Industrial Average" "35,601.98"
# get symbol
(symbols <- table[seq(1,316,9)]
) %>%
head()
## [1] "^GSPC" "^DJI" "^IXIC" "^NYA" "^XAX" "^BUK100P"
# get names of indices
(names <- table[seq(2,317,9)]
) %>%
head()
## [1] "S&P 500" "Dow Jones Industrial Average"
## [3] "NASDAQ Composite" "NYSE COMPOSITE (DJ)"
## [5] "NYSE AMEX COMPOSITE INDEX" "Cboe UK 100"
# data set of tickers and names of indices
(names_symbols <- tibble(symbol = symbols, name = names)
) %>%
head()
## # A tibble: 6 x 2
## symbol name
## <chr> <chr>
## 1 ^GSPC S&P 500
## 2 ^DJI Dow Jones Industrial Average
## 3 ^IXIC NASDAQ Composite
## 4 ^NYA NYSE COMPOSITE (DJ)
## 5 ^XAX NYSE AMEX COMPOSITE INDEX
## 6 ^BUK100P Cboe UK 100
Now we will download world indices’ prices (dropping missing values) and check if it worked properly:
# download data
data <- tq_get(symbols,
complete_cases = TRUE,
from = "2019-01-01",
to = Sys.Date(),
get = "stock.prices") %>%
drop_na()
# check if it returned data correctly
data %>%
group_by(symbol) %>%
summarise(n = n()) %>%
summary(n)
## symbol n
## Length:36 Min. : 1.0
## Class :character 1st Qu.:704.8
## Mode :character Median :724.0
## Mean :680.0
## 3rd Qu.:729.0
## Max. :740.0
# check world indices with less data
data %>%
group_by(symbol) %>%
summarise(n = n()) %>%
filter(n < 680)
## # A tibble: 3 x 2
## symbol n
## <chr> <int>
## 1 ^CASE30 1
## 2 ^IPSA 115
## 3 ^TA125.TA 574
As we can see, there are just a few data for Chile and Egypt stock exchange. Let’s remove them, because we want a more robust analysis, and them we will calculate daily returns:
# calculate daily returns
daily <- data %>%
filter(!symbol %in% c("^CASE30","^IPSA")) %>%
group_by(symbol) %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
period = "daily",
col_rename = "return") %>%
left_join(names_symbols)
Finally, we just need to convert data set from long to wide format and plot correlation:
daily %>%
pivot_wider(id_cols = date, names_from = name, values_from = return) %>%
rename(sp500 = 'S&P 500') %>%
corr_var(sp500, plot = TRUE, top = 33)
As you can see, if your portfolio is highly concentrated in United States, in order to diversify geographically your investments, reduce risk while you maintain returns, it is a good idea to invest new money in the other side of the world (for instance, Malaysia, New Zealand, China, Taiwan, Japan). I wish you get rich, good luck!