Fetching Public Data With R

Packages

There are several R packages used to fetch public data from varied sources, Yahoo Finance, Google Finance, oanda, FRED, WDI, Quandl, datamarket, to name a few. Among them, pdfetch, quantmod, tseries, fImport, Quandl and rdatamarket are all very excellent.

Package pdfetch can retrieve data from Yahoo, FRED, world bank, US Bureau of Labor Statistics, the Bank of England interactive Statistical Database, European Central Bank’s statistical data warehouse, Eurostat, the French National Institute of Statistics and Economic Studies and UK Office of National Statistics.

Package quantmod can fetch data from Yahoo, Google, FRED and oanda.

Package WDI deals with data from the world bank (world development indicators). Package Quandl can fetch data from Quandl. Package rdatamarket is used to fetch data from datamarket

We need only data from Yahoo (shares, indices, financial statemengts etc.), FRED (US and world economic data), oanda (foreign exchages, precious metal prices), Quandl (commodities, futures etc.). We introduce the basic usage of quantmod and Quandl.

Basic Info about quantmod, Quandl and pdfetch

packageDescription('quantmod', 
                   fields = c('Package', 'Version', 'Author', 'Description'))

## Package: quantmod
## Version: 0.4-0
## Author: Jeffrey A. Ryan
## Description: Specify, build, trade, and analyse quantitative
##        financial trading strategies
## 
## -- File: /home/rho/RLib/quantmod/Meta/package.rds 
## -- Fields read: Package, Version, Author, Description

packageDescription('Quandl', 
                   fields = c('Package', 'Version', 'Author', 'Description'))

## Package: Quandl
## Version: 2.3.2
## Author: Raymond McTaggart, Gergely Daroczi
## Description: This package interacts directly with the Quandl API
##        to offer data in a number of formats usable in R, as well
##        as the ability to upload and search.
## 
## -- File: /home/rho/RLib/Quandl/Meta/package.rds 
## -- Fields read: Package, Version, Author, Description

packageDescription('pdfetch', 
                   fields = c('Package', 'Version', 'Author', 'Description'))

## Package: pdfetch
## Version: 0.1.5
## Author: Abiel Reinhart <abielr@gmail.com>
## Description: A package for downloading economic and financial time
##        series from public sources.
## 
## -- File: /home/rho/RLib/pdfetch/Meta/package.rds 
## -- Fields read: Package, Version, Author, Description

Data from YAHOO

Tabe 1 : Frequently Used ticker symbols in YAHOO

name	ticker symbol	name	ticker Symbol
上证综指	000001.ss or ^SSEC	恒生指数	^HSI
上证A指	000002.ss	Taiwan weighted	^TWII
上证B指	000003.ss	NIKKEI225	^N225
深证成指	399001.sz	Seoul composite	^KSI
深成指A	399002.sz	straights times	^STI
深成指B	399003.sz	NASDAQ	^IXIC
深证综指	399106.sz	DJIA	^DJI
深综指A	399107.sz	SP500	^GSPC
深综指B	399108.sz	RUSSELL 1000	^RUI
中小板指	399005.sz	Wilshire Total Market	^W5000
沪深300	000300.ss	FTSE 100	^FTSE

We will retrieve the historical dividends of “Hershey Foods” (HSY, 好时) and “Industrial and Commercial Bank of China” (601398.ss). Then we will get the historical trading prices of “Yinxing Energy” (000862.sz).

require('quantmod')
div.HSY <- getDividends('HSY', from = '2000-01-01', to = '2014-08-20')
div.ICBC <- getDividends('601398.ss', from = '2007-01-01', to = '2014-08-20')
# HSY pays its dividends quarterly, ICBC pays per annual
index.HSY <- endpoints(div.HSY, on = 'years')
ydiv.HSY <- period.sum(div.HSY, INDEX = index.HSY)  # annual dividends

yinxing <- getSymbols('000862.sz', auto.assign = FALSE, from = '2010-10-10')
# number cannot used as object name, we set auto.assign as FALSE (default is TRUE), # or we can
setSymbolLookup(YINXING = list(name = '000862.sz'))
getSymbols('YINXING', from = '2010-10-10')

## [1] "YINXING"

Function getSplits() download split data from Yahoo, getQuote() fetch current stock quotes, getOptionChain() download option chain data.

getQuote('HSY; MSFT')

As a demonstration, we use pdfetch package to download historical trading data of “Yinxing Energy”.

require('pdfetch')
Yinxing.pdfetch <- pdfetch_YAHOO(identifiers = '000862.sz', fields = 'adjclose',
                         from = '2010-10-10')

Data from Google

Using Google Finance, we can retrieve financial statements of listed companies. But as you know, we can not link google freely. So you have to use some climbing over wall techniques to use this function.

getFinancials('HSY')
viewFinancials(HSY, type = 'BS', period = 'A')
# 'BS' for balance sheet, 'IS' for income statement, 'CF' for cash flow statement
# 'A' for annual, 'Q' for quarterly

Data from Oanda

We can retrieve foreign exchanges or precious metal data from oanda. Beware that there is a limit of 500 days per request.

getSymbols('USD/CNY', src = 'oanda', from = '2013-10-10')

## [1] "USDCNY"

# auto.assign = TRUE, or just getFX() 
getMetals(Metals = 'gold', base.currency = 'CNY', from = '2013-10-10')

## [1] "XAUCNY"

Metals can be expressed in common name or symbol form.

Table 2 : 3-character symbols of precious metals

name	symbol	name	symbol
gold	XAU	silver	XAG
palladium	XPD	platinum	XPT

Data from FRED

getSymbols('GDPC96', src = 'FRED') # us real gdp

## [1] "GDPC96"

getSymbols('AAA', src = 'FRED')

## [1] "AAA"

# monthly Moody's seasoned AAA corporate bond yield
# 'DAAA' for daily, 'WAAA' for weekly'

FRED provides both metadata and time series data for all of its economic data. We can get the metadata in this way.

base <- 'http://research.stlouisfed.org/fred2/series/'
seriesID <- 'GDPC96'
URL <- paste0(base, seriesID, '/downloaddata/', seriesID, '.xls')
tmp <- tempfile()
download.file(url = URL, destfile = tmp)
require('gdata')
gdpmetadata <- read.xls(tmp, nrows = 11, header = FALSE)
unlink(tmp)
require('xtable')
print(xtable(gdpmetadata, caption = 'Table 3: FRED metadata for GDPC96', 
             align = rep('l', 3)), 
      type = 'html', caption.placement = 'top', include.rownames = FALSE,
      include.colnames = FALSE)

Table 3: FRED metadata for GDPC96
Title:	Real Gross Domestic Product, 3 Decimal
Series ID:	GDPC96
Source:	U.S. Department of Commerce: Bureau of Economic Analysis
Release:	Gross Domestic Product
Seasonal Adjustment:	Seasonally Adjusted Annual Rate
Frequency:	Quarterly
Units:	Billions of Chained 2009 Dollars
Date Range:	1947-01-01 to 2014-04-01
Last Updated:	2014-07-30 10:16 AM CDT
Notes:	A Guide to the National Income and Product Accounts of the United
	States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)

Data from World Bank

Package WDI (world development indicators) used to fetch data from WDI. Here we want to fetch annual data of the ratio of listed companies’ market capitalization to GDP from year 2000 to 2012 in China, USA, GB, JP and IN.

require('WDI')
WDIsearch(string = 'market capitalization')

##      indicator          
## [1,] "CM.MKT.LCAP.CD"   
## [2,] "CM.MKT.LCAP.GD.ZS"
##      name                                                     
## [1,] "Market capitalization of listed companies (current US$)"
## [2,] "Market capitalization of listed companies (% of GDP)"

Now we know the series ID is “CM.MKT.LCAP.GD.ZS”. We can also search series IdD in < http://data.worldbank.org/indicator>.

mc <- WDI(indicator = 'CM.MKT.LCAP.GD.ZS',
          country = c('CN', 'US', 'GB', 'JP', 'IN'),
          start = 2000, end = 2012)
mc[mc$iso2c == 'CN', ]

##    iso2c country CM.MKT.LCAP.GD.ZS year
## 1     CN   China             44.93 2012
## 2     CN   China             46.29 2011
## 3     CN   China             80.31 2010
## 4     CN   China            100.35 2009
## 5     CN   China             61.78 2008
## 6     CN   China            178.20 2007
## 7     CN   China             89.43 2006
## 8     CN   China             34.59 2005
## 9     CN   China             33.12 2004
## 10    CN   China             41.51 2003
## 11    CN   China             31.85 2002
## 12    CN   China             39.55 2001
## 13    CN   China             48.48 2000

Table 4: Frequently Used Indicators (WDI)

indicator	name
AG.LND.ARBL.HA.PC	arable land (hectares per person)
AG.LND.ARBL.ZS	arable land (% of land area)
AG.LND.FRST.ZS	forest area (% of land area)
SP.RUR.TOTL.ZS	rural population (% of total population)
EN.ATM.CO2E.PC	CO2 emissions (metric tons per capita)
EN.ATM.CO2E.PP.GD	CO2 emissions (kg per PPP $ of GDP)
NY.GDP.MKTP.KD	GDP (constant 2000 US$)
NY.GDP.PCAP.KD	GDP per capita (constant 2000 US$)
NY.GDP.DEFL.KD.ZG	Inflation, GDP deflator (annual %)
NY.GDP.MKTP.CD.XD	GDP deflator, index (2000=100; US$ series)
GC.TAX.TOTL.GD.ZS	tax revenue (% of GDP)
SE.XPD.TOTL.GD.ZS	Public spending on education, total (% of GDP)
SL.UEM.TOTL.ZS	unemployment, total (% of total labor force)
CM.MKT.LCAP.GD.ZS	market capitalization of listed companies (% of GDP)
CM.MKT.TRNR	stocks traded, turnover ratio (%)
FR.INR.LNDP	interest rate spread (lending rate minus deposit rate)
SP.POP.65UP.TO.ZS	population ages 65 and above (% of total)
IT.NET.USER.P3	internet users (per 1,000 people)
IS.VEH.NVEH.P3	motor vehicles (per 1,000 people)
SI.DST.10TH.10	income share held by highest 10%
SI.DST.FRST.10	Income share held by lowest 10%
MS.MIL.XPND.GD.ZS	ilitary expenditure (% of GDP)
GB.XPD.RSDV.GD.ZS	research and development expenditure (% of GDP)
TX.VAL.TECH.MF.ZS	high-technology exports (% of manufactured exports)

arable: adj. (of farmland) capable of being farmed productively

hectare: noun, (abbreviated `ha’) equal to 100 ares (or 10000 square meters)

Data from Quandl

You should get a free account first. Click your account to get the authentication token.

Here we will get historical data of China crude oil consumption.

require('Quandl')
Quandl.auth('yourAuthenticationToken')
Quandl.search(query = 'China crude oil consumption')
# we get code: INDEXMUNDI/ENERGY_CHINA_CRUDEOIL
oilCons <- Quandl('INDEXMUNDI/ENERGY_CHINA_CRUDEOIL', type = 'xts',
                  start_date = '2005-01-01', end_date = '2013-12-31')

We can also upload our own data to Quandl with Quandl.push() function.