Disclaimer: The content of this RMarkdown note came from a course called Importing and Managing Financial Data in R in datacamp.

Importing and Managing Financial Data in R

If you’ve ever done anything with financial or economic time series, you know the data come in various shapes, sizes, and periodicities. Getting the data into R can be stressful and time-consuming, especially when you need to merge data from several different sources into one data set. This course will cover importing data from local files as well as from internet sources.

# Install packages
#install.packages("quantmod") #Once it's installed, you won't have to run this code again
#install.packages("xts")
#install.packages("Quandl") 

# Load packages
library(quantmod) #for use of getSymbols()
library(xts)
library(Quandl) #for useof  Quandl()

Chapter 1: Introduction and downloading data

A wealth of financial and economic data are available online. Learn how getSymbols() and Quandl() make it easy to access data from a variety of sources.

1.1 Introducing getSymbols()

The getSymbols() function from the quantmod package provides a consistent interface to import data from various sources into your workspace. By default, getSymbols() imports the data as a xts object.

getSymbols() data sources:

  • Yahoo! Finance
  • Google Finance
  • FRED for economic data
  • Oanda for foreign exchange
  • csv
  • Yahoo! Finance Japan
  • MySQL
  • SQLite
  • RData
  • rds (created by saveRDS())
# Import QQQ data from Yahoo! Finance
getSymbols(Symbols = "QQQ", auto_assign = TRUE)
## [1] "QQQ"

# Look at the structure of the object getSymbols created
str(QQQ)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
##   Data: num [1:2643, 1:6] 47.8 47.7 48.4 48.3 48.4 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:6] "QQQ.Open" "QQQ.High" "QQQ.Low" "QQQ.Close" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "yahoo"
##  $ updated: POSIXct[1:1], format: "2017-07-01 10:25:56"

# Look at the first few rows of QQQ
head(QQQ)
##            QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
## 2007-01-03   47.832   48.492  46.797     43.24  167689500     39.28798
## 2007-01-04   47.656   48.657  47.491     44.06  136853500     40.03301
## 2007-01-05   48.371   48.371  47.854     43.85  138958800     39.84221
## 2007-01-08   48.305   48.558  48.030     43.88  106401600     39.86947
## 2007-01-09   48.437   48.745  48.019     44.10  121577500     40.06936
## 2007-01-10   48.382   49.152  48.228     44.62  121070100     40.54183

1.2 Data sources

Import data from Google Finance and FRED. Google Finance is a source similar to Yahoo! Finance. FRED is an online database of economic time series data created and maintained by the Federal Reserve Bank of St. Louis.

getSymbols() imports data from Yahoo! Finance by default because src = "yahoo" by default. The src values for Google Finance and FRED are "google" and "FRED", respectively.

# Import QQQ data from Google Finance
getSymbols(Symbols = "QQQ", src = "google")
## [1] "QQQ"

# Look at the structure of QQQ
str(QQQ)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
##   Data: num [1:2642, 1:5] 43.5 43.3 43.9 43.9 44 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:5] "QQQ.Open" "QQQ.High" "QQQ.Low" "QQQ.Close" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "google"
##  $ updated: POSIXct[1:1], format: "2017-07-01 10:25:57"

# Import GDP data from FRED
getSymbols(Symbols = "GDP", src = "FRED")
## [1] "GDP"

# Look at the structure of GDP
str(GDP)
## An 'xts' object on 1947-01-01/2017-01-01 containing:
##   Data: num [1:281, 1] 243 246 250 260 266 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr "GDP"
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "FRED"
##  $ updated: POSIXct[1:1], format: "2017-07-01 10:25:58"

1.3 Make getSymbols() return the data it retrieves

In the last exercise, getSymbols() automatically created an object for you, based on the symbol you provided. Sometimes you may want to assign the data to an object yourself, so you need getSymbols() to return the data instead.

# Assign SPY data to 'spy' object using auto.assign argument
spy <- getSymbols(Symbols = "SPY", auto.assign = FALSE)

# Look at the structure of the 'spy' object
str(spy)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
##   Data: num [1:2643, 1:6] 177 176 176 175 176 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:6] "SPY.Open" "SPY.High" "SPY.Low" "SPY.Close" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "yahoo"
##  $ updated: POSIXct[1:1], format: "2017-07-01 10:25:58"

# Assign JNJ data to 'jnj' object using env argument
jnj <- getSymbols(Symbols = "JNJ", env = NULL) #env = NULL does the same as auto.assign = FALSE

# Look at the structure of the 'jnj' object
str(jnj)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
##   Data: num [1:2643, 1:6] 91.2 90.8 92.1 91.7 91.8 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:6] "JNJ.Open" "JNJ.High" "JNJ.Low" "JNJ.Close" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "yahoo"
##  $ updated: POSIXct[1:1], format: "2017-07-01 10:25:58"

1.4 Introducing Quandl()

Similar to how the quantmod package provides getSymbols() to import data from various sources, the Quandl package provides access to the Quandl databases via one simple function: Quandl().

Recall that getSymbols() uses the Symbols and src arguments to specify the instrument and data source, respectively. The Quandl() function specifies both the data source and the instrument via its code argument, in the form “DATABASE/DATASET”.

Two other ways Quandl() differs from getSymbols() are:

  • Quandl() returns a data.frame by default.
  • Quandl() will not automatically assign the data to an object.

If you plan on importing a lot of data using Quandl(), you might consider opening a free account with them in order to get an API key. You can find your API key on your account settings page. Click here for more information.

# Provide your api_key so that you
#Quandl.api_key('your api_key')

# Import GDP data from FRED
gdp <- Quandl(code = "FRED/GDP")

# Look at the structure of the object returned by Quandl
str(gdp)
## 'data.frame':    281 obs. of  2 variables:
##  $ Date : Date, format: "2017-01-01" "2016-10-01" ...
##  $ Value: num  19027 18869 18675 18450 18282 ...
##  - attr(*, "freq")= chr "quarterly"

1.5 Return data type

Learn how to use the type argument to Quandl() to make it return an xts and a zoo object.

The possible values for type are:

  • “raw” (a data.frame),
  • “ts” (time-series objects from the stats package),
  • “zoo”,
  • “xts”, and
  • “timeSeries” (from the timeSeries package in the RMetrics suite).
# Import GDP data from FRED as xts
gdp_xts <- Quandl(code = "FRED/GDP", type = "xts")

# Look at the structure of gdp_xts
str(gdp_xts)
## An 'xts' object on 1947 Q1/2017 Q1 containing:
##   Data: num [1:281, 1] 243 246 250 260 266 ...
##   Indexed by objects of class: [yearqtr] TZ: 
##   xts Attributes:  
##  NULL

# Import GDP data from FRED as zoo
gdp_zoo <- Quandl(code = "FRED/GDP", type = "zoo")

# Look at the structure of gdp_zoo
str(gdp_zoo)
## 'zooreg' series from 1947 Q1 to 2017 Q1
##   Data: num [1:281] 243 246 250 260 266 ...
##   Index: Class 'yearqtr'  num [1:281] 1947 1947 1948 1948 1948 ...
##   Frequency: 4

1.6 Find stock ticker from Google Finance

Before you can import data from any internet data source, you need to know the instrument identifier. These can generally be found on the website of the data source provider. In this exercise, you’ll need to search Google Finance to find the ticker symbol for Pfizer stock.

Note that some data series may not be available for download from certain providers. This may be true even if you can see the data displayed on their website in tables and/or charts. getSymbols() will throw an error if the data is not available for download.

# Create an object containing the Pfizer ticker symbol
symbol = "PFE"

# Use getSymbols to import the data
getSymbols(Symbols = symbol, src = "google")
## [1] "PFE"

# Look at the first few rows of data
head(PFE)
##            PFE.Open PFE.High PFE.Low PFE.Close PFE.Volume
## 2007-01-03    26.15    26.42   25.98     26.29   40645100
## 2007-01-04    26.38    26.57   26.29     26.38   32258100
## 2007-01-05    26.55    26.63   26.17     26.30   31355800
## 2007-01-08    26.29    26.42   25.89     26.16   43224400
## 2007-01-09    26.24    26.34   26.04     26.17   31321400
## 2007-01-10    26.10    26.26   26.00     26.20   34548400

1.7 Download exchange rate data from Oanda

Oanda.com provides historical foreign exchange data for many currency pairs. Currency pairs are expressed as two currencies separated by a “/”, where the first currency is the “base” and the second is the “quote”. For example, the U.S. Dollar to Euro exchange rate would be expressed as "USD/EUR".

Since object names can’t contain “/”, getSymbols() will convert USD/EUR to a valid name by removing the “/”. For example, getSymbols("USD/EUR") would automatically create an object named USDEUR.

Also, Oanda.com only provides historical data for the past 180 days. A warning will be thrown if you request data from more than 180 days ago, but getSymbols() will return as much data as possible. You can change the date range of your request by using the from and to arguments; both of which are dates expressed by character strings in "%Y-%m-%d" format (e.g. "2016-02-06").

# There seems to be an error in IBM's system itself. It doesn't import data from Oanda.com

# Create a currency_pair object
currency_pair <- "GBP/CAD"

# Load British Pound to Canadian Dollar exchange rate data
getSymbols(Symbols = currency_pair, src = "oanda")
## [1] "GBPCAD"

# Examine object using str()
str(GBPCAD)
## An 'xts' object on 2017-01-03/2017-06-30 containing:
##   Data: num [1:179, 1] 1.65 1.64 1.64 1.64 1.63 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr "GBP.CAD"
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "oanda"
##  $ updated: POSIXct[1:1], format: "2017-07-01 10:26:01"

# Try to load data from 190 days ago
getSymbols(Symbols = currency_pair, from = Sys.Date() - 190, to = Sys.Date(), src = "oanda")
## [1] "GBPCAD"

1.8 Find and download US Civilian Unemployment Rate data from FRED

Both getSymbols() andQuandl() provide access to the FRED database.

# Create a series_name object
series_name <- "UNRATE"   # To find the series name, go to the FRED website and search for "United States civilian unemployment rate".

# Load the data using getSymbols
getSymbols(Symbols = series_name, src = "FRED")
## [1] "UNRATE"

# Create a quandl_code object
quandl_code <- "FRED/UNRATE"

# Load the data using Quandl
unemploy_rate <- Quandl(code = quandl_code)

Chapter 2: Extracting and transforming data

You’ve learned how to import data from online sources, now it’s time to see how to extract columns from the imported data. After you’ve learned how to extract columns from a single object, you will explore how to import, transform, and extract data from multiple instruments.

2.1 Extract one column from one instrument

The quantmod package provides several helper functions to extract specific columns from an object, based on the column name. Six of the functions extract a single column.

  • Op(): for opening price
  • Hi(): for high price
  • Lo(): for low price
  • Cl(): for close price
  • Vo(): for traded volume
  • Ad(): for adjusted close price
# Load data
DC <- read.csv("data/Importing and Managing Financial Data in R/DC.csv")
DC$Date <- as.POSIXct(DC$Date, format("%Y-%m-%d %H:%M:%S"))
DC <- xts(DC[, 2:6], order.by = DC$Date)
str(DC)
## An 'xts' object on 2016-01-16 01:00:00/2016-01-17 23:00:00 containing:
##   Data: num [1:47, 1:5] 20.8 20.8 20.8 20.8 20.8 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:5] "DC.Open" "DC.High" "DC.Low" "DC.Close" ...
##   Indexed by objects of class: [POSIXct,POSIXt] TZ: %Y-%m-%d %H:%M:%S
##   xts Attributes:  
##  NULL

# Look at the head of DC
head(DC)
##                     DC.Open DC.High DC.Low DC.Close DC.Volume
## 2016-01-16 01:00:00  20.845  20.850 20.835   20.845       157
## 2016-01-16 02:00:00  20.845  20.850 20.835   20.845       214
## 2016-01-16 03:00:00  20.845  20.850 20.835   20.845       103
## 2016-01-16 04:00:00  20.845  20.855 20.835   20.845       180
## 2016-01-16 05:00:00  20.845  20.845 20.845   20.845       211
## 2016-01-16 06:00:00  20.845  20.845 20.840   20.845        35

# Extract the close column
dc_close <- Cl(DC)

# Look at the head of dc_close
head(dc_close)
##                     DC.Close
## 2016-01-16 01:00:00   20.845
## 2016-01-16 02:00:00   20.845
## 2016-01-16 03:00:00   20.845
## 2016-01-16 04:00:00   20.845
## 2016-01-16 05:00:00   20.845
## 2016-01-16 06:00:00   20.845

# Extract the volume column
dc_volume <- Vo(DC)

# Look at the head of dc_volume
head(dc_volume)
##                     DC.Volume
## 2016-01-16 01:00:00       157
## 2016-01-16 02:00:00       214
## 2016-01-16 03:00:00       103
## 2016-01-16 04:00:00       180
## 2016-01-16 05:00:00       211
## 2016-01-16 06:00:00        35

2.2 Extract multiple columns from one instrument

The quantmod package also provides several functions you can use to extract a specific set of columns. See help("OHLC.Transformations") for more details. For example:

  • OHLC(): for open, high, low, close
  • HLC()
  • OHLCV()
# Extract the high, low, and close columns
dc_hlc <- HLC(DC)

# Look at the head of dc_hlc
head(dc_hlc)
##                     DC.High DC.Low DC.Close
## 2016-01-16 01:00:00  20.850 20.835   20.845
## 2016-01-16 02:00:00  20.850 20.835   20.845
## 2016-01-16 03:00:00  20.850 20.835   20.845
## 2016-01-16 04:00:00  20.855 20.835   20.845
## 2016-01-16 05:00:00  20.845 20.845   20.845
## 2016-01-16 06:00:00  20.845 20.840   20.845

# Extract the open, high, low, close, and volume columns
dc_ohlcv <- OHLCV(DC)

# Look at the head of dc_ohlcv
head(dc_ohlcv)
##                     DC.Open DC.High DC.Low DC.Close DC.Volume
## 2016-01-16 01:00:00  20.845  20.850 20.835   20.845       157
## 2016-01-16 02:00:00  20.845  20.850 20.835   20.845       214
## 2016-01-16 03:00:00  20.845  20.850 20.835   20.845       103
## 2016-01-16 04:00:00  20.845  20.855 20.835   20.845       180
## 2016-01-16 05:00:00  20.845  20.845 20.845   20.845       211
## 2016-01-16 06:00:00  20.845  20.845 20.840   20.845        35

2.3 Use getPrice to extract other columns

Sometimes you might have one object that contains the same price column for multiple instruments. Other times, you might have an object with price data (e.g. bid and/or ask prices) that do not have an explicit extractor function.

The quantmod package also provides a getPrice() function that is flexible enough to handle these cases. You can extract a column with any name (e.g. bid, ask, trade) with it by specifying the name via the prefer argument. It is also useful if you have an object that contains multiple instruments with the same price type, because you can use the symbol argument to extract columns for a specific instrument.

# Download CME data for CL and BZ as an xts object
oil_data <- Quandl(code = c("CME/CLH2016", "CME/BZH2016"), type = "xts")

# Look at the column names of the oil_data object
colnames(oil_data)
##  [1] "CME.CLH2016 - Open"          "CME.CLH2016 - High"         
##  [3] "CME.CLH2016 - Low"           "CME.CLH2016 - Last"         
##  [5] "CME.CLH2016 - Change"        "CME.CLH2016 - Settle"       
##  [7] "CME.CLH2016 - Volume"        "CME.CLH2016 - Open Interest"
##  [9] "CME.BZH2016 - Open"          "CME.BZH2016 - High"         
## [11] "CME.BZH2016 - Low"           "CME.BZH2016 - Last"         
## [13] "CME.BZH2016 - Change"        "CME.BZH2016 - Settle"       
## [15] "CME.BZH2016 - Volume"        "CME.BZH2016 - Open Interest"

# Extract the Open price for CLH2016
cl_open <- getPrice(oil_data, symbol = "CLH2016", prefer = "Open$") #$ is to prevent you from also retrieving 'Open Interest'

# Look at January, 2016 using xts' ISO-8601 subsetting
cl_open["2016-01"]
##            CME.CLH2016 - Open
## 2016-01-04              38.75
## 2016-01-05              38.07
## 2016-01-06              37.39
## 2016-01-07              35.35
## 2016-01-08              34.50
## 2016-01-11              34.11
## 2016-01-12              32.24
## 2016-01-13              31.60
## 2016-01-14              31.47
## 2016-01-15              32.11
## 2016-01-19              30.17
## 2016-01-20              29.47
## 2016-01-21              28.35
## 2016-01-22              29.84
## 2016-01-25              32.05
## 2016-01-26              29.81
## 2016-01-27              30.55
## 2016-01-28              32.19
## 2016-01-29              33.70

2.4 Use Quandl to download weekly returns data

Sometimes you need to aggregate and/or transform raw data before you can continue your analysis. The Quandl() function makes this easy by allowing you to request common aggregations and transformations. You just need to specify the collapse and/or transform arguments and the Quandl API takes care of the details for you.

The collapse argument gives simply the last observation of the period. Note that aggregations are peroformed before transformations.

trasnforming with Quandl:

  • diff: change from the prior period
  • rdiff: % change from the prior period
  • rdiff-from: % change from the latest period
  • cumul: cumulative sum
  • normalize: scale to start at 100
# CL and BZ Quandl codes
quandl_codes <- c("CME/CLH2016","CME/BZH2016")

# Download quarterly CL and BZ prices
qtr_price <- Quandl(quandl_codes, collapse = "quarterly", type = "xts") #collapse gives the last observation of the period

# View the high prices for both series
Hi(qtr_price)
##         CME.CLH2016 - High CME.BZH2016 - High
## 2010 Q4              92.94                 NA
## 2011 Q1             101.43                 NA
## 2011 Q2             100.19                 NA
## 2011 Q3              91.59                 NA
## 2011 Q4               0.00               0.00
## 2012 Q1               0.00               0.00
## 2012 Q2               0.00               0.00
## 2012 Q3               0.00               0.00
## 2012 Q4               0.00               0.00
## 2013 Q1               0.00               0.00
## 2013 Q2               0.00               0.00
## 2013 Q3               0.00               0.00
## 2013 Q4               0.00               0.00
## 2014 Q1              86.00               0.00
## 2014 Q2              93.05                 NA
## 2014 Q3               0.00                 NA
## 2014 Q4              61.03              69.45
## 2015 Q1              57.17              64.45
## 2015 Q2              61.63              66.72
## 2015 Q3              47.95              51.37
## 2015 Q4              38.87              38.34
## 2016 Q1              32.05              35.00

# Download quarterly CL and BZ returns
qtr_return <- Quandl(quandl_codes, collapse = "quarterly", type = "xts", transform = "rdiff")

# View the settle price returns for both series
getPrice(qtr_return, prefer = "Settle")
##         CME.CLH2016 - Settle CME.BZH2016 - Settle
## 2011 Q1          0.091349258                   NA
## 2011 Q2         -0.012225180                   NA
## 2011 Q3         -0.085836910                   NA
## 2011 Q4         -0.012337591                   NA
## 2012 Q1          0.036038028          0.040508511
## 2012 Q2         -0.074797268         -0.040381031
## 2012 Q3          0.016030446          0.029779888
## 2012 Q4          0.008286039          0.006496228
## 2013 Q1         -0.011595182          0.001561524
## 2013 Q2         -0.045899772         -0.041679659
## 2013 Q3          0.025784887          0.027440347
## 2013 Q4         -0.011521005          0.038847250
## 2014 Q1          0.014480810          0.008535718
## 2014 Q2          0.079377974          0.052191436
## 2014 Q3         -0.069454897         -0.070669348
## 2014 Q4         -0.295436164         -0.306543019
## 2015 Q1         -0.073958675         -0.073551263
## 2015 Q2          0.088188419          0.071692061
## 2015 Q3         -0.229129373         -0.237503741
## 2015 Q4         -0.194215748         -0.260647694
## 2016 Q1         -0.175268535         -0.079108044

2.5 Combine objects from an environment using do.call and eapply

There may be times where you need to aggregate or transform your data in ways Quandl() does not support. In those cases, you can use the flexibility of R.

One paradigm you can use in the quantmod workflow involves environments. Store all your data in one environment. Then you can use eapply() to call a function on each object in the environment, much like what lapply() does for each element of a list. Also like lapply(), eapply() returns a list.

Then you can merge all the elements of the list into one object by using do.call(), which is like having R programmatically type and run a command for you. Instead of typing merge(my_list[[1]], my_list[[2]]], ...), you can type do.call(merge, my_list).

# Creat a new environment
data_env <- new.env()

# Load data into the environment
getSymbols(Symbols = c("QQQ"), env = data_env, auto.assign = TRUE) #This will not generate the exact same restult as is  in datacamp but the point is to show how to combine objects
## [1] "QQQ"


# Call head on each object in data_env using eapply
data_list <- eapply(data_env, head)
head(data_list)
## $QQQ
##            QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
## 2007-01-03   47.832   48.492  46.797     43.24  167689500     39.28798
## 2007-01-04   47.656   48.657  47.491     44.06  136853500     40.03301
## 2007-01-05   48.371   48.371  47.854     43.85  138958800     39.84221
## 2007-01-08   48.305   48.558  48.030     43.88  106401600     39.86947
## 2007-01-09   48.437   48.745  48.019     44.10  121577500     40.06936
## 2007-01-10   48.382   49.152  48.228     44.62  121070100     40.54183

# Merge all the list elements into one xts object
data_merged <- do.call(merge, data_list)
head(data_merged)
##            QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
## 2007-01-03   47.832   48.492  46.797     43.24  167689500     39.28798
## 2007-01-04   47.656   48.657  47.491     44.06  136853500     40.03301
## 2007-01-05   48.371   48.371  47.854     43.85  138958800     39.84221
## 2007-01-08   48.305   48.558  48.030     43.88  106401600     39.86947
## 2007-01-09   48.437   48.745  48.019     44.10  121577500     40.06936
## 2007-01-10   48.382   49.152  48.228     44.62  121070100     40.54183

# Ensure the columns are ordered: open, high, low, close
data_ohlc <- OHLC(data_merged)
head(data_ohlc)
##            QQQ.Open QQQ.High QQQ.Low QQQ.Close
## 2007-01-03   47.832   48.492  46.797     43.24
## 2007-01-04   47.656   48.657  47.491     44.06
## 2007-01-05   48.371   48.371  47.854     43.85
## 2007-01-08   48.305   48.558  48.030     43.88
## 2007-01-09   48.437   48.745  48.019     44.10
## 2007-01-10   48.382   49.152  48.228     44.62

2.6 Use quantmod to download multiple instruments and extract the close column

The previous exercise taught you how to use do.call(merge, eapply(env, fun)) to apply a function to each object in an environment and then combine all the results into one object.

Let’s use what you learned to solve a very common problem. Often you will need to load similar data for many instruments, extract a column, and create one object that contains that specific column for every instrument.

# Symbols
symbols <- c("AAPL", "MSFT", "IBM")

# Create new environment
data_env <- new.env()

# Load symbols into data_env
getSymbols(Symbols = symbols, env = data_env)
## [1] "AAPL" "MSFT" "IBM"

# Extract the close column from each object and merge into one xts object
close_data <- do.call(merge, eapply(data_env, Cl))

# View the head of close_data
head(close_data)
##            AAPL.Close IBM.Close MSFT.Close
## 2007-01-03   83.80000     97.27      29.86
## 2007-01-04   85.66000     98.31      29.81
## 2007-01-05   85.04999     97.42      29.64
## 2007-01-08   85.47000     98.90      29.93
## 2007-01-09   92.57000    100.07      29.96
## 2007-01-10   97.00000     98.89      29.66

Chapter 3: Managing data from multiple sources

Learn how to simplify and streamline your workflow by taking advantage of the ability to customize default arguments to getSymbols(). You will see how to customize defaults by data source, and then how to customize defaults by symbol. You will also learn how to handle problematic instrument symbols.

3.1 Setting the default data source

Change the default data source for getSymbols(). The setDefaults() function allows you to change the default argument values for getSymbols(). Setting a default source can be useful if you use that source often.

# Set the default to pull data from Google Finance
setDefaults(getSymbols, src = "google")

# Get GOOG data
getSymbols(Symbols = "GOOG")
## [1] "GOOG"

# Verify the data was actually pulled from Google Finance
str(GOOG)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
##   Data: num [1:2642, 1:5] 233 234 241 244 242 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:5] "GOOG.Open" "GOOG.High" "GOOG.Low" "GOOG.Close" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "google"
##  $ updated: POSIXct[1:1], format: "2017-07-01 10:26:06"

3.2 Set default arguments for a getSymbols source method

Change the default value for the from argument to getSymbols.yahoo(), using setDefaults(). Calling getDefaults() will show you the current default values (if there are any).

# Look at getSymbols.yahoo arguments
args(getSymbols.yahoo)
## function (Symbols, env, return.class = "xts", index.class = "Date", 
##     from = "2007-01-01", to = Sys.Date(), ..., periodicity = "daily") 
## NULL

# Set default 'from' value for getSymbols.yahoo
setDefaults(getSymbols.yahoo, from = "2000-01-01")

# Confirm defaults were set correctly
getDefaults("getSymbols.yahoo")
## $from
## [1] "'2000-01-01'"

3.3 Set default data source for one symbol

Use setSymbolLookup() to specify the default data source for an instrument.

It is useful to set a single symbol to be pulled from a specific source, if multiple sources use the same symbol for different instruments. For example, getSymbols(“CP”, src = “yahoo”) would load Canadian Pacific Railway data from the New York Stock Exchange. But getSymbols(“CP”, src = “FRED”) would load Corporate Profits After Tax from the U.S. Bureau of Economic Analysis.

  • setSymbolLookup(AAPL = “google”): set default
  • getSymbolLookup(): verify the default changed
  • saveSymbolLookup(“symbol_lookup.rda”): save lookup
  • loadSymbolLookup(“symbol_lookup.rda”): load lookup
  • setSymbolLookup(AAPL = NULL): remove lookup
# Load CP data using defaults where default source is yahoo
getSymbols("CP", src = "yahoo") # Canadian railway stock prices
## [1] "CP"

# Look at the first few rows of CP
head(CP)
##            CP.Open CP.High CP.Low CP.Close CP.Volume CP.Adjusted
## 2000-01-03  29.069  29.236 27.896  20.8750    358200    7.787133
## 2000-01-04  27.729  29.153 27.729  21.0000   1160200    7.833765
## 2000-01-05  28.147  29.153 28.147  21.5625    852200    8.043595
## 2000-01-06  29.404  30.744 29.236  22.6250   1960400    8.439946
## 2000-01-07  30.493  31.163 29.990  23.0000   1758800    8.579834
## 2000-01-10  30.912  31.331 30.074  23.1875   1273600    8.649777

# Set the source for CP to FRED
setSymbolLookup("CP" = "FRED") # corporate profits from USBEA

# Load CP data again
getSymbols("CP")
## [1] "CP"

# Look at the first few rows of CP
head(CP)
##            CP.Open CP.High CP.Low CP.Close CP.Volume
## 2007-01-03   53.43   53.68  52.40    52.74    207200
## 2007-01-04   52.42   52.48  51.54    51.62    238700
## 2007-01-05   51.63   51.77  51.39    51.53    180000
## 2007-01-08   51.23   52.14  51.21    51.94    276200
## 2007-01-09   51.44   52.42  51.44    52.25    419300
## 2007-01-10   52.21   52.41  51.92    52.07    206100

3.4 Save and load symbol lookup table

Changing default settings only last for the current session. Save and load symbol-based defaults by using saveSymbolLookup() and loadSymbolLookup(), respectively. You can use the file arguments to specify a file to store your defaults.

# Save symbol lookup table
saveSymbolLookup("my_symbol_lookup.rda")

# Set default source for CP to "yahoo"
setSymbolLookup(CP = list(src = "yahoo"))

# Verify the default source is "yahoo"
getSymbolLookup("CP")
## $CP
## $CP$src
## [1] "yahoo"


# Load symbol lookup table
loadSymbolLookup("my_symbol_lookup.rda")

# Verify the default source is "FRED"
getSymbolLookup("CP")
## $CP
## $CP$src
## [1] "FRED"

3.5 Access the object using get() or backticks

Problem: Sometimes, you download data for an instrument that does not have a syntactically valid name. Syntactically valid names contain letters, numbers, “.”, and “_“, and must start with a letter or a”." followed by a non-number.

For example, the symbol for Berkshire Hathaway class A shares is “BRK-A”, which is not a syntactically valid name because it contains a “-” character. Another example are Chinese stocks, which have symbols composed of numbers. The Yahoo Finance symbol for the SSE Composite Index is “000001.SS”.

Solution: To access objects that do not have syntactically valid names, you can use:

  • the get function or
  • backticks (`)
  • getSymbols(…, assign = FALSE)
# Load BRK-A data
getSymbols(Symbols = "BRK-A")
## [1] "BRK-A"

# Use backticks and head() to look at the loaded data
head(`BRK-A`)
##            BRK-A.Open BRK-A.High BRK-A.Low BRK-A.Close BRK-A.Volume
## 2007-01-03     110400     110600  109000.0    109000.0          670
## 2007-01-04     109400     109400  108250.0    108850.0          460
## 2007-01-05     108800     108800  107000.1    107200.0          420
## 2007-01-08     107600     107800  107300.0    107500.1          340
## 2007-01-09     107510     109600  107510.0    109350.0          400
## 2007-01-10     109350     109550  108500.0    109490.0          370

# Use get() to assign the BRK-A data to an object named BRK.A
BRK.A <- get("BRK-A")

3.6 Create valid name for one instrument

If you are only downloading data for a single symbol and that symbol is not a syntactically valid name, you can set auto.assign = FALSE in your call to getSymbols().

# Create BRK.A object
BRK.A <- getSymbols("BRK-A", auto.assign = FALSE)

# Create col_names object with the column names of BRK.A
col_names <- colnames(BRK.A)

# Set BRK.A column names to syntactically valid names
colnames(BRK.A) <- make.names(col_names)
head(BRK.A)
##            BRK.A.Open BRK.A.High BRK.A.Low BRK.A.Close BRK.A.Volume
## 2007-01-03     110400     110600  109000.0    109000.0          670
## 2007-01-04     109400     109400  108250.0    108850.0          460
## 2007-01-05     108800     108800  107000.1    107200.0          420
## 2007-01-08     107600     107800  107300.0    107500.1          340
## 2007-01-09     107510     109600  107510.0    109350.0          400
## 2007-01-10     109350     109550  108500.0    109490.0          370

# Alternative
#colnames(BRK.A) <- paste("BRK.A", c("Open", "High", "Low", "Close", "Volume"), sep = ".")

3.7 Create valid names for multiple instruments

Not only can you use setSymbolLookup() to set a default data source for getSymbols(), you can also use setSymbolLookup() to create a mapping between the instrument symbol and the name of the R object.

This is useful if you want to download data for a lot symbols that are not syntactically valid names, or symbols that have names that conflict with other R variable names.

An example of a name that conflicts is the symbol for AT&T’s stock, T, which is often used as a short form for the logical value TRUE.

# Set name for BRK-A to BRK.A
setSymbolLookup(BRK.A = list(name = "BRK-A"))

# Set name for T (AT&T) to ATT
setSymbolLookup(ATT = list(name = "T"))

# Load BRK.A and ATT data
getSymbols(c("BRK.A", "ATT"))
## [1] "BRK.A" "ATT"

Chapter 4: Aligning data with different periodicities

You’ve learned how to import, extract, and transform data from multiple data sources. You often have to manipulate data from different sources in order to combine them into a single data set. First, you will learn how to convert sparse, irregular data into a regular series. Then you will review how to aggregate dense data to a lower frequency. Finally, you will learn how to handle issues with intra-day data.

4.1 Create an zero-width xts object with a regular index

Create a regular sequence of date-times that span the dates of your irregular data set.

# Import data
irregular_xts <- read.csv("data/Importing and Managing Financial Data in R/irregular_xts.csv")
irregular_xts$date <- as.Date(irregular_xts$date, format = "%m/%d/%Y")
irregular_xts <- xts(irregular_xts[, 2], order.by = irregular_xts$date)
irregular_xts
##            [,1]
## 2016-01-02    4
## 2016-01-05   21
## 2016-01-07    1
## 2016-01-11   34

# Extract the start date of the series
start_date <- start(irregular_xts)

# Extract the end date of the series
end_date <- end(irregular_xts)

# Create a regular date-time sequence
regular_index <- seq(from = start_date, to = end_date, by = "day")

# Create a zero-width xts object
regular_xts <- xts(, order.by = regular_index)

4.2 Merge irregular data with zero-width xts object with regular time index

Create a regular xts object from an irregular one.

# Merge irregular_xts and regular_xts
merged_xts <- merge(irregular_xts, regular_xts)

# Look at the first few rows of merged_xts
head(merged_xts)
##            irregular_xts
## 2016-01-02             4
## 2016-01-03            NA
## 2016-01-04            NA
## 2016-01-05            21
## 2016-01-06            NA
## 2016-01-07             1

# Use the fill argument to fill NA with their previous value
merged_filled_xts <- merge(irregular_xts, regular_xts, fill = na.locf)

# Look at the first few rows of merged_filled_xts
head(merged_filled_xts)
##            irregular_xts
## 2016-01-02             4
## 2016-01-03             4
## 2016-01-04             4
## 2016-01-05            21
## 2016-01-06            21
## 2016-01-07             1

4.3 Aggregate daily series to monthly, convert index to yearmon, merge with monthly series

Sometimes you may have two series with the same periodicity, but that use different conventions to represent a timestamp. For example:

  • FEDFUNDS is the monthly Fed Funds rate from FRED that is the average of all the days in the month, but the first day of the month is used for the timestamp.
  • DFF is the daily Fed Funds rate from FRED.

If you then try to merge FEDFUNDS with the monthly aggregate of DFF, the result will have a lot of NA because the timestamps representing the month are the first and last days of the month, respectively.

# Import data
getSymbols(c("FEDFUNDS", "DFF"), src = "FRED")
## [1] "FEDFUNDS" "DFF"

# Aggregate DFF to monthly
monthly_fedfunds <- apply.monthly(DFF, mean, na.rm = TRUE)
head(monthly_fedfunds) # monthly aggregates with the timestaps representing the last days of the month
##                  DFF
## 1954-07-31 0.7993548
## 1954-08-31 1.2206452
## 1954-09-30 1.0666667
## 1954-10-31 0.8487097
## 1954-11-30 0.8336667
## 1954-12-31 1.2777419

# Convert index to yearmon
index(monthly_fedfunds) <- as.yearmon(index(monthly_fedfunds))
head(monthly_fedfunds) # monthly aggregates with timestaps representing only months not days
##                DFF
## Jul 1954 0.7993548
## Aug 1954 1.2206452
## Sep 1954 1.0666667
## Oct 1954 0.8487097
## Nov 1954 0.8336667
## Dec 1954 1.2777419
head(FEDFUNDS) # monthly series with the timestamps representing the first days of the month
##            FEDFUNDS
## 1954-07-01     0.80
## 1954-08-01     1.22
## 1954-09-01     1.06
## 1954-10-01     0.85
## 1954-11-01     0.83
## 1954-12-01     1.28

# Merge FEDFUNDS with the monthly aggregate
merged_fedfunds <- merge(FEDFUNDS, monthly_fedfunds)

# Look at the first few rows of the merged object
head(merged_fedfunds)
##            FEDFUNDS       DFF
## 1954-07-01     0.80 0.7993548
## 1954-08-01     1.22 1.2206452
## 1954-09-01     1.06 1.0666667
## 1954-10-01     0.85 0.8487097
## 1954-11-01     0.83 0.8336667
## 1954-12-01     1.28 1.2777419

4.4 Align aggregated series with first day of month, then last day

When you can’t use convenience classes like yearmon to represent timestamps, like you did in the previous exercise, manually align merged data to the timestamp representation you prefer.

# Load monthly_fedfunds and merged_fedfunds without converting the monthly_fedfunds index to yearmon first.

monthly_fedfunds <- apply.monthly(DFF, mean)
merged_fedfunds <- merge(FEDFUNDS, monthly_fedfunds)

# Look at the first few rows of merged_fedfunds
head(merged_fedfunds)
##            FEDFUNDS       DFF
## 1954-07-01     0.80        NA
## 1954-07-31       NA 0.7993548
## 1954-08-01     1.22        NA
## 1954-08-31       NA 1.2206452
## 1954-09-01     1.06        NA
## 1954-09-30       NA 1.0666667

# Fill NA forward
merged_fedfunds_locf <- na.locf(merged_fedfunds)
head(merged_fedfunds_locf)
##            FEDFUNDS       DFF
## 1954-07-01     0.80        NA
## 1954-07-31     0.80 0.7993548
## 1954-08-01     1.22 0.7993548
## 1954-08-31     1.22 1.2206452
## 1954-09-01     1.06 1.2206452
## 1954-09-30     1.06 1.0666667

# Extract index values containing last day of month
aligned_last_day <- merged_fedfunds_locf[index(monthly_fedfunds)]
head(aligned_last_day)
##            FEDFUNDS       DFF
## 1954-07-31     0.80 0.7993548
## 1954-08-31     1.22 1.2206452
## 1954-09-30     1.06 1.0666667
## 1954-10-31     0.85 0.8487097
## 1954-11-30     0.83 0.8336667
## 1954-12-31     1.28 1.2777419

# Fill NA backward
merged_fedfunds_locb <- na.locf(merged_fedfunds, fromLast = TRUE)
head(merged_fedfunds_locb)
##            FEDFUNDS       DFF
## 1954-07-01     0.80 0.7993548
## 1954-07-31     1.22 0.7993548
## 1954-08-01     1.22 1.2206452
## 1954-08-31     1.06 1.2206452
## 1954-09-01     1.06 1.0666667
## 1954-09-30     0.85 1.0666667

# Extract index values containing first day of month
aligned_first_day <- merged_fedfunds_locb[index(FEDFUNDS)]
head(aligned_first_day)
##            FEDFUNDS       DFF
## 1954-07-01     0.80 0.7993548
## 1954-08-01     1.22 1.2206452
## 1954-09-01     1.06 1.0666667
## 1954-10-01     0.85 0.8487097
## 1954-11-01     0.83 0.8336667
## 1954-12-01     1.28 1.2777419

4.5 Aggregate to weekly, ending on Wednesdays

Aggregate daily data to weekly, but with weeks ending on Wednesdays, by using period.apply() and endpoints(). This is often done in stock market research papers to avoid some of the intra-week seasonality.

# Extract index weekdays
index_weekdays <- .indexwday(DFF)
head(index_weekdays)
## [1] 4 5 6 0 1 2

# Find locations of Wednesdays
wednesdays <- which(index_weekdays == 3)
head(wednesdays)
## [1]  7 14 21 28 35 42

# Create custom end points
end_points <- c(0, wednesdays, nrow(DFF))

head(end_points)
## [1]  0  7 14 21 28 35

# Calculate weekly mean using custom end points
weekly_mean <- period.apply(DFF, end_points, mean)
head(weekly_mean)
##                  DFF
## 1954-07-07 1.0014286
## 1954-07-14 1.2157143
## 1954-07-21 0.5714286
## 1954-07-28 0.6257143
## 1954-08-04 0.2700000
## 1954-08-11 1.3071429