Disclaimer: The content of this RMarkdown note came from a course called Importing and Managing Financial Data in R in datacamp.
If you’ve ever done anything with financial or economic time series, you know the data come in various shapes, sizes, and periodicities. Getting the data into R can be stressful and time-consuming, especially when you need to merge data from several different sources into one data set. This course will cover importing data from local files as well as from internet sources.
# Install packages
#install.packages("quantmod") #Once it's installed, you won't have to run this code again
#install.packages("xts")
#install.packages("Quandl")
# Load packages
library(quantmod) #for use of getSymbols()
library(xts)
library(Quandl) #for useof Quandl()
A wealth of financial and economic data are available online. Learn how getSymbols() and Quandl() make it easy to access data from a variety of sources.
The getSymbols()
function from the quantmod package provides a consistent interface to import data from various sources into your workspace. By default, getSymbols()
imports the data as a xts object.
getSymbols()
data sources:
# Import QQQ data from Yahoo! Finance
getSymbols(Symbols = "QQQ", auto_assign = TRUE)
## [1] "QQQ"
# Look at the structure of the object getSymbols created
str(QQQ)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
## Data: num [1:2643, 1:6] 47.8 47.7 48.4 48.3 48.4 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "QQQ.Open" "QQQ.High" "QQQ.Low" "QQQ.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2017-07-01 10:25:56"
# Look at the first few rows of QQQ
head(QQQ)
## QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
## 2007-01-03 47.832 48.492 46.797 43.24 167689500 39.28798
## 2007-01-04 47.656 48.657 47.491 44.06 136853500 40.03301
## 2007-01-05 48.371 48.371 47.854 43.85 138958800 39.84221
## 2007-01-08 48.305 48.558 48.030 43.88 106401600 39.86947
## 2007-01-09 48.437 48.745 48.019 44.10 121577500 40.06936
## 2007-01-10 48.382 49.152 48.228 44.62 121070100 40.54183
Import data from Google Finance and FRED. Google Finance is a source similar to Yahoo! Finance. FRED is an online database of economic time series data created and maintained by the Federal Reserve Bank of St. Louis.
getSymbols()
imports data from Yahoo! Finance by default because src = "yahoo"
by default. The src
values for Google Finance and FRED are "google"
and "FRED"
, respectively.
# Import QQQ data from Google Finance
getSymbols(Symbols = "QQQ", src = "google")
## [1] "QQQ"
# Look at the structure of QQQ
str(QQQ)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
## Data: num [1:2642, 1:5] 43.5 43.3 43.9 43.9 44 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:5] "QQQ.Open" "QQQ.High" "QQQ.Low" "QQQ.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "google"
## $ updated: POSIXct[1:1], format: "2017-07-01 10:25:57"
# Import GDP data from FRED
getSymbols(Symbols = "GDP", src = "FRED")
## [1] "GDP"
# Look at the structure of GDP
str(GDP)
## An 'xts' object on 1947-01-01/2017-01-01 containing:
## Data: num [1:281, 1] 243 246 250 260 266 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr "GDP"
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "FRED"
## $ updated: POSIXct[1:1], format: "2017-07-01 10:25:58"
In the last exercise, getSymbols() automatically created an object for you, based on the symbol you provided. Sometimes you may want to assign the data to an object yourself, so you need getSymbols() to return the data instead.
# Assign SPY data to 'spy' object using auto.assign argument
spy <- getSymbols(Symbols = "SPY", auto.assign = FALSE)
# Look at the structure of the 'spy' object
str(spy)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
## Data: num [1:2643, 1:6] 177 176 176 175 176 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "SPY.Open" "SPY.High" "SPY.Low" "SPY.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2017-07-01 10:25:58"
# Assign JNJ data to 'jnj' object using env argument
jnj <- getSymbols(Symbols = "JNJ", env = NULL) #env = NULL does the same as auto.assign = FALSE
# Look at the structure of the 'jnj' object
str(jnj)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
## Data: num [1:2643, 1:6] 91.2 90.8 92.1 91.7 91.8 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "JNJ.Open" "JNJ.High" "JNJ.Low" "JNJ.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2017-07-01 10:25:58"
Similar to how the quantmod package provides getSymbols()
to import data from various sources, the Quandl package provides access to the Quandl databases via one simple function: Quandl()
.
Recall that getSymbols()
uses the Symbols
and src
arguments to specify the instrument and data source, respectively. The Quandl()
function specifies both the data source and the instrument via its cod
e argument, in the form “DATABASE/DATASET”.
Two other ways Quandl() differs from getSymbols() are:
If you plan on importing a lot of data using Quandl()
, you might consider opening a free account with them in order to get an API key. You can find your API key on your account settings page. Click here for more information.
# Provide your api_key so that you
#Quandl.api_key('your api_key')
# Import GDP data from FRED
gdp <- Quandl(code = "FRED/GDP")
# Look at the structure of the object returned by Quandl
str(gdp)
## 'data.frame': 281 obs. of 2 variables:
## $ Date : Date, format: "2017-01-01" "2016-10-01" ...
## $ Value: num 19027 18869 18675 18450 18282 ...
## - attr(*, "freq")= chr "quarterly"
Learn how to use the type
argument to Quandl()
to make it return an xts and a zoo object.
The possible values for type
are:
# Import GDP data from FRED as xts
gdp_xts <- Quandl(code = "FRED/GDP", type = "xts")
# Look at the structure of gdp_xts
str(gdp_xts)
## An 'xts' object on 1947 Q1/2017 Q1 containing:
## Data: num [1:281, 1] 243 246 250 260 266 ...
## Indexed by objects of class: [yearqtr] TZ:
## xts Attributes:
## NULL
# Import GDP data from FRED as zoo
gdp_zoo <- Quandl(code = "FRED/GDP", type = "zoo")
# Look at the structure of gdp_zoo
str(gdp_zoo)
## 'zooreg' series from 1947 Q1 to 2017 Q1
## Data: num [1:281] 243 246 250 260 266 ...
## Index: Class 'yearqtr' num [1:281] 1947 1947 1948 1948 1948 ...
## Frequency: 4
Before you can import data from any internet data source, you need to know the instrument identifier. These can generally be found on the website of the data source provider. In this exercise, you’ll need to search Google Finance to find the ticker symbol for Pfizer stock.
Note that some data series may not be available for download from certain providers. This may be true even if you can see the data displayed on their website in tables and/or charts. getSymbols() will throw an error if the data is not available for download.
# Create an object containing the Pfizer ticker symbol
symbol = "PFE"
# Use getSymbols to import the data
getSymbols(Symbols = symbol, src = "google")
## [1] "PFE"
# Look at the first few rows of data
head(PFE)
## PFE.Open PFE.High PFE.Low PFE.Close PFE.Volume
## 2007-01-03 26.15 26.42 25.98 26.29 40645100
## 2007-01-04 26.38 26.57 26.29 26.38 32258100
## 2007-01-05 26.55 26.63 26.17 26.30 31355800
## 2007-01-08 26.29 26.42 25.89 26.16 43224400
## 2007-01-09 26.24 26.34 26.04 26.17 31321400
## 2007-01-10 26.10 26.26 26.00 26.20 34548400
Oanda.com provides historical foreign exchange data for many currency pairs. Currency pairs are expressed as two currencies separated by a “/”, where the first currency is the “base” and the second is the “quote”. For example, the U.S. Dollar to Euro exchange rate would be expressed as "USD/EUR"
.
Since object names can’t contain “/”, getSymbols()
will convert USD/EUR
to a valid name by removing the “/”. For example, getSymbols("USD/EUR")
would automatically create an object named USDEUR
.
Also, Oanda.com only provides historical data for the past 180 days. A warning will be thrown if you request data from more than 180 days ago, but getSymbols()
will return as much data as possible. You can change the date range of your request by using the from
and to
arguments; both of which are dates expressed by character strings in "%Y-%m-%d"
format (e.g. "2016-02-06"
).
# There seems to be an error in IBM's system itself. It doesn't import data from Oanda.com
# Create a currency_pair object
currency_pair <- "GBP/CAD"
# Load British Pound to Canadian Dollar exchange rate data
getSymbols(Symbols = currency_pair, src = "oanda")
## [1] "GBPCAD"
# Examine object using str()
str(GBPCAD)
## An 'xts' object on 2017-01-03/2017-06-30 containing:
## Data: num [1:179, 1] 1.65 1.64 1.64 1.64 1.63 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr "GBP.CAD"
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "oanda"
## $ updated: POSIXct[1:1], format: "2017-07-01 10:26:01"
# Try to load data from 190 days ago
getSymbols(Symbols = currency_pair, from = Sys.Date() - 190, to = Sys.Date(), src = "oanda")
## [1] "GBPCAD"
Both getSymbols()
andQuandl()
provide access to the FRED database.
# Create a series_name object
series_name <- "UNRATE" # To find the series name, go to the FRED website and search for "United States civilian unemployment rate".
# Load the data using getSymbols
getSymbols(Symbols = series_name, src = "FRED")
## [1] "UNRATE"
# Create a quandl_code object
quandl_code <- "FRED/UNRATE"
# Load the data using Quandl
unemploy_rate <- Quandl(code = quandl_code)
You’ve learned how to import data from online sources, now it’s time to see how to extract columns from the imported data. After you’ve learned how to extract columns from a single object, you will explore how to import, transform, and extract data from multiple instruments.
The quantmod package provides several helper functions to extract specific columns from an object, based on the column name. Six of the functions extract a single column.
# Load data
DC <- read.csv("data/Importing and Managing Financial Data in R/DC.csv")
DC$Date <- as.POSIXct(DC$Date, format("%Y-%m-%d %H:%M:%S"))
DC <- xts(DC[, 2:6], order.by = DC$Date)
str(DC)
## An 'xts' object on 2016-01-16 01:00:00/2016-01-17 23:00:00 containing:
## Data: num [1:47, 1:5] 20.8 20.8 20.8 20.8 20.8 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:5] "DC.Open" "DC.High" "DC.Low" "DC.Close" ...
## Indexed by objects of class: [POSIXct,POSIXt] TZ: %Y-%m-%d %H:%M:%S
## xts Attributes:
## NULL
# Look at the head of DC
head(DC)
## DC.Open DC.High DC.Low DC.Close DC.Volume
## 2016-01-16 01:00:00 20.845 20.850 20.835 20.845 157
## 2016-01-16 02:00:00 20.845 20.850 20.835 20.845 214
## 2016-01-16 03:00:00 20.845 20.850 20.835 20.845 103
## 2016-01-16 04:00:00 20.845 20.855 20.835 20.845 180
## 2016-01-16 05:00:00 20.845 20.845 20.845 20.845 211
## 2016-01-16 06:00:00 20.845 20.845 20.840 20.845 35
# Extract the close column
dc_close <- Cl(DC)
# Look at the head of dc_close
head(dc_close)
## DC.Close
## 2016-01-16 01:00:00 20.845
## 2016-01-16 02:00:00 20.845
## 2016-01-16 03:00:00 20.845
## 2016-01-16 04:00:00 20.845
## 2016-01-16 05:00:00 20.845
## 2016-01-16 06:00:00 20.845
# Extract the volume column
dc_volume <- Vo(DC)
# Look at the head of dc_volume
head(dc_volume)
## DC.Volume
## 2016-01-16 01:00:00 157
## 2016-01-16 02:00:00 214
## 2016-01-16 03:00:00 103
## 2016-01-16 04:00:00 180
## 2016-01-16 05:00:00 211
## 2016-01-16 06:00:00 35
The quantmod package also provides several functions you can use to extract a specific set of columns. See help("OHLC.Transformations")
for more details. For example:
# Extract the high, low, and close columns
dc_hlc <- HLC(DC)
# Look at the head of dc_hlc
head(dc_hlc)
## DC.High DC.Low DC.Close
## 2016-01-16 01:00:00 20.850 20.835 20.845
## 2016-01-16 02:00:00 20.850 20.835 20.845
## 2016-01-16 03:00:00 20.850 20.835 20.845
## 2016-01-16 04:00:00 20.855 20.835 20.845
## 2016-01-16 05:00:00 20.845 20.845 20.845
## 2016-01-16 06:00:00 20.845 20.840 20.845
# Extract the open, high, low, close, and volume columns
dc_ohlcv <- OHLCV(DC)
# Look at the head of dc_ohlcv
head(dc_ohlcv)
## DC.Open DC.High DC.Low DC.Close DC.Volume
## 2016-01-16 01:00:00 20.845 20.850 20.835 20.845 157
## 2016-01-16 02:00:00 20.845 20.850 20.835 20.845 214
## 2016-01-16 03:00:00 20.845 20.850 20.835 20.845 103
## 2016-01-16 04:00:00 20.845 20.855 20.835 20.845 180
## 2016-01-16 05:00:00 20.845 20.845 20.845 20.845 211
## 2016-01-16 06:00:00 20.845 20.845 20.840 20.845 35
Sometimes you might have one object that contains the same price column for multiple instruments. Other times, you might have an object with price data (e.g. bid and/or ask prices) that do not have an explicit extractor function.
The quantmod package also provides a getPrice()
function that is flexible enough to handle these cases. You can extract a column with any name (e.g. bid, ask, trade) with it by specifying the name via the prefer
argument. It is also useful if you have an object that contains multiple instruments with the same price type, because you can use the symbol
argument to extract columns for a specific instrument.
# Download CME data for CL and BZ as an xts object
oil_data <- Quandl(code = c("CME/CLH2016", "CME/BZH2016"), type = "xts")
# Look at the column names of the oil_data object
colnames(oil_data)
## [1] "CME.CLH2016 - Open" "CME.CLH2016 - High"
## [3] "CME.CLH2016 - Low" "CME.CLH2016 - Last"
## [5] "CME.CLH2016 - Change" "CME.CLH2016 - Settle"
## [7] "CME.CLH2016 - Volume" "CME.CLH2016 - Open Interest"
## [9] "CME.BZH2016 - Open" "CME.BZH2016 - High"
## [11] "CME.BZH2016 - Low" "CME.BZH2016 - Last"
## [13] "CME.BZH2016 - Change" "CME.BZH2016 - Settle"
## [15] "CME.BZH2016 - Volume" "CME.BZH2016 - Open Interest"
# Extract the Open price for CLH2016
cl_open <- getPrice(oil_data, symbol = "CLH2016", prefer = "Open$") #$ is to prevent you from also retrieving 'Open Interest'
# Look at January, 2016 using xts' ISO-8601 subsetting
cl_open["2016-01"]
## CME.CLH2016 - Open
## 2016-01-04 38.75
## 2016-01-05 38.07
## 2016-01-06 37.39
## 2016-01-07 35.35
## 2016-01-08 34.50
## 2016-01-11 34.11
## 2016-01-12 32.24
## 2016-01-13 31.60
## 2016-01-14 31.47
## 2016-01-15 32.11
## 2016-01-19 30.17
## 2016-01-20 29.47
## 2016-01-21 28.35
## 2016-01-22 29.84
## 2016-01-25 32.05
## 2016-01-26 29.81
## 2016-01-27 30.55
## 2016-01-28 32.19
## 2016-01-29 33.70
Sometimes you need to aggregate and/or transform raw data before you can continue your analysis. The Quandl()
function makes this easy by allowing you to request common aggregations and transformations. You just need to specify the collapse
and/or transform
arguments and the Quandl API takes care of the details for you.
The collapse argument gives simply the last observation of the period. Note that aggregations are peroformed before transformations.
trasnforming with Quandl:
# CL and BZ Quandl codes
quandl_codes <- c("CME/CLH2016","CME/BZH2016")
# Download quarterly CL and BZ prices
qtr_price <- Quandl(quandl_codes, collapse = "quarterly", type = "xts") #collapse gives the last observation of the period
# View the high prices for both series
Hi(qtr_price)
## CME.CLH2016 - High CME.BZH2016 - High
## 2010 Q4 92.94 NA
## 2011 Q1 101.43 NA
## 2011 Q2 100.19 NA
## 2011 Q3 91.59 NA
## 2011 Q4 0.00 0.00
## 2012 Q1 0.00 0.00
## 2012 Q2 0.00 0.00
## 2012 Q3 0.00 0.00
## 2012 Q4 0.00 0.00
## 2013 Q1 0.00 0.00
## 2013 Q2 0.00 0.00
## 2013 Q3 0.00 0.00
## 2013 Q4 0.00 0.00
## 2014 Q1 86.00 0.00
## 2014 Q2 93.05 NA
## 2014 Q3 0.00 NA
## 2014 Q4 61.03 69.45
## 2015 Q1 57.17 64.45
## 2015 Q2 61.63 66.72
## 2015 Q3 47.95 51.37
## 2015 Q4 38.87 38.34
## 2016 Q1 32.05 35.00
# Download quarterly CL and BZ returns
qtr_return <- Quandl(quandl_codes, collapse = "quarterly", type = "xts", transform = "rdiff")
# View the settle price returns for both series
getPrice(qtr_return, prefer = "Settle")
## CME.CLH2016 - Settle CME.BZH2016 - Settle
## 2011 Q1 0.091349258 NA
## 2011 Q2 -0.012225180 NA
## 2011 Q3 -0.085836910 NA
## 2011 Q4 -0.012337591 NA
## 2012 Q1 0.036038028 0.040508511
## 2012 Q2 -0.074797268 -0.040381031
## 2012 Q3 0.016030446 0.029779888
## 2012 Q4 0.008286039 0.006496228
## 2013 Q1 -0.011595182 0.001561524
## 2013 Q2 -0.045899772 -0.041679659
## 2013 Q3 0.025784887 0.027440347
## 2013 Q4 -0.011521005 0.038847250
## 2014 Q1 0.014480810 0.008535718
## 2014 Q2 0.079377974 0.052191436
## 2014 Q3 -0.069454897 -0.070669348
## 2014 Q4 -0.295436164 -0.306543019
## 2015 Q1 -0.073958675 -0.073551263
## 2015 Q2 0.088188419 0.071692061
## 2015 Q3 -0.229129373 -0.237503741
## 2015 Q4 -0.194215748 -0.260647694
## 2016 Q1 -0.175268535 -0.079108044
There may be times where you need to aggregate or transform your data in ways Quandl()
does not support. In those cases, you can use the flexibility of R.
One paradigm you can use in the quantmod workflow involves environments. Store all your data in one environment. Then you can use eapply()
to call a function on each object in the environment, much like what lapply()
does for each element of a list. Also like lapply()
, eapply()
returns a list.
Then you can merge all the elements of the list into one object by using do.call()
, which is like having R programmatically type and run a command for you. Instead of typing merge(my_list[[1]], my_list[[2]]], ...)
, you can type do.call(merge, my_list)
.
# Creat a new environment
data_env <- new.env()
# Load data into the environment
getSymbols(Symbols = c("QQQ"), env = data_env, auto.assign = TRUE) #This will not generate the exact same restult as is in datacamp but the point is to show how to combine objects
## [1] "QQQ"
# Call head on each object in data_env using eapply
data_list <- eapply(data_env, head)
head(data_list)
## $QQQ
## QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
## 2007-01-03 47.832 48.492 46.797 43.24 167689500 39.28798
## 2007-01-04 47.656 48.657 47.491 44.06 136853500 40.03301
## 2007-01-05 48.371 48.371 47.854 43.85 138958800 39.84221
## 2007-01-08 48.305 48.558 48.030 43.88 106401600 39.86947
## 2007-01-09 48.437 48.745 48.019 44.10 121577500 40.06936
## 2007-01-10 48.382 49.152 48.228 44.62 121070100 40.54183
# Merge all the list elements into one xts object
data_merged <- do.call(merge, data_list)
head(data_merged)
## QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
## 2007-01-03 47.832 48.492 46.797 43.24 167689500 39.28798
## 2007-01-04 47.656 48.657 47.491 44.06 136853500 40.03301
## 2007-01-05 48.371 48.371 47.854 43.85 138958800 39.84221
## 2007-01-08 48.305 48.558 48.030 43.88 106401600 39.86947
## 2007-01-09 48.437 48.745 48.019 44.10 121577500 40.06936
## 2007-01-10 48.382 49.152 48.228 44.62 121070100 40.54183
# Ensure the columns are ordered: open, high, low, close
data_ohlc <- OHLC(data_merged)
head(data_ohlc)
## QQQ.Open QQQ.High QQQ.Low QQQ.Close
## 2007-01-03 47.832 48.492 46.797 43.24
## 2007-01-04 47.656 48.657 47.491 44.06
## 2007-01-05 48.371 48.371 47.854 43.85
## 2007-01-08 48.305 48.558 48.030 43.88
## 2007-01-09 48.437 48.745 48.019 44.10
## 2007-01-10 48.382 49.152 48.228 44.62
The previous exercise taught you how to use do.call(merge, eapply(env, fun))
to apply a function to each object in an environment and then combine all the results into one object.
Let’s use what you learned to solve a very common problem. Often you will need to load similar data for many instruments, extract a column, and create one object that contains that specific column for every instrument.
# Symbols
symbols <- c("AAPL", "MSFT", "IBM")
# Create new environment
data_env <- new.env()
# Load symbols into data_env
getSymbols(Symbols = symbols, env = data_env)
## [1] "AAPL" "MSFT" "IBM"
# Extract the close column from each object and merge into one xts object
close_data <- do.call(merge, eapply(data_env, Cl))
# View the head of close_data
head(close_data)
## AAPL.Close IBM.Close MSFT.Close
## 2007-01-03 83.80000 97.27 29.86
## 2007-01-04 85.66000 98.31 29.81
## 2007-01-05 85.04999 97.42 29.64
## 2007-01-08 85.47000 98.90 29.93
## 2007-01-09 92.57000 100.07 29.96
## 2007-01-10 97.00000 98.89 29.66
Learn how to simplify and streamline your workflow by taking advantage of the ability to customize default arguments to getSymbols()
. You will see how to customize defaults by data source, and then how to customize defaults by symbol. You will also learn how to handle problematic instrument symbols.
Change the default data source for getSymbols()
. The setDefaults()
function allows you to change the default argument values for getSymbols()
. Setting a default source can be useful if you use that source often.
# Set the default to pull data from Google Finance
setDefaults(getSymbols, src = "google")
# Get GOOG data
getSymbols(Symbols = "GOOG")
## [1] "GOOG"
# Verify the data was actually pulled from Google Finance
str(GOOG)
## An 'xts' object on 2007-01-03/2017-06-30 containing:
## Data: num [1:2642, 1:5] 233 234 241 244 242 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:5] "GOOG.Open" "GOOG.High" "GOOG.Low" "GOOG.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "google"
## $ updated: POSIXct[1:1], format: "2017-07-01 10:26:06"
Change the default value for the from
argument to getSymbols.yahoo()
, using setDefaults()
. Calling getDefaults()
will show you the current default values (if there are any).
# Look at getSymbols.yahoo arguments
args(getSymbols.yahoo)
## function (Symbols, env, return.class = "xts", index.class = "Date",
## from = "2007-01-01", to = Sys.Date(), ..., periodicity = "daily")
## NULL
# Set default 'from' value for getSymbols.yahoo
setDefaults(getSymbols.yahoo, from = "2000-01-01")
# Confirm defaults were set correctly
getDefaults("getSymbols.yahoo")
## $from
## [1] "'2000-01-01'"
Use setSymbolLookup()
to specify the default data source for an instrument.
It is useful to set a single symbol to be pulled from a specific source, if multiple sources use the same symbol for different instruments. For example, getSymbols(“CP”, src = “yahoo”) would load Canadian Pacific Railway data from the New York Stock Exchange. But getSymbols(“CP”, src = “FRED”) would load Corporate Profits After Tax from the U.S. Bureau of Economic Analysis.
# Load CP data using defaults where default source is yahoo
getSymbols("CP", src = "yahoo") # Canadian railway stock prices
## [1] "CP"
# Look at the first few rows of CP
head(CP)
## CP.Open CP.High CP.Low CP.Close CP.Volume CP.Adjusted
## 2000-01-03 29.069 29.236 27.896 20.8750 358200 7.787133
## 2000-01-04 27.729 29.153 27.729 21.0000 1160200 7.833765
## 2000-01-05 28.147 29.153 28.147 21.5625 852200 8.043595
## 2000-01-06 29.404 30.744 29.236 22.6250 1960400 8.439946
## 2000-01-07 30.493 31.163 29.990 23.0000 1758800 8.579834
## 2000-01-10 30.912 31.331 30.074 23.1875 1273600 8.649777
# Set the source for CP to FRED
setSymbolLookup("CP" = "FRED") # corporate profits from USBEA
# Load CP data again
getSymbols("CP")
## [1] "CP"
# Look at the first few rows of CP
head(CP)
## CP.Open CP.High CP.Low CP.Close CP.Volume
## 2007-01-03 53.43 53.68 52.40 52.74 207200
## 2007-01-04 52.42 52.48 51.54 51.62 238700
## 2007-01-05 51.63 51.77 51.39 51.53 180000
## 2007-01-08 51.23 52.14 51.21 51.94 276200
## 2007-01-09 51.44 52.42 51.44 52.25 419300
## 2007-01-10 52.21 52.41 51.92 52.07 206100
Changing default settings only last for the current session. Save and load symbol-based defaults by using saveSymbolLookup()
and loadSymbolLookup()
, respectively. You can use the file
arguments to specify a file to store your defaults.
# Save symbol lookup table
saveSymbolLookup("my_symbol_lookup.rda")
# Set default source for CP to "yahoo"
setSymbolLookup(CP = list(src = "yahoo"))
# Verify the default source is "yahoo"
getSymbolLookup("CP")
## $CP
## $CP$src
## [1] "yahoo"
# Load symbol lookup table
loadSymbolLookup("my_symbol_lookup.rda")
# Verify the default source is "FRED"
getSymbolLookup("CP")
## $CP
## $CP$src
## [1] "FRED"
Problem: Sometimes, you download data for an instrument that does not have a syntactically valid name. Syntactically valid names contain letters, numbers, “.”, and “_“, and must start with a letter or a”." followed by a non-number.
For example, the symbol for Berkshire Hathaway class A shares is “BRK-A”, which is not a syntactically valid name because it contains a “-” character. Another example are Chinese stocks, which have symbols composed of numbers. The Yahoo Finance symbol for the SSE Composite Index is “000001.SS”.
Solution: To access objects that do not have syntactically valid names, you can use:
get
function or# Load BRK-A data
getSymbols(Symbols = "BRK-A")
## [1] "BRK-A"
# Use backticks and head() to look at the loaded data
head(`BRK-A`)
## BRK-A.Open BRK-A.High BRK-A.Low BRK-A.Close BRK-A.Volume
## 2007-01-03 110400 110600 109000.0 109000.0 670
## 2007-01-04 109400 109400 108250.0 108850.0 460
## 2007-01-05 108800 108800 107000.1 107200.0 420
## 2007-01-08 107600 107800 107300.0 107500.1 340
## 2007-01-09 107510 109600 107510.0 109350.0 400
## 2007-01-10 109350 109550 108500.0 109490.0 370
# Use get() to assign the BRK-A data to an object named BRK.A
BRK.A <- get("BRK-A")
If you are only downloading data for a single symbol and that symbol is not a syntactically valid name, you can set auto.assign = FALSE
in your call to getSymbols()
.
# Create BRK.A object
BRK.A <- getSymbols("BRK-A", auto.assign = FALSE)
# Create col_names object with the column names of BRK.A
col_names <- colnames(BRK.A)
# Set BRK.A column names to syntactically valid names
colnames(BRK.A) <- make.names(col_names)
head(BRK.A)
## BRK.A.Open BRK.A.High BRK.A.Low BRK.A.Close BRK.A.Volume
## 2007-01-03 110400 110600 109000.0 109000.0 670
## 2007-01-04 109400 109400 108250.0 108850.0 460
## 2007-01-05 108800 108800 107000.1 107200.0 420
## 2007-01-08 107600 107800 107300.0 107500.1 340
## 2007-01-09 107510 109600 107510.0 109350.0 400
## 2007-01-10 109350 109550 108500.0 109490.0 370
# Alternative
#colnames(BRK.A) <- paste("BRK.A", c("Open", "High", "Low", "Close", "Volume"), sep = ".")
Not only can you use setSymbolLookup() to set a default data source for getSymbols()
, you can also use setSymbolLookup()
to create a mapping between the instrument symbol and the name of the R object.
This is useful if you want to download data for a lot symbols that are not syntactically valid names, or symbols that have names that conflict with other R variable names.
An example of a name that conflicts is the symbol for AT&T’s stock, T
, which is often used as a short form for the logical value TRUE
.
# Set name for BRK-A to BRK.A
setSymbolLookup(BRK.A = list(name = "BRK-A"))
# Set name for T (AT&T) to ATT
setSymbolLookup(ATT = list(name = "T"))
# Load BRK.A and ATT data
getSymbols(c("BRK.A", "ATT"))
## [1] "BRK.A" "ATT"
You’ve learned how to import, extract, and transform data from multiple data sources. You often have to manipulate data from different sources in order to combine them into a single data set. First, you will learn how to convert sparse, irregular data into a regular series. Then you will review how to aggregate dense data to a lower frequency. Finally, you will learn how to handle issues with intra-day data.
Create a regular sequence of date-times that span the dates of your irregular data set.
# Import data
irregular_xts <- read.csv("data/Importing and Managing Financial Data in R/irregular_xts.csv")
irregular_xts$date <- as.Date(irregular_xts$date, format = "%m/%d/%Y")
irregular_xts <- xts(irregular_xts[, 2], order.by = irregular_xts$date)
irregular_xts
## [,1]
## 2016-01-02 4
## 2016-01-05 21
## 2016-01-07 1
## 2016-01-11 34
# Extract the start date of the series
start_date <- start(irregular_xts)
# Extract the end date of the series
end_date <- end(irregular_xts)
# Create a regular date-time sequence
regular_index <- seq(from = start_date, to = end_date, by = "day")
# Create a zero-width xts object
regular_xts <- xts(, order.by = regular_index)
Create a regular xts object from an irregular one.
# Merge irregular_xts and regular_xts
merged_xts <- merge(irregular_xts, regular_xts)
# Look at the first few rows of merged_xts
head(merged_xts)
## irregular_xts
## 2016-01-02 4
## 2016-01-03 NA
## 2016-01-04 NA
## 2016-01-05 21
## 2016-01-06 NA
## 2016-01-07 1
# Use the fill argument to fill NA with their previous value
merged_filled_xts <- merge(irregular_xts, regular_xts, fill = na.locf)
# Look at the first few rows of merged_filled_xts
head(merged_filled_xts)
## irregular_xts
## 2016-01-02 4
## 2016-01-03 4
## 2016-01-04 4
## 2016-01-05 21
## 2016-01-06 21
## 2016-01-07 1
Sometimes you may have two series with the same periodicity, but that use different conventions to represent a timestamp. For example:
If you then try to merge FEDFUNDS with the monthly aggregate of DFF, the result will have a lot of NA because the timestamps representing the month are the first and last days of the month, respectively.
# Import data
getSymbols(c("FEDFUNDS", "DFF"), src = "FRED")
## [1] "FEDFUNDS" "DFF"
# Aggregate DFF to monthly
monthly_fedfunds <- apply.monthly(DFF, mean, na.rm = TRUE)
head(monthly_fedfunds) # monthly aggregates with the timestaps representing the last days of the month
## DFF
## 1954-07-31 0.7993548
## 1954-08-31 1.2206452
## 1954-09-30 1.0666667
## 1954-10-31 0.8487097
## 1954-11-30 0.8336667
## 1954-12-31 1.2777419
# Convert index to yearmon
index(monthly_fedfunds) <- as.yearmon(index(monthly_fedfunds))
head(monthly_fedfunds) # monthly aggregates with timestaps representing only months not days
## DFF
## Jul 1954 0.7993548
## Aug 1954 1.2206452
## Sep 1954 1.0666667
## Oct 1954 0.8487097
## Nov 1954 0.8336667
## Dec 1954 1.2777419
head(FEDFUNDS) # monthly series with the timestamps representing the first days of the month
## FEDFUNDS
## 1954-07-01 0.80
## 1954-08-01 1.22
## 1954-09-01 1.06
## 1954-10-01 0.85
## 1954-11-01 0.83
## 1954-12-01 1.28
# Merge FEDFUNDS with the monthly aggregate
merged_fedfunds <- merge(FEDFUNDS, monthly_fedfunds)
# Look at the first few rows of the merged object
head(merged_fedfunds)
## FEDFUNDS DFF
## 1954-07-01 0.80 0.7993548
## 1954-08-01 1.22 1.2206452
## 1954-09-01 1.06 1.0666667
## 1954-10-01 0.85 0.8487097
## 1954-11-01 0.83 0.8336667
## 1954-12-01 1.28 1.2777419
When you can’t use convenience classes like yearmon to represent timestamps, like you did in the previous exercise, manually align merged data to the timestamp representation you prefer.
# Load monthly_fedfunds and merged_fedfunds without converting the monthly_fedfunds index to yearmon first.
monthly_fedfunds <- apply.monthly(DFF, mean)
merged_fedfunds <- merge(FEDFUNDS, monthly_fedfunds)
# Look at the first few rows of merged_fedfunds
head(merged_fedfunds)
## FEDFUNDS DFF
## 1954-07-01 0.80 NA
## 1954-07-31 NA 0.7993548
## 1954-08-01 1.22 NA
## 1954-08-31 NA 1.2206452
## 1954-09-01 1.06 NA
## 1954-09-30 NA 1.0666667
# Fill NA forward
merged_fedfunds_locf <- na.locf(merged_fedfunds)
head(merged_fedfunds_locf)
## FEDFUNDS DFF
## 1954-07-01 0.80 NA
## 1954-07-31 0.80 0.7993548
## 1954-08-01 1.22 0.7993548
## 1954-08-31 1.22 1.2206452
## 1954-09-01 1.06 1.2206452
## 1954-09-30 1.06 1.0666667
# Extract index values containing last day of month
aligned_last_day <- merged_fedfunds_locf[index(monthly_fedfunds)]
head(aligned_last_day)
## FEDFUNDS DFF
## 1954-07-31 0.80 0.7993548
## 1954-08-31 1.22 1.2206452
## 1954-09-30 1.06 1.0666667
## 1954-10-31 0.85 0.8487097
## 1954-11-30 0.83 0.8336667
## 1954-12-31 1.28 1.2777419
# Fill NA backward
merged_fedfunds_locb <- na.locf(merged_fedfunds, fromLast = TRUE)
head(merged_fedfunds_locb)
## FEDFUNDS DFF
## 1954-07-01 0.80 0.7993548
## 1954-07-31 1.22 0.7993548
## 1954-08-01 1.22 1.2206452
## 1954-08-31 1.06 1.2206452
## 1954-09-01 1.06 1.0666667
## 1954-09-30 0.85 1.0666667
# Extract index values containing first day of month
aligned_first_day <- merged_fedfunds_locb[index(FEDFUNDS)]
head(aligned_first_day)
## FEDFUNDS DFF
## 1954-07-01 0.80 0.7993548
## 1954-08-01 1.22 1.2206452
## 1954-09-01 1.06 1.0666667
## 1954-10-01 0.85 0.8487097
## 1954-11-01 0.83 0.8336667
## 1954-12-01 1.28 1.2777419
Aggregate daily data to weekly, but with weeks ending on Wednesdays, by using period.apply()
and endpoints()
. This is often done in stock market research papers to avoid some of the intra-week seasonality.
# Extract index weekdays
index_weekdays <- .indexwday(DFF)
head(index_weekdays)
## [1] 4 5 6 0 1 2
# Find locations of Wednesdays
wednesdays <- which(index_weekdays == 3)
head(wednesdays)
## [1] 7 14 21 28 35 42
# Create custom end points
end_points <- c(0, wednesdays, nrow(DFF))
head(end_points)
## [1] 0 7 14 21 28 35
# Calculate weekly mean using custom end points
weekly_mean <- period.apply(DFF, end_points, mean)
head(weekly_mean)
## DFF
## 1954-07-07 1.0014286
## 1954-07-14 1.2157143
## 1954-07-21 0.5714286
## 1954-07-28 0.6257143
## 1954-08-04 0.2700000
## 1954-08-11 1.3071429