The getSymbols() function from the quantmod package provides a consistent interface to import data from various sources into your workspace. By default, getSymbols() imports the data as a xts object.
This exercise will introduce you to getSymbols(). You will use it to import QQQ data from Yahoo! Finance. QQQ is an exchange-traded fund that tracks the Nasdaq 100 index, and Yahoo! Finance is the default data source for getSymbols().
You use the Symbols argument to specify the instrument (i.e. the ticker symbol) you want to import. Since Symbols is the first argument to getSymbols(), you usually just type the instrument name and omit Symbols =.
# Load the quantmod package
library(quantmod)
# Import QQQ data from Yahoo! Finance
getSymbols(Symbols = "QQQ", auto.assign = TRUE)
[1] "QQQ"
# Look at the structure of the object getSymbols created
str(QQQ)
An ‘xts’ object on 2000-01-03/2020-07-31 containing:
Data: num [1:5178, 1:6] 96.2 92 87.5 86.9 82.9 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "QQQ.Open" "QQQ.High" "QQQ.Low" "QQQ.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2020-08-03 18:02:04"
# Look at the first few rows of QQQ
head(QQQ)
QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
2000-01-03 96.1875 96.1875 90.7500 94.75000 36345200 82.61639
2000-01-04 92.0000 93.5000 87.9375 88.25000 33786600 76.94878
2000-01-05 87.5000 89.6250 84.2500 86.00000 42496600 74.98690
2000-01-06 86.8750 88.0000 79.7500 80.09375 37134800 69.83700
2000-01-07 82.9375 90.0000 82.5000 90.00000 28138200 78.47465
2000-01-10 91.0000 93.9375 89.9375 92.50000 29675600 80.65449
In the last exercise, you imported data from Yahoo! Finance. The src argument allows you to tell getSymbols() to import data from a different data source.
In this exercise, you will import data from Alpha Vantage and FRED. Alpha Vantage is a source similar to Yahoo! Finance. FRED is an online database of economic time series data created and maintained by the Federal Reserve Bank of St. Louis.
getSymbols() imports data from Yahoo! Finance by default because src = “yahoo” by default. The src values for Alpha Vantage and FRED are “av” and “FRED”, respectively.
# Import QQQ data from Alpha Vantage
# getSymbols(Symbols = "QQQ", src = "av")
# https://www.alphavantage.co/ to get API key
# Look at the structure of QQQ
# str(QQQ)
# Import GDP data from FRED
getSymbols(Symbols = "GDP", src = "FRED")
[1] "GDP"
# Look at the structure of GDP
str(GDP)
An ‘xts’ object on 1947-01-01/2020-04-01 containing:
Data: num [1:294, 1] 243 246 250 260 266 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "GDP"
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "FRED"
$ updated: POSIXct[1:1], format: "2020-08-03 18:02:05"
In the last exercise, getSymbols() automatically created an object named like the symbol you provided. This exercise will teach you to make getSymbols() return the data, so you can assign the output yourself.
There are two arguments that will make getSymbols() return the data:
The two methods are functionally equivalent, but auto.assign = FALSE describes the behavior better. Use it because you will be more likely to remember what auto.assign = FALSE means in the future.
# Assign SPY data to 'spy' using auto.assign argument
spy <- getSymbols(Symbols = "SPY", auto.assign = FALSE)
# Look at the structure of the 'spy' object
str(spy)
An ‘xts’ object on 2000-01-03/2020-07-31 containing:
Data: num [1:5178, 1:6] 148 144 140 140 140 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "SPY.Open" "SPY.High" "SPY.Low" "SPY.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2020-08-03 18:02:05"
# Assign JNJ data to 'jnj' using env argument
jnj <- getSymbols(Symbols = "JNJ", env = NULL)
# Look at the structure of the 'jnj' object
str(jnj)
An ‘xts’ object on 2000-01-03/2020-07-31 containing:
Data: num [1:5178, 1:6] 46.6 45.6 44.4 45.2 47.1 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "JNJ.Open" "JNJ.High" "JNJ.Low" "JNJ.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2020-08-03 18:02:06"
Turning off auto.assign is useful if you want to assign the data to an object yourself.
Similar to how the quantmod package provides getSymbols() to import data from various sources, the Quandl package provides access to the Quandl databases via one simple function: Quandl().
Recall that getSymbols() uses the Symbols and src arguments to specify the instrument and data source, respectively. The Quandl() function specifies both the data source and the instrument via its code argument, in the form “DATABASE/DATASET”.
Two other ways Quandl() differs from getSymbols() are:
If you plan on importing a lot of data using Quandl(), you might consider opening a free account with them in order to get an API key.
# Load the Quandl package
library(Quandl)
# Import GDP data from FRED
# gdp <- Quandl(code = "FRED/GDP")
# Look at the structure of the object returned by Quandl
# str(gdp)
Quandl provides access to a large amount of data series. Their website has documentation for it all!
The Quandl() function returns a data.frame by default. It can return other classes via the type argument.
The possible values for type are:
In this exercise, you will learn how to use the type argument to make Quandl() return an xts and a zoo object.
# Import GDP data from FRED as xts
# gdp_xts <- Quandl(code = "FRED/GDP", type = "xts")
# Look at the structure of gdp_xts
# str(gdp_xts)
# Import GDP data from FRED as zoo
# gdp_zoo <- Quandl(code = "FRED/GDP", type = "zoo")
# Look at the structure of gdp_zoo
# str(gdp_zoo)
Having the flexibility to return different data types is a great bonus and less work for you!
You need the instrument identifier to import data from an internet data source. They can often be found on the data source website. In this exercise, you will search Yahoo Finance for the ticker symbol for Pfizer stock.
Note that some sources may not provide data for certain symbols, even if you can see the data displayed on their website in tables and/or charts. getSymbols() will error if the data is not available for download.
# Create an object containing the Pfizer ticker symbol
symbol <- "PFE"
# Use getSymbols to import the data
getSymbols("PFE")
[1] "PFE"
# Look at the first few rows of data
head(PFE)
PFE.Open PFE.High PFE.Low PFE.Close PFE.Volume PFE.Adjusted
2000-01-03 32.0625 32.375 31.6250 31.8750 12213800 15.85582
2000-01-04 31.2500 31.500 30.0000 30.6875 13481000 15.26512
2000-01-05 30.8750 31.875 30.8125 31.1875 12316500 15.51383
2000-01-06 31.2500 32.750 31.1875 32.3125 10545800 16.07345
2000-01-07 34.0000 34.875 32.3125 34.5000 17041900 17.16160
2000-01-10 35.0000 35.000 34.0625 34.4375 9880200 17.13050
Looking up identifiers online is common when seeking data about a new instrument, so it’s good to get comfortable with the process!
Oanda.com provides historical foreign exchange data for many currency pairs. Currency pairs are expressed as two currencies, the “base” and the “quote”, separated by a “/”. For example, the U.S. Dollar to Euro exchange rate would be “USD/EUR”.
Note that getSymbols() will automatically convert “USD/EUR” to a valid name by removing the “/”. For example, getSymbols(“USD/EUR”) would create an object named USDEUR.
Also, Oanda.com only provides 180 days of historical data. getSymbols() will warn and return as much data as possible if you request data from more than 180 days ago. You can use the from and to arguments to set a date range; both should be strings in “%Y-%m-%d” format (e.g. “2016-02-06”).
quantmod::oanda.currencies contains a list of currencies provided by Oanda.com.
# Create a currency_pair object
currency_pair <- "GBP/CAD"
# Load British Pound to Canadian Dollar exchange rate data
getSymbols(currency_pair, src = "oanda")
[1] "GBP/CAD"
# Examine object using str()
str(GBPCAD)
An ‘xts’ object on 2020-02-06/2020-08-02 containing:
Data: num [1:179, 1] 1.72 1.72 1.71 1.71 1.72 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "GBP.CAD"
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "oanda"
$ updated: POSIXct[1:1], format: "2020-08-03 18:02:07"
# Try to load data from 190 days ago
getSymbols(currency_pair, from = Sys.Date() - 190, to = Sys.Date(), src = "oanda")
Oanda only provides historical data for the past 180 days. Symbol: GBP/CAD
[1] "GBP/CAD"
Both getSymbols() and Quandl() provide access to the FRED database. In this exercise, you will find the FRED symbol for the United States civilian unemployment rate. Then you will use the series name to download the data directly from FRED using getSymbols(), and from the Quandl database using Quandl().
Remember that getSymbols() specifies the data source using the src argument and that Quandl() specifies it as part of the Quandl code (i.e. database/series).
# Create a series_name object
series_name <- "UNRATE"
# Load the data using getSymbols
getSymbols(series_name, src = "FRED")
[1] "UNRATE"
# Create a quandl_code object
# quandl_code <- "FRED/UNRATE"
# Load the data using Quandl#
# unemploy_rate <- Quandl(quandl_code)
The quantmod package provides several helper functions to extract specific columns from an object, based on the column name. The Op(), Hi(), Lo(), Cl(), Vo(), and Ad() functions can be used to extract the open, high, low, close, volume, and adjusted close column, respectively.
In this exercise, you will use two of these functions on an xts object named DC. The DC object contains fictitious DataCamp OHLC (open, high, low, close) stock prices created by randomizing some real financial market data. DC is similar to the xts objects created by getSymbols().
While it’s not necessary to complete the exercise, you can learn more about all the extractor functions from help(“OHLC.Transformations”). ## Extracting columns from financial time series
load(file = "DC.RData")
DC <- DC[,c(1,2)]
DC <- to.hourly(DC, indexAt = "startof")
missing values removed from data
library(quantmod)
# Extract the close column
dc_close <- Cl(DC)
# Look at the head of dc_close
head(dc_close)
timezone of object (GMT) is different than current timezone ().
DC.Close
2016-01-16 00:00:44 20.84
2016-01-16 01:00:01 20.85
2016-01-16 02:00:00 20.85
2016-01-16 03:02:52 20.85
2016-01-16 04:02:37 20.85
2016-01-16 05:00:00 20.85
# Extract the volume column
dc_volume <- Vo(DC)
# Look at the head of dc_volume
head(dc_volume)
timezone of object (GMT) is different than current timezone ().
DC.Volume
2016-01-16 00:00:44 157
2016-01-16 01:00:01 214
2016-01-16 02:00:00 103
2016-01-16 03:02:52 180
2016-01-16 04:02:37 211
2016-01-16 05:00:00 35
The quantmod package provides functions to extract a single column, and also has functions to extract specific sets of columns.
Recall OHLC stands for open, high, low, close. Now you can guess which columns the OHLC() and HLC() functions extract. There’s also an OHLCV() function, which adds the volume column.
These functions are helpful when you need to pass a set of columns to another function. For example, you might need to pass the high, low, and close columns (in that order) to a technical indicator function.
# Extract the high, low, and close columns
dc_hlc = HLC(DC)
# Look at the head of dc_hlc
head(dc_hlc)
timezone of object (GMT) is different than current timezone ().
DC.High DC.Low DC.Close
2016-01-16 00:00:44 20.85 20.83 20.84
2016-01-16 01:00:01 20.85 20.83 20.85
2016-01-16 02:00:00 20.85 20.84 20.85
2016-01-16 03:02:52 20.85 20.84 20.85
2016-01-16 04:02:37 20.85 20.84 20.85
2016-01-16 05:00:00 20.85 20.84 20.85
# Extract the open, high, low, close, and volume columns
dc_ohlcv = OHLCV(DC)
# Look at the head of dc_ohlcv
head(dc_ohlcv)
timezone of object (GMT) is different than current timezone ().
DC.Open DC.High DC.Low DC.Close DC.Volume
2016-01-16 00:00:44 20.84 20.85 20.83 20.84 157
2016-01-16 01:00:01 20.85 20.85 20.83 20.85 214
2016-01-16 02:00:00 20.85 20.85 20.84 20.85 103
2016-01-16 03:02:52 20.85 20.85 20.84 20.85 180
2016-01-16 04:02:37 20.85 20.85 20.84 20.85 211
2016-01-16 05:00:00 20.84 20.85 20.84 20.85 35
The extractor functions you learned in the previous two exercises do not cover all use cases. Sometimes you might have one object that contains the same price column for multiple instruments. Other times, you might have an object with price data (e.g. bid, ask, trade) that do not have an explicit extractor function.
The getPrice() function in the quantmod package can extract any column by name by using the prefer argument. It can also extract columns for a specific instrument by using the symbol argument, which is useful when an object contains several instruments with the same price type.
You can use regular expressions for both the prefer and symbol arguments, because they are passed to the base::grep() function internally.
# Download CME data for CL and BZ as an xts object
oil_data <- Quandl(code = c("CHRIS/CME_QX7", "CFTC/067653_FO_L_ALL_CR"), type = "xts")
# Look at the column names of the oil_data object
colnames(oil_data)
[1] "CHRIS.CME_QX7 - Open"
[2] "CHRIS.CME_QX7 - High"
[3] "CHRIS.CME_QX7 - Low"
[4] "CHRIS.CME_QX7 - Last"
[5] "CHRIS.CME_QX7 - Change"
[6] "CHRIS.CME_QX7 - Settle"
[7] "CHRIS.CME_QX7 - Volume"
[8] "CHRIS.CME_QX7 - Previous Day Open Interest"
[9] "CFTC.067653_FO_L_ALL_CR - Largest 4 Longs; Gross"
[10] "CFTC.067653_FO_L_ALL_CR - Largest 4 Shorts; Gross"
[11] "CFTC.067653_FO_L_ALL_CR - Largest 8 Longs; Gross"
[12] "CFTC.067653_FO_L_ALL_CR - Largest 8 Shorts; Gross"
[13] "CFTC.067653_FO_L_ALL_CR - Largest 4 Longs; Net"
[14] "CFTC.067653_FO_L_ALL_CR - Largest 4 Shorts; Net"
[15] "CFTC.067653_FO_L_ALL_CR - Largest 8 Longs; Net"
[16] "CFTC.067653_FO_L_ALL_CR - Largest 8 Shorts; Net"
# Extract the Open price for CLH2016
cl_open <- getPrice(oil_data, symbol = "CME_QX7", prefer = "Open$")
# Look at January, 2016 using xts' ISO-8601 subsetting
cl_open["2016-01"]
CHRIS.CME_QX7 - Open
2016-01-04 39.42
2016-01-05 39.40
2016-01-06 39.40
2016-01-07 40.20
2016-01-08 40.65
2016-01-11 40.70
2016-01-12 40.68
2016-01-13 40.00
2016-01-14 40.00
2016-01-15 40.00
2016-01-19 39.90
2016-01-20 39.90
2016-01-21 39.00
2016-01-22 38.58
2016-01-25 38.55
2016-01-26 39.13
2016-01-27 39.13
2016-01-28 38.47
2016-01-29 38.47
getPrice() is a very flexible way to retrieve the columns you need.
Sometimes you need to aggregate and/or transform raw data before you can continue your analysis. The Quandl() function allows you to specify common aggregations and transformations via the collapse and/or transform arguments. The Quandl API takes care of the details for you.
# Download quarterly CL and BZ prices
qtr_price <- Quandl(code = c("CHRIS/CME_QM1", "CHRIS/CME_QG1"), collapse = "quarterly",type = "xts")
# View the high prices for both series
Hi(qtr_price)
CHRIS.CME_QM1 - High CHRIS.CME_QG1 - High
2014 Q1 101.950 4.485
2014 Q2 105.800 4.490
2014 Q3 94.925 4.175
2014 Q4 56.600 3.145
2015 Q1 48.750 2.685
2015 Q2 59.700 2.840
2015 Q3 45.850 2.610
2015 Q4 37.775 2.375
2016 Q1 39.025 2.030
2016 Q2 49.600 2.940
2016 Q3 48.300 2.975
2016 Q4 54.100 3.850
2017 Q1 50.850 3.245
2017 Q2 46.375 3.055
2017 Q3 51.775 3.040
2017 Q4 60.500 3.010
2018 Q1 65.275 2.765
2018 Q2 74.475 2.955
2018 Q3 73.725 3.060
2018 Q4 46.525 3.145
2019 Q1 60.725 2.725
2019 Q2 59.800 2.365
2019 Q3 56.600 2.395
2019 Q4 61.900 2.205
2020 Q1 21.875 1.730
2020 Q2 40.100 1.780
2020 Q3 40.550 1.860
# Download quarterly CL and BZ returns
qtr_return <- Quandl(code = c("CHRIS/CME_QM1", "CHRIS/CME_QG1"), collapse = "quarterly",transform = "rdiff", type = "xts")
# View the settle price returns for both series
getPrice(qtr_return, prefer = "Settle")
CHRIS.CME_QM1 - Settle CHRIS.CME_QG1 - Settle
2014 Q2 0.037310494 0.020590254
2014 Q3 -0.134858119 -0.076216095
2014 Q4 -0.399627029 -0.264013589
2015 Q1 -0.130275900 -0.129574679
2015 Q2 0.249369748 0.072727273
2015 Q3 -0.241802590 -0.108757062
2015 Q4 -0.178531825 -0.074088748
2016 Q1 0.035097192 -0.161745828
2016 Q2 0.260563380 0.492598264
2016 Q3 -0.001862197 -0.006155951
2016 Q4 0.113598673 0.281486579
2017 Q1 -0.058078928 -0.143394200
2017 Q2 -0.090118577 -0.048589342
2017 Q3 0.122284970 -0.009225700
2017 Q4 0.169343913 -0.017958098
2018 Q1 0.074809666 -0.074500508
2018 Q2 0.141823221 0.069886572
2018 Q3 -0.012137559 0.028727770
2018 Q4 -0.380068259 -0.022606383
2019 Q1 0.324377890 -0.094557823
2019 Q2 -0.027768540 -0.132982720
2019 Q3 -0.075252266 0.009532062
2019 Q4 0.129276863 -0.060515021
2020 Q1 -0.664592204 -0.250799452
2020 Q2 0.917480469 0.067682927
2020 Q3 0.025464731 0.027412907
What if you need to aggregate or transform your data in ways Quandl() does not support? In those cases, you can use the flexibility of R.
One paradigm involves importing data into a new environment. Then you can use eapply() to call a function on each object in the environment, much like what lapply() does for each element of a list. Also like lapply(), eapply() returns a list.
Then you can merge all the elements of the list into one object by using do.call(), which is like having R programmatically type and run a command for you. Instead of typing merge(my_list[[1]], my_list[[2]]], …), you can type do.call(merge, my_list).
# Create new environment
data_env <- new.env()
# Use getSymbols to load data into the environment
getSymbols(c("SPY", "QQQ"), env = data_env, auto.assign = TRUE)
[1] "SPY" "QQQ"
# Look at a few rows of the SPY data
head(data_env$SPY, 3)
SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
2000-01-03 148.2500 148.2500 143.8750 145.4375 8164300 98.95903
2000-01-04 143.5312 144.0625 139.6406 139.7500 8089800 95.08910
2000-01-05 139.9375 141.5312 137.2500 140.0000 12177900 95.25919
# Look at a few rows of the SPY data
eapply(data_env, head)
$QQQ
QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
2000-01-03 96.1875 96.1875 90.7500 94.75000 36345200 82.61639
2000-01-04 92.0000 93.5000 87.9375 88.25000 33786600 76.94878
2000-01-05 87.5000 89.6250 84.2500 86.00000 42496600 74.98690
2000-01-06 86.8750 88.0000 79.7500 80.09375 37134800 69.83700
2000-01-07 82.9375 90.0000 82.5000 90.00000 28138200 78.47465
2000-01-10 91.0000 93.9375 89.9375 92.50000 29675600 80.65449
$SPY
SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
2000-01-03 148.2500 148.2500 143.8750 145.4375 8164300 98.95903
2000-01-04 143.5312 144.0625 139.6406 139.7500 8089800 95.08910
2000-01-05 139.9375 141.5312 137.2500 140.0000 12177900 95.25919
2000-01-06 139.6250 141.5000 137.7500 137.7500 6227200 93.72825
2000-01-07 140.3125 145.7500 140.0625 145.7500 8066500 99.17164
2000-01-10 146.2500 146.9062 145.0312 146.2500 5741700 99.51184
# Call head on each object in data_env using eapply
data_list <- eapply(data_env, head)
# Merge all the list elements into one xts object
data_merged <- do.call(merge, data_list)
# Ensure the columns are ordered: open, high, low, close
data_ohlc <- OHLC(data_merged)
data_ohlc
QQQ.Open SPY.Open QQQ.High SPY.High QQQ.Low SPY.Low QQQ.Close
2000-01-03 96.1875 148.2500 96.1875 148.2500 90.7500 143.8750 94.75000
2000-01-04 92.0000 143.5312 93.5000 144.0625 87.9375 139.6406 88.25000
2000-01-05 87.5000 139.9375 89.6250 141.5312 84.2500 137.2500 86.00000
2000-01-06 86.8750 139.6250 88.0000 141.5000 79.7500 137.7500 80.09375
2000-01-07 82.9375 140.3125 90.0000 145.7500 82.5000 140.0625 90.00000
2000-01-10 91.0000 146.2500 93.9375 146.9062 89.9375 145.0312 92.50000
SPY.Close
2000-01-03 145.4375
2000-01-04 139.7500
2000-01-05 140.0000
2000-01-06 137.7500
2000-01-07 145.7500
2000-01-10 146.2500
# Extract volume column from each object
adjusted_list <- lapply(data_env, Ad)
# Merge each list element into one object
adjusted <- do.call(merge, adjusted_list)
head(adjusted)
QQQ.Adjusted SPY.Adjusted
2000-01-03 82.61639 98.95903
2000-01-04 76.94878 95.08910
2000-01-05 74.98690 95.25919
2000-01-06 69.83700 93.72825
2000-01-07 78.47465 99.17164
2000-01-10 80.65449 99.51184
The previous exercise taught you how to use do.call(merge, eapply(env, fun)) to apply a function to each object in an environment and then combine all the results into one object.
Let’s use what you learned to solve a very common problem. Often you will need to load similar data for many instruments, extract a column, and create one object that contains that specific column for every instrument.
# Symbols
symbols <- c("AAPL", "MSFT", "IBM")
# Create new environment
data_env <- new.env()
# Load symbols into data_env
getSymbols(symbols, env = data_env)
[1] "AAPL" "MSFT" "IBM"
# Extract the close column from each object and combine into one xts object
close_data <- do.call(merge, eapply(data_env, Cl))
# View the head of close_data
head(close_data)
AAPL.Close IBM.Close MSFT.Close
2000-01-03 3.997768 116.0000 58.28125
2000-01-04 3.660714 112.0625 56.31250
2000-01-05 3.714286 116.0000 56.90625
2000-01-06 3.392857 114.0000 55.00000
2000-01-07 3.553571 113.5000 55.71875
2000-01-10 3.491071 118.0000 56.12500
Recall that getSymbols() imports from Yahoo Finance by default. This exercise will teach you how to change the default data source with the setDefaults() function.
The first argument to setDefaults() is the function you want to update, and the remaining arguments are name = value pairs of the arguments you want to update and the new default value.
Note that this only works with getSymbols() because getSymbols() actively checks to see if you want to use a different default value.
# Set the default to pull data from Alpha Vantage
setDefaults(getSymbols, src = "av")
# Get GOOG data
# getSymbols("GOOG")
# Verify the data was actually pulled from Alpha Vantage
# str(GOOG)
Setting a default source can be useful if you use that source often.
You can also use setDefaults() on individual getSymbols() source methods. This exercise will teach you how to change the default value for the from argument to getSymbols.yahoo().
You can find the arguments for a specific method by using help() (e.g. help(“getSymbols.yahoo”) or by calling args() to print them to the console (e.g. args(getSymbols.yahoo)). Calling getDefaults() will show you the current default values (if there are any).
Remember, you are not supposed to call getSymbols.yahoo() directly!
# Look at getSymbols.yahoo arguments
args(getSymbols.yahoo)
function (Symbols, env, return.class = "xts", index.class = "Date",
from = "2007-01-01", to = Sys.Date(), ..., periodicity = "daily",
curl.options = list())
NULL
# Set default 'from' value for getSymbols.yahoo
setDefaults(getSymbols.yahoo, from = "2000-01-01")
# Confirm defaults were set correctly
getDefaults("getSymbols.yahoo")
$from
[1] "'2000-01-01'"
Changing the default source for one instrument is useful if multiple sources use the same symbol for different instruments. For example, getSymbols(“CP”, src = “yahoo”) would load Canadian Pacific Railway data from the New York Stock Exchange. But getSymbols(“CP”, src = “FRED”) would load Corporate Profits After Tax from the U.S. Bureau of Economic Analysis.
You can use setSymbolLookup() to specify the default data source for an instrument. In this exercise, you will learn how to make getSymbols(“CP”) load the corporate profit data from FRED instead of the railway stock data from Yahoo Finance.
setSymbolLookup() can take any number of name = value pairs, where name is the symbol and value is a named list of getSymbols() arguments for that one symbol.
setDefaults(getSymbols, src = "yahoo")
setSymbolLookup("CP" = "yahoo")
# Load CP data again
getSymbols("CP")
[1] "CP"
# Look at the first few rows of CP
head(CP)
CP.Open CP.High CP.Low CP.Close CP.Volume CP.Adjusted
2000-01-03 10.84375 10.90625 10.40625 10.43750 358200 7.454829
2000-01-04 10.34375 10.87500 10.34375 10.50000 1160200 7.499471
2000-01-05 10.50000 10.87500 10.50000 10.78125 852200 7.700349
2000-01-06 10.96875 11.46875 10.90625 11.31250 1960400 8.079788
2000-01-07 11.37500 11.62500 11.18750 11.50000 1758800 8.213708
2000-01-10 11.53125 11.68750 11.21875 11.59375 1273600 8.280663
setSymbolLookup("CP" = NULL)
# Set the source for CP to FRED
setSymbolLookup("CP" = "FRED")
# Load CP data again
getSymbols("CP")
[1] "CP"
# Look at the first few rows of CP
head(CP)
CP.Open CP.High CP.Low CP.Close CP.Volume CP.Adjusted
2000-01-03 10.84375 10.90625 10.40625 10.43750 358200 7.454829
2000-01-04 10.34375 10.87500 10.34375 10.50000 1160200 7.499471
2000-01-05 10.50000 10.87500 10.50000 10.78125 852200 7.700349
2000-01-06 10.96875 11.46875 10.90625 11.31250 1960400 8.079788
2000-01-07 11.37500 11.62500 11.18750 11.50000 1758800 8.213708
2000-01-10 11.53125 11.68750 11.21875 11.59375 1273600 8.280663
Occasionally this happens, and it is useful to set a single symbol to be pulled from a specific source.
The previous exercise taught you how to set default arguments on a per-symbol basis, but those settings only last for the current session.
This exercise will teach you how to save and load symbol-based defaults by using saveSymbolLookup() and loadSymbolLookup(), respectively. You can use the file arguments to specify a file to store your defaults.
You can also use the getSymbolLookup() function to check per-symbol defaults before you try to load data using getSymbols().
# Save symbol lookup table
saveSymbolLookup("my_symbol_lookup.rda")
# Set default source for CP to "yahoo"
setSymbolLookup("CP" = "yahoo")
# Verify the default source is "yahoo"
getSymbolLookup("CP")
$CP
$CP$src
[1] "yahoo"
# Load symbol lookup table
loadSymbolLookup("my_symbol_lookup.rda")
getSymbolLookup("CP")
$CP
$CP$src
[1] "FRED"
This will let you load the same lookup table even if you close out of R.
At some point, you might download data for an instrument that does not have a syntactically valid name. Syntactically valid names contain letters, numbers, “.”, and "_“, and must start with a letter or a”." followed by a non-number.
For example, the symbol for Berkshire Hathaway class A shares is “BRK-A”, which is not a syntactically valid name because it contains a “-” character. Another example are Chinese stocks, which have symbols composed of numbers. The Yahoo Finance symbol for the SSE Composite Index is “000001.SS”.
You can use the get function or backticks (`) to access objects that do not have syntactically valid names.
# Load BRK-A data
getSymbols("BRK-A")
[1] "BRK-A"
# Use backticks and head() to look at the loaded data
head(`BRK-A`)
BRK-A.Open BRK-A.High BRK-A.Low BRK-A.Close BRK-A.Volume
2000-01-03 56100 56100 53800 54800 36000
2000-01-04 53700 53800 52000 52000 44000
2000-01-05 51700 54700 51700 53200 51000
2000-01-06 53300 55000 53100 55000 57000
2000-01-07 55600 56500 55200 56500 67000
2000-01-10 57300 58000 56000 56000 31000
BRK-A.Adjusted
2000-01-03 54800
2000-01-04 52000
2000-01-05 53200
2000-01-06 55000
2000-01-07 56500
2000-01-10 56000
# Use get() to assign the BRK-A data to an object named BRK.A
BRK.A <- get("BRK-A")
Just remember to use backticks or get() if you ever run into invalid characters.
If you are only downloading data for a single symbol and that symbol is not a syntactically valid name, you can set auto.assign = FALSE in your call to getSymbols(). That will allow you to directly assign the output to a syntactically valid name.
You may also want to convert the column names to syntactically valid names. That is a good idea if you plan to use the data in functions that expect column names to be syntactically valid names (e.g. lm()).
# Create BRK.A object
BRK.A <- getSymbols("BRK-A", auto.assign = FALSE)
# Create col_names object with the column names of BRK.A
col_names <- colnames(BRK.A)
# Set BRK.A column names to syntactically valid names
colnames(BRK.A) <- make.names(col_names)
Now you can fix tricky ticker symbols in the column names of your data.
An earlier exercise taught you how to use setSymbolLookup() to set a default data source for getSymbols(). You can also use setSymbolLookup() to create a mapping between the instrument symbol and the name of the R object.
This is useful if you want to download data for a lot symbols that are not syntactically valid names, or symbols that have names that conflict with other R variable names.
An example of a name that conflicts is the symbol for AT&T’s stock, T, which is often used as a short form for the logical value TRUE.
To change the name of a given symbol, arguments must be passed to setSymbolLookup() as a list, like so: setSymbolLookup(NEW_NAME = list(name = “OLD_NAME”)).
# Set name for BRK-A to BRK.A
setSymbolLookup(BRK.A = list(name = "BRK-A"))
# Set name for T (AT&T) to ATT
setSymbolLookup(ATT = list(name = "T"))
# Load BRK.A and ATT data
getSymbols(c("BRK.A", "ATT"))
[1] "BRK.A" "ATT"
Now you can map troublesome tickers to new names with setSymbolLookup().
In order to create regular data from an irregular data set, the first thing you need is a regular sequence of date-times that span the dates of your irregular data set. A “regular” sequence of date-times has equally-spaced time points.
In this exercise, you will use the irregular_xts object to create a zero-width xts object that has a regular daily index. A zero-width xts object has an index of date-times, but no data columns.
# Extract the start date of the series
start_date <- start(irregular_xts)
# Extract the end date of the series
end_date <- end(irregular_xts)
# Create a regular date sequence
regular_index <- seq(from = start_date, to = end_date, by = "day")
# Create a zero-width xts object
regular_xts <- xts(seq_along(regular_index), order.by = regular_index)
regular_xts
[,1]
2016-01-02 1
2016-01-03 2
2016-01-04 3
2016-01-05 4
2016-01-06 5
2016-01-07 6
2016-01-08 7
2016-01-09 8
2016-01-10 9
2016-01-11 10
Making regular date-time sequences is useful in many time-series applications.
The previous exercise taught you how to make a zero-width xts object with a regular time index. You can use the zero-width object to regularize an irregular xts object.
The regularized series usually has missing values (NA) because the irregular data does not have a value for all observations in the regular index. This exercise will teach you how to handle these missing values when you merge() the two series.
# Merge irregular_xts and regular_xts
merged_xts <- merge(irregular_xts, regular_xts)
# Look at the first few rows of merged_xts
head(merged_xts)
data regular_xts
2016-01-02 4 1
2016-01-03 NA 2
2016-01-04 NA 3
2016-01-05 21 4
2016-01-06 NA 5
2016-01-07 1 6
# Use the fill argument to fill NA with their previous value
merged_filled_xts <- merge(irregular_xts, regular_xts, fill = na.locf)
# Look at the first few rows of merged_filled_xts
head(merged_filled_xts)
data regular_xts
2016-01-02 4 1
2016-01-03 4 2
2016-01-04 4 3
2016-01-05 21 4
2016-01-06 21 5
2016-01-07 1 6
Filling forward is a useful operation, but be careful to make sure it is what you want!
Sometimes two series have the same periodicy, but use different conventions to represent a timestamp. For example, monthly series may be timestamped with the first or last date of the month. The different timestamp convention can cause many NA when series are merged. The yearmon class from the zoo package helps solve this problem.
In this exercise, you will aggregate the FRED daily Fed Funds rate (DFF) to a monthly periodicy and merge it with the FRED monthly Fed Funds rate (FEDFUNDS).The DFF aggregate will be timestamped with the last row of the month, while FEDFUNDS is timestamped with the first day of the month.
getSymbols(c("FEDFUNDS", "DFF"), src = "FRED")
[1] "FEDFUNDS" "DFF"
# Aggregate DFF to monthly
monthly_fedfunds <- apply.monthly(DFF, mean, na.rm = TRUE)
# Convert index to yearmon
index(monthly_fedfunds) <- as.yearmon(index(monthly_fedfunds))
# Merge FEDFUNDS with the monthly aggregate
merged_fedfunds <- merge(FEDFUNDS, monthly_fedfunds)
# Look at the first few rows of the merged object
head(merged_fedfunds)
FEDFUNDS DFF
1954-07-01 0.80 0.7993548
1954-08-01 1.22 1.2206452
1954-09-01 1.07 1.0666667
1954-10-01 0.85 0.8487097
1954-11-01 0.83 0.8336667
1954-12-01 1.28 1.2777419
You will often need to aggregate to a lower frequency to align multiple time series.
Sometimes you may not be able to use convenience classes like yearmon to represent timestamps. This exercise will teach you how to manually align merged data to the timestamp representation you prefer.
First you merge the lower-frequency data with the aggregate data, then use na.locf() to fill the NA forward (or backward, using fromLast = TRUE). Then you can subset the result using the index of the object with the representation you prefer.
# Aggregate DFF to monthly
monthly_fedfunds <- apply.monthly(DFF, mean, na.rm = TRUE)
# Merge FEDFUNDS with the monthly aggregate
merged_fedfunds <- merge(FEDFUNDS, monthly_fedfunds)
# Look at the first few rows of the merged object
head(merged_fedfunds)
FEDFUNDS DFF
1954-07-01 0.80 NA
1954-07-31 NA 0.7993548
1954-08-01 1.22 NA
1954-08-31 NA 1.2206452
1954-09-01 1.07 NA
1954-09-30 NA 1.0666667
# Fill NA forward
merged_fedfunds_locf <- na.locf(merged_fedfunds)
# Extract index values containing last day of month
aligned_last_day <- merged_fedfunds_locf[index(monthly_fedfunds)]
head(aligned_last_day)
FEDFUNDS DFF
1954-07-31 0.80 0.7993548
1954-08-31 1.22 1.2206452
1954-09-30 1.07 1.0666667
1954-10-31 0.85 0.8487097
1954-11-30 0.83 0.8336667
1954-12-31 1.28 1.2777419
# Fill NA backward
merged_fedfunds_locb <- na.locf(merged_fedfunds, fromLast = TRUE)
# Extract index values containing first day of month
aligned_first_day <- merged_fedfunds_locb[index(FEDFUNDS)]
head(aligned_first_day)
FEDFUNDS DFF
1954-07-01 0.80 0.7993548
1954-08-01 1.22 1.2206452
1954-09-01 1.07 1.0666667
1954-10-01 0.85 0.8487097
1954-11-01 0.83 0.8336667
1954-12-01 1.28 1.2777419
Knowing how to manually align merged data will definitely come in handy!
In this exercise, you will learn a general aggregation technique to aggregate daily data to weekly, but with weeks ending on Wednesdays. This is often done in stock market research to avoid intra-week seasonality.
You can supply your own end points to period.apply() (versus using endpoints()). Recall endpoints() returns locations of the last observation in each period specified by the on argument. The first and last elements of the result are always zero and the total number of observations, respectively. The end points you pass to period.apply() must follow this convention.
# Extract index weekdays
index_weekdays <- .indexwday(DFF)
# Find locations of Wednesdays
wednesdays <- which(index_weekdays == 3)
# Create custom end points
end_points <- c(0, wednesdays, nrow(DFF))
# Calculate weekly mean using custom end points
weekly_mean <- period.apply(DFF, end_points, mean)
head(weekly_mean)
timezone of object (UTC) is different than current timezone ().
DFF
1954-07-07 1.0014286
1954-07-14 1.2157143
1954-07-21 0.5714286
1954-07-28 0.6257143
1954-08-04 0.2700000
1954-08-11 1.3071429
There are many ways to convert a time series to a lower frequency.
Recall that xts objects store the time index as seconds since midnight, 1970-01-01 in the UTC timezone. merge() uses this underlying index and returns a result with the first object’s timezone.
This exercise provides an example. The two objects in your workspace are identical except for the index timezone. The index values are the same instances in time, but measured in different locations. The london object’s timezone is Europe/London and the chicago object’s timezone is America/Chicago.
# Create merged object with a Europe/London timezone
tz_london <- merge(london, chicago)
# Look at tz_london structure
str(tz_london)
An ‘xts’ object on 2010-01-06 06:00:00/2010-01-14 08:00:00 containing:
Data: int [1:5, 1:2] 1 2 3 4 5 1 2 3 4 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "London" "Chicago"
Indexed by objects of class: [POSIXct,POSIXt] TZ: Europe/London
xts Attributes:
NULL
# Create merged object with a America/Chicago timezone
tz_chicago <- merge(chicago, london)
# Look at tz_chicago structure
str(tz_chicago)
An ‘xts’ object on 2010-01-06/2010-01-14 02:00:00 containing:
Data: int [1:5, 1:2] 1 2 3 4 5 1 2 3 4 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "Chicago" "London"
Indexed by objects of class: [POSIXct,POSIXt] TZ: America/Chicago
xts Attributes:
NULL
Earlier you learned how to create a regular daily series from irregular daily data. Now you will create regular intra-day data from an irregular series.
Intra-day financial data often does not span a full 24 hour period. Most markets are usually closed at least part of the day. This exercise assumes markets open at 9AM and close at 4PM Monday-Friday.
Your data may not have an observation exactly at the market open and/or close. So, you would not be able to use start() and end() as you could with the daily data. You need to specify the start and end date-times to create this regular sequence.
The regular date-time sequence will include periods when markets are closed, but you can use xts’ time-of-day subsetting to extract periods the market is open.
# Create a regular date-time sequence
regular_index <- seq(as.POSIXct("2010-01-04 09:00"), as.POSIXct("2010-01-08 16:00"), by = "30 min")
# Create a zero-width xts object
regular_xts <- xts(x = NULL, order.by = regular_index)
# Merge irregular_xts and regular_xts, filling NA with their previous value
merged_xts <- merge(irregular_xts, regular_xts, fill = na.locf)
# Subset to trading day (9AM - 4PM)
trade_day <- merged_xts["T09:00/T16:00"]
'indexTZ' is deprecated.
Use 'tzone' instead.
See help("Deprecated") and help("xts-deprecated").
trade_day
data
2010-01-04 09:00:00 NA
2010-01-04 09:30:00 NA
2010-01-04 10:00:00 NA
2010-01-04 10:30:00 NA
2010-01-04 11:00:00 NA
2010-01-04 11:30:00 NA
2010-01-04 12:00:00 1
2010-01-04 12:30:00 1
2010-01-04 13:00:00 1
2010-01-04 13:30:00 1
2010-01-04 14:00:00 2
2010-01-04 14:30:00 2
2010-01-04 15:00:00 2
2010-01-04 15:30:00 2
2010-01-04 16:00:00 2
2010-01-05 09:00:00 7
2010-01-05 09:30:00 7
2010-01-05 10:00:00 7
2010-01-05 10:30:00 7
2010-01-05 11:00:00 8
2010-01-05 11:30:00 8
2010-01-05 12:00:00 8
2010-01-05 12:30:00 8
2010-01-05 13:00:00 8
2010-01-05 13:30:00 8
2010-01-05 14:00:00 8
2010-01-05 14:30:00 8
2010-01-05 15:00:00 8
2010-01-05 15:30:00 8
2010-01-05 16:00:00 8
2010-01-06 09:00:00 10
2010-01-06 09:30:00 10
2010-01-06 10:00:00 10
2010-01-06 10:30:00 10
2010-01-06 11:00:00 10
2010-01-06 11:30:00 10
2010-01-06 12:00:00 10
2010-01-06 12:30:00 10
2010-01-06 13:00:00 10
2010-01-06 13:30:00 10
2010-01-06 14:00:00 11
2010-01-06 14:30:00 11
2010-01-06 15:00:00 11
2010-01-06 15:30:00 11
2010-01-06 16:00:00 11
2010-01-07 09:00:00 14
2010-01-07 09:30:00 14
2010-01-07 10:00:00 14
2010-01-07 10:30:00 14
2010-01-07 11:00:00 14
2010-01-07 11:30:00 14
2010-01-07 12:00:00 14
2010-01-07 12:30:00 14
2010-01-07 13:00:00 14
2010-01-07 13:30:00 14
2010-01-07 14:00:00 14
2010-01-07 14:30:00 14
2010-01-07 15:00:00 14
2010-01-07 15:30:00 14
2010-01-07 16:00:00 14
2010-01-08 09:00:00 20
2010-01-08 09:30:00 20
2010-01-08 10:00:00 20
2010-01-08 10:30:00 20
2010-01-08 11:00:00 20
2010-01-08 11:30:00 20
2010-01-08 12:00:00 20
2010-01-08 12:30:00 20
2010-01-08 13:00:00 20
2010-01-08 13:30:00 20
2010-01-08 14:00:00 20
2010-01-08 14:30:00 20
2010-01-08 15:00:00 20
2010-01-08 15:30:00 20
2010-01-08 16:00:00 20
Now you know how to subset your intra-day data to only contain the trading day!
The previous exercise carried the last observation of the prior day forward into the first observation of the following day. This exercise will show you how to fill missing values by trading day, without using the prior day’s final value.
You will use the same split-lapply-rbind paradigm from the Introduction to xts and zoo course. For reference, the pattern is below.
x_split <- split(x, f = "months")
x_list <- lapply(x_split, cummax)
x_list_rbind <- do.call(rbind, x_list)
Recall that the do.call(rbind, …) syntax allows you to pass a list of objects to rbind() instead of having to type all their names.
# Split trade_day into days
daily_list <- split(trade_day , f = "days")
# Use lapply to call na.locf for each day in daily_list
daily_filled <- lapply(daily_list, FUN = na.locf)
# Use do.call to rbind the results
filled_by_trade_day <- do.call(rbind, daily_filled)
filled_by_trade_day
data
2010-01-04 09:00:00 NA
2010-01-04 09:30:00 NA
2010-01-04 10:00:00 NA
2010-01-04 10:30:00 NA
2010-01-04 11:00:00 NA
2010-01-04 11:30:00 NA
2010-01-04 12:00:00 1
2010-01-04 12:30:00 1
2010-01-04 13:00:00 1
2010-01-04 13:30:00 1
2010-01-04 14:00:00 2
2010-01-04 14:30:00 2
2010-01-04 15:00:00 2
2010-01-04 15:30:00 2
2010-01-04 16:00:00 2
2010-01-05 09:00:00 7
2010-01-05 09:30:00 7
2010-01-05 10:00:00 7
2010-01-05 10:30:00 7
2010-01-05 11:00:00 8
2010-01-05 11:30:00 8
2010-01-05 12:00:00 8
2010-01-05 12:30:00 8
2010-01-05 13:00:00 8
2010-01-05 13:30:00 8
2010-01-05 14:00:00 8
2010-01-05 14:30:00 8
2010-01-05 15:00:00 8
2010-01-05 15:30:00 8
2010-01-05 16:00:00 8
2010-01-06 09:00:00 NA
2010-01-06 09:30:00 NA
2010-01-06 10:00:00 NA
2010-01-06 10:30:00 NA
2010-01-06 11:00:00 NA
2010-01-06 11:30:00 NA
2010-01-06 12:00:00 NA
2010-01-06 12:30:00 NA
2010-01-06 13:00:00 NA
2010-01-06 13:30:00 NA
2010-01-06 14:00:00 11
2010-01-06 14:30:00 11
2010-01-06 15:00:00 11
2010-01-06 15:30:00 11
2010-01-06 16:00:00 11
2010-01-07 09:00:00 NA
2010-01-07 09:30:00 NA
2010-01-07 10:00:00 NA
2010-01-07 10:30:00 NA
2010-01-07 11:00:00 NA
2010-01-07 11:30:00 NA
2010-01-07 12:00:00 NA
2010-01-07 12:30:00 NA
2010-01-07 13:00:00 NA
2010-01-07 13:30:00 NA
2010-01-07 14:00:00 NA
2010-01-07 14:30:00 NA
2010-01-07 15:00:00 NA
2010-01-07 15:30:00 NA
2010-01-07 16:00:00 NA
2010-01-08 09:00:00 NA
2010-01-08 09:30:00 NA
2010-01-08 10:00:00 NA
2010-01-08 10:30:00 NA
2010-01-08 11:00:00 NA
2010-01-08 11:30:00 NA
2010-01-08 12:00:00 NA
2010-01-08 12:30:00 NA
2010-01-08 13:00:00 NA
2010-01-08 13:30:00 NA
2010-01-08 14:00:00 NA
2010-01-08 14:30:00 NA
2010-01-08 15:00:00 NA
2010-01-08 15:30:00 NA
2010-01-08 16:00:00 NA
You used advanced functions to transform data for each trading day!
Intraday data can be huge, with hundreds of thousands of observations per day, millions per month, and hundreds of millions per year. These data sets often need to be aggregated before you can work with them.
You learned how to aggregate daily data in the Introduction to xts and zoo course. This exercise will use to.period() to aggregate intraday data to an OHLC series. You often need to specify both period and k arguments to aggregate intraday data.
load("DC.RData")
dc_intraday <- DC[,1]
# Convert raw prices to 5-second prices
xts_5sec <- to.period(dc_intraday, period = "seconds", k = 5)
missing values removed from data
head(xts_5sec)
timezone of object (GMT) is different than current timezone ().
dc_intraday.Open dc_intraday.High dc_intraday.Low
2016-01-16 00:00:44 20.84 20.84 20.84
2016-01-16 00:01:53 20.84 20.84 20.84
2016-01-16 00:01:58 20.84 20.84 20.84
2016-01-16 00:02:06 20.84 20.84 20.84
2016-01-16 00:03:05 20.84 20.84 20.84
2016-01-16 00:03:33 20.84 20.84 20.84
dc_intraday.Close
2016-01-16 00:00:44 20.84
2016-01-16 00:01:53 20.84
2016-01-16 00:01:58 20.84
2016-01-16 00:02:06 20.84
2016-01-16 00:03:05 20.84
2016-01-16 00:03:33 20.84
# Convert raw prices to 10-minute prices
xts_10min <- to.period(dc_intraday, period = "minutes", k = 10)
missing values removed from data
head(xts_10min)
timezone of object (GMT) is different than current timezone ().
dc_intraday.Open dc_intraday.High dc_intraday.Low
2016-01-16 00:08:21 20.84 20.84 20.84
2016-01-16 00:19:03 20.84 20.84 20.84
2016-01-16 00:25:48 20.84 20.84 20.83
2016-01-16 00:38:33 20.85 20.85 20.84
2016-01-16 00:48:02 20.84 20.84 20.84
2016-01-16 00:57:34 20.85 20.85 20.84
dc_intraday.Close
2016-01-16 00:08:21 20.84
2016-01-16 00:19:03 20.84
2016-01-16 00:25:48 20.84
2016-01-16 00:38:33 20.84
2016-01-16 00:48:02 20.84
2016-01-16 00:57:34 20.84
# Convert raw prices to 1-hour prices
xts_1hour <- to.period(dc_intraday, period = "hours", k = 1)
missing values removed from data
head(xts_1hour)
timezone of object (GMT) is different than current timezone ().
dc_intraday.Open dc_intraday.High dc_intraday.Low
2016-01-16 00:57:34 20.84 20.85 20.83
2016-01-16 01:58:28 20.85 20.85 20.83
2016-01-16 02:59:39 20.85 20.85 20.84
2016-01-16 03:54:31 20.85 20.85 20.84
2016-01-16 04:59:58 20.85 20.85 20.84
2016-01-16 05:58:43 20.84 20.85 20.84
dc_intraday.Close
2016-01-16 00:57:34 20.84
2016-01-16 01:58:28 20.85
2016-01-16 02:59:39 20.85
2016-01-16 03:54:31 20.85
2016-01-16 04:59:58 20.85
2016-01-16 05:58:43 20.85
You can use getSymbols() to import a well-formatted CSV. In this case, well-formatted means the file contains data for a single instrument with date, open, high, low, close, volume, and adjusted close columns, in that order. You might have noticed that this is the same format as getSymbols() returns when you download data from internet sources.
getSymbols() allows you to use a directory of CSV files as a source (like Yahoo Finance and FRED). In this exercise, you will be using AMZN.csv in your working directory. It contains some randomized Amazon.com data from the first half of 2002. You can use dir() to see the file in your working directory.
# Load AMZN.csv
getSymbols("AMZN", src = "csv")
[1] "AMZN"
# Look at AMZN structure
str(AMZN)
An ‘xts’ object on 2002-01-02/2002-06-28 containing:
Data: num [1:124, 1:6] 11.1 11.3 12.5 11.8 12.8 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "AMZN.Open" "AMZN.High" "AMZN.Low" "AMZN.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "csv"
$ updated: POSIXct[1:1], format: "2020-08-03 18:02:15"
The previous exercise taught you how to import well-formatted CSV data using getSymbols(). Unfortunately, most data are not well-formatted.
The zoo package provides several functions to import text files as zoo objects. The main function is read.zoo(), which wraps read.table(). The xts class extends zoo, so you can easily convert the result of read.zoo() into an xts object by using as.xts().
# Import AMZN.csv using read.zoo
amzn_zoo <- read.zoo("AMZN.csv", sep = ",", header = TRUE)
# Convert to xts
amzn_xts <- as.xts(amzn_zoo)
# Look at the first few rows of amzn_xts
head(amzn_xts)
AMZN.Open AMZN.High AMZN.Low AMZN.Close AMZN.Volume
2002-01-02 11.13 11.01 10.46 10.87 6674703
2002-01-03 11.26 12.25 10.76 11.99 11441553
2002-01-04 12.46 12.62 11.71 12.10 12619402
2002-01-07 11.76 12.73 12.01 12.48 12296059
2002-01-08 12.82 12.43 11.35 11.96 6821418
2002-01-09 12.07 12.49 10.93 11.59 10669376
AMZN.Adjusted
2002-01-02 10.87
2002-01-03 11.99
2002-01-04 12.10
2002-01-07 12.48
2002-01-08 11.96
2002-01-09 11.59
As you will see, read.zoo() is a very flexible import function for time series.
read.zoo() makes it easy to import data when the date and time are in separate columns. The index.column argument allows you to specify the name or number of the column(s) containing the index data. That’s all you need to do if the date and time are specified in the standard format (“%Y-%m-%d” for date, and “%H:%M:%S” for time).
In this exercise, you will use the index.column argument to specify the date and time columns of the file. Your working directory has a file named UNE.csv that contains some 5-minute OHLC data for the energy company, Unron. You will still use read.csv() find the column names of the date and time columns.
# Read data with read.csv
une_data <- read.csv("UNE.csv", nrows = 5)
# Look at the structure of une_data
str(une_data)
'data.frame': 5 obs. of 6 variables:
$ Date : chr "2016-11-14" "2016-11-14" "2016-11-14" "2016-11-14" ...
$ Time : chr "09:05:00" "09:10:00" "09:15:00" "09:20:00" ...
$ Open : num 10.4 10.1 10.2 10.1 10.2
$ High : num 10.4 10.2 10.2 10.2 10.2
$ Low : num 10.2 10.1 10.2 10.1 10.2
$ Close: num 10.2 10.2 10.2 10.2 10.2
# Read data with read.zoo, specifying index columns
une_zoo <- read.zoo("UNE.csv", index.column = c("Date", "Time"), sep = ",", header = TRUE)
# Look at first few rows of data
head(une_zoo)
Open High Low Close
2016-11-14 09:05:00 10.38 10.38 10.21 10.22
2016-11-14 09:10:00 10.12 10.22 10.12 10.22
2016-11-14 09:15:00 10.18 10.22 10.15 10.22
2016-11-14 09:20:00 10.09 10.22 10.07 10.22
2016-11-14 09:25:00 10.22 10.22 10.21 10.22
2016-11-14 09:30:00 10.16 10.22 10.15 10.22
The index.column argument is great if your dates and times are in separate columns!
The previous exercises work if each file contains only one instrument. Some software and data vendors may provide data for all instruments in one file. This exercise will teach you how to import a file that contains multiple instruments.
Once again, you can use read.zoo(). This time you will be using its split argument, which allows you to specify the name or number of the columns(s) that contain the variables that identify unique observations.
The two_symbols.csv file in your working directory contains bid/ask data for two instruments, where each row has one bid or ask observation for one instrument. You will use the split argument to import the data into an object that has both bid and ask prices for both instruments on one row.
# Read data with read.csv
two_symbols_data <- read.csv("two_symbols.csv", nrows = 5)
# Look at the structure of two_symbols_data
str(two_symbols_data)
'data.frame': 5 obs. of 4 variables:
$ Date : chr "2016-01-01 10:43:01" "2016-01-01 10:43:01" "2016-01-01 10:43:01" "2016-01-01 10:43:01" ...
$ Symbol: chr "A" "A" "B" "B" ...
$ Type : chr "Bid" "Ask" "Bid" "Ask" ...
$ Price : num 58.2 58.2 29 29 58.2
# Read data with read.zoo, specifying index columns
two_symbols_zoo <- read.zoo("two_symbols.csv", split = c("Symbol", "Type"), sep = ",", header = TRUE)
# Look at first few rows of data
head(two_symbols_zoo)
A.Ask B.Ask A.Bid B.Bid
2016-01-01 10:43:01 58.24 28.98 58.23 28.96
2016-01-01 10:43:02 58.25 28.99 58.24 28.97
In chapter 3, you used na.locf() to fill missing values with the previous non-missing value. You can use interpolation when carrying the previous value forward isn’t appropriate. In this exercise, you will explore two interpolation methods: linear and spline.
Linear interpolation calculates values that lie on a line between two known data points. This is a good choice for fairly linear data, like a series with a strong trend. Spline interpolation is more appropriate for series without a strong trend, because it calculates a non-linear approximation using multiple data points.
Use these two methods to interpolate the three missing values for the 10-year Treasury rate in the object DGS10. Then compare the results with the output of na.locf().
# fill NA using last observation carried forward
locf <- na.locf(DGS10)
# fill NA using linear interpolation
approx <- na.approx(DGS10)
# fill NA using spline interpolation
spline <- na.spline(DGS10)
# merge into one object
na_filled <- merge(locf, approx, spline)
# plot combined object
plot(na_filled, col = c("black", "red", "green"))
It’s important to check your imported data is reasonable. A plot is a quick and easy way to spot oddities. In this exercise, you will use the plot() function to visualize some AAPL data from Yahoo Finance.
A stock split caused a huge price change in June 2014. Apple simultaneously increased the number of shares outstanding and decreased its stock price, leaving the company value unchanged. For example, a 2-for-1 split would double the shares outstanding, and reduce the stock price by 1/2.
You will also use the quantmod extractor functions Cl() and Ad() to access the close and adjusted close columns, respectively. Yahoo Finance provides the split- and/or dividend-adjusted close column.
getSymbols("AAPL", src='yahoo', from = "2007-01-01", to = "2017-09-17")
[1] "AAPL"
head(AAPL)
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume
2007-01-03 12.32714 12.36857 11.70000 11.97143 309579900
2007-01-04 12.00714 12.27857 11.97429 12.23714 211815100
2007-01-05 12.25286 12.31428 12.05714 12.15000 208685400
2007-01-08 12.28000 12.36143 12.18286 12.21000 199276700
2007-01-09 12.35000 13.28286 12.16429 13.22429 837324600
2007-01-10 13.53571 13.97143 13.35000 13.85714 738220000
AAPL.Adjusted
2007-01-03 10.36364
2007-01-04 10.59366
2007-01-05 10.51822
2007-01-08 10.57016
2007-01-09 11.44823
2007-01-10 11.99609
# Look at the last few rows of AAPL data
tail(AAPL)
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume
2017-09-08 160.86 161.15 158.53 158.63 28611500
2017-09-11 160.50 162.05 159.89 161.50 31580800
2017-09-12 162.61 163.96 158.77 160.86 71714000
2017-09-13 159.87 159.96 157.91 159.65 44907400
2017-09-14 158.99 159.40 158.09 158.28 23760700
2017-09-15 158.47 160.97 158.00 159.88 49114600
AAPL.Adjusted
2017-09-08 152.6423
2017-09-11 155.4040
2017-09-12 154.7882
2017-09-13 153.6238
2017-09-14 152.3055
2017-09-15 153.8451
# Plot close price
plot(AAPL$AAPL.Close)
# Plot adjusted close price
plot(AAPL$AAPL.Adjusted)
In this exercise, you will cross-reference the AAPL raw price data from the previous exercise with AAPL data from another source.
The new data is already adjusted for splits, but not dividends. So the close prices from the new data won’t align closely with the adjusted close prices from the previous exercise (which are adjusted for both splits and dividends). You will learn more about the adjustment process in the next video.
You will compare raw, unadjusted AAPL data with split-adjusted AAPL data. The data have already been loaded to your workspace in aapl_raw and aapl_split_adjusted, respectively.
# Look at first few rows of aapl_raw
head(aapl_raw)
# Look at first few rows of aapl_split_adjusted
head(aapl_split_adjusted)
# Plot difference between adjusted close and split-adjusted close
plot(Ad(aapl_raw$AAPL.Adjusted) - Cl(aapl_split_adjusted$AAPL.Close))
# Plot difference between volume from the raw and split-adjusted sources
plot(Vo(aapl_raw$AAPL.Volume) - Vo(aapl_split_adjusted$AAPL.Volume))
The volumes agree on most (but not all) days, whereas the close prices are completely different.
Stock splits can create large historical price changes even though they do not change the value of the company. So, you must adjust all pre-split prices in order to calculate historical returns correctly.
Similarly, you must adjust all pre-dividend prices. Dividends do reduce the company’s value by the amount of the dividend payment, but the investor’s return isn’t affected because they receive the offsetting dividend payment.
In this exercise, you will learn how to use the adjustOHLC() function to adjust raw historical OHLC prices for splits and dividends, so historical returns can be calculated accurately.
Yahoo Finance provides raw prices and a split- and dividend-adjusted close column. The output of adjustOHLC() should match Yahoo’s adjusted close column. AAPL data from Yahoo Finance is already loaded in your workspace.
While not necessary to complete this exercise, Yahoo Finance provides an accessible example of the adjusted close calculation, if you’re interested in learning more.
# Look at first few rows of AAPL
head(AAPL)
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume
2007-01-03 12.32714 12.36857 11.70000 11.97143 309579900
2007-01-04 12.00714 12.27857 11.97429 12.23714 211815100
2007-01-05 12.25286 12.31428 12.05714 12.15000 208685400
2007-01-08 12.28000 12.36143 12.18286 12.21000 199276700
2007-01-09 12.35000 13.28286 12.16429 13.22429 837324600
2007-01-10 13.53571 13.97143 13.35000 13.85714 738220000
AAPL.Adjusted
2007-01-03 10.36364
2007-01-04 10.59366
2007-01-05 10.51822
2007-01-08 10.57016
2007-01-09 11.44823
2007-01-10 11.99609
# Adjust AAPL for splits and dividends
aapl_adjusted <- adjustOHLC(AAPL)
incomplete final line found by readTableHeader on 'https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=-2208988800&period2=1596412800&interval=1d&events=split&crumb=xy3lzdxTZvG'incomplete final line found by readTableHeader on 'https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=-2208988800&period2=1596412800&interval=1d&events=split&crumb=xy3lzdxTZvG'
# Look at first few rows of aapl_adjusted
head(aapl_adjusted)
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume
2007-01-03 1.584312 1.589637 1.503711 1.538595 309579900
2007-01-04 1.543185 1.578070 1.538962 1.572745 211815100
2007-01-05 1.574765 1.582660 1.549611 1.561546 208685400
2007-01-08 1.578254 1.588719 1.565769 1.569257 199276700
2007-01-09 1.587250 1.707143 1.563382 1.699615 837324600
2007-01-10 1.739641 1.795640 1.715772 1.780951 738220000
AAPL.Adjusted
2007-01-03 10.36364
2007-01-04 10.59366
2007-01-05 10.51822
2007-01-08 10.57016
2007-01-09 11.44823
2007-01-10 11.99609
In the previous exercise, you used adjustOHLC() to adjust raw historical OHLC prices for splits and dividends, but it only works for OHLC data. It will not work if you only have close prices, and it does not return any of the split or dividend data it uses.
You need the dates and values for each split and dividend to adjust a non-OHLC price series, or if you simply want to analyze the raw split and dividend data.
You can download the split and dividend data from Yahoo Finance using the quantmod functions getSplits() and getDividends(), respectively. The historical dividend data from Yahoo Finance is adjusted for splits. If you want to download unadjusted dividend data, you need to set split.adjust = FALSE in your call to getDividends().
# Download AAPL split data
splits <- getSplits("AAPL")
incomplete final line found by readTableHeader on 'https://query2.finance.yahoo.com/v7/finance/download/AAPL?period1=0&period2=1596412800&interval=1d&events=split&crumb=xy3lzdxTZvG'
# Download AAPL dividend data
dividends <- getDividends("AAPL")
incomplete final line found by readTableHeader on 'https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=-2208988800&period2=1596412800&interval=1d&events=split&crumb=xy3lzdxTZvG'
# Look at the first few rows of dividends
head(dividends)
AAPL.div
1987-05-11 3.821429e-05
1987-08-10 7.642857e-05
1987-11-17 1.021429e-04
1988-02-12 1.021429e-04
1988-05-16 1.021429e-04
1988-08-15 1.021429e-04
# Download unadjusted AAPL dividend data
raw_dividends <- getDividends("AAPL", split.adjust = FALSE)
# Look at the first few rows of raw_dividends
head(dividends)
AAPL.div
1987-05-11 3.821429e-05
1987-08-10 7.642857e-05
1987-11-17 1.021429e-04
1988-02-12 1.021429e-04
1988-05-16 1.021429e-04
1988-08-15 1.021429e-04
It’s important to get splits and dividends correct when calculating historical returns.
If you only have close prices, you can adjust them with adjRatios(). It has 3 arguments: splits, dividends, and close. It returns an xts object with split and dividend adjustment ratios in columns “Split” and “Div”, respectively.
You need to provide split data via the splits argument to calculate the split ratio. To calculate the dividend ratio, you need to provide raw dividends and raw prices via the dividends and close arguments, respectively.
Once you have the split and dividend adjustment ratios, you calculate the adjusted price multiplying the unadjusted price by both the split and dividend adjustment ratios.
# Calculate split and dividend adjustment ratios
ratios <- adjRatios(splits = splits, dividends = raw_dividends, close = Cl(AAPL))
# Calculate adjusted close for AAPL
aapl_adjusted <- Cl(AAPL) * ratios[, "Split"] * ratios[, "Div"]
# Look at first few rows of Yahoo adjusted close
head(Ad(AAPL))
AAPL.Adjusted
2007-01-03 10.36364
2007-01-04 10.59366
2007-01-05 10.51822
2007-01-08 10.57016
2007-01-09 11.44823
2007-01-10 11.99609
# Look at first few rows of aapl_adjusted
head(aapl_adjusted)
AAPL.Close
2007-01-03 1.538595
2007-01-04 1.572745
2007-01-05 1.561546
2007-01-08 1.569257
2007-01-09 1.699615
2007-01-10 1.780951