ITCH is the outbound protocol NASDAQ uses to communicate market data to its clients, that is, all information including market status, orders, trades, circuit breakers, etc. with nanosecond timestamps for each day and each exchange.
---
title: "NASDAQ ITCH Datafiles - Market Microstructure"
author: "R. Brown"
output:
tufte::tufte_handout: default
tufte::tufte_html: default
---
The RITCH package by David Zimmermann provides features for market microstructure analysis
The ITCH protocol allows NASDAQ to distribute financial information to market participants. The financial information includes orders, trades, order modifications, trading status, traded stocks, and more.
A typical file containing a single trading day consists of something like 30-50 million messages (BX-exchange) up to 230 million messages (NASDAQ), thus speed makes a crucial difference. As the data is streamed from the file, the execution time mainly depends on the reading/writing speed of the hard-drive.
ITCH datafiles are available from NASDAQ')
1 NASDAQ FTP, Numerous files are available with a simple file naming nomenclature. They tend to be large.
I use the magrittr package to use the pipe operator %>% for processing the data. This avoids storing large data objects during intermediate processing steps of sorting and aggregating orders.
oldwd = getwd()
setwd("/home/hduser/zdata/nasdaq/")
mypath = "/home/hduser/zdata/nasdaq/"
file = paste(mypath, "20170530.PSX_ITCH_50", sep="") # original 20170830
## already DONE
R.utils::gunzip("20170530.PSX_ITCH_50.gz", "20170530.PSX_ITCH_50", remove = FALSE)
md = get_meta_data() # a trade type
head(md)
## msg_type msg_name msg_group doc_nr
## 1: S System Event Message System Event Message 4.1
## 2: R Stock Directory Stock Related Messages 4.2.1
## 3: H Stock Trading Action Stock Related Messages 4.2.2
## 4: Y Reg SHO Restriction Stock Related Messages 4.2.3
## 5: L Market Participant Position Stock Related Messages 4.2.4
## 6: V MWCB Decline Level Message Stock Related Messages 4.2.5.1
msg_count <- count_messages(file, add_meta_data = TRUE)
## [Counting] 26889612 messages found
## [Converting] to data.table
head(msg_count)
## msg_type count msg_name msg_group
## 1: S 6 System Event Message System Event Message
## 2: R 8414 Stock Directory Stock Related Messages
## 3: H 8457 Stock Trading Action Stock Related Messages
## 4: Y 8521 Reg SHO Restriction Stock Related Messages
## 5: L 6107 Market Participant Position Stock Related Messages
## 6: V 2 MWCB Decline Level Message Stock Related Messages
## doc_nr
## 1: 4.1
## 2: 4.2.1
## 3: 4.2.2
## 4: 4.2.3
## 5: 4.2.4
## 6: 4.2.5.1
#orders <- get_orders(file, 1, count_orders(msg_count))
#trades <- get_trades(file, 1, count_trades(msg_count))
library(magrittr)
get_orders(file, 1, count_orders(msg_count), quiet = T) %>%
.$stock %>%
table %>%
sort(decreasing = T) %>%
head(4)
## .
## QQQ IWM XLE SPY
## 122761 112533 98226 79776
# 0. load the data
orders <- get_orders(file, 1, count_orders(msg_count))
## 11997579 messages found
## [Loading] .........
## [Converting] to data.table
## [Formatting]
#> 21162665 messages found
#> [Loading] ................
#> [Converting] to data.table
#> [Formatting]
head(orders)
## msg_type locate_code tracking_number timestamp order_ref buy
## 1: A 2635 0 2.880001e+13 42086 TRUE
## 2: A 973 0 2.880002e+13 42089 FALSE
## 3: A 3771 0 2.880002e+13 42090 TRUE
## 4: A 4982 0 2.880002e+13 42091 TRUE
## 5: A 8394 0 2.880002e+13 42092 TRUE
## 6: A 2635 0 2.880002e+13 42093 TRUE
## shares stock price mpid date datetime
## 1: 100 F 1.00 <NA> 2017-05-30 2017-05-30 08:00:00
## 2: 4600 BRCD 12.64 <NA> 2017-05-30 2017-05-30 08:00:00
## 3: 1000 IBDB 25.58 <NA> 2017-05-30 2017-05-30 08:00:00
## 4: 5000 MNR-B 24.99 <NA> 2017-05-30 2017-05-30 08:00:00
## 5: 546 ZNH 37.32 <NA> 2017-05-30 2017-05-30 08:00:00
## 6: 100 F 1.00 <NA> 2017-05-30 2017-05-30 08:00:00
trades <- get_trades(file, 1, count_trades(msg_count))
## 27160 messages found
## [Loading] .........
## [Converting] to data.table
## [Formatting]
#> 330023 messages found
#> [Loading] ................
#> [Converting] to data.table
#> [Formatting]
head(trades)
## msg_type locate_code tracking_number timestamp order_ref buy shares
## 1: P 7155 2 3.003448e+13 0 TRUE 500
## 2: P 7155 2 3.005588e+13 0 TRUE 300
## 3: P 2499 2 3.105399e+13 0 TRUE 100
## 4: P 2499 4 3.105399e+13 0 TRUE 14
## 5: P 2499 2 3.323261e+13 0 TRUE 100
## 6: P 2499 2 3.330769e+13 0 TRUE 100
## stock price match_number cross_type date datetime
## 1: STO 17.62 16848 <NA> 2017-05-30 2017-05-30 08:20:34
## 2: STO 17.62 16849 <NA> 2017-05-30 2017-05-30 08:20:55
## 3: ESV 6.29 16869 <NA> 2017-05-30 2017-05-30 08:37:33
## 4: ESV 6.29 16870 <NA> 2017-05-30 2017-05-30 08:37:33
## 5: ESV 6.46 17004 <NA> 2017-05-30 2017-05-30 09:13:52
## 6: ESV 6.47 17009 <NA> 2017-05-30 2017-05-30 09:15:07
# 1. data munging
tickers <- c("SPY", "IWO", "IWM", "VXX")
dt_orders <- orders[stock %in% tickers]
dt_trades <- trades[stock %in% tickers]
# for each ticker, use only orders that are within 1% of the range of traded prices
ranges <- dt_trades[, .(min_price = min(price), max_price = max(price)), by = stock]
# filter the orders
dt_orders <- dt_orders[ranges, on = "stock"][price >= 0.99 * min_price & price <= 1.01 * max_price]
# replace the buy-factor with something more useful
dt_orders[, buy := ifelse(buy, "Bid", "Ask")]
dt_orders[, stock := factor(stock, levels = tickers)]
# 2. data visualization
p1 = ggplot() +
# add the orders to the plot
geom_point(data = dt_orders,
aes(x = datetime, y = price, color = buy), size = 0.5) +
# add the trades as a black line to the plot
geom_step(data = dt_trades,
aes(x = datetime, y = price)) +
# add a facet for each ETF
facet_wrap(~stock, scales = "free_y") +
# some Aesthetics
theme_light() +
labs(title = "Orders and Trades of the largest ETFs",
subtitle = "Date: 2017-05-30 | Exchange: PSX",
caption = "Source: NASDAQ",
x = "Time", y = "Price",
color = "Side") +
scale_y_continuous(labels = scales::dollar) +
scale_color_brewer(palette = "Set1")
print(p1)
A full width figure.
##
head(dt_orders)
## msg_type locate_code tracking_number timestamp order_ref buy shares
## 1: A 4135 0 3.368433e+13 184492 Ask 500
## 2: A 4135 0 3.368816e+13 184616 Ask 500
## 3: A 4135 0 3.368816e+13 184617 Ask 500
## 4: A 4135 0 3.368816e+13 184618 Ask 500
## 5: A 4135 0 3.368816e+13 184619 Ask 500
## 6: A 4135 0 3.368816e+13 184620 Ask 500
## stock price mpid date datetime min_price max_price
## 1: IWM 137.13 <NA> 2017-05-30 2017-05-30 09:21:24 136.29 137.25
## 2: IWM 137.12 <NA> 2017-05-30 2017-05-30 09:21:28 136.29 137.25
## 3: IWM 137.15 <NA> 2017-05-30 2017-05-30 09:21:28 136.29 137.25
## 4: IWM 137.16 <NA> 2017-05-30 2017-05-30 09:21:28 136.29 137.25
## 5: IWM 137.14 <NA> 2017-05-30 2017-05-30 09:21:28 136.29 137.25
## 6: IWM 137.17 <NA> 2017-05-30 2017-05-30 09:21:28 136.29 137.25
setwd(oldwd)
end = date()
cat("\n started at: ", start)
##
## started at: Fri May 11 12:49:06 2018
cat("\n ended at ", end)
##
## ended at Fri May 11 13:16:43 2018
#kable(dt_orders)
Bitcoin transaction data
CRAN R Project
R Bloggers
RStudio Download
The R Project for Statistical Computing
The Integrated Development Environment - IDE for R
The R Journal is the open access, refereed journal of the R project for statistical computing. It features short to medium length articles covering topics that should be of interest to users or developers of R.