NASDAQ ITCH Datafiles - Market Microstructure

An implementation in R Markdown

R. Brown

2018-05-11

Introduction

ITCH is the outbound protocol NASDAQ uses to communicate market data to its clients, that is, all information including market status, orders, trades, circuit breakers, etc. with nanosecond timestamps for each day and each exchange.

---
title: "NASDAQ ITCH Datafiles - Market Microstructure"
author: "R. Brown"
output:
  tufte::tufte_handout: default
  tufte::tufte_html: default
---

The RITCH package by David Zimmermann provides features for market microstructure analysis

The ITCH protocol allows NASDAQ to distribute financial information to market participants. The financial information includes orders, trades, order modifications, trading status, traded stocks, and more.

A typical file containing a single trading day consists of something like 30-50 million messages (BX-exchange) up to 230 million messages (NASDAQ), thus speed makes a crucial difference. As the data is streamed from the file, the execution time mainly depends on the reading/writing speed of the hard-drive.

DATA

ITCH datafiles are available from NASDAQ')1 NASDAQ FTP, Numerous files are available with a simple file naming nomenclature. They tend to be large.

I use the magrittr package to use the pipe operator %>% for processing the data. This avoids storing large data objects during intermediate processing steps of sorting and aggregating orders.

oldwd = getwd()
setwd("/home/hduser/zdata/nasdaq/")
mypath = "/home/hduser/zdata/nasdaq/"
file = paste(mypath, "20170530.PSX_ITCH_50", sep="") # original 20170830
## already DONE 
R.utils::gunzip("20170530.PSX_ITCH_50.gz", "20170530.PSX_ITCH_50", remove = FALSE)
md = get_meta_data() # a trade type
head(md)
##    msg_type                    msg_name              msg_group  doc_nr
## 1:        S        System Event Message   System Event Message     4.1
## 2:        R             Stock Directory Stock Related Messages   4.2.1
## 3:        H        Stock Trading Action Stock Related Messages   4.2.2
## 4:        Y         Reg SHO Restriction Stock Related Messages   4.2.3
## 5:        L Market Participant Position Stock Related Messages   4.2.4
## 6:        V  MWCB Decline Level Message Stock Related Messages 4.2.5.1
msg_count <- count_messages(file, add_meta_data = TRUE) 
## [Counting]   26889612 messages found
## [Converting] to data.table
head(msg_count)
##    msg_type count                    msg_name              msg_group
## 1:        S     6        System Event Message   System Event Message
## 2:        R  8414             Stock Directory Stock Related Messages
## 3:        H  8457        Stock Trading Action Stock Related Messages
## 4:        Y  8521         Reg SHO Restriction Stock Related Messages
## 5:        L  6107 Market Participant Position Stock Related Messages
## 6:        V     2  MWCB Decline Level Message Stock Related Messages
##     doc_nr
## 1:     4.1
## 2:   4.2.1
## 3:   4.2.2
## 4:   4.2.3
## 5:   4.2.4
## 6: 4.2.5.1
#orders <- get_orders(file, 1, count_orders(msg_count))
#trades <- get_trades(file, 1, count_trades(msg_count))

library(magrittr)

get_orders(file, 1, count_orders(msg_count), quiet = T) %>% 
  .$stock %>% 
  table %>% 
  sort(decreasing = T) %>% 
  head(4)
## .
##    QQQ    IWM    XLE    SPY 
## 122761 112533  98226  79776

Figures

# 0. load the data
orders <- get_orders(file, 1, count_orders(msg_count))
## 11997579 messages found
## [Loading]    .........
## [Converting] to data.table
## [Formatting]
#> 21162665 messages found
#> [Loading]    ................
#> [Converting] to data.table
#> [Formatting]
head(orders)
##    msg_type locate_code tracking_number    timestamp order_ref   buy
## 1:        A        2635               0 2.880001e+13     42086  TRUE
## 2:        A         973               0 2.880002e+13     42089 FALSE
## 3:        A        3771               0 2.880002e+13     42090  TRUE
## 4:        A        4982               0 2.880002e+13     42091  TRUE
## 5:        A        8394               0 2.880002e+13     42092  TRUE
## 6:        A        2635               0 2.880002e+13     42093  TRUE
##    shares stock price mpid       date            datetime
## 1:    100     F  1.00 <NA> 2017-05-30 2017-05-30 08:00:00
## 2:   4600  BRCD 12.64 <NA> 2017-05-30 2017-05-30 08:00:00
## 3:   1000  IBDB 25.58 <NA> 2017-05-30 2017-05-30 08:00:00
## 4:   5000 MNR-B 24.99 <NA> 2017-05-30 2017-05-30 08:00:00
## 5:    546   ZNH 37.32 <NA> 2017-05-30 2017-05-30 08:00:00
## 6:    100     F  1.00 <NA> 2017-05-30 2017-05-30 08:00:00
trades <- get_trades(file, 1, count_trades(msg_count))
## 27160 messages found
## [Loading]    .........
## [Converting] to data.table
## [Formatting]
#> 330023 messages found
#> [Loading]    ................
#> [Converting] to data.table
#> [Formatting]
head(trades)
##    msg_type locate_code tracking_number    timestamp order_ref  buy shares
## 1:        P        7155               2 3.003448e+13         0 TRUE    500
## 2:        P        7155               2 3.005588e+13         0 TRUE    300
## 3:        P        2499               2 3.105399e+13         0 TRUE    100
## 4:        P        2499               4 3.105399e+13         0 TRUE     14
## 5:        P        2499               2 3.323261e+13         0 TRUE    100
## 6:        P        2499               2 3.330769e+13         0 TRUE    100
##    stock price match_number cross_type       date            datetime
## 1:   STO 17.62        16848       <NA> 2017-05-30 2017-05-30 08:20:34
## 2:   STO 17.62        16849       <NA> 2017-05-30 2017-05-30 08:20:55
## 3:   ESV  6.29        16869       <NA> 2017-05-30 2017-05-30 08:37:33
## 4:   ESV  6.29        16870       <NA> 2017-05-30 2017-05-30 08:37:33
## 5:   ESV  6.46        17004       <NA> 2017-05-30 2017-05-30 09:13:52
## 6:   ESV  6.47        17009       <NA> 2017-05-30 2017-05-30 09:15:07
# 1. data munging
tickers <- c("SPY", "IWO", "IWM", "VXX")
dt_orders <- orders[stock %in% tickers]
dt_trades <- trades[stock %in% tickers]

# for each ticker, use only orders that are within 1% of the range of traded prices
ranges <- dt_trades[, .(min_price = min(price), max_price = max(price)), by = stock]

# filter the orders
dt_orders <- dt_orders[ranges, on = "stock"][price >= 0.99 * min_price & price <= 1.01 * max_price]

# replace the buy-factor with something more useful
dt_orders[, buy := ifelse(buy, "Bid", "Ask")]
dt_orders[, stock := factor(stock, levels = tickers)]

# 2. data visualization
p1 = ggplot() +
  # add the orders to the plot
  geom_point(data = dt_orders,
             aes(x = datetime, y = price, color = buy), size = 0.5) +
  # add the trades as a black line to the plot
  geom_step(data = dt_trades,
            aes(x = datetime, y = price)) +
  # add a facet for each ETF
  facet_wrap(~stock, scales = "free_y") +
  # some Aesthetics
  theme_light() +
  labs(title = "Orders and Trades of the largest ETFs",
       subtitle = "Date: 2017-05-30 | Exchange: PSX",
       caption = "Source: NASDAQ",
       x = "Time", y = "Price",
       color = "Side") +
  scale_y_continuous(labels = scales::dollar) +
  scale_color_brewer(palette = "Set1")

print(p1)
A full width figure.

A full width figure.

##
head(dt_orders)
##    msg_type locate_code tracking_number    timestamp order_ref buy shares
## 1:        A        4135               0 3.368433e+13    184492 Ask    500
## 2:        A        4135               0 3.368816e+13    184616 Ask    500
## 3:        A        4135               0 3.368816e+13    184617 Ask    500
## 4:        A        4135               0 3.368816e+13    184618 Ask    500
## 5:        A        4135               0 3.368816e+13    184619 Ask    500
## 6:        A        4135               0 3.368816e+13    184620 Ask    500
##    stock  price mpid       date            datetime min_price max_price
## 1:   IWM 137.13 <NA> 2017-05-30 2017-05-30 09:21:24    136.29    137.25
## 2:   IWM 137.12 <NA> 2017-05-30 2017-05-30 09:21:28    136.29    137.25
## 3:   IWM 137.15 <NA> 2017-05-30 2017-05-30 09:21:28    136.29    137.25
## 4:   IWM 137.16 <NA> 2017-05-30 2017-05-30 09:21:28    136.29    137.25
## 5:   IWM 137.14 <NA> 2017-05-30 2017-05-30 09:21:28    136.29    137.25
## 6:   IWM 137.17 <NA> 2017-05-30 2017-05-30 09:21:28    136.29    137.25
setwd(oldwd)

end = date()
cat("\n started at:  ", start)
## 
##  started at:   Fri May 11 12:49:06 2018
cat("\n ended at     ", end)
## 
##  ended at      Fri May 11 13:16:43 2018
#kable(dt_orders)

References

Bitcoin transaction data
CRAN R Project
R Bloggers
RStudio Download
The R Project for Statistical Computing
The Integrated Development Environment - IDE for R

The R Journal is the open access, refereed journal of the R project for statistical computing. It features short to medium length articles covering topics that should be of interest to users or developers of R.

The R Journal

The R Journal - Current Issue