ritch

An R interface to the ITCH Protocol

https://github.com/davzim/ritch

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.2%) to scientific vocabulary

Keywords

cpp itch r rdatatable

Last synced: 6 months ago · JSON representation

Repository

An R interface to the ITCH Protocol

Basic Info

Host: GitHub
Owner: DavZim
License: other
Language: R
Default Branch: master
Homepage: https://davzim.github.io/RITCH/
Size: 10.8 MB

Statistics

Stars: 18
Watchers: 5
Forks: 6
Open Issues: 2
Releases: 0

Topics

cpp itch r rdatatable

Created about 8 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
options(width = 120)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# RITCH - an R interface to the ITCH Protocol


[![CRAN status](https://www.r-pkg.org/badges/version/RITCH)](https://CRAN.R-project.org/package=RITCH) [![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/RITCH)](https://www.r-pkg.org/pkg/RITCH) [![R-CMD-check](https://github.com/DavZim/RITCH/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/DavZim/RITCH/actions/workflows/R-CMD-check.yaml)


The `RITCH` library provides an `R` interface to NASDAQs ITCH protocol, which is used to distribute financial messages to participants.
Messages include orders, trades, market status, and much more financial information.
A full list of messages is shown later.
The main purpose of this package is to parse the binary ITCH files to a [`data.table`](https://CRAN.R-project.org/package=data.table) in `R`.

The package leverages [`Rcpp`](https://CRAN.R-project.org/package=Rcpp) and `C++` for efficient message parsing.

Note that the package provides a small simulated sample dataset in the `ITCH_50` format for testing and example purposes.
Helper functions are provided to list and download sample files from NASDAQs official server.

## Install

To install `RITCH` you can use the following

```R
# stable version:
install.packages("RITCH")

# development version:
# install.packages("remotes")
remotes::install_github("DavZim/RITCH")
```

## Quick Overview

The main functions of `RITCH` are read-related and are easily identified by their `read_` prefix.

Due to the inherent structural differences between message classes, each class has its own read function.
A list of message types and the respective classes are provided later in this Readme.

Example message classes used in this example are *orders* and *trades*.
First we define the file to load and count the messages, then we read in the orders and the first 100 trades

```{r}
library(RITCH)
# use built in example dataset
file <- system.file("extdata", "ex20101224.TEST_ITCH_50", package = "RITCH")

# count the number of messages in the file
msg_count <- count_messages(file)
dim(msg_count)
names(msg_count)

# read the orders into a data.table
orders <- read_orders(file)
dim(orders)
names(orders)

# read the first 100 trades
trades <- read_trades(file, n_max = 100)
dim(trades)
names(trades)
```
Note that the file can be a plain `ITCH_50` file or a gzipped `ITCH_50.gz` file, which will be decompressed to the current directory.
You may also note that the output reports quite a low read speed in the `MB/s`.
This lowish number is due to including the parsing process, furthermore, due to overhead of setup code, this number gets higher on larger files.

If you want to know more about the functions of the package, read on.

## Main Functions

`RITCH` provides the following main functions:

- `read_itch(file, ...)` to read an ITCH file
Convenient wrappers for different message classes such as `orders`, `trades`, etc are also provided as `read_orders()`, `read_trades()`, ...
- `filter_itch(infile, outfile, ...)` to filter an ITCH file and write directly to another file without loading the data into R
- `write_itch(data, file, ...)` to write a dataset to an ITCH file

There are also some helper functions provided, a selection is:

- `download_sample_file(choice)` to download a sample file from the NASDAQ server and `list_sample_files()` to get a list of all available sample files
- `download_stock_directory(exchange, date)` to download the stock locate information for a given exchange and date
- `open_itch_sample_server()` to open the official NASDAQ server in your browser, which hosts among other things example data files
- `gzip_file(infile, outfile)` and `gunzip_file(infile, outfile)` for gzip functionality
- `open_itch_specification()` to open the official NASDAQ ITCH specification PDF in your browser

## Writing ITCH Files

`RITCH` also provides functionality for writing ITCH files.
Although it could be stored in other file formats (for example a database or a [`qs`](https://CRAN.R-project.org/package=qs) file), ITCH files are quite optimized regarding size as well as write/read speeds.
Thus the `write_itch()` function allows you to write a single or multiple types of message to an `ITCH_50` file.
Note however, that only the standard columns are supported.
Additional columns will not be written to file!

Additional information can be saved in the filename.
By default the date, exchange, and fileformat information is added to the filename unless you specify `add_meta = FALSE`, in which case the given name is used.

As a last note: if you write your data to an ITCH file and want to filter for stocks later on, make sure to save the stock directory of that day/exchange, either externally or in the ITCH file directly (see example below).

### Simple Write Example

A simple write example would be to read all modifications from an ITCH file and save it to a separate file to save space, reduce read times later on, etc.

```{r}
file <- system.file("extdata", "ex20101224.TEST_ITCH_50", package = "RITCH")
md <- read_modifications(file, quiet = TRUE)
dim(md)
names(md)

outfile <- write_itch(md, "modifications", compress = TRUE)

# compare file sizes
files <- c(full_file = file, subset_file = outfile)
format_bytes(sapply(files, file.size))
```
```{r, include = FALSE}
unlink(outfile)
```


### Comprehensive Write Example

A typical work flow would look like this:

- read in some message classes from file and filter for certain stocks
- save the results for later analysis, also compress to save disk space

```{r}
## Read in the different message classes
file <- system.file("extdata", "ex20101224.TEST_ITCH_50", package = "RITCH")

# read in the different message types
data <- read_itch(file,
                  c("system_events", "stock_directory", "orders"),
                  filter_stock_locate = c(1, 3),
                  quiet = TRUE)

str(data, max.level = 1)


## Write the different message classes
outfile <- write_itch(data,
                      "alc_char_subset",
                      compress = TRUE)
outfile

# compare file sizes
format_bytes(
  sapply(c(full_file = file, subset_file = outfile),
         file.size)
)


## Lastly, compare the two datasets to see if they are identical
data2 <- read_itch(outfile, quiet = TRUE)
all.equal(data, data2)
```
```{r, include=FALSE}
# remove files from write_itch again...
unlink(outfile)
outfile_unz <- gsub("\\.gz$", "", outfile)
unlink(outfile_unz)
```

For comparison, the same format in the [`qs`](https://CRAN.R-project.org/package=qs) format results in `44788` bytes.


## ITCH Messages

There are a total of 22 different message types which are grouped into 13 classes by `RITCH`.

The messages and their respective classes are:
```{r, echo=FALSE}
d <- get_msg_classes()
d$msg_type <- paste0("", d$msg_type, "")
d$read_function <- paste0("", "read_", d$msg_class, "()", "")

data.table::setcolorder(d, c("msg_type", "msg_class", "read_function",
                             "msg_name", "doc_nr"))
data.table::setnames(d, c("Type", "RITCH Class",
                          "RITCH Read Function", "ITCH Name",
                          "ITCH Spec Section"))

knitr::kable(d, escape = FALSE)
```

Note that if you are interested in the exact definition of the messages and its components, you should look into the [official ITCH specification](https://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/NQTVITCHspecification.pdf), which can also be opened by calling `open_itch_specification()`.


## Data

The `RITCH` package provides a small, artificial dataset in the ITCH format for example and test purposes.
To learn more about the dataset check `?ex20101224.TEST_ITCH_50`.

To access the dataset use:
```{r}
file <- system.file("extdata", "ex20101224.TEST_ITCH_50", package = "RITCH")
count_messages(file, add_meta_data = TRUE, quiet = TRUE)
```
Note that the example dataset does not contain messages from all classes but is limited to 6 system messages, 3 stock directory, 3 stock trading action, 5000 trade, 5000 order, and 2000 order modification messages.
As seen by the 3 stock directory messages, the file contains data about 3 made up stocks (see also the plot later in the Readme).

MASDAQ provides sample ITCH files on their official server at  (or in R use `open_itch_sample_server()`) which can be used to test code on larger datasets.
Note that the sample files are up to 5GB compressed, which inflate to about 13GB.
To interact with the sample files, use `list_sample_files()` and `download_sample_files()`.


## Notes on Memory and Speed

There are some tweaks available to deal with memory and speed issues.
For faster reading speeds, you can increase the buffer size of the `read_` functions to something around 1 GB or more (`buffer_size = 1e9`).

### Provide Message Counts

If you have to read from a single file multiple times, for example because you want to extract orders and trades, you can count the messages beforehand and provide it to each read's `n_max` argument, reducing the need to pass the file for counting the number of messages.
```{r}
# count messages once
n_msgs <- count_messages(file, quiet = TRUE)

# use counted messages multiple times, saving file passes
orders <- read_orders(file, quiet = TRUE, n_max = n_msgs)
trades <- read_trades(file, quiet = TRUE, n_max = n_msgs)
```

### Batch Read

If the dataset does not fit entirely into RAM, you can do a partial read specifying `skip` and `n_max`, similar to this:

```{r}
file <- system.file("extdata", "ex20101224.TEST_ITCH_50", package = "RITCH")

n_messages <- count_orders(count_messages(file, quiet = TRUE))
n_messages

# read 1000 messages at a time
n_batch <- 1000
n_parsed <- 0

while (n_parsed < n_messages) {
  cat(sprintf("Parsing Batch %04i - %04i", n_parsed, n_parsed + n_batch))
  # read in a batch
  df <- read_orders(file, quiet = TRUE, skip = n_parsed, n_max = n_batch)
  cat(sprintf(": with %04i orders\n", nrow(df)))
  # use the data
  # ...
  n_parsed <- n_parsed + n_batch
}
```

### Filter when Reading Data

You can also filter a dataset directly while reading messages for `msg_type`, `stock_locate`, `timestamp` range, as well as `stock`.
Note that filtering for a specific stock, is just a shorthand lookup for the stocks' `stock_locate` code, therefore a `stock_directory` needs to be supplied (either by providing the output from `read_stock_directory()` or `download_stock_locate()`) or the function will try to extract the stock directory from the file (might take some time depending on the size of the file).

```{r}
# read in the stock directory as we filter for stock names later on
sdir <- read_stock_directory(file, quiet = TRUE)

od <- read_orders(
  file,
  filter_msg_type = "A",          # take only 'No MPID add orders'
  min_timestamp = 43200000000000, # start at 12:00:00.000000
  max_timestamp = 55800000000000, # end at 15:30:00.000000
  filter_stock_locate = 1,        # take only stock with code 1
  filter_stock = "CHAR",          # but also take stock CHAR
  stock_directory = sdir          # provide the stock_directory to match stock names to stock_locates
)

# count the different message types
od[, .(n = .N), by = msg_type]
# see if the timestamp is in the specified range
range(od$timestamp)
# count the stock/stock-locate codes
od[, .(n = .N), by = .(stock_locate, stock)]
```

### Filter Data to File

On larger files, reading the data into memory might not be the best idea, especially if only a small subset is actually needed.
In this case, the `filter_itch` function will come in handy.

The basic design is identical to the `read_itch` function but instead of reading the messages into memory, they are immediately written to a file.

Taking the filter data example from above, we can do the following

```{r}
# the function returns the final name of the output file
outfile <- filter_itch(
  infile = file,
  outfile = "filtered",
  filter_msg_type = "A",          # take only 'No MPID add orders'
  min_timestamp = 43200000000000, # start at 12:00:00.000000
  max_timestamp = 55800000000000, # end at 15:30:00.000000
  filter_stock_locate = 1,        # take only stock with code 1
  filter_stock = "CHAR",          # but also take stock CHAR
  stock_directory = sdir          # provide the stock_directory to match stock names to stock_locates
)

format_bytes(file.size(outfile))

# read in the orders from the filtered file
od2 <- read_orders(outfile)

# check that the filtered dataset contains the same information as in the example above
all.equal(od, od2)
```
```{r, include=FALSE}
# remove files from filter_itch again...
unlink(outfile)
```


## Create a Plot with Trades and Orders of the largest ETFs

As a last step, a quick visualization of the example dataset

```{r ETF_plot}
library(ggplot2)

file <- system.file("extdata", "ex20101224.TEST_ITCH_50", package = "RITCH")

# load the data
orders <- read_orders(file, quiet = TRUE)
trades <- read_trades(file, quiet = TRUE)

# replace the buy-factor with something more useful
orders[, buy := ifelse(buy, "Bid", "Ask")]

ggplot() +
  geom_point(data = orders,
             aes(x = as.POSIXct(datetime), y = price, color = buy), alpha = 0.2) +
  geom_step(data = trades, aes(x = as.POSIXct(datetime), y = price), size = 0.2) +
  facet_grid(stock~., scales = "free_y") +
  theme_light() +
  labs(title = "Orders and Trades of Three Simulated Stocks",
       subtitle = "Date: 2010-12-24 | Exchange: TEST",
       caption = "Source: RITCH package", x = "Time", y = "Price", color = "Side") +
  scale_y_continuous(labels = scales::dollar) +
  scale_color_brewer(palette = "Set1")
```


## Other Notes

If you find this package useful or have any other kind of feedback, I'd be happy if you let me know. Otherwise, if you need more functionality, please feel free to create an issue or a pull request.

Citation and CRAN release are WIP.

If you are interested in gaining a better understanding of the internal data structures, converting data to and from binary, have a look at the `debug` folder and its contents (only available on the [RITCH's Github page](https://github.com/DavZim/RITCH/)).

Owner

Login: DavZim
Kind: user

Website: https://davzim.github.io/
Repositories: 34
Profile: https://github.com/DavZim

GitHub Events

Total

Watch event: 1
Fork event: 1

Last Year

Watch event: 1
Fork event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 45
Total pull requests: 4
Average time to close issues: 3 months
Average time to close pull requests: 1 minute
Total issue authors: 15
Total pull request authors: 2
Average comments per issue: 3.49
Average comments per pull request: 0.0
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: about 1 hour
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 2.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

DavZim (8)
DanielRonnstam (4)
macdeutsche (3)
eddelbuettel (2)
YufDar (1)
ChrisPetersen1098 (1)
waynep610 (1)
stefan-jansen (1)
annapooranil (1)
vovalev (1)
puneetkumar1997 (1)
ssh352 (1)
yhdanid (1)
herbertzou (1)
volare2016 (1)

Pull Request Authors

eddelbuettel (1)
DavZim (1)

Top Labels

Issue Labels

enhancement (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 261 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

cran.r-project.org: RITCH

R Parser for the ITCH-Protocol

Homepage: https://davzim.github.io/RITCH/
Documentation: http://cran.r-project.org/web/packages/RITCH/RITCH.pdf
License: MIT + file LICENSE
Status: removed
Latest release: 0.1.26
published about 2 years ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 261 Last month

Rankings

Forks count: 11.2%

Stargazers count: 13.2%

Dependent packages count: 29.2%

Dependent repos count: 34.9%

Average: 35.6%

Downloads: 89.6%

Maintainers (1)

david_j_zimmermann@hotmail.com