PINstimation
A comprehensive bundle of utilities for the estimation of probability of informed trading models: original PIN in Easley and O'Hara (1992) and Easley et al. (1996); Multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN) in Duarte and Young (2009); and volume-synchronized PIN (VPIN) in Easley et al. (2011, 2012). Implementations of various estimation methods suggested in the literature are included. Additional compelling features comprise posterior probabilities, an implementation of an expectation-maximization (EM) algorithm, and PIN decomposition into layers, and into bad/good components. Versatile data simulation tools, and trade classification algorithms are among the supplementary utilities. The package provides fast, compact, and precise utilities to tackle the sophisticated, error-prone, and time-consuming estimation procedure of informed trading, and this solely using the raw trade-level data.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary
Keywords
Repository
A comprehensive bundle of utilities for the estimation of probability of informed trading models: original PIN in Easley and O'Hara (1992) and Easley et al. (1996); Multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN) in Duarte and Young (2009); and volume-synchronized PIN (VPIN) in Easley et al. (2011, 2012). Implementations of various estimation methods suggested in the literature are included. Additional compelling features comprise posterior probabilities, an implementation of an expectation-maximization (EM) algorithm, and PIN decomposition into layers, and into bad/good components. Versatile data simulation tools, and trade classification algorithms are among the supplementary utilities. The package provides fast, compact, and precise utilities to tackle the sophisticated, error-prone, and time-consuming estimation procedure of informed trading, and this solely using the raw trade-level data.
Basic Info
- Host: GitHub
- Owner: monty-se
- License: gpl-3.0
- Language: R
- Default Branch: master
- Homepage: https://pinstimation.com/
- Size: 5.27 MB
Statistics
- Stars: 40
- Watchers: 2
- Forks: 7
- Open Issues: 1
- Releases: 4
Topics
Metadata Files
README.md
PINstimation: Estimating Models of Probability of Informed Trading 
PINstimation provides utilities for the estimation of probability of informed trading models: original PIN (PIN) in Easley and O'Hara (1992) and Easley et al. (1996); multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN) in Duarte and Young (2009); and volume- synchronized PIN (VPIN) in Easley et al. (2011, 2012). Various computation methods suggested in the literature are included. Data simulation tools and trade classification algorithms are among the supplementary utilities. The package enables fast and precise solutions for the sophisticated, error-prone and time-consuming estimation procedure of the probability of informed trading measures, and it is compact in the sense detailed estimation results can be achieved by solely the use of raw trade level data.
New features in Version 0.1.3
We have updated the
initials_adjpin()function, which generates initial parameter sets for the adjusted PIN model, to align with the algorithm outlined in Ersan and Ghachem (2024).The function
adjpin()now includes the time spent on generating initial parameter sets in the total time displayed in the output.The function
ivpin()has been introduced, which implements an improved version of the Volume-Synchronized Probability of Informed Trading (VPIN), based on Lin and Ke (2017). The function uses maximum likelihood estimation to provide more stable VPIN estimates, particularly for small volume buckets or infrequent informed trades, and improves the predictability of flow toxicity.
New features in Version 0.1.2
We introduce a new function called
classify_trades()that enables users to classify high-frequency (HF) trades individually, without aggregating them.
For each HF trade, the function assigns a variableisBuythat is set toTRUEif the trade is buyer-initiated, orFALSEif it is seller-initiated.The
aggregate_trades()function enables users to aggregate high-frequency (HF) trades at different frequencies. In the previous version, HF trades were automatically aggregated into daily trade data. However, with the updated version, users can now specify the desired frequency, such as every 15 minutes.
Table of contents
- Main functionalities
- Installation
- Examples
- Resources
- Note to frequent users
- Contributions
- Alternative packages
- Getting help <!--te-->
Main functionalities
The functionalities that the package offers are summarized below:
PIN model
- estimate the PIN model using the functions
pin(),pin_yz(),pin_gwj(), andpin_ea(). - compute initial parameter sets using the functions
initials_pin_yz(),initials_pin_gwj(), andinitials_pin_ea(). - generate simulation data following the PIN model using
generatedata_mpin(layers=1). - evaluate factorizations of the PIN likelihood functions using
fact_pin_eho(),fact_pin_lk(),fact_pin_e(). - estimate the PIN model by the Bayesian approach (Gibbs Sampler) using
pin_bayes()(*) .
- estimate the PIN model using the functions
MPIN model
- estimate the MPIN model using the functions
mpin_ml()andmpin_ecm(). - compute initial parameter sets using
initials_mpin(). - detect the number of layers in data using
detectlayers_e(),detectlayers_eg(), anddetectlayers_ecm(). - generate simulation data following the MPIN model using
generatedata_mpin(). - evaluate the factorization of the MPIN likelihood function through
fact_mpin().
- estimate the MPIN model using the functions
AdjPIN model
- estimate the AdjPIN model using the function
adjpin(). - compute initial parameter sets using functions
initials_adjpin(),initials_adjpin_cl(), andinitials_adjpin_rnd(). - generate simulation data following the AdjPIN model using
generatedata_adjpin(). - evaluate the factorization of the AdjPIN likelihood function through
fact_adjpin().
- estimate the AdjPIN model using the function
VPIN
- estimate the VPIN model using the function
vpin()
- estimate the VPIN model using the function
Data classification
- Classify high-frequency data through
tick,quote,LRandEMOalgorithms using the functionaggregate_trades()
- Classify high-frequency data through
Installation
The easiest way to get PINstimation is the following:
r
install.packages("PINstimation")
To get a bugfix or to use a feature from the development version, you can install the development version of PINstimation from GitHub.
```r
install.packages("devtools")
library(devtools)
devtools::installgithub("monty-se/PINstimation", buildvignettes = TRUE) ```
Loading the package
r
library(PINstimation)
Examples
Example 1: Estimate the PIN model
We estimate the PIN model on preloaded dataset dailytrades using the initial parameter sets of Ersan & Alici (2016).
r
estimate <- pin_ea(dailytrades)
```r
[+] PIN Estimation started
|[1] Likelihood function factorization: Ersan (2016)
|[2] Loading initial parameter sets : 5 EA initial set(s) loaded
|[3] Estimating PIN model (1996) : Using Maximum Likelihood Estimation
|+++++++++++++++++++++++++++++++++++++| 100% of PIN estimation completed
[+] PIN Estimation completed
```
Example 2: Estimate the Multilayer PIN model
We run the estimation of the MPIN model on preloaded dataset dailytrades using:
- the maximum-likelihood method.
r
ml_estimate <- mpin_ml(dailytrades)
```r
[+] MPIN estimation started
|[1] Detecting layers from data : using Ersan and Ghachem (2022a)
|[=] Number of layers in the data : 3 information layer(s) detected
|[2] Computing initial parameter sets : using algorithm of Ersan (2016)
|[3] Estimating the MPIN model : Maximum-likelihood standard estimation
|+++++++++++++++++++++++++++++++++++++| 100% of mpin estimation completed
[+] MPIN estimation completed
```
- the ECM algorithm.
r
ecm_estimate <- mpin_ecm(dailytrades)
```r
[+] MPIN estimation started
|[1] Computing the range of layers : information layers from 1 to 8
|[2] Computing initial parameter sets : using algorithm of Ersan (2016)
|[=] Selecting initial parameter sets : max 100 initial sets per estimation
|[3] Estimating the MPIN model : Expectation-Conditional Maximization algorithm
|+++++++++++++++++++++++++++++++++++++| 100% of estimation completed [8 layer(s)]
|[3] Selecting the optimal model : using lowest Information Criterion (BIC)
[+] MPIN estimation completed
```
Compare the aggregate parameters obtained from the ML, and ECM estimations.
r
mpin_comparison <- rbind(ml_estimate@aggregates, ecm_estimate@aggregates)
rownames(mpin_comparison) <- c("ML", "ECM")
cat("Probabilities of ML, and ECM estimations of the MPIN model\n")
print(mpin_comparison)
Display the summary of the model estimates for all number of layers.
r
summary <- getSummary(ecm_estimate)
show(summary)
```r
layers em.layers MPIN Likelihood AIC BIC AWE
Model[1] 1 1 0.566 -3226.469 6462.9 6473.4 6508.9
Model[2] 2 2 0.577 -800.379 1616.8 1633.5 1690.3
Model[3] 3 3 0.574 -643.458 1308.9 1332.0 1410.0
Model[4] 4 3 0.574 -643.458 1308.9 1332.0 1410.0
Model[5] 5 3 0.574 -643.458 1308.9 1332.0 1410.0
Model[6] 6 3 0.574 -643.458 1308.9 1332.0 1410.0
Model[7] 7 4 0.575 -642.631 1313.3 1342.6 1441.9
Model[8] 8 4 0.575 -642.631 1313.3 1342.6 1441.9
```
Example 3: Estimate the Adjusted PIN model
We estimate the adjusted PIN model on preloaded dataset dailytrades using 20 initial parameter sets computed by the algorithm of Ersan and Ghachem (2022b).
r
estimate_adjpin <- adjpin(dailytrades, initialsets = "GE")
show(estimate_adjpin)
```r
[+] AdjPIN estimation started
|[1] Computing initial parameter sets : 20 GE initial sets generated
|[2] Estimating the AdjPIN model : Maximum-likelihood Standard Estimation
|+++++++++++++++++++++++++++++++++++++| 100% of AdjPIN estimation completed
[+] AdjPIN estimation completed
```
Example 4: Estimate the Volume-adjusted PIN model
We run a VPIN estimation on preloaded dataset hfdata with timebarsize of 5 minutes (300 seconds).
r
estimate.vpin <- vpin(hfdata, timebarsize = 300)
show(estimate.vpin)
```r
----------------------------------
VPIN estimation completed successfully.
----------------------------------
Type object@vpin to access the VPIN vector.
Type object@bucketdata to access data used to construct the VPIN vector.
Type object@dailyvpin to access the daily VPIN vectors.
[+] VPIN descriptive statistics
| | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA's |
|:-----|:-----:|:-------:|:------:|:-----:|:-------:|:-----:|:----:|
|value | 0.101 | 0.185 | 0.238 | 0.244 | 0.29 | 0.636 | 49 |
[+] VPIN parameters
| tbSize | buckets | samplength | VBS | #days |
|:------:|:-------:|:----------:|:--------:|:-----:|
| 300 | 50 | 50 | 36321.25 | 77 |
-------
Running time: 3.753 seconds
```
Example 5: Estimate the AdjPIN model using aggregated high-frequency data
We use the preloaded high-frequency dataset hfdata, prepare it for aggregation.
r
data <- hfdata
data$volume <- NULL
We classify data using the LR algorithm with a time lag of 500 milliseconds (0.5 s), using the function aggregate_data().
r
daytrades <- aggregate_trades(data, algorithm = "LR", timelag = 500)
```r
[+] Trade classification started
|[=] Classification algorithm : LR algorithm
|[=] Number of trades in dataset : 100 000 trades
|[=] Time lag of lagged variables : 500 milliseconds
|[1] Computing lagged variables : using parallel processing
|+++++++++++++++++++++++++++++++++++++| 100% of variables computed
|[=] Computed lagged variables : in 7.68 seconds
|[2] Computing aggregated trades : using lagged variables
[+] Trade classification completed
```
We use the obtained dataset to estimate the (adjusted) probability of informed trading via the standard Maximum-likelihood method.
r
adjpin_ml <- adjpin(daytrades, method = "ML", initialsets = "GE")
```r
[+] AdjPIN estimation started
|[1] Computing initial parameter sets : 20 GE initial sets generated
|[2] Estimating the AdjPIN model : Maximum-likelihood Standard Estimation
|+++++++++++++++++++++++++++++++++++++| 100% of AdjPIN estimation completed
[+] AdjPIN estimation completed
```
Note to frequent users
If you are a frequent user of PINstimation, you might want to avoid repetitively
loading the package PINstimation whenever you open a new R session. You can do
that by adding PINstimation to .R profile either manually, or using the function
load_pinstimation_for_good().
To automatically load PINstimation, run load_pinstimation_for_good(),
and the following code will be added to your .R profile.
r
if (interactive()) suppressMessages(require(PINstimation))
After restart of the R session, PINstimation will be loaded automatically, whenever a new R
session is started. To remove the automatic loading of PINstimation, just open the
.R profile for editing usethis::edit_r_profile(), find the code above, and delete it.
Resources
For a smooth introduction to, and useful tips on the main functionalities of the package, please refer to:
- The sections Get Started, and Online documentation on the package site.
- The package documentation in PDF format is available for download here.
- An overview of the scientific research underlying the package is available here.
Contributions
The package makes a series of original contributions to the literature:
An efficient, user-friendly, and comprehensive implementation of the standard models of probability of informed trading.
A first implementation of the estimation of the multilayer probability of informed trading (MPIN) as developed by Ersan (2016).
A comprehensive treatment of the estimation of the adjusted probability of informed trading as introduced by Duarte and Young (2009). This includes the implementation of the factorization of the AdjPIN likelihood function, various algorithms to generate initial parameter sets, and MLE method.
The introduction of the expectation-conditional maximization (ECM) algorithm as an alternative method to estimate the models of probability of informed trading. The contribution is both theoretical and computational. The theoretical contribution is included in the paper by Ghachem and Ersan (2022b). The implementation of the ECM algorithm allows the estimation of PIN, MPIN, as well as the adjusted PIN model.
Implementation of three layer-detection algorithms, namely of preexistent algorithm of Ersan (2016), as well as two newly developed algorithms, described in Ersan and Ghachem (2022a), and Ghachem and Ersan (2022b), respectively.
A first implementation of the estimation of the volume-synchronized probability of informed trading (VPIN) as introduced by Easley et al. (2011, 2012).
One do-it-all function for trade classification in buyer-initiated or seller-initiated trades that implements the standard algorithms in the field, namely
Tick,Quote,LR, andEMO.
Alternative packages
To our knowledge, there are three preexisting R packages for the estimation of models of the probability of informed trading: pinbasic, InfoTrad, and FinAsym.
Getting help
If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub.
Owner
- Name: Montasser Ghachem
- Login: monty-se
- Kind: user
- Location: Sweden
- Website: www.pinstimation.com
- Repositories: 1
- Profile: https://github.com/monty-se
Researcher in Economics, and Finance, enthusiast for statistics, programming, and writing. Based in Sweden.
GitHub Events
Total
- Watch event: 9
- Push event: 20
- Fork event: 1
Last Year
- Watch event: 9
- Push event: 20
- Fork event: 1
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 4
- Total pull requests: 3
- Average time to close issues: 3 months
- Average time to close pull requests: about 2 hours
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 2.25
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- HenrikBengtsson (3)
- RezaTalebloo (1)
Pull Request Authors
- monty-se (2)
- alecriste (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 261 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 3
- Total maintainers: 1
cran.r-project.org: PINstimation
Estimation of the Probability of Informed Trading
- Homepage: https://www.pinstimation.com
- Documentation: http://cran.r-project.org/web/packages/PINstimation/PINstimation.pdf
- License: GPL (≥ 3)
-
Latest release: 0.1.2
published almost 3 years ago
Rankings
Maintainers (1)
Dependencies
- R >= 2.10.0 depends
- Rdpack * imports
- coda * imports
- dplyr * imports
- furrr * imports
- future * imports
- knitr * imports
- methods * imports
- nloptr * imports
- rmarkdown * imports
- skellam * imports
- actions/checkout v2 composite
- r-lib/actions/check-r-package v1 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite
- r-lib/actions/setup-r-dependencies v1 composite
- actions/checkout v3 composite
- github/super-linter v4 composite