https://github.com/broadinstitute/poa-bench

Benchmark suite for partial order aligners written in Rust

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Benchmark suite for partial order aligners written in Rust

Basic Info

Host: GitHub
Owner: broadinstitute
Language: Rust
Default Branch: main
Size: 145 MB

Statistics

Stars: 5
Watchers: 4
Forks: 0
Open Issues: 2
Releases: 1

Created about 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme

POA-bench

Benchmarking various partial order aligners

POA-bench is a tool to benchmark partial order aligners. We constructed multiple datasets from bacterial housekeeping genes, the human HLA locus, and others, with varying graph sizes and sequence diversity. POA-bench assesses the runtime, throughput and memory usage of multiple partial order alignment tools. POA-bench uses thin wrappers around each tool's internal align function and thus benchmarks only the time spent aligning sequence, and not anything I/O or graph modification related.

Installation

POA-bench only works on Linux.

Pre-built binaries

TODO

Conda

TODO

Building poa-bench from source

Rust compiler

POA-bench is written in Rust, and to build and install it, you'll need a recent version of the Rust compiler. The minimum supported Rust version is 1.70.

Download and install rustup: https://rustup.rs/
Run rustup update

Building poa-bench

Clone the repository.

bash git clone https://github.com/broadinstitute/poa-bench

Clone poasta, a dependency for poa-bench that should reside in the same folder, which is not yet publicly available.

bash git clone https://github.com/broadinstitute/poasta

Move into the directory.

bash cd poa-bench

Build using cargo. We enable a flag to ensure the compiler uses all features of your machine's CPU. To maximize portability of the binary, however, remove the RUSTFLAGS="..." part.

bash RUSTFLAGS="-C target-cpu=native" cargo build --release

The built poa-bench executable will be available in the directory target/release/

Installing the Python helper tools

poa-bench relies on a couple of helper scripts written in Python to create and manage datasets. These are located in the python/ folder. To install them, move to the python/ directory and run

bash pip install .

I'd recommend to create a new conda environment first, with at least Python 3.10.

Additional dependencies

The following additional tools should be available in your $PATH:

samtools
spoa

Datasets

Datasets are defined by a TOML configuration files named meta.toml. See the data/ directory for a number of examples. Each dataset comprises a set of sequences used to construct a graph, and a set of sequences used to benchmark alignment to the constructed graph. Each FASTA with sequences must be compressed with gzip. To define which sequences to use, specify it as follows in the TOML file:

```toml [graphset] fname = "filewithgraphseq.fa.gz"

[alignset] fname = "filewithalnseq.fa.gz" ```

The above fields are the only mandatory fields in the TOML file, but you can add other metadata if you want.

Running benchmarks

To run benchmarks, invoke poa-bench bench. This will run the benchmarks for all supported algorithms on all data sets. To find datasets, it will recursively search a directory data/ in the current working directory to find dataset configuration files. Each combination of a dataset and algorithm is a single job, and poa-bench will spawn a new worker process that will perform the benchmark for that job. To reduce variance in benchmarks, each worker process gets pinned to a specific core. If requested, poa-bench can run multiple jobs in parallel, using the -j option. All output files and results are written to a directory output/. The algorithms to run, and the input/output directories are configurable on the command line.

``` Usage: poa-bench bench [OPTIONS]

Options: -a, --algorithms [...] Specify which algorithms to run, options include poasta and spoa. To specify multiple, separate them by spaces [possible values: poasta, spoa] -j, --parallel Number of parallel threads to start. This number will include the main orchestrator process, so the number of actual worker threads will be one less than the number specified [default: 2] -d, --datasets-dir [default: data/] -o, --output-dir [default: output/] -h, --help Print help -V, --version Print version ```

Acknowledgements

This program is inspired (and shares some code with) the excellent pa-bench repository.

Related repositories

pa-bench - Benchmark suite for pairwise aligners
poasta - POASTA aligner
spoa - SIMD partial order aligner
spoa-rs - Rust bindings to SPOA
abPOA - adaptive band partial order aligner
abpoa-rs - Rust bindings to abPOA

Owner

Name: Broad Institute
Login: broadinstitute
Kind: organization
Location: Cambridge, MA

Website: http://www.broadinstitute.org/
Twitter: broadinstitute
Repositories: 1,083
Profile: https://github.com/broadinstitute

Broad Institute of MIT and Harvard

GitHub Events

Total

Watch event: 1
Create event: 1

Last Year

Watch event: 1
Create event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 8
Total pull requests: 0
Average time to close issues: 4 months
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.63
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

lrvdijk (8)

Pull Request Authors

Top Labels

Issue Labels

enhancement (7)

Pull Request Labels

Dependencies

Cargo.lock cargo

131 dependencies

Cargo.toml cargo

python/pyproject.toml pypi

numpy *
pandas *
pysam *
scikit-bio *
scipy *
tqdm *

https://github.com/broadinstitute/poa-bench

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

POA-bench

Benchmarking various partial order aligners

Installation

Pre-built binaries

Conda

Building poa-bench from source

Rust compiler

Building poa-bench

Installing the Python helper tools

Additional dependencies

Datasets

Running benchmarks

Acknowledgements

Related repositories

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies