https://github.com/broadinstitute/poa-bench
Benchmark suite for partial order aligners written in Rust
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.2%) to scientific vocabulary
Repository
Benchmark suite for partial order aligners written in Rust
Basic Info
- Host: GitHub
- Owner: broadinstitute
- Language: Rust
- Default Branch: main
- Size: 145 MB
Statistics
- Stars: 5
- Watchers: 4
- Forks: 0
- Open Issues: 2
- Releases: 1
Metadata Files
README.md
POA-bench
Benchmarking various partial order aligners
POA-bench is a tool to benchmark partial order aligners. We constructed multiple datasets from bacterial housekeeping
genes, the human HLA locus, and others, with varying graph sizes and sequence diversity. POA-bench assesses the
runtime, throughput and memory usage of multiple partial order alignment tools. POA-bench uses thin wrappers around
each tool's internal align function and thus benchmarks only the time spent aligning sequence, and not anything I/O or
graph modification related.
Installation
POA-bench only works on Linux.
Pre-built binaries
TODO
Conda
TODO
Building poa-bench from source
Rust compiler
POA-bench is written in Rust, and to build and install it, you'll need a recent version of the Rust compiler. The minimum supported Rust version is 1.70.
- Download and install
rustup: https://rustup.rs/ - Run
rustup update
Building poa-bench
- Clone the repository.
bash
git clone https://github.com/broadinstitute/poa-bench
- Clone
poasta, a dependency forpoa-benchthat should reside in the same folder, which is not yet publicly available.
bash
git clone https://github.com/broadinstitute/poasta
- Move into the directory.
bash
cd poa-bench
- Build using
cargo. We enable a flag to ensure the compiler uses all features of your machine's CPU. To maximize portability of the binary, however, remove theRUSTFLAGS="..."part.
bash
RUSTFLAGS="-C target-cpu=native" cargo build --release
- The built
poa-benchexecutable will be available in the directorytarget/release/
Installing the Python helper tools
poa-bench relies on a couple of helper scripts written in Python to
create and manage datasets. These are located in the python/ folder.
To install them, move to the python/ directory and run
bash
pip install .
I'd recommend to create a new conda environment first, with at least Python 3.10.
Additional dependencies
The following additional tools should be available in your $PATH:
samtoolsspoa
Datasets
Datasets are defined by a TOML configuration files named meta.toml. See the data/ directory for a number of examples. Each
dataset comprises a set of sequences used to construct a graph, and a set of sequences used to benchmark alignment to the
constructed graph. Each FASTA with sequences must be compressed with gzip.
To define which sequences to use, specify it as follows in the TOML file:
```toml [graphset] fname = "filewithgraphseq.fa.gz"
[alignset] fname = "filewithalnseq.fa.gz" ```
The above fields are the only mandatory fields in the TOML file, but you can add other metadata if you want.
Running benchmarks
To run benchmarks, invoke poa-bench bench. This will run the benchmarks for all supported algorithms on all data sets.
To find datasets, it will recursively search a directory data/ in the current
working directory to find dataset configuration files. Each combination of a dataset and algorithm
is a single job, and poa-bench will spawn a new worker process that will perform the benchmark for that job.
To reduce variance in benchmarks, each worker process gets pinned to a specific core. If requested, poa-bench
can run multiple jobs in parallel, using the -j option. All output files and results are written to a directory
output/. The algorithms to run, and the input/output directories are configurable on the command line.
``` Usage: poa-bench bench [OPTIONS]
Options:
-a, --algorithms [
Acknowledgements
This program is inspired (and shares some code with) the excellent pa-bench repository.
Related repositories
Owner
- Name: Broad Institute
- Login: broadinstitute
- Kind: organization
- Location: Cambridge, MA
- Website: http://www.broadinstitute.org/
- Twitter: broadinstitute
- Repositories: 1,083
- Profile: https://github.com/broadinstitute
Broad Institute of MIT and Harvard
GitHub Events
Total
- Watch event: 1
- Create event: 1
Last Year
- Watch event: 1
- Create event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 8
- Total pull requests: 0
- Average time to close issues: 4 months
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.63
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- lrvdijk (8)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- 131 dependencies
- numpy *
- pandas *
- pysam *
- scikit-bio *
- scipy *
- tqdm *