https://github.com/bigbuildbench/ragnargrootkoerkamp_astar-pairwise-aligner
https://github.com/bigbuildbench/ragnargrootkoerkamp_astar-pairwise-aligner
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 9 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Last synced: 9 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: BigBuildBench
- License: mpl-2.0
- Language: Rust
- Default Branch: master
- Size: 5.8 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 1 year ago
· Last pushed over 1 year ago
Metadata Files
Readme
License
README.org
#+TITLE: A*PA & A*PA2: A* Pairwise Aligner
#+PROPERTY: header-args :eval no-export :exports results
A*PA is a global pairwise sequence aligner for edit distance using A*, co-authored by [[https://github.com/pesho-ivanov][@pesho-ivanov]] and [[https://github.com/RagnarGrootKoerkamp][@RagnarGrootKoerkamp]].
A*PA2 is an improvement of A*PA that uses a DP-based approach instead of plain A*.
It achieves up to 20x speedup over other exact aligners and is competitive with
approximate aligners.
An alignment of two sequences of length 500 with 30% error rate using A*PA:
[[file:imgs/readme/layers.gif]]
An alignment of two sequences of length 10'000 with 15% error rate using A*PA2:
[[file:imgs/readme/astarpa2.gif]]
- Papers ::
For A*PA, we recommend reading the bioRxiv version which directly includes the
supplement and has better formatting. But please cite the Bioinformatics version!
- A*PA BioRxiv preprint:
*Ragnar Groot Koerkamp*, *Pesho Ivanov*.
"Exact global alignment using A* with chaining seed heuristic and match pruning".
bioRxiv (2024). [[https://doi.org/10.1101/2022.09.19.508631][10.1101/2022.09.19.508631]]
- A*PA OUP Bioinformatics paper:
*Ragnar Groot Koerkamp*, *Pesho Ivanov*.
"Exact global alignment using A* with chaining seed heuristic and match pruning".
Bioinformatics (2024). [[https://doi.org/10.1093/bioinformatics/btae032][10.1093/bioinformatics/btae032]]
- A*PA2 BioRxiv preprint:
*Ragnar Groot Koerkamp*.
"A*PA2: up to 20 times faster exact global alignment".
bioRxiv (2024). [[https://doi.org/10.1101/2024.03.24.586481][10.1101/2024.03.24.586481]]
- Links ::
- Twitter: [[https://mobile.twitter.com/curious_coding][@curious_coding]], [[https://mobile.twitter.com/peshotrie][@peshotrie]],
- Matrix: =@curious_coding:matrix.org=,
- Blog: [[https://curiouscoding.nl]]
* Usage
If you run into any kind of problem or unclarity, please (/please/ 🥺) make an issue or
reach out on twitter or matrix.
** Rust API
To call A*PA2 from another Rust crate, simply add the =astarpa[2]= crate in this
repo as a git dependency.
For A*PA2, use ~astarpa2_simple(a, b)~ or ~astarpa2_full(a, b)~ in the
[[file:astarpa2/src/lib.rs][~astarpa2~ crate]], or customize parameters with e.g.
#+begin_src rust
let mut params = astarpa2::AstarPa2Params::full();
params.front.incremental_doubling = false;
let mut aligner = params.make_aligner(true);
let (cost, cigar) = aligner.align(a, b);
#+end_src
The ~astarpa~ crate is the [[file:astarpa/src/lib.rs][main entrypoint]] for A*PA. See the docs there.
Use ~astarpa::astarpa(a, b)~ for alignment with default settings or
~astarpa::astarpa_gcsh(a,b,r,k,end_pruning)~ to use GCSH+DT with custom parameters.
More complex usage examples can be found in [[file:pa-bin/examples/][pa-bin/examples]].
** C API
The ~astarpa-c~ [[file:astarpa-c/astarpa.h][crate]] contains simple C-bindings for the
~astarpa::{astarpa,astarpa_gcsh}~ and ~astarpa2::astarpa2_{simple,full}~ functions and an [[file:astarpa-c/example.c][example]] with [[file:astarpa-c/makefile][makefile]]. More should not be needed for
simple usage. To run the resulting binary, make sure to ~export LD_LIBRARY_PATH=/path/to/astarpa/target/release~.
** Command line application
=pa-bin= is a small command line application that takes as input consecutive pairs of
sequences from a =.fasta=, =.seq=, or =.txt= file (or can generate random input)
and outputs costs and alignments to a =.csv=.
#+end_src
This requires =cargo= and Rust =nightly=. To get both, first install [[https://rustup.rs/][rustup]]. Then enable ~nightly~: ~rustup install nightly; rustup default nightly~.
Install =pa-bin= to =~/.local/share/cargo/bin/pa-bin= using the following (cloning this repo is not needed):
#+begin_src shell
cargo install --git https://github.com/RagnarGrootKoerkamp/astar-pairwise-aligner pa-bin
To run from the repository: clone and ~cargo run --release -- ~.
#+begin_src shell :exports both :results verbatim
cargo run --release -- -h
#+end_src
#+RESULTS:
#+begin_example
Globally align pairs of sequences using A*PA
Usage: pa-bin [OPTIONS] <--input |--length >
Options:
-i, --input A .seq, .txt, or Fasta file with sequence pairs to align
-o, --output
Owner
- Name: BigBuildBench
- Login: BigBuildBench
- Kind: organization
- Repositories: 1
- Profile: https://github.com/BigBuildBench
abbr. B3, benchmarking the repo-level understanding capability of your LLMs by reconstructing project build-file.
GitHub Events
Total
- Create event: 8
Last Year
- Create event: 8
Dependencies
.github/workflows/test.yml
actions
- Swatinem/rust-cache v1 composite
- actions-rs/cargo v1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v2 composite
Cargo.lock
cargo
- 189 dependencies
Cargo.toml
cargo
astarpa/Cargo.toml
cargo
- rand 0.8 development
- triple_accel 0.4 development
astarpa-c/Cargo.toml
cargo
astarpa-next/Cargo.toml
cargo
astarpa2/Cargo.toml
cargo
- rand 0.8 development
- triple_accel 0.4.0 development
pa-affine-types/Cargo.toml
cargo
pa-base-algos/Cargo.toml
cargo
- rand 0.8 development
- triple_accel 0.4.0 development
pa-bin/Cargo.toml
cargo
pa-bitpacking/Cargo.toml
cargo
- criterion 0.4.0 development
- strum 0.24.1 development
pa-heuristic/Cargo.toml
cargo
- aho-corasick 0.7 development
- criterion 0.4.0 development
- lazy_static 1 development
- suffix 1 development
pa-test/Cargo.toml
cargo
pa-vis/Cargo.toml
cargo
pa-web/Cargo.toml
cargo