Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary
Repository
Alignment-based filtering CLI tool
Basic Info
Statistics
- Stars: 53
- Watchers: 6
- Forks: 1
- Open Issues: 5
- Releases: 5
Metadata Files
README.md
Fast record filtering by alignment-scores 🔥
ish is a CLI tool for searching for matches against records using different alignment methods.
Build
Install pixi
pixi run build./ish --help
Pixi / Conda install
``` pixi global install -c conda-forge -c https://repo.prefix.dev/modular-community -c https://conda.modular.com/max ish
Or
conda install -c conda-forge -c https://repo.prefix.dev/modular-community -c https://conda.modular.com/max ish ```
For best performance it's recommended to build from source.
Usage
```sh ❯ ./ish --help ish Search for inexact patterns in files.
ARGS:
--verbose <Bool> [Default: False]
Verbose logging output.
OPTIONS:
--scoring-matrix
--score <Float> [Default: 0.8]
The min score needed to return a match. Results >= this value will be returned. The score is the found alignment score / the optimal score for the given scoring matrix and gap-open / gap-extend penalty.
--gap-open <Int> [Default: 3]
Score penalty for opening a gap.
--gap-extend <Int> [Default: 1]
Score penalty for extending a gap.
--match-algo <String> [Default: striped-semi-global]
The algorithm to use for matching: [striped-local, striped-semi-global]
--record-type <String> [Default: line]
The input record type: [line, fastx]
--threads <Int> [Default: 10]
The number of threads to use. Defaults to the number of physical cores.
--batch-size <Int> [Default: 268435456]
The number of bytes in a parallel processing batch. Note that this may use 2-3x this amount to account for intermediate transfer buffers.
--max-gpus <Int> [Default: 0]
The max number of GPUs to try to use. If set to 0 this will ignore any found GPUs. In general, if you have only one query then there won't be much using more than 1 GPU. GPUs won't always be faster than CPU parallelization depending on the profile of data you are working with.
--output-file <String> [Default: /dev/stdout]
The file to write the output to, defaults to stdout.
--sg-ends-free <String> [Default: FFTT]
The ends-free for semi-global alignment, if used. The free ends are: (query_start, query_end, target_start, target_end). These must be specified with a T or F, all four must be specified. By default this target ends are free.
```
```sh
Some actual usage.
❯ ./ish blosum62 ./ishbenchaligner.mojo ./ishbenchaligner.mojo:94 defaultvalue=String("Blosum50"), ./ishbenchaligner.mojo:96 "Scoring matrix to use. Currently supports: [Blosum50," ./ishbenchaligner.mojo:97 " Blosum62, ACTGN]" ./ishbenchaligner.mojo:379 if matrixname == "Blosum50": ./ishbenchaligner.mojo:380 matrix = ScoringMatrix.blosum50() ./ishbenchaligner.mojo:381 elif matrixname == "Blosum62": ./ishbenchaligner.mojo:382 matrix = ScoringMatrix.blosum62() ./ishbench_aligner.mojo:390 ## Assuming we are using Blosum50 AA matrix for everything below this for now. ```
🔥 Note
The
filepath:linenumberin the match allows you tocmd-clickon the match and have vscode open the file at that location.
Match Methods
striped-semi-global: Striped Semi-global, SIMD accelerated, GPU accelerated when available, supports affine gaps and scoring matrices. Specify ends-free with the--sg-ends-freeoptions.striped-local: Striped Smith-Waterman, SIMD accelerated, supports affine gaps and scoring matrices.
Record Types
line: match against one line at a time, a-lagrepfastx: match against the sequence portion of FASTA or FASTQ records.
ish-aligner
This is a benchmarking tool based on parasail_aligner.
⚠️ Warning
ish-alignerand all variations of it are for development purposes only.
Further Reading
The associated paper can be found here.
Future Work
- Support multiple queries
- Choose a better default between cpu and gpu / think about more. GPU crushes on big files / long running / many files, cpu is faster for small jobs
- Add ability to not skip dotfiles
Rattler Build
For testing the build process for modular-community
bash
pixi global install rattler-build
rattler-build build -c https://repo.prefix.dev/modular-community -c https://conda.modular.com/max -c conda-forge --skip-existing=all -r ./recipe.yaml
Owner
- Name: BioRadOpenSource
- Login: BioRadOpenSource
- Kind: organization
- Location: San Francisco Bay Area
- Repositories: 2
- Profile: https://github.com/BioRadOpenSource
Public facing repos
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Stadick" given-names: "Seth" orcid: "https://orcid.org/0009-0002-0915-9459" title: "ish" version: 1.1.1 doi: https://doi.org/10.1101/2025.06.04.657890 date-released: 2025-06-09 url: "https://github.com/BioRadOpenSource/ish"
GitHub Events
Total
- Create event: 14
- Issues event: 3
- Release event: 2
- Watch event: 30
- Delete event: 9
- Issue comment event: 10
- Public event: 1
- Push event: 34
- Pull request review event: 10
- Pull request review comment event: 6
- Pull request event: 18
Last Year
- Create event: 14
- Issues event: 3
- Release event: 2
- Watch event: 30
- Delete event: 9
- Issue comment event: 10
- Public event: 1
- Push event: 34
- Pull request review event: 10
- Pull request review comment event: 6
- Pull request event: 18