<code>diverse-seq</code>
<code>diverse-seq</code>: an application for alignment-free selecting and clustering biological sequences - Published in JOSS (2025)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: biorxiv.org, joss.theoj.org -
○Academic email domains
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
Tools for analysis of sequence divergence
Basic Info
- Host: GitHub
- Owner: HuttleyLab
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Homepage: https://diverse-seq.readthedocs.io
- Size: 2.18 MB
Statistics
- Stars: 6
- Watchers: 3
- Forks: 4
- Open Issues: 4
- Releases: 19
Topics
Metadata Files
README.md
diverse-seq provides alignment-free algorithms to facilitate phylogenetic workflows
diverse-seq implements computationally efficient alignment-free algorithms that enable efficient prototyping for phylogenetic workflows. It can accelerate parameter selection searches for sequence alignment and phylogeny estimation by identifying a subset of sequences that are representative of the diversity in a collection. We show that selecting representative sequences with an entropy measure of k-mer frequencies correspond well to sampling via conventional genetic distances. The computational performance is linear with respect to the number of sequences and can be run in parallel. Applied to a collection of 10.5k whole microbial genomes on a laptop took ~8 minutes to prepare the data and 4 minutes to select 100 representatives. diverse-seq can further boost the performance of phylogenetic estimation by providing a seed phylogeny that can be further refined by a more sophisticated algorithm. For ~1k whole microbial genomes on a laptop, it takes ~1.8 minutes to estimate a bifurcating tree from mash distances.
You can read more about the methods implemented in diverse-seq in the preprint here.
The user documentation is here.
Installation
We recommend installing diverse-seq from PyPI as follows
pip install "diverse-seq[extra]"
for the full jupyter experience.
For command line only usage, install as follows
pip install diverse-seq
NOTE If you experience any errors during installation, we recommend using uv pip. This command provides much better error messages than the standard
pipcommand. If you cannot resolve the installation problem, please open an issue on the GitHub repository.
Using uv
Speaking of uv, it provides a simplified approach to install dvs as a command-line only tool as
uv tool install diverse-seq
Usage in this case is then
uvx --from diverse-seq dvs
Dependencies
For a full listing of dependencies, see the pyproject.toml file.
The command line interface
dvs is the command line interface for diverse-seq.
The `dvs` subcommands
``` Usage: dvs [OPTIONS] COMMAND [ARGS]... dvs -- alignment free detection of the most diverse sequences using JSD Options: --version Show the version and exit. --help Show this message and exit. Commands: demo-data Export a demo sequence file prep Writes processed sequences to aThe Python API
We make comparable capabilities available as cogent3 apps. The main difference is the app instances directly operate on, and return, cogent3 sequence collections. See the docs for demonstrations of how to use the apps.
Project Information
diverse-seq is released under the BSD-3 license. If you want to contribute to the diverse-seq project (and we hope you do! :innocent:) the code of conduct and other useful developer information is available on the wiki.
Owner
- Name: HuttleyLab
- Login: HuttleyLab
- Kind: organization
- Repositories: 4
- Profile: https://github.com/HuttleyLab
JOSS Publication
<code>diverse-seq</code>: an application for alignment-free selecting and clustering biological sequences
Authors
Tags
genomics statistics machine learning bioinformatics molecular evolution phylogeneticsGitHub Events
Total
- Create event: 51
- Issues event: 18
- Release event: 14
- Watch event: 3
- Delete event: 40
- Issue comment event: 199
- Push event: 101
- Gollum event: 23
- Pull request review comment event: 43
- Pull request review event: 97
- Pull request event: 190
- Fork event: 2
Last Year
- Create event: 51
- Issues event: 18
- Release event: 14
- Watch event: 3
- Delete event: 40
- Issue comment event: 199
- Push event: 101
- Gollum event: 23
- Pull request review comment event: 43
- Pull request review event: 97
- Pull request event: 190
- Fork event: 2
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 12
- Total pull requests: 129
- Average time to close issues: about 2 months
- Average time to close pull requests: 1 day
- Total issue authors: 3
- Total pull request authors: 4
- Average comments per issue: 1.0
- Average comments per pull request: 1.23
- Merged pull requests: 106
- Bot issues: 0
- Bot pull requests: 48
Past Year
- Issues: 10
- Pull requests: 116
- Average time to close issues: 6 days
- Average time to close pull requests: 1 day
- Issue authors: 3
- Pull request authors: 4
- Average comments per issue: 1.0
- Average comments per pull request: 1.35
- Merged pull requests: 96
- Bot issues: 0
- Bot pull requests: 47
Top Authors
Issue Authors
- iimog (6)
- GavinHuttley (4)
- xin-huang (2)
Pull Request Authors
- GavinHuttley (73)
- dependabot[bot] (48)
- rmcar17 (7)
- iimog (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 178 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 16
- Total maintainers: 1
pypi.org: diverse-seq
diverse_seq: a tool for sampling diverse biological sequences
- Documentation: https://diverse-seq.readthedocs.io/
- License: BSD License
-
Latest release: 2025.7.10
published 6 months ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- coverallsapp/github-action v2 composite
- actions/checkout v4 composite
- github/codeql-action/analyze v3 composite
- github/codeql-action/autobuild v3 composite
- github/codeql-action/init v3 composite
- EndBug/add-and-commit v9 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- attrs *
- click *
- cogent3 *
- h5py *
- hdf5plugin *
- numpy >=2.0
- rich *
- scitrack *
