outward-assembly
Pipeline for assembling outward from seed sequences in large metagenomic datasets
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
Pipeline for assembling outward from seed sequences in large metagenomic datasets
Basic Info
- Host: GitHub
- Owner: naobservatory
- License: mit
- Language: Python
- Default Branch: main
- Size: 104 KB
Statistics
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 18
- Releases: 0
Metadata Files
README.md
Outward assembly
Welcome to the outward assembly repository! This repo provides Python tooling to assemble the genomic context around a provided seed sequence from a collection of read pairs. The seed sequence could be a flagged chimeric junction, a kmer identified by reference-free growth detection, etc.
When the collection of reads is very large -- e.g. billions of reads from one delivery, or tens of billions of reads across deliveries -- doing a full metagenomic joint assembly is slow and computationally expensive. Instead of jointly assembling all reads, outward assembly attempts to iteratively grow contigs outward from the provided seed. The basic algorithm is simple: 1. Start with the seed as your initial contig. 2. Iteratively: * Find all read pairs that share a kmer with your contigs. * Assemble these read pairs. * Filter contigs to those that contain the seed. 3. Continue until either maximum iterations are reached or the assembly algorithm does not make progress from one iteration to another ("convergence").
Although the basic algorithm is simple, in practice, getting good assembly results is a bit more complex (see algorithm details for how we handle some of these complexities) and might require running outward assembly multiple times with different parameters. To handle this, we've introduced a way to automate the labor-intensive iterative process of:
- Running outward assembly;
- Evaluating outputs;
- Deciding whether to accept the outputs as final or modify parameters and re-run.
Whether you're running outward assembly once or iteratively, you can find more information in the usage docs.
Quick Start
- Install dependencies:
uv sync --extra dev - Create tools environment:
mamba env create -n oa-tools -f oa_tools_env.yml --channel-priority flexible - Activate tools environment:
mamba activate oa-tools - Run with:
uv run your_script.py
See installation docs for detailed setup instructions.
Documentation
Owner
- Name: Nucleic Acid Observatory
- Login: naobservatory
- Kind: organization
- Email: info@naobservatory.org
- Website: https://naobservatory.org
- Repositories: 36
- Profile: https://github.com/naobservatory
Reliable early warning for catastrophic biothreats
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Fields" given-names: "Evan" orcid: "https://orcid.org/0000-0003-1147-1687" - family-names: "Bhasin" given-names: "Harmon" orcid: "https://orcid.org/0009-0003-1569-5749" - family-names: "Teo" given-names: "Ryan" orcid: "https://orcid.org/0000-0003-0416-8436" - family-names: "Kaufman" given-names: "Jeff" orcid: "https://orcid.org/0009-0002-6137-5124" - family-names: "McLaren" given-names: "Michael" orcid: "https://orcid.org/0000-0003-1575-473X" title: "Outward Assembly" version: 0.1.0 date-released: 2025-04-16 url: "https://github.com/naobservatory/outward-assembly"
GitHub Events
Total
- Create event: 9
- Issues event: 12
- Watch event: 4
- Delete event: 11
- Issue comment event: 13
- Push event: 32
- Public event: 1
- Pull request review comment event: 12
- Pull request review event: 24
- Pull request event: 17
Last Year
- Create event: 9
- Issues event: 12
- Watch event: 4
- Delete event: 11
- Issue comment event: 13
- Push event: 32
- Public event: 1
- Pull request review comment event: 12
- Pull request review event: 24
- Pull request event: 17
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Evan Fields | e****n@s****g | 5 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 7
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 23 hours
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.33
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 7
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 23 hours
- Issue authors: 1
- Pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.33
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- evanfields (14)
Pull Request Authors
- evanfields (13)
- harmonbhasin (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- ubuntu latest build