outward-assembly

Pipeline for assembling outward from seed sequences in large metagenomic datasets

https://github.com/naobservatory/outward-assembly

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Pipeline for assembling outward from seed sequences in large metagenomic datasets

Basic Info
  • Host: GitHub
  • Owner: naobservatory
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 104 KB
Statistics
  • Stars: 4
  • Watchers: 3
  • Forks: 0
  • Open Issues: 18
  • Releases: 0
Created 11 months ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License Citation

README.md

Outward assembly

Welcome to the outward assembly repository! This repo provides Python tooling to assemble the genomic context around a provided seed sequence from a collection of read pairs. The seed sequence could be a flagged chimeric junction, a kmer identified by reference-free growth detection, etc.

When the collection of reads is very large -- e.g. billions of reads from one delivery, or tens of billions of reads across deliveries -- doing a full metagenomic joint assembly is slow and computationally expensive. Instead of jointly assembling all reads, outward assembly attempts to iteratively grow contigs outward from the provided seed. The basic algorithm is simple: 1. Start with the seed as your initial contig. 2. Iteratively: * Find all read pairs that share a kmer with your contigs. * Assemble these read pairs. * Filter contigs to those that contain the seed. 3. Continue until either maximum iterations are reached or the assembly algorithm does not make progress from one iteration to another ("convergence").

Although the basic algorithm is simple, in practice, getting good assembly results is a bit more complex (see algorithm details for how we handle some of these complexities) and might require running outward assembly multiple times with different parameters. To handle this, we've introduced a way to automate the labor-intensive iterative process of:

  1. Running outward assembly;
  2. Evaluating outputs;
  3. Deciding whether to accept the outputs as final or modify parameters and re-run.

Whether you're running outward assembly once or iteratively, you can find more information in the usage docs.

Quick Start

  1. Install dependencies: uv sync --extra dev
  2. Create tools environment: mamba env create -n oa-tools -f oa_tools_env.yml --channel-priority flexible
  3. Activate tools environment: mamba activate oa-tools
  4. Run with: uv run your_script.py

See installation docs for detailed setup instructions.

Documentation

Owner

  • Name: Nucleic Acid Observatory
  • Login: naobservatory
  • Kind: organization
  • Email: info@naobservatory.org

Reliable early warning for catastrophic biothreats

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Fields"
  given-names: "Evan"
  orcid: "https://orcid.org/0000-0003-1147-1687"
- family-names: "Bhasin"
  given-names: "Harmon"
  orcid: "https://orcid.org/0009-0003-1569-5749"
- family-names: "Teo"
  given-names: "Ryan"
  orcid: "https://orcid.org/0000-0003-0416-8436"
- family-names: "Kaufman"
  given-names: "Jeff"
  orcid: "https://orcid.org/0009-0002-6137-5124"
- family-names: "McLaren"
  given-names: "Michael"
  orcid: "https://orcid.org/0000-0003-1575-473X"
title: "Outward Assembly"
version: 0.1.0
date-released: 2025-04-16
url: "https://github.com/naobservatory/outward-assembly"

GitHub Events

Total
  • Create event: 9
  • Issues event: 12
  • Watch event: 4
  • Delete event: 11
  • Issue comment event: 13
  • Push event: 32
  • Public event: 1
  • Pull request review comment event: 12
  • Pull request review event: 24
  • Pull request event: 17
Last Year
  • Create event: 9
  • Issues event: 12
  • Watch event: 4
  • Delete event: 11
  • Issue comment event: 13
  • Push event: 32
  • Public event: 1
  • Pull request review comment event: 12
  • Pull request review event: 24
  • Pull request event: 17

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 5
  • Total Committers: 1
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 5
  • Committers: 1
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Evan Fields e****n@s****g 5
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 7
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: about 23 hours
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.33
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 7
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: about 23 hours
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.33
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • evanfields (14)
Pull Request Authors
  • evanfields (13)
  • harmonbhasin (2)
Top Labels
Issue Labels
enhancement (10) good first issue (3) bug (1) cleanup (1) documentation (1)
Pull Request Labels

Dependencies

nextflow/docker/Dockerfile docker
  • ubuntu latest build
pyproject.toml pypi