odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs

https://github.com/pangenome/odgi

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs

Basic Info
Statistics
  • Stars: 219
  • Watchers: 12
  • Forks: 44
  • Open Issues: 60
  • Releases: 26
Created over 7 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

odgi

build and test install with bioconda

optimized dynamic genome/graph implementation (odgi)

odgi provides an efficient and succinct dynamic DNA sequence graph model, as well as a host of algorithms that allow the use of such graphs in bioinformatic analyses.

Careful encoding of graph entities allows odgi to efficiently compute and transform pangenomes with minimal overheads. odgi implements a dynamic data structure that leveraged multi-core CPUs and can be updated on the fly.

The edges and path steps are recorded as deltas between the current node id and the target node id, where the node id corresponds to the rank in the global array of nodes. Graphs built from biological data sets tend to have local partial order and, when sorted, the deltas be small. This allows them to be compressed with a variable length integer representation, resulting in a small in-memory footprint at the cost of packing and unpacking.

The RAM and computational savings are substantial. In partially ordered regions of the graph, most deltas will require only a single byte.

installation

building from source

odgi requires a C++ version of 9.3 or higher. You can check your version via:

bash gcc --version g++ --version

odgi pulls in a host of source repositories as dependencies. It may be necessary to install several system-level libraries to build odgi. On Ubuntu 20.04, these can be installed using apt: sudo apt install build-essential cmake python3-distutils python3-dev libjemalloc-dev

After installing the required dependencies, clone the odgi git repository recursively because of the many submodules and build with:

git clone --recursive https://github.com/pangenome/odgi.git cd odgi cmake -H. -Bbuild && cmake --build build -- -j 3

To build a static executable, use:

cmake -DBUILD_STATIC=1 -H. -Bbuild && cmake --build build -- -j 3

You'll need to set this flag to 0 or remove and rebuild your build directory if you want to unset this build behavior and get a dynamic binary again. Static builds are unlikely to be supported on OSX, and require appropriate static libraries on linux.

For more information on optimisations, debugging and GNU Guix builds, see INSTALL.md and CMakeLists.txt.

building with GPU

If you have GPUs and CUDA installed, you can build with GPU to use our GPU-accelerated odgi-layout. This will provide significant 57.3x speedup compared to the CPU solution on NVIDIA A100 GPU, reducing execution time from hours to minutes. Check out this paper and repo for the detailed performance speedup number. It's going to be presented at SC'24!

Simply build with -DUSE_GPU=ON when cmake: cmake -DUSE_GPU=ON -H. -Bbuild && cmake --build build -- -j 3

To run odgi layout with GPU, simply add a --gpu with the other arguments like: odgi layout -i ${OG_FILE} -o ${LAY_FILE} --threads ${NUM_THREAD} --gpu

Nix build

If you have nix, build and installation in your profile are as simple as:

nix-build && nix-env -i ./result

Notes for distribution

If you need to avoid machine-specific optimizations, use the CMAKE_BUILD_TYPE=Generic build type:

shell cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Generic && cmake --build build -- -j 3

Notes on dependencies

On Arch Linux, the jemalloc dependency can be installed with:

sudo pacman -S jemalloc # arch linux

Bioconda

odgi recipes for Bioconda are available at https://bioconda.github.io/recipes/odgi/README.html. To install the latest version using Conda please execute:

bash conda install -c bioconda odgi

Docker

To simplify installation and versioning, we have an automated GitHub action that pushes the current docker build to dockerhub. To use it, pull the docker image:

shell docker pull pangenome/odgi

Then, you can run odgi with:

shell docker run odgi

Guix

An alternative way to manage odgi's dependencies is by using the GNU GUIX package manager. We use Guix to develop, test and deploy odgi on our systems. For more information see INSTALL.

FAQs

odgi is supported by default by PGGB and part of the standard construction pipeline. Since Cactus 2.6.1, odgi has been integrated in the minigraph-CACTUS pangenome build with the additional option --odgi; this produces the native odgi file format for the full graph, which can later be represented as a 1D visualization or a 2D layout according to the respective parameters — for more information on this users are encouraged to look at help guide for minigraph-CACTUS.

Regardless, if for any reason the minigraph-CACTUS pangenome has been built without the additional --odgi option, a simple conversion step can be done to achieve the desired output. Specifically, the new GFA v1.1 format has to be converted to a GFA v1.0 — the one used by odgi. The main difference between the two is whether or not including a W field to encode walks in the pangenome graph, a feature that is not as yet supported in odgi; hence why minigraph-CACTUS has to go through the following conversion step:

vg convert -W -t <n_of_threads> -g <pangenome_v1.1>.gfa -f > <pangenome_v1.0>.gfa

After converting the minigraph-CACTUS graph in GFA v1.0 format; it is then possible to do all conventional odgi analyses. For instance:

odgi build -g <pangenome_v1.0>.gfa -o <graph>.og -s -O -t <n_of_threads>

which is often useful to speed up odgi operations using its native format, especially when part of pipelines or called multiple times. Afterwards, paths can be extracted with odgi paths:

odgi paths -i <graph>.og -L -t <n_of_threads> | cut -f 1,2 -d '#' | sort | uniq > <prefixes>.tsv

and, lasty, any 1D visualization or 2D layout can be produced also for this graph. Below, an example of a 1D viz that colors haplotypes based on their ID:

odgi viz -i <graph>.og -o <graph_viz>.png -s '#' -M <prefixes>.tsv -t <n_of_threads>

documentation

odgi includes a variety of tools for analyzing and manipulating large pangenome graphs. Read the full documentation at https://odgi.readthedocs.io/.

multiqc

Since v1.11 MultiQC has an ODGI module. This module can only work with output from odgi stats! For more details take a look at the documentation at odgi.readthedocs.io/multiqc.

Citation

Andrea Guarracino*, Simon Heumos*, Sven Nahnsen, Pjotr Prins, Erik Garrison. ODGI: understanding pangenome graphs, Bioinformatics, 2022\ *Shared first authorship

Jiajie Li, Jan-Niklas Schmelzle, Yixiao Du, Simon Heumos, Andrea Guarracino, Giulia Guidi, Pjotr Prins, Erik Garrison, Zhiru Zhang. Rapid GPU-Based Pangenome Graph Layout, SC (The International Conference for High Performance Computing, Networking, Storage, and Analysis), 2024

funding sources

odgi has been funded through a variety of mechanisms, including a Wellcome Sanger PhD fellowship and diverse NIH and NSF grants (listed in our paper), as well as funding from the State of Tennessee. Of particular note is the contribution of NLnet to the development of a differential privacy model, "privvg", which supported significant maintenance and development effort in the odgi toolkit.

tests

Unittests from vg have been ported here and are used to validate the behavior of the algorithm. They can be run via odgi test which is invoked by

ctest .

API

odgi::graph_t is a MutablePathDeletableHandleGraph in the generic variation graph handle graph hierarchical API model. As such, it is possible to add, delete, and modify nodes, edges, and paths through the graph. Wherever possible, destructive operations on the graph maintain path validity.

versioning

Each time odgi is build, the current version is inferred via git describe --always --tags. Assuming, version.cpp is up to date, odgi version will not only print out the current tagged version, but its release codename, too.

new release (developers only)

  • Create a new release on GitHub
    • Choose a tag: v0.X.Y
    • Fill the Release title: ODGI v0.X.Y - Miao
    • Fill the Describe this release section
    • Tick This is a pre-release
    • Click Publish release
  • Produce a buildable source tarball, containing code for odgi and all submodules, and upload it to the release.
    • Execute the following instructions: shell mkdir source-tarball cd source-tarball git clone --recursive https://github.com/pangenome/odgi cd odgi git fetch --tags origin LATEST_TAG="$(git describe --tags `git rev-list --tags --max-count=1`)" git checkout "${LATEST_TAG}" git submodule update --init --recursive mkdir include bash scripts/generate_git_version.sh include sed 's/execute_process(COMMAND bash/#execute_process(COMMAND bash/g' CMakeLists.txt -i rm -Rf .git find deps -name ".git" -exec rm -Rf "{}" \; cd .. mv odgi "odgi-${LATEST_TAG}" tar -czf "odgi-${LATEST_TAG}.tar.gz" "odgi-${LATEST_TAG}" rm -Rf "odgi-${LATEST_TAG}"
    • Open the (pre-)release created earlier
    • Upload the odgi-v0.X.Y.tar.gz file
    • Remove the tick on This is a pre-release
    • Click Publish release (this will trigger the update on bioconda)

presentations

@AndreaGuarracino and @subwaystation presented odgi at the German Bioinformatics Conference 2021: ODGI - scalable tools for pangenome graphs.

name

odgi is a play on the Italian word "oggi" (/ˈɔd.dʒi/), which means "today". As of 2019, a standard refrain in genomics is that genome graphs will be useful in x years. But, if we make them efficient and scalable, they will be useful today.

Owner

  • Name: pangenome
  • Login: pangenome
  • Kind: organization

graphical pangenomic methods

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "Optimized Dynamic Genome/Graph Implementation (ODGI)"
version: "0.6.2"
date-released: "2021-09-15"
abstract: "ODGI, a new suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA variation graphs. ODGI includes tools for detecting complex regions, extracting loci, removing artifacts, exploratory analysis, manipulation, validation, and visualization."
authors:
  - family-names: "Guarracino"
    given-names: "Andrea"
    orcid: "https://orcid.org/0000-0001-9744-131X"
  - family-names: "Heumos"
    given-names: "Simon"
  - family-names: "Nahnsen"
    given-names: "Sven"
  - family-names: "Prins"
    given-names: "Pjotr"
  - family-names: "Garrison"
    given-names: "Erik"
license: MIT
repository-code: "https://github.com/pangenome/odgi"
keywords:
  - pangenome
  - pangenome graph
  - variation graph

GitHub Events

Total
  • Create event: 8
  • Issues event: 22
  • Release event: 3
  • Watch event: 16
  • Issue comment event: 40
  • Push event: 19
  • Pull request event: 18
  • Fork event: 5
Last Year
  • Create event: 8
  • Issues event: 22
  • Release event: 3
  • Watch event: 16
  • Issue comment event: 40
  • Push event: 19
  • Pull request event: 18
  • Fork event: 5

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 57
  • Total pull requests: 72
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 9 days
  • Total issue authors: 46
  • Total pull request authors: 11
  • Average comments per issue: 3.7
  • Average comments per pull request: 0.76
  • Merged pull requests: 62
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 15
  • Pull requests: 16
  • Average time to close issues: 5 days
  • Average time to close pull requests: about 9 hours
  • Issue authors: 14
  • Pull request authors: 4
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.63
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • adadiehl (4)
  • sivico26 (3)
  • zhangyixing3 (3)
  • subwaystation (2)
  • yeeus (2)
  • gavinieong (2)
  • ASLeonard (2)
  • splaisan (2)
  • saswat-km (1)
  • hui-liu (1)
  • Shihab-Shahriar (1)
  • jikhashkya (1)
  • Jokendo-collab (1)
  • joehagmann (1)
  • bslbgg (1)
Pull Request Authors
  • AndreaGuarracino (47)
  • subwaystation (8)
  • kojix2 (4)
  • ekg (4)
  • Overcraft90 (3)
  • tonyjie (2)
  • Shihab-Shahriar (2)
  • dpryan79 (2)
  • JervenBolleman (2)
  • sampsyo (1)
  • kdm9 (1)
Top Labels
Issue Labels
help wanted (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 0
  • Total maintainers: 2
spack.io: odgi

Optimized dynamic genome/graph implementation. Odgi provides an efficient and succinct dynamic DNA sequence graph model, as well as a host of algorithms that allow the use of such graphs in bioinformatic analyses.

  • Versions: 0
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Stargazers count: 16.1%
Forks count: 17.4%
Average: 22.9%
Dependent packages count: 57.9%
Maintainers (2)
Last synced: 10 months ago

Dependencies

docs/requirements.txt pypi
  • asciidoc *
  • breathe *
  • m2r2 *
.github/workflows/build_and_test_on_push.yml actions
  • actions/checkout v2 composite
.github/workflows/publish_docker_hub.yml actions
  • actions/checkout v2 composite
Dockerfile docker
  • debian bullseye-slim build