hictk

Blazing fast toolkit to work with .hic and .cool files

https://github.com/paulsengroup/hictk

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.9%) to scientific vocabulary

Keywords

3d-genomics bioinformatics cli-application conversion cooler cxx cxx-library cxx17 genomics hi-c hic hictk
Last synced: 4 months ago · JSON representation ·

Repository

Blazing fast toolkit to work with .hic and .cool files

Basic Info
Statistics
  • Stars: 38
  • Watchers: 2
  • Forks: 1
  • Open Issues: 13
  • Releases: 21
Topics
3d-genomics bioinformatics cli-application conversion cooler cxx cxx-library cxx17 genomics hi-c hic hictk
Created over 2 years ago · Last pushed 5 months ago
Metadata Files
Readme License Citation

README.md

hictk


Downloads Bioconda   Conan Center Index   DockerHub   Zenodo
Documentation Documentation
License License
Coverage Coverage
CI Ubuntu CI Status   macOS CI Status   Windows CI Status   Build Dockerfile Status
CodeQL CodeQL (C++) Status   CodeQL (Python) Status   CodeQL (GH Actions) Status
Fuzzy Testing Fuzzy Testing Status
Static Analysis clang-tidy Status   Lint CMakeLists.txt files Status   Lint CITATION.cff Status

hictk is a blazing fast toolkit to work with .hic and .cool files.

The toolkit consists of a native CLI application and a C++ library running on Linux, macOS, and Windows.\ hictk offers native IO support for Cooler and .hic files, meaning that its implementation is independent of that of cooler, JuicerTools, or straw.

hictk can also be accessed from several programming languages using one of the following libraries:

  • hictkpy - Python bindings for hictk: read and write .cool and .hic files directly from Python
  • hictkR - R bindings for hictk: read .cool and .hic files directly from R
  • libhictk - The native C++ library that underlies hictk

Features

Supported formats

The CLI application and C++ library are capable of reading and writing files in the following formats:

| Format | Revision | Read | Write | | ------ | ---------- | ---- | --------------- | | .cool | v1-3 (all) | ✅ | ✅ 1 | | .mcool | v1-2 (all) | ✅ | ✅ 2 | | .scool | v1 (all) | ✅ | ✅ 3 | | .hic | v6-9 | ✅ | ✅ 4 |

1 v3 only\ 2 v2 only\ 3 libhictk only\ 4 v9 only

Supported operations

  • Seamless conversion between Cooler and .hic formats (from hic to cool and vice versa)
  • Uniform interface to query interaction matrices
  • High performance and low memory requirements (see benchmarks in the Supplementary Text from our paper)
  • Easy access to file metadata
  • Create files from interaction pairs or pre-binned interaction counts (e.g. 4DN-DCIC pairs or BEDPE/bedGraph2)
  • Merge interactions from multiple files into a single file (also supports merging files in different formats)
  • Detect (and when possible fix) corrupted files
  • Balance interaction matrices using ICE, SCALE, or VC
  • Create multi-resolution files suitable for visualization with JuiceBox and HiGlass

All the above operations can be performed on both Cooler and .hic files and yield identical results.

Installation

hictk can be installed using containers, bioconda, Conan, or directly from source.\ Refer to the Installation section in the documentation for more information.

Quickstart

hictk (CLI)

hictk provides the following subcommands:

| Subcommand | Description | | ---------------------- | ---------------------------------------------------------------------------------------------- | | balance | Balance Hi-C files using ICE, SCALE, or VC. | | convert | Convert Hi-C files between different formats. | | dump | Read interactions and other kinds of data from .hic and Cooler files and write them to stdout. | | fix-mcool | Fix corrupted .mcool files. | | load | Build .cool and .hic files from interactions in various text formats. | | merge | Merge multiple Cooler or .hic files into a single file. | | metadata | Print file metadata to stdout. | | rename-chromosomes | Rename chromosomes found in a Cooler file. | | validate | Validate .hic and Cooler files. | | zoomify | Convert single-resolution Cooler and .hic files to multi-resolution by coarsening. |

Refer to the Quickstart (CLI) and CLI Reference sections in the documentation for more details.

libhictk

libhictk can be installed in various ways, including with Conan and CMake FetchContent.\ Section Quickstart (API) of hictk documentation contains further details on how this can be accomplished.

Quickstart (API) also demonstrates the basic functionality offered by libhictk.\ For more complex examples refer to the sample programs under the examples/ folder as well as to the source code of hictk.

The public C++ API of hictk is documented in the C++ API Reference section of hictk documentation.

Citing

If you use hictk or any of its language bindings in your research, please cite the following publication:

Roberto Rossini, Jonas Paulsen, hictk: blazing fast toolkit to work with .hic and .cool files Bioinformatics, Volume 40, Issue 7, July 2024, btae408, https://doi.org/10.1093/bioinformatics/btae408

BibTex ```bibtex @article{hictk, author = {Rossini, Roberto and Paulsen, Jonas}, title = "{hictk: blazing fast toolkit to work with .hic and .cool files}", journal = {Bioinformatics}, volume = {40}, number = {7}, pages = {btae408}, year = {2024}, month = {06}, issn = {1367-4811}, doi = {10.1093/bioinformatics/btae408}, url = {https://doi.org/10.1093/bioinformatics/btae408}, eprint = {https://academic.oup.com/bioinformatics/article-pdf/40/7/btae408/58385157/btae408.pdf}, } ```

Owner

  • Name: paulsengroup
  • Login: paulsengroup
  • Kind: organization

Citation (CITATION.cff)

# Copyright (C) 2024 Roberto Rossini <roberros@uio.no>
#
# SPDX-License-Identifier: MIT

cff-version: 1.2.0
message: 'If you use this software, please cite it using the metadata from this file.'
authors:
  - given-names: Roberto
    family-names: Rossini
    orcid: 'https://orcid.org/0000-0003-3096-1470'
    email: roberros@uio.no
    affiliation: 'Department of Biosciences, University of Oslo'
title: hictk
abstract: 'Blazing fast toolkit to work with .hic and .cool files.'
doi: '10.5281/zenodo.8214220'
url: 'https://github.com/paulsengroup/hictk'
repository-code: 'https://github.com/paulsengroup/hictk'
repository-artifact: 'https://github.com/paulsengroup/hictk/pkgs/container/hictk'
type: software
license: MIT
keywords:
  - bioinformatics
  - cxx
  - conversion
  - cooler
  - cli-application
  - hic
  - cxx17
  - cxx-library
  - hictk
preferred-citation:
  type: article
  authors:
  - given-names: Roberto
    family-names: Rossini
    orcid: 'https://orcid.org/0000-0003-3096-1470'
    email: roberros@uio.no
    affiliation: 'Department of Biosciences, University of Oslo'
  - given-names: Jonas
    family-names: Paulsen
    orcid: 'https://orcid.org/0000-0002-7918-5495'
    email: jonas.paulsen@ibv.uio.no
    affiliation: 'Department of Biosciences, University of Oslo'
  doi: '10.1093/bioinformatics/btae408'
  url: 'https://academic.oup.com/bioinformatics/article/40/7/btae408/7698028'
  journal: 'Bioinformatics'
  year: 2024
  month: 06
  title: 'hictk: blazing fast toolkit to work with .hic and .cool files'
  abstract: >
    Hi-C is gaining prominence as a method for mapping genome organization.
    With declining sequencing costs and a growing demand for higher-resolution data, efficient tools for processing Hi-C datasets at different resolutions are crucial.
    Over the past decade, the .hic and Cooler file formats have become the de-facto standard to store interaction matrices produced by Hi-C experiments in binary format.
    Interoperability issues make it unnecessarily difficult to convert between the two formats and to develop applications that can process each format natively.

    We developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance.
    The toolkit is written in C++ and consists of a C++ library with Python and R bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries.
    We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication.

    The hictk library, Python bindings and CLI tools are released under the MIT license as a multi-platform application available at github.com/paulsengroup/hictk.
    Pre-built binaries for Linux and macOS are available on bioconda.
    Python bindings for hictk are available on GitHub at github.com/paulsengroup/hictkpy, while R bindings are available on GitHub at github.com/paulsengroup/hictkR.

GitHub Events

Total
  • Create event: 132
  • Issues event: 18
  • Release event: 9
  • Watch event: 12
  • Delete event: 126
  • Issue comment event: 130
  • Push event: 385
  • Pull request review event: 1
  • Pull request review comment event: 2
  • Pull request event: 236
Last Year
  • Create event: 132
  • Issues event: 18
  • Release event: 9
  • Watch event: 12
  • Delete event: 126
  • Issue comment event: 130
  • Push event: 385
  • Pull request review event: 1
  • Pull request review comment event: 2
  • Pull request event: 236

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 23
  • Total pull requests: 186
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 4 days
  • Total issue authors: 9
  • Total pull request authors: 4
  • Average comments per issue: 0.52
  • Average comments per pull request: 0.67
  • Merged pull requests: 124
  • Bot issues: 0
  • Bot pull requests: 33
Past Year
  • Issues: 7
  • Pull requests: 109
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 4 days
  • Issue authors: 5
  • Pull request authors: 3
  • Average comments per issue: 1.29
  • Average comments per pull request: 0.64
  • Merged pull requests: 62
  • Bot issues: 0
  • Bot pull requests: 31
Top Authors
Issue Authors
  • robomics (35)
  • Nuturetree (4)
  • bskubi (2)
  • paulmenzel (1)
  • sh0rt2l0ng (1)
  • Phlya (1)
  • fubar2 (1)
  • GMFranceschini (1)
  • taojingfen (1)
Pull Request Authors
  • robomics (238)
  • dependabot[bot] (30)
  • pre-commit-ci[bot] (4)
  • Phlya (1)
Top Labels
Issue Labels
enhancement (19) bug (7) testing (4) CI (3) documentation (3) good first issue (2)
Pull Request Labels
enhancement (66) dependencies (38) CI (25) bug (23) github_actions (18) testing (16) documentation (12) python (2)

Dependencies

.github/workflows/codecov.yml actions
  • actions/cache v3 composite
  • actions/cache/restore v3 composite
  • actions/cache/save v3 composite
  • actions/checkout v3 composite
  • codecov/codecov-action v3 composite
.github/workflows/fuzzy-testing.yml actions
  • actions/cache/restore v3 composite
  • actions/cache/save v3 composite
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/upload-artifact v3 composite
.github/workflows/macos-ci.yml actions
  • actions/cache v3 composite
  • actions/cache/restore v3 composite
  • actions/cache/save v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/run-clang-tidy.yml actions
  • actions/cache v3 composite
  • actions/cache/restore v3 composite
  • actions/cache/save v3 composite
  • actions/checkout v3 composite
.github/workflows/ubuntu-ci.yml actions
  • actions/cache v3 composite
  • actions/cache/restore v3 composite
  • actions/cache/save v3 composite
  • actions/checkout v3 composite
  • actions/github-script v6 composite
.github/workflows/windows-ci.yml actions
  • actions/cache/restore v3 composite
  • actions/cache/save v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
Dockerfile docker
  • "$BUILD_BASE_IMAGE" latest build
  • "${FINAL_BASE_IMAGE}@${FINAL_BASE_IMAGE_DIGEST}" latest build
.github/workflows/build-dockerfile.yml actions
  • actions/cache/restore v3 composite
  • actions/checkout v4 composite
  • docker/build-push-action v4 composite
  • docker/login-action v2 composite
  • docker/metadata-action v4 composite
  • docker/setup-buildx-action v2 composite
.github/workflows/cache-test-dataset.yml actions
  • actions/cache/restore v3 composite
  • actions/cache/save v3 composite
  • actions/checkout v4 composite