Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: riccardoc95
- License: mit
- Language: C++
- Default Branch: main
- Size: 83 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
HyperC: Scalable Distributed Workflow for Long Reads Self-Correction
HyperC is a high-performance, distributed workflow designed to accelerate long-read self-correction pipelines. It integrates with existing tools like CONSENT to efficiently process third-generation sequencing (TGS) data at scale using a hybrid MPI + OpenMP parallelization strategy.
Developed to address the limitations of slow correction processes in genomic studies, HyperC enables fast and scalable analysis suitable for population-scale datasets.
Features
- Distributed parallelization using MPI across compute nodes.
- Intra-node multithreading using OpenMP.
- Optimized I/O: compressed FASTQ loading (via Zstd) and balanced PAF partitioning.
- Modular design: easily plug in any MSA-based correction tool.
- No need for parallel programming knowledge to run.
Use Case
Using real datasets (like NA12878 from the Nanopore WGS Consortium), it significantly reduced runtime, scaling efficiently across multiple nodes.
Repository Structure
main.cpp– Entry point for the HyperC workflow.CMakeLists.txt– Build configuration for the project.compile.sh,compile_and_run.sh– Scripts for compilation and execution.consent.h– Interface for integrating CONSENT's correction module (an example of correction module).robin_hood.h– Efficient hash map implementation used for data structures (needed for consent.h code).utils.h– Utility functions for data handling and decompression..gitmodules– Contains references to submodules:spoa: MSA library based on partial order alignment.Complete-Striped-Smith-Waterman-Library: Optimized Smith-Waterman alignment.
Installation
Prerequisites
- GCC ≥ 10 with OpenMP support
- MPI (e.g. OpenMPI)
- CMake ≥ 3.10
- Zstandard (
libzstd)
Clone the Repository
Make sure to clone the repository with submodules to include required dependencies:
bash
git clone --recurse-submodules https://github.com/riccardoc95/hipesc.git
Build
bash
./compile.sh
Or, for building and running:
bash
./compile_and_run.sh
Running the Pipeline
To run the pipeline, you need:
- A FASTQ file with long reads
- A corresponding PAF file (generated via Minimap2)
Example (SLURM):
bash
srun -n 4 ./hyperc /path/to/input.fq /path/to/input.paf
Each MPI rank will process a portion of the input, and correction results will be written to separate output files.
Plug in Your Own Correction Module
To integrate another MSA-based correction module:
- Implement a C/C++ function that accepts a target read and its overlapping reads.
- Modify
consent.hto wrap the new module. - Recompile with
compile.sh.
HyperC will handle all job distribution and parallelization transparently.
License
🔗 Reference
If you use HyperC in your research, please cite:
Ceccaroni, R., Di Rocco, L., Ferraro Petrillo, U., & Brutti, P. (2025). A Distributed Workflow for Long Reads Self-correction. In S. Caino-Lores, D. Zeinalipour, T. D. Doudali, D. E. Singh, G. E. M. Garzón, L. Sousa, … S. Neuwirth (Eds.), Euro-Par 2024: Parallel Processing Workshops (pp. 105–116). Cham: Springer Nature Switzerland.
Owner
- Name: Riccardo Ceccaroni
- Login: riccardoc95
- Kind: user
- Repositories: 1
- Profile: https://github.com/riccardoc95
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite the following paper:"
title: "A Distributed Workflow for Long Reads Self-correction"
authors:
- family-names: Ceccaroni
given-names: Riccardo
- family-names: Di Rocco
given-names: Lorenzo
- family-names: Ferraro Petrillo
given-names: Umberto
- family-names: Brutti
given-names: Pierpaolo
date-released: 2025-08-01
doi: "https://doi.org/10.1007/978-3-031-90203-1_10"
GitHub Events
Total
- Public event: 1
- Push event: 23
Last Year
- Public event: 1
- Push event: 23