redundans

Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.

https://github.com/Gabaldonlab/redundans

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
✓
Committers with academic emails
2 of 9 committers (22.2%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary

Keywords

assembled-contigs assembly bioinformatics closing contigs docker-image fasta gap genome-assembly genomics heterozygous mate-pairs paired-end pipeline polymorphic python scaffolding

Last synced: 6 months ago · JSON representation

Repository

Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.

Basic Info

Host: GitHub
Owner: Gabaldonlab
License: gpl-3.0
Language: C++
Default Branch: master
Homepage:
Size: 63.7 MB

Statistics

Stars: 157
Watchers: 6
Forks: 20
Open Issues: 0
Releases: 4

Topics

assembled-contigs assembly bioinformatics closing contigs docker-image fasta gap genome-assembly genomics heterozygous mate-pairs paired-end pipeline polymorphic python scaffolding

Created almost 11 years ago · Last pushed 10 months ago

Metadata Files

Readme License

README.md

Latest Version Docker Pulls

Redundans
- Prerequisites
- Official conda package
- UNIX installer
- Docker image
- Running the pipeline
- Parameters
- Test run
- Support
- Citation

Redundans

Redundans pipeline assists an assembly of heterozygous genomes.
Program takes as input assembled contigs, sequencing libraries and/or reference sequence and returns scaffolded homozygous genome assembly. Final assembly should be less fragmented and with total size smaller than the input contigs. In addition, Redundans will automatically close the gaps resulting from genome assembly or scaffolding.

The pipeline consists of several steps (modules):
1. de novo contig assembly (optional if no contigs are given) 2. redundancy reduction: detection and selective removal of redundant contigs from an initial de novo assembly 3. scaffolding: joining of genome fragments using paired-end reads, mate-pairs, long reads and/or reference chromosomes 4. gap closing: filling the gaps after scaffolding using paired-end and/or mate-pair reads

Redundans is: - fast & lightweight, multi-core support and memory-optimised, so it can be run even on the laptop for small-to-medium size genomes - flexible toward many sequencing technologies (Illumina, 454, Sanger, PacBio & Nanopore) and library types (paired-end, mate pairs, fosmids, long reads) - modular: every step can be omitted or replaced by other tools - reliable: it has been already used to improve genome assemblies varying in size (several Mb to several Gb) and complexity (fungal, animal & plants)

For more information have a look at the documentation, poster, publication, test dataset or manual.

Prerequisites

Redundans uses several programs (all except the interpreters and its submodules are provided within this repository):

| Resource | Type | Version | | :--- | :--- | :--- | | Python | Language interpreter | <3.11, ≥ 3.8 | | Platanus | Genome assembler | v1.2.4 | | Miniasm | Genome assembler | ≥ v0.3 (r179) | | Minimap2 | Sequence aligner | ≥ v2.2.4 (r1122) | | LAST | Sequence aligner | ≥ v800 | | BWA | Sequence aligner | ≥ v0.7.12 | | SNAP aligner | Sequence aligner | v2.0.1 | | SSPACE3 | Scaffolding software | v3.0 | | GapCloser | Gapclosing software | v1.12 | | GFAstats | Stats software | ≥ v1.3.6 | | Meryl | K-mer counter software | ≥ v1.3 | | Merqury | Assembly evaluation software | v1.3 | | k8 | Javascript shell based on V8 | v0.2.4 | | R | Language interpreter | ≥ 3.6 | | ggplot2| R package | ≥ 3.3.2 | | scales | R package | ≥ 3.3.2 | | argparser | R package | ≥ 3.6 |

WARNING: Some of the third-party requirements are provided precompiled in x86_64 and not readily available for other architectures.

On most Linux distros, the installation should be as easy as: git clone --recursive https://github.com/Gabaldonlab/redundans/ cd redundans && bin/.compile.sh

If it fails, make sure you have below dependencies installed: - Perl [SSPACE3] - make, gcc & g++ [BWA, GFAstats, Miniasm & LAST] ie. sudo apt-get install make gcc g++ - zlib including zlib.h headers [BWA] ie. sudo apt-get install zlib1g-dev - R ≥ 3.6 and additional packages [ggplot2, scales, argparser] for plotting the Merqury results. - optionally for additional plotting numpy and matplotlib ie. sudo -H pip install -U matplotlib numpy

For user convenience, we provide UNIX installer and Docker image, that can be used instead of manually installation.

Official conda package

If you are familiar with conda, this will be by far the easiest way of installing redundans: ```bash

create new Python3 >=3.8,<3.11 environment

conda create -n redundans python=3.10

activate it

conda activate redundans

and install redundans

conda install -c bioconda redundans ```

UNIX installer

UNIX installer will automatically fetch, compile and configure Redundans together with all dependencies. It should work on all modern Linux systems, given Python >= 3, commonly used programmes (ie. wget, make, curl, git, perl, gcc, g++, ldconfig) and libraries (zlib including zlib.h) are installed. bash source <(curl -Ls https://github.com/Gabaldonlab/redundans/raw/master/INSTALL.sh)

Docker image

First, you need to install docker: wget -qO- https://get.docker.com/ | sh
Then, you can run the test example by executing: ```bash

Pull the image directly from dockerhub

docker pull cgenomics/redundans:latest

process the data inside the image - all data will be lost at the end

docker run -it -w /root/src/redundans cgenomics/redundans:latest ./redundans.py -v -i test/{600,5000}_{1,2}.fq.gz -f test/contigs.fa -o test/run1

if you wish to process local files, you need to mount the volume with -v

make sure you are in redundans repo directory (containing test/ directory)

docker run -v pwd/test:/test:rw -it cgenomics/redundans:latest /root/src/redundans/redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1 ```

Singularity image

Redundans is also supported by singularity. First install singularity.

You can either use our singularity repository to build the image or to build the image out of the docker image. Then run the first example: ```

Pull from the singularity repo

singularity pull --arch amd64 library://cgenomics/redundans/redundans:2.0

Build the image based on the docker repo

singularity build redundans.sif docker://cgenomics/redundans

Use exec instead of run to account for shell-based wildcarsds * and ?

singularity exec redundans.sif bash -c "/root/src/redundans/redundans.py -v -i /root/src/redundans/test/*_?.fq.gz -f /root/src/redundans/test/contigs.fa -o /tmp/run1" ```

Running the pipeline

Redundans input consists of any combination of: - assembled contigs (FastA) - paired-end and/or mate pairs reads (FastQ) - *long reads** (FastQ/FastA) - both PacBio and Nanopore are supported for the scaffolding - and/or *reference chromosomes/contigs** (FastA). * gzipped files are also accepted.

Redundans will return homozygous genome assembly in scaffolds.filled.fa (FastA). It will also report the heterozygous contigs that were not discarded during the reduction step. In addition, the program reports statistics for every pipeline step, including number of contigs that were removed, GC content, N50, N90 and size of gap regions.

Parameters

For the user convenience, Redundans is equipped with a wrapper that automatically estimates run parameters and executes all steps/modules. You should specify some sequencing libraries (FastA/FastQ) or reference sequence (FastA) in order to perform scaffolding. If you don't specify -f contigs (FastA), Redundans will assemble contigs de novo, but you'll have to provide paired-end and/or mate pairs reads (FastQ). Most of the pipeline parameters can be adjusted manually (default values are given in square brackets []):
HINT: If you run fails, you may try to resume it, by adding --resume parameter. - General options: -h, --help show this help message and exit -v, --verbose verbose --version show program's version number and exit -i FASTQ, --fastq FASTQ FASTQ PE / MP files -f FASTA, --fasta FASTA FASTA file with contigs / scaffolds -o OUTDIR, --outdir OUTDIR output directory [redundans] -t THREADS, --threads THREADS no. of threads to run [4] --resume resume previous run --log LOG output log to [stderr] --nocleaning De novo assembly options: -m MEM, --mem MEM max memory to allocate (in GB) for the Platanus assembler [2] --tmp TMP tmp directory [/tmp] - Reduction options: --identity IDENTITY min. identity [0.51] --overlap OVERLAP min. overlap [0.80] --minLength MINLENGTH min. contig length [200] --minimap2reduce Use minimap2 for the initial and final Reduction step. Recommended for input assembled contigs from long reads or larger contigs using --preset[asm5] by default. By default LASTal is used for Reduction. -x INDEX, --index INDEX Minimap2 parameter -i used to load at most INDEX target bases into RAM for indexing [4G]. It has to be provided as a string INDEX ending with k/K/m/M/g/G. --noreduction Skip reduction - Short-read scaffolding options: -j JOINS, --joins JOINS min pairs to join contigs [5] -a LINKRATIO, --linkratio LINKRATIO max link ratio between two best contig pairs [0.7] --limit LIMIT align subset of reads [0.2] -q MAPQ, --mapq MAPQ min mapping quality [10] --iters ITERS iterations per library [2] --noscaffolding Skip short-read scaffolding -b, --usebwa use bwa mem for alignment [use snap-aligner] - Long-read scaffolding options: -l LONGREADS, --longreads LONGREADS FastQ/FastA files with long reads -s, --populateScaffolds Run populateScaffolds mode for long read scaffolding, else generate a dirty assembly for reference-based scaffolding. Not recommended for highly repetitive genomes. Default False. --minimap2scaffold Use Minimap2 for aligning long reads. Preset usage dependant on file name convention (case insensitive): ont, nanopore, pb, pacbio, hifi, hi_fi, hi-fi. ie: s324_nanopore.fq.gz. Else it uses LASTal. - Reference-based scaffolding options: -r REFERENCE, --reference REFERENCE reference FastA file --norearrangements high identity mode (rearrangements not allowed) -p PRESET, --preset PRESET Preset option for Minimap2-based Reduction and/or Reference-based scaffolding. Possible options: asm5 (5 percent sequence divergence), asm10 (10 percent sequence divergence) and asm20(20 percent sequence divergence). Default [asm5] - Gap closing options: --nogapclosing - Meryl and Merqury options: --runmerqury Run meryldb and merqury for assembly kmer multiplicity stats. [False] by default. -k KMER, --kmer KMER K-mer size for meryl [21]

Redundans is extremely flexible. All steps of the pipeline can be ommited using: --noreduction, --noscaffolding, --nogapclosing and/or --runmerqury parameters.

Test run

To run the test example, execute: ```bash ./redundans.py -v -i test/*_?.fq.gz -f test/contigs.fa -o test/run1

Test it using minimap2 for the reduction step, increasing performance for large genomes

./redundans.py -v -i test/*_?.fq.gz -f test/contigs.fa --minimap2reduce -o test/run2

if your run failed for any reason, you can try to resume it

rm test/run1/sspace.2.1.filled.fa ./redundans.py -v -i test/*?.fq.gz -f test/contigs.fa -o test/run1 --resume

if you have no contigs assembled, just run without `-f`

./redundans.py -v -i test/*_?.fq.gz -o test/run.denovo ```

Note, the order of libraries (-i/--input) is not important, as long as read1 and read2 from each library are given one after another i.e. -i 600_1.fq.gz 600_2.fq.gz 5000_1.fq.gz 5000_2.fq.gz would be interpreted the same as -i 5000_1.fq.gz 5000_2.fq.gz 600_1.fq.gz 600_2.fq.gz.

You can play with any combination of inputs ie. paired-end, mate pairs, long reads and / or reference-based scaffolding as well as selecting minimap2 for each step or default to LASTal, for example: ```bash

reduction, scaffolding with paired-end, mate pairs and long reads used to generate a miniasm assembly to do reference-based scaffolding, and gap closing with paired-end and mate pairs using as an aligner minimap2

./redundans.py -v -i test/*?.fq.gz -l test/nanopore.fa.gz -f test/contigs.fa -o test/runshortlongref --minimap2scaffold

reduction, scaffolding with paired-end, mate pairs and long reads, and gap closing with paired-end and mate pairs using populateScaffolds method using as aligner minimap2

./redundans.py -v -i test/*?.fq.gz -l test/pacbio.fq.gz test/nanopore.fa.gz -f test/contigs.fa -o test/runshortlongpopulatescaffold --minimap2scaffold --populateScaffolds

scaffolding and gap closing with paired-end and mate pairs (no reduction)

./redundans.py -v -i test/*?.fq.gz -f test/contigs.fa -o test/runshort-scaffolding-closing --noreduction

reduction, reference-based scaffolding and gap closing with paired-end reads (--noscaffolding disables only short-read scaffolding)

./redundans.py -v -i test/600?.fq.gz -r test/ref.fa -f test/contigs.fa -o test/runref_pe-closing --noscaffolding ```

For more details have a look in test directory.

Support

If you have any issues or doubts check documentation and FAQ (Frequently Asked Questions). You may want also to sign to our forum.

Citation

Leszek P. Pryszcz and Toni Gabaldón (2016) Redundans: an assembly pipeline for highly heterozygous genomes. NAR. doi: 10.1093/nar/gkw294

Owner

Name: Gabaldon lab
Login: Gabaldonlab
Kind: organization
Email: gabaldonlab@gmail.com
Location: Barcelona

Website: http://cgenomics.org/
Twitter: gabaldonlab
Repositories: 23
Profile: https://github.com/Gabaldonlab

Github repository of Gabaldon lab

GitHub Events

Total

Issues event: 7
Watch event: 22
Delete event: 1
Issue comment event: 13
Push event: 3
Pull request event: 2
Create event: 1

Last Year

Issues event: 7
Watch event: 22
Delete event: 1
Issue comment event: 13
Push event: 3
Pull request event: 2
Create event: 1

Committers

Last synced: 9 months ago

All Time

Total Commits: 565
Total Committers: 9
Avg Commits per committer: 62.778
Development Distribution Score (DDS): 0.34

Past Year

Commits: 5
Committers: 1
Avg Commits per committer: 5.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
lpryszcz	l**z@g**m	373
Leszek Pryszcz	l**z@i**0	131
Diego Fuentes Palacios	4****a	44
dfuentes	d**s@g**m	12
Will Aoki	w**i@u**u	1
Jason Stajich	j**h@u**u	1
GitHub Action	a**n@g**m	1
lpryszcz	l**z@s**l	1
Leszek	l**z@m**s	1

Committer Domains (Top 20 + Academic)

solaris.iimcb.gov.pl: 1 github.com: 1 ucr.edu: 1 umnh.utah.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 107
Total pull requests: 7
Average time to close issues: 8 months
Average time to close pull requests: about 1 year
Total issue authors: 78
Total pull request authors: 3
Average comments per issue: 2.17
Average comments per pull request: 0.14
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 1
Average time to close issues: 5 months
Average time to close pull requests: 20 minutes
Issue authors: 4
Pull request authors: 1
Average comments per issue: 2.75
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mictadlo (9)
lpryszcz (8)
ptranvan (4)
aabaricalla (3)
thackl (2)
pwkooij (2)
AMMMachado (2)
eburgueno (2)
DaniPaulo (2)
Kata-Pa (2)
waoki (2)
apelin20 (2)
maxmaronna (2)
jwasmuth (1)
martin-g (1)

Pull Request Authors

Dfupa (5)
waoki (2)
mictadlo (1)

Top Labels

Issue Labels

enhancement (14) bug (5) stale (4) question (3) close (2) Thanks (2) closed (2) wontfix (1) noreply (1) invalid (1) duplicate (1) help wanted (1)

Pull Request Labels

enhancement (1)

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2

spack.io: redundans

Redundans pipeline assists an assembly of heterozygous genomes.

Homepage: https://github.com/Gabaldonlab/redundans
License: []
Latest release: 0.13c
published almost 4 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 0.0%

Stargazers count: 18.0%

Forks count: 20.5%

Average: 23.9%

Dependent packages count: 57.3%

Last synced: 6 months ago

Dependencies

.github/workflows/gitclone.yml actions

actions/checkout v2 composite
ad-m/github-push-action master composite

.github/workflows/stale.yml actions

actions/stale v3 composite

Dockerfile docker

python 3.9.1 build

.github/workflows/docker-image.yml actions

actions/checkout v4 composite

.github/workflows/pylint2black.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite

.github/workflows/unittesting.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite

redundans

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Table of Contents

Redundans

Prerequisites

WARNING: Some of the third-party requirements are provided precompiled in x86_64 and not readily available for other architectures.

Official conda package

create new Python3 >=3.8,<3.11 environment

activate it

and install redundans

UNIX installer

Docker image

Pull the image directly from dockerhub

process the data inside the image - all data will be lost at the end

if you wish to process local files, you need to mount the volume with -v

make sure you are in redundans repo directory (containing test/ directory)

Singularity image

Pull from the singularity repo

Build the image based on the docker repo

Use exec instead of run to account for shell-based wildcarsds * and ?

Running the pipeline

Parameters

Test run

Test it using minimap2 for the reduction step, increasing performance for large genomes

if your run failed for any reason, you can try to resume it

if you have no contigs assembled, just run without -f

reduction, scaffolding with paired-end, mate pairs and long reads used to generate a miniasm assembly to do reference-based scaffolding, and gap closing with paired-end and mate pairs using as an aligner minimap2

reduction, scaffolding with paired-end, mate pairs and long reads, and gap closing with paired-end and mate pairs using populateScaffolds method using as aligner minimap2

scaffolding and gap closing with paired-end and mate pairs (no reduction)

reduction, reference-based scaffolding and gap closing with paired-end reads (--noscaffolding disables only short-read scaffolding)

Support

Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

spack.io: redundans

Rankings

Dependencies

if you have no contigs assembled, just run without `-f`