jcvi

Python library to facilitate genome assembly, annotation, and comparative genomics

https://github.com/tanghaibao/jcvi

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    7 of 25 committers (28.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary

Keywords

allmaps assembly bioinformatics blast comparative-genomics genetic-maps genome-sequencing genomics sequence-alignments synteny variant-calling

Keywords from Contributors

telomere
Last synced: 4 months ago · JSON representation ·

Repository

Python library to facilitate genome assembly, annotation, and comparative genomics

Basic Info
  • Host: GitHub
  • Owner: tanghaibao
  • License: bsd-2-clause
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 18.2 MB
Statistics
  • Stars: 844
  • Watchers: 35
  • Forks: 195
  • Open Issues: 56
  • Releases: 0
Topics
allmaps assembly bioinformatics blast comparative-genomics genetic-maps genome-sequencing genomics sequence-alignments synteny variant-calling
Created about 15 years ago · Last pushed 5 months ago
Metadata Files
Readme License Citation

README.md

JCVI: A Versatile Toolkit for Comparative Genomics Analysis

Latest PyPI version bioconda Github Actions Downloads

Collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

| | | | ------- | ---------------------------------------------------------------- | | Authors | Haibao Tang (tanghaibao) | | | Vivek Krishnakumar (vivekkrish) | | | Adam Taranto (Adamtaranto) | | | Xingtan Zhang (tangerzhang) | | | Won Cheol Yim (wyim-pgl) | | Email | tanghaibao@gmail.com | | License | BSD |

How to cite

[!TIP] JCVI is now published in iMeta!

Tang et al. (2024) JCVI: A Versatile Toolkit for Comparative Genomics Analysis. iMeta

MCSCAN example

ALLMAPS animation

GRABSEEDS example

Contents

Following modules are available as generic Bioinformatics handling methods.

  • algorithms

    • Linear programming solver with SCIP and GLPK.
    • Supermap: find set of non-overlapping anchors in BLAST or NUCMER output.
    • Longest or heaviest increasing subsequence.
    • Matrix operations.
  • apps

    • GenBank entrez accession, Phytozome, Ensembl and SRA downloader.
    • Calculate (non)synonymous substitution rate between gene pairs.
    • Basic phylogenetic tree construction using PHYLIP, PhyML, or RAxML, and viualization.
    • Wrapper for BLAST+, LASTZ, LAST, BWA, BOWTIE2, CLC, CDHIT, CAP3, etc.
  • formats

Currently supports .ace format (phrap, cap3, etc.), .agp (goldenpath), .bed format, .blast output, .btab format, .coords format (nucmer output), .fasta format, .fastq format, .fpc format, .gff format, obo format (ontology), .psl format (UCSC blat, GMAP, etc.), .posmap format (Celera assembler output), .sam format (read mapping), .contig format (TIGR assembly format), etc.

  • graphics

    • BLAST or synteny dot plot.
    • Histogram using R and ASCII art.
    • Paint regions on set of chromosomes.
    • Macro-synteny and micro-synteny plots.
    • Ribbon plots from whole genome alignments.
  • utils

    • Grouper can be used as disjoint set data structure.
    • range contains common range operations, like overlap and chaining.
    • Miscellaneous cookbook recipes, iterators decorators, table utilities.

Then there are modules that contain domain-specific methods.

  • assembly

    • K-mer histogram analysis.
    • Preparation and validation of tiling path for clone-based assemblies.
    • Scaffolding through ALLMAPS, optical map and genetic map.
    • Pre-assembly and post-assembly QC procedures.
  • annotation

    • Training of ab initio gene predictors.
    • Calculate gene, exon and intron statistics.
    • Wrapper for PASA and EVM.
    • Launch multiple MAKER processes.
  • compara

    • C-score based BLAST filter.
    • Synteny scan (de-novo) and lift over (find nearby anchors).
    • Ancestral genome reconstruction using Sankoff's and PAR method.
    • Ortholog and tandem gene duplicates finder.

Applications

Please visit wiki for full-fledged applications.

Dependencies

JCVI requires Python3 between v3.9 and v3.12.

Some graphics modules require the ImageMagick library.

On MacOS this can be installed using Conda (see next section). If you are using a linux system (i.e. Ubuntu) you can install ImageMagick using apt-get:

bash sudo apt-get update sudo apt-get install libmagickwand-dev

See the Wand docs for instructions on installing ImageMagick on other systems.

A few modules may ask for locations of external programs, if the executable cannot be found in your PATH.

The external programs that are often used are:

Managing dependencies with Conda

You can use the the YAML files in this repo to create an environment with basic JCVI dependencies.

If you are new to Conda, we recommend the Miniforge distribution.

```bash conda env create -f environment.yml

conda activate jcvi ```

Note: If you are using a Mac with an ARM64 (Apple Silicon) processor, some dependencies are not currently available from Bioconda for this architecture.

You can instead create a virtual OSX64 (intel) env like this:

```bash conda env create -f env_osx64.yml

conda activate jcvi-osx64 ```

After activating the Conda environment install JCVI using one of the following options.

Installation

Installation options

1) Use pip to install the latest development version directly from this repo.

bash pip install git+https://github.com/tanghaibao/jcvi.git

2) Install latest release from PyPi.

bash pip install jcvi

3) Alternatively, if you want to install in development mode.

bash git clone git://github.com/tanghaibao/jcvi.git && cd jcvi pip install -e '.[tests]'

Test Installation

If installed successfully, you can check the version with:

bash jcvi --version

Usage

Use python -m to call any of the modules installed with JCVI.

Most of the modules in this package contains multiple actions. To use the fasta example:

```console Usage: python -m jcvi.formats.fasta ACTION

Available ACTIONs: clean | Remove irregular chars in FASTA seqs diff | Check if two fasta records contain same information extract | Given fasta file and seq id, retrieve the sequence in fasta format fastq | Combine fasta and qual to create fastq file filter | Filter the records by size format | Trim accession id to the first space or switch id based on 2-column mapping file fromtab | Convert 2-column sequence file to FASTA format gaps | Print out a list of gap sizes within sequences gc | Plot G+C content distribution identical | Given 2 fasta files, find all exactly identical records ids | Generate a list of headers info | Run sequence_info on fasta files ispcr | Reformat paired primers into isPcr query format join | Concatenate a list of seqs and add gaps in between longestorf | Find longest orf for CDS fasta pair | Sort paired reads to .pairs, rest to .fragments pairinplace | Starting from fragment.fasta, find if adjacent records can form pairs pool | Pool a bunch of fastafiles together and add prefix qual | Generate dummy .qual file based on FASTA file random | Randomly take some records sequin | Generate a gapped fasta file for sequin submission simulate | Simulate random fasta file for testing some | Include or exclude a list of records (also performs on .qual file if available) sort | Sort the records by IDs, sizes, etc. summary | Report the real no of bases and N's in fasta files tidy | Normalize gap sizes and remove small components in fasta translate | Translate CDS to proteins trim | Given a cross_match screened fasta, trim the sequence trimsplit | Split sequences at lower-cased letters uniq | Remove records that are the same ```

Then you need to use one action, you can just do:

console python -m jcvi.formats.fasta extract

This will tell you the options and arguments it expects.

Feel free to check out other scripts in the package, it is not just for FASTA.

Star History

Star History
Chart

Owner

  • Name: Haibao Tang
  • Login: tanghaibao
  • Kind: user
  • Location: San Francisco Bay Area

Genomics data monkey, hacking on human genetics and diverse agricultural crops

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: JCVI
message: >-
  JCVI: A versatile toolkit for comparative genomics
  analysis
type: software
authors:
  - given-names: Haibao
    family-names: Tang
    email: tanghaibao@gmail.com
    orcid: 'https://orcid.org/0000-0002-3460-8570'
    affiliation: >-
      Fujian Provincial Key Laboratory of Haixia Applied
      Plant Systems Biology
  - given-names: Vivek
    family-names: Krishnakumar
    affiliation: 'J. Craig Venter Institute, Rockville, Maryland, USA'
  - given-names: Xiaofei
    family-names: Zeng
    affiliation: 'National Key Laboratory for Tropical Crop Breeding,'
  - given-names: Zhougeng
    family-names: Xu
    affiliation: >-
      National Key Laboratory of Plant Molecular Genetics
      (NKLPMG)
  - given-names: Adam
    family-names: Taranto
    affiliation: The University of Melbourne
  - given-names: Johnathan S.
    family-names: Lomas
    affiliation: University of Nevada
  - given-names: Yixing
    family-names: Zhang
    affiliation: >-
      Fujian Provincial Key Laboratory of Haixia Applied
      Plant Systems Biology
  - given-names: Yumin
    family-names: Huang
    affiliation: >-
      Fujian Provincial Key Laboratory of Haixia Applied
      Plant Systems Biology
  - given-names: Yibin
    family-names: Wang
    affiliation: National Key Laboratory for Tropical Crop Breeding
  - given-names: 'Won Cheol '
    family-names: Yim
    affiliation: University of Nevada
  - given-names: Jisen
    family-names: Zhang
    affiliation: >-
      State Key Lab for Conservation and Utilization of
      Subtropical Agro-Biological Resources
  - given-names: Xingtan
    family-names: Zhang
    email: tanger_009@163.com
    affiliation: National Key Laboratory for Tropical Crop Breeding
identifiers:
  - type: doi
    value: 10.1002/imt2.211
    description: iMeta paper
repository-code: 'https://github.com/tanghaibao/jcvi'
url: 'https://github.com/tanghaibao/jcvi'
abstract: >-
  The life cycle of genome builds spans interlocking pillars
  of assembly, annotation, and comparative genomics to drive
  biological insights. The JCVI library is a versatile
  Python-based library that offers a suite of tools that
  excel across these pillars. Featuring a modular design,
  the JCVI library provides high-level utilities for tasks
  such as format parsing, graphics generation, and
  manipulation of genome assemblies and annotations.
  Supporting genomics algorithms like MCscan and ALLMAPS are
  widely employed in building genome releases, producing
  publication-ready figures for quality assessment and
  evolutionary inference.
license: BSD-2-Clause
commit: 0f9292d
version: v1.5.3
date-released: '2025-03-25'
preferred-citation:
  type: article
  authors:
    - given-names: Haibao
      family-names: Tang
      email: tanghaibao@gmail.com
      orcid: 'https://orcid.org/0000-0002-3460-8570'
      affiliation: >-
        Fujian Provincial Key Laboratory of Haixia Applied
        Plant Systems Biology
    - given-names: Vivek
      family-names: Krishnakumar
      affiliation: 'J. Craig Venter Institute, Rockville, Maryland, USA'
    - given-names: Xiaofei
      family-names: Zeng
      affiliation: 'National Key Laboratory for Tropical Crop Breeding,'
    - given-names: Zhougeng
      family-names: Xu
      affiliation: >-
        National Key Laboratory of Plant Molecular Genetics
        (NKLPMG)
    - given-names: Adam
      family-names: Taranto
      affiliation: The University of Melbourne
    - given-names: Johnathan S.
      family-names: Lomas
      affiliation: University of Nevada
    - given-names: Yixing
      family-names: Zhang
      affiliation: >-
        Fujian Provincial Key Laboratory of Haixia Applied
        Plant Systems Biology
    - given-names: Yumin
      family-names: Huang
      affiliation: >-
        Fujian Provincial Key Laboratory of Haixia Applied
        Plant Systems Biology
    - given-names: Yibin
      family-names: Wang
      affiliation: National Key Laboratory for Tropical Crop Breeding
    - given-names: 'Won Cheol '
      family-names: Yim
      affiliation: University of Nevada
    - given-names: Jisen
      family-names: Zhang
      affiliation: >-
        State Key Lab for Conservation and Utilization of
        Subtropical Agro-Biological Resources
    - given-names: Xingtan
      family-names: Zhang
      email: tanger_009@163.com
      affiliation: National Key Laboratory for Tropical Crop Breeding
  doi: "10.1002/imt2.211"
  journal: "iMeta"
  month: 6
  start: e211 # First page number
  title: "JCVI: A versatile toolkit for comparative genomics analysis"
  issue: 4
  volume: 3
  year: 2024

GitHub Events

Total
  • Issues event: 77
  • Watch event: 84
  • Delete event: 28
  • Issue comment event: 151
  • Push event: 128
  • Pull request review event: 19
  • Pull request review comment event: 10
  • Pull request event: 54
  • Fork event: 7
  • Create event: 34
Last Year
  • Issues event: 77
  • Watch event: 84
  • Delete event: 28
  • Issue comment event: 151
  • Push event: 128
  • Pull request review event: 19
  • Pull request review comment event: 10
  • Pull request event: 54
  • Fork event: 7
  • Create event: 34

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 2,967
  • Total Committers: 25
  • Avg Commits per committer: 118.68
  • Development Distribution Score (DDS): 0.109
Past Year
  • Commits: 61
  • Committers: 3
  • Avg Commits per committer: 20.333
  • Development Distribution Score (DDS): 0.098
Top Committers
Name Email Commits
Haibao Tang t****o@g****m 2,644
Vivek Krishnakumar v****r@j****g 209
Jingping Li j****i@g****m 54
Adam Taranto a****o@g****m 8
Haibao Tang h****o@a****m 7
Jingping Li j****g@m****l 6
Chen Tong c****y@1****m 5
zengxiaofei x****g@w****n 5
xuzhougeng x****g@y****t 4
goertzenlr g****n@a****u 3
Jingping Li j****g@l****u 3
l-Imoon 5****n 2
Tiany s****4 2
Won Cheol Yim a****o@d****u 2
Haibao Tang h****g@l****u 2
MichelMoser m****r@i****h 2
peng xu 6****k 1
msarmien m****n@j****g 1
Haibao Tang h****g@l****u 1
Jingping Li j****g@h****u 1
jlomasunr 8****r 1
dwpeng 1****4@q****m 1
anonymousdouble 1****e 1
Peng Zhou z****i@g****m 1
Song Jun Tae 5****l 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 211
  • Total pull requests: 133
  • Average time to close issues: 7 months
  • Average time to close pull requests: about 12 hours
  • Total issue authors: 163
  • Total pull request authors: 7
  • Average comments per issue: 2.68
  • Average comments per pull request: 0.29
  • Merged pull requests: 124
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 53
  • Pull requests: 51
  • Average time to close issues: 24 days
  • Average time to close pull requests: about 24 hours
  • Issue authors: 39
  • Pull request authors: 3
  • Average comments per issue: 1.47
  • Average comments per pull request: 0.57
  • Merged pull requests: 44
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Adamtaranto (11)
  • alpapan (3)
  • SoledadPianist (3)
  • cutykaka (3)
  • MR-D-CJ (3)
  • tanghaibao (3)
  • RNieuwenhuis (3)
  • Wongolini (3)
  • francicco (3)
  • jd3234 (2)
  • xiekunwhy (2)
  • guo-cheng (2)
  • UpalabdhaD (2)
  • xiaoguizz (2)
  • haihao999 (2)
Pull Request Authors
  • tanghaibao (136)
  • Adamtaranto (27)
  • Tong-Chen (6)
  • zhangyixing3 (2)
  • jguppy (1)
  • anonymousdouble (1)
  • dwpeng (1)
Top Labels
Issue Labels
new feature (6) help needed (5) TODO (4) backlog (2) github action (2) infrastructure (2) algorithm (1) graphics (1) bug (1)
Pull Request Labels
bug (4) new feature (2) github action (2) infrastructure (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 670 last-month
  • Total docker downloads: 121
  • Total dependent packages: 1
  • Total dependent repositories: 3
  • Total versions: 126
  • Total maintainers: 1
  • Total advisories: 1
pypi.org: jcvi

Python utility libraries on genome assembly, annotation and comparative genomics

  • Versions: 126
  • Dependent Packages: 1
  • Dependent Repositories: 3
  • Downloads: 670 Last month
  • Docker Downloads: 121
Rankings
Stargazers count: 2.5%
Docker downloads count: 3.2%
Forks count: 3.8%
Dependent packages count: 4.8%
Average: 5.1%
Downloads: 7.6%
Dependent repos count: 9.0%
Maintainers (1)
Last synced: 5 months ago

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite