drep

Rapid comparison and dereplication of genomes

https://github.com/mrolm/drep

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov
  • Committers with academic emails
    10 of 19 committers (52.6%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.6%) to scientific vocabulary

Keywords

assembly bioinformatics metagenomics microbial-genomes microbiology

Keywords from Contributors

genome bacteria phylogenetics
Last synced: 6 months ago · JSON representation

Repository

Rapid comparison and dereplication of genomes

Basic Info
  • Host: GitHub
  • Owner: MrOlm
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 16.5 MB
Statistics
  • Stars: 300
  • Watchers: 7
  • Forks: 40
  • Open Issues: 18
  • Releases: 6
Topics
assembly bioinformatics metagenomics microbial-genomes microbiology
Created over 9 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog

README.md

dRep

Downloads Downloads

dRep is a python program for rapidly comparing large numbers of genomes. dRep can also "de-replicate" a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set.

Manual, installation instructions, and API are at available at ReadTheDocs

Publication is available at ISMEJ

Open source pre-print publication is available at bioRxiv

Installation with pip

$ pip install drep

Quick start

Genome comparison:

$ dRep compare output_directory -g path/to/genomes/*.fasta

Genome de-replication:

$ dRep dereplicate output_directory -g path/to/genomes/*.fasta

Make sure dependencies are properly installed:

$ dRep check_dependencies

Dependencies

Near Essential

  • Mash - Makes primary clusters (v1.1.1 confirmed works)
  • MUMmer - Performs default ANIm comparison method (v3.23 confirmed works)

Optional

  • fastANI - A fast secondary clustering algorithm
  • CheckM_ - Determines contamination and completeness of genomes (v1.0.7 confirmed works)
  • gANI (aka ANIcalculator) - Performs gANI comparison method (v1.0 confirmed works)
  • Prodigal - Used be both checkM and gANI (v2.6.3 confirmed works)
  • NSimScan - Only needed for goANI algorithm (open source version of gANI)

Owner

  • Name: Matt Olm
  • Login: MrOlm
  • Kind: user
  • Location: San Francisco, CA
  • Company: Stanford University

Postdoc in Justin Sonnenburg's lab at Stanford

GitHub Events

Total
  • Issues event: 61
  • Watch event: 54
  • Issue comment event: 110
  • Push event: 4
  • Pull request event: 3
  • Fork event: 2
Last Year
  • Issues event: 61
  • Watch event: 54
  • Issue comment event: 110
  • Push event: 4
  • Pull request event: 3
  • Fork event: 2

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 261
  • Total Committers: 19
  • Avg Commits per committer: 13.737
  • Development Distribution Score (DDS): 0.632
Past Year
  • Commits: 10
  • Committers: 3
  • Avg Commits per committer: 3.333
  • Development Distribution Score (DDS): 0.2
Top Committers
Name Email Commits
Matt Olm m****m@b****u 96
Matt Olm m****m@M****l 76
Matt Olm m****m@g****m 41
Matt Olm m****m@M****l 23
Tanaes j****n@g****m 3
Matt Olm m****m@a****u 3
Ben J Woodcroft d****n@g****m 2
Valentyn Bezshapkin 6****z 2
Mike Lee l****d@u****u 2
Matt Olm m****m@s****u 2
Matt Olm m****m@a****u 2
Matt Olm m****m@a****u 2
Asaf Peer A****r@j****g 1
Matt Olm m****m@a****u 1
Matt Olm m****m@a****u 1
Matt Olm m****m@a****u 1
Malte Rühlemann m****n@g****m 1
Francisco Zorrilla f****4@c****k 1
Bérénice Batut b****t@g****m 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 173
  • Total pull requests: 13
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 2 days
  • Total issue authors: 142
  • Total pull request authors: 10
  • Average comments per issue: 3.95
  • Average comments per pull request: 0.62
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 39
  • Pull requests: 4
  • Average time to close issues: 14 days
  • Average time to close pull requests: about 8 hours
  • Issue authors: 32
  • Pull request authors: 3
  • Average comments per issue: 1.79
  • Average comments per pull request: 0.25
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • B-1991-ing (5)
  • SilasK (5)
  • jianshu93 (3)
  • Chandrasekaran-J (3)
  • Wanli-HE (2)
  • sleepvet (2)
  • pengfeimm (2)
  • mewu3 (2)
  • aretchless (2)
  • nick-youngblut (2)
  • jdwinkler-lanzatech (2)
  • aistBMRG (2)
  • rrohwer (2)
  • franciscozorrilla (2)
  • luigallucci (2)
Pull Request Authors
  • tanaes (3)
  • wwood (2)
  • ShriramHPatel (2)
  • ParsaGhadermazi (2)
  • rroutsong (2)
  • MrOlm (2)
  • aboffin (1)
  • bebatut (1)
  • franciscozorrilla (1)
  • valentynbez (1)
Top Labels
Issue Labels
enhancement (8) bug (5) documentation (4)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 258 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 67
  • Total maintainers: 2
pypi.org: drep

De-replication of microbial genomes assembled from multiple samples

  • Versions: 65
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 258 Last month
Rankings
Stargazers count: 4.8%
Forks count: 6.8%
Dependent packages count: 10.0%
Average: 11.0%
Downloads: 11.9%
Dependent repos count: 21.7%
Maintainers (1)
Last synced: 7 months ago
spack.io: py-drep

dRep is a python program for rapidly comparing large numbers of genomes. dRep can also "de-replicate" a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set.

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Stargazers count: 16.4%
Forks count: 18.6%
Average: 23.1%
Dependent packages count: 57.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • biopython *
  • matplotlib *
  • numpy *
  • pandas *
  • pytest *
  • scikit-learn *
  • seaborn *
setup.py pypi
  • biopython *
  • matplotlib *
  • numpy *
  • pandas *
  • pytest *
  • scikit-learn *
  • seaborn *
  • tqdm *