cb-platon
Identification & characterization of bacterial plasmid-borne contigs from short-read draft assemblies.
Science Score: 59.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 18 DOI reference(s) in README -
✓Academic publication links
Links to: pubmed.ncbi, ncbi.nlm.nih.gov, zenodo.org -
✓Committers with academic emails
3 of 4 committers (75.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Identification & characterization of bacterial plasmid-borne contigs from short-read draft assemblies.
Basic Info
- Host: GitHub
- Owner: oschwengers
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://doi.org/10.1099/mgen.0.000398
- Size: 37.4 MB
Statistics
- Stars: 118
- Watchers: 5
- Forks: 15
- Open Issues: 4
- Releases: 10
Topics
Metadata Files
README.md
Platon: identification and characterization of bacterial plasmid contigs from short-read draft assemblies
Contents
Description
TL;DR Platon detects plasmid-borne contigs within bacterial draft (meta) genomes assemblies. Therefore, Platon analyzes the distribution bias of protein-coding gene families among chromosomes and plasmids. This analysis is complemented by comprehensive contig characterizations followed by heuristic filters.
Platon conducts three analysis steps:
- It predicts and searches protein sequences against a custom and pre-computed database comprising marker protein sequences (MPS) and related replicon distribution scores (RDS). These scores express the empirically measured bias of protein sequence family distributions among plasmids and chromosomes pre-computed on complete NCBI RefSeq replicons. Platon calculates the mean RDS for each contig and either classifies them as chromosome if the RDS is below a sensitivity cutoff determined to 95% sensitivity or as plasmid if the RDS is above a specificity cutoff determined to 99.9% specificity. Exact values for these thresholds have been computed based on Monte Carlo simulations of artifical replicon fragments created from complete RefSeq chromosome and plasmid sequences.
- Contigs passing the sensitivity filter get comprehensivley characterized. Hereby, Platon tries to circularize the contig sequences, searches for rRNA, replication, mobilization and conjugation genes, oriT sequences, incompatibility group DNA probes and finally performs a BLAST+ search against the NCBI plasmid database.
- Finally, to increase the overall sensitivity, Platon classifies all remaining contigs based on the gathered information by several heuristics.
|
|
| -- |
| Fig: Replicon distribution and alignment hit frequencies of MPS. Shown are summed plasmid and chromosome alignment hit frequencies per MPS plotted against plasmid/chromosome hit count ratios scaled to [-1 (chromosome), 1 (plasmid)]; Hue: normalized RDS values (min=-100, max=100), hit count outliers below 10-4 and above 1 are discarded for the sake of readability. |
Input/Output
Input
Platon accepts draft (meta) genome assemblies in fasta format. If contigs have been assembled with SPAdes, Platon is able to extract the coverage information from the contig names.
Output
For each contig classified as plasmid sequence the following columns are printed to STDOUT as tab separated values:
- Contig ID
- Length
- Coverage
- # ORFs
- RDS
- Circularity
- Incompatibility Type(s)
- # Replication Genes
- # Mobilization Genes
- # OriT Sequences
- # Conjugation Genes
- # rRNA Genes
- # Plasmid Database Hits
In addition, Platon writes the following files into the output directory:
<prefix>.plasmid.fasta: contigs classified as plasmids or plasmodal origin<prefix>.chromosome.fasta: contigs classified as chromosomal origin<prefix>.tsv: dense information as printed to STDOUT (see above)<prefix>.json: comprehensive results and information on each single plasmid contig. All files are prefixed (<prefix>) as the input genome fasta file.
Installation
Platon can be installed via BioConda or Pip. However, we encourage to use Conda to automatically install all required 3rd party dependencies. In all cases a mandatory database must be downloaded.
BioConda
bash
$ conda install -c conda-forge -c bioconda -c defaults platon
Pip
bash
$ python3 -m pip install --user cb-platon
Platon requires the following 3rd party executables which must be installed & executable:
- Prodigal (2.6.3) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2848648 https://github.com/hyattpd/Prodigal
- Diamond (2.0.6) https://pubmed.ncbi.nlm.nih.gov/25402007 http://www.diamondsearch.org
- Blast+ (2.10.1) https://www.ncbi.nlm.nih.gov/pubmed/2231712 https://blast.ncbi.nlm.nih.gov
- MUMmer (4.0.0-beta2) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC395750/ https://github.com/gmarcais/mummer
- HMMER (3.3.1) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3695513/ http://hmmer.org/
- INFERNAL (1.1.4) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810854 http://eddylab.org/infernal
Database download
Platon requires a mandatory database which is publicly hosted at Zenodo:
Further information is provided in the database section below.
bash
$ wget https://zenodo.org/record/4066768/files/db.tar.gz
$ tar -xzf db.tar.gz
$ rm db.tar.gz
The db path can either be provided via parameter (--db) or environment variable (PLATON_DB):
```bash
$ platon --db
$ export PLATON_DB=
Additionally, for a system-wide setup, the database can be copied to the Platon base directory:
bash
$ cp -r db/ <platon-installation-dir>
Usage
Usage:
```bash
usage: platon [--db DB] [--prefix PREFIX] [--output OUTPUT] [--mode {sensitivity,accuracy,specificity}] [--characterize] [--meta] [--help] [--verbose] [--threads THREADS] [--version]
Identification and characterization of bacterial plasmid contigs from short-read draft assemblies.
Input / Output:
Workflow: --mode {sensitivity,accuracy,specificity}, -m {sensitivity,accuracy,specificity} applied filter mode: sensitivity: RDS only (>= 95% sensitivity); specificity: RDS only (>=99.9% specificity); accuracy: RDS & characterization heuristics (highest accuracy) (default = accuracy) --characterize, -c deactivate filters; characterize all contigs --meta use metagenome gene prediction mode
General: --help, -h Show this help message and exit --verbose, -v Print verbose information --threads THREADS, -t THREADS Number of threads to use (default = number of available CPUs) --version show program's version number and exit ```
Examples
Simple:
bash
$ platon genome.fasta
Expert: writing results to results directory with verbose output using 8 threads:
bash
$ platon --db ~/db --output results/ --verbose --threads 8 genome.fasta
Mode
Platon provides 3 different modi controlling which filters will be used.
Accuracy mode is the preset default.
Sensitivity
In the sensitivity mode Platon will classifiy all contigs with an RDS value below the sensitivity threshold as chromosomal and all remaining contigs as plasmid. This threshold was defined to account for 95% sensitivity and computed via Monte Carlo simulations of artifical contigs resulting in an RDS=-7.9.
-> use this mode to exclude chromosomal contigs.
Specificity
In the specificity mode Platon will classifiy all contigs with an RDS value above the specificity threshold as plasmid and all remaining contigs as chromosomal. This threshold was defined to account for 99.9% specificity and computed via Monte Carlo simulations of artifical contigs resulting in an RDS=0.7.
Accuracy (default)
In the accuracy mode Platon will classifiy all contigs with:
- an
RDSvalue below the sensitivity threshold as chromosomal - an
RDSvalue above the specificity threshold as plasmid and in addition all contigs as plasmid for which one of the following is true: it - can be circularized
- has an incompatibility group sequence
- has a replication or mobilization HMM hit
- has an oriT hit
- has an RDS above the conservative score (0.1), a RefSeq plasmid hit and no rRNA hit
Database
Platon depends on a custom database based on MPS, RDS, RefSeq Plasmid database, PlasmidFinder db as well as manually curated MOB HMM models from MOBscan, custom conjugation and replication HMM models and oriT sequences from MOB-suite. This database based on UniProt UniRef90 release 202 can be downloaded here: (zipped 1.6 Gb, unzipped 2.8 Gb)
https://zenodo.org/record/4066768/files/db.tar.gz
Please make sure that you use the latest Platon version along with the most recent database version! Older software versions are *not** compatible with the latest database version*
Dependencies
Platon was developed and tested in Python 3.5 and depends on BioPython (>=1.71).
Additionally, it depends on the following 3rd party executables:
- Prodigal (2.6.3) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2848648 https://github.com/hyattpd/Prodigal
- Diamond (2.0.6) https://pubmed.ncbi.nlm.nih.gov/25402007 http://www.diamondsearch.org
- Blast+ (2.10.1) https://www.ncbi.nlm.nih.gov/pubmed/2231712 https://blast.ncbi.nlm.nih.gov
- MUMmer (4.0.0-beta2) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC395750/ https://github.com/gmarcais/mummer
- HMMER (3.3.1) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3695513/ http://hmmer.org/
- INFERNAL (1.1.4) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810854 http://eddylab.org/infernal
Citation
Schwengers O., Barth P., Falgenhauer L., Hain T., Chakraborty T., & Goesmann A. (2020). Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores. Microbial Genomics, 95, 295. https://doi.org/10.1099/mgen.0.000398
As Platon takes advantage of the inc groups, MOB HMMs and oriT sequences of the following databases, please also cite:
Carattoli A., Zankari E., Garcia-Fernandez A., Voldby Larsen M., Lund O., Villa L., Aarestrup F.M., Hasman H. (2014) PlasmidFinder and pMLST: in silico detection and typing of plasmids. Antimicrobial Agents and Chemotherapy, https://doi.org/10.1128/AAC.02412-14
Garcillán-Barcia M. P., Redondo-Salvo S., Vielva L., de la Cruz F. (2020) MOBscan: Automated Annotation of MOB Relaxases. Methods in Molecular Biology, https://doi.org/10.1007/978-1-4939-9877-7_21
Robertson J., Nash J. H. E. (2018) MOB-suite: Software Tools for Clustering, Reconstruction and Typing of Plasmids From Draft Assemblies. Microbial Genomics, https://doi.org/10.1099/mgen.0.000206
Feedback
We highly wellcome and appreciate feedback of all kind!
So, if you run into any issues with Platon, we'd be happy to hear about it! Please, start the pipeline with -v (verbose) and do not hesitate to file an issue here on GitHub including as much of the following as possible:
- a detailed description of the issue
- the platon cmd line output
- the
<prefix>.jsonfile if possible - A reproducible example of the issue with a small dataset that you can share (helps us identify whether the issue is specific to a particular computer, operating system, and/or dataset).
The maintenance of Platon is supported by de.NBI. If you would like to provide (non-technical) feedback, please find a service monitoring survey here.
Owner
- Name: Oliver Schwengers
- Login: oschwengers
- Kind: user
- Location: Giessen, Germany
- Company: @ag-computational-bio - JLU Giessen
- Twitter: oschwengers1
- Repositories: 6
- Profile: https://github.com/oschwengers
Microbial bioinformatics, WGS bacteria, plasmids, PostDoc, father of 2, husband, astrophotographer
GitHub Events
Total
- Issues event: 1
- Watch event: 6
- Push event: 3
Last Year
- Issues event: 1
- Watch event: 6
- Push event: 3
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| oschwengers | o****s@c****e | 191 |
| fLLah | p****h@c****e | 1 |
| Michael R. Crusoe | 1****c | 1 |
| Francisco Zorrilla | f****4@c****k | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 42
- Total pull requests: 5
- Average time to close issues: about 2 months
- Average time to close pull requests: about 8 hours
- Total issue authors: 38
- Total pull request authors: 4
- Average comments per issue: 3.48
- Average comments per pull request: 0.4
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tthye1 (2)
- crarlus (2)
- Dey497 (2)
- Wanli-HE (2)
- lorcan1601 (1)
- Dabiguina94 (1)
- mdiricks (1)
- pavlo888 (1)
- androga2 (1)
- elina2410 (1)
- bayraktar1 (1)
- noeldjitro (1)
- Clabe1986 (1)
- tdcollingsworth (1)
- ZhangDengwei (1)
Pull Request Authors
- oschwengers (2)
- mr-c (1)
- patrick-barth (1)
- franciscozorrilla (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 65 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 1
(may contain duplicates) - Total versions: 18
- Total maintainers: 1
proxy.golang.org: github.com/oschwengers/platon
- Documentation: https://pkg.go.dev/github.com/oschwengers/platon#section-documentation
- License: gpl-3.0
-
Latest release: v1.5.0
published over 5 years ago
Rankings
pypi.org: cb-platon
Platon: identification and characterization of bacterial plasmid contigs from short-read draft assemblies.
- Homepage: https://github.com/oschwengers/platon
- Documentation: https://cb-platon.readthedocs.io/
- License: GPLv3
-
Latest release: 1.4.0
published over 5 years ago
Rankings
Maintainers (1)
Dependencies
- biopython >=1.78
- blast >=2.12.0
- diamond >=2.0.14
- hmmer >=3.3.1
- infernal >=1.1.4
- mummer4 >=4.0.0rc1
- prodigal >=2.6.3
- biopython *
- actions/checkout v2 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
- pypa/gh-action-pypi-publish master composite