https://github.com/broadinstitute/plasmid-detection-benchmark
Supporting analysis code for "Circling in on plasmids: benchmarking plasmid detection and reconstruction tools for short-read data from diverse species"
https://github.com/broadinstitute/plasmid-detection-benchmark
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary
Repository
Supporting analysis code for "Circling in on plasmids: benchmarking plasmid detection and reconstruction tools for short-read data from diverse species"
Basic Info
- Host: GitHub
- Owner: broadinstitute
- License: mit
- Language: Python
- Default Branch: main
- Size: 2.4 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Circling in on plasmids: benchmarking plasmid detection and reconstruction tools for short-read data from diverse species
Supporting analysis code for Circling in on plasmids: benchmarking plasmid detection and reconstruction tools for short-read data from diverse species.
The predictions of plasmid detection and reconstruction tools, assembly statistics, other plasmid information (such as plasmid size, ARG presence/absence and the number of transposases), and ANI between chromosomes and plasmids are in ./data/.
Python scripts used for analysis are in ./scripts/ and include:
| Category | Script | Description | | -------- | ------- | ----------- | | Dataset | dataset.args.fragmentation.py | Compares the number of IS, SR contigs, and plasmid length between plasmids with ARGs and plasmids without ARGs | | Dataset | dataset.plasmidlength.py | Plots the distribution of plasmid lengths in samples with complete hybrid assemblies | | Dataset | dataset.ani.py | Plots heatmaps of ANI between chromosomes and plasmids. Also clusters plasmids based on their alignment fraction | | Dataset | dataset.plsdb.py | Compares the Mash identity to the closest PLSDB plasmids for different taxa | | Detection | detection.metrics.py | Calculates plasmid detection metrics from a set of predictions and plots the results | | Detection | detection.glm.py | Fits a Logistic Regression model to estimate the contribution of certain plasmid and assembly features to plasmid detection | | Detection | detection.srcontiglength.py | Compares the plasmid detection metrics for SR contigs of different lenghts | | Detection | detection.repcluster.py | Compares the plasmid detection metrics for plasmids of different Inc types/rep clusters | | Detection | detection.args.py | Compares the plasmid detection metrics for SR contigs with and without ARGs | | Detection | detection.plasmidsize.py | Compares the plasmid detection metrics for plasmids of different sizes (large vs. small) | | Reconstruction | reconstruction.metrics.py | Calculates plasmid reconstruction metrics from a set of predictions and plots the results | | Reconstruction | reconstruction.glm.py | Fits a Linear Regression model to estimate the contribution of certain assembly features to plasmid reconstruction | | Reconstruction | reconstruction.metricsbest_detector.py | Calculates plasmid reconstruction metrics from a set of predictions, using initial contig classifications from the best plasmid detection tools, and plots the results | | Reconstruction | reconstruction.args.py | Compares plasmid reconstruction metrics for plasmids with and without ARGs |
You can run each script individually or all at once with the script in ./data/run-everything.
The exact package versions in machine-readable format used to generate the results presented in the manuscript are in pkgs.versions.txt.
Usage
Clone this repository and navigate into it:
cd plasmid-detection-benchmarkCreate a new virtual environment and install the required packages. If you are using conda:
conda create --name plasmid-env --file requirements.yaml conda activate plasmid-envRun all scripts with:
sh scripts/run-everythingOr run individual scripts with:python scripts dataset.ani.pyYou can check the command line arguments for each script withpython script.name.py --help`.
Outputs
By default, the output files are written to ./outputs/. If running the run-everything script, the result of each individual script will be written to a separate directory within ./outputs/.
Authors
Marco Teixeira (mcarvalh@broadinstitute.org), Celia Souque, Colin J. Worby, Terrance Shea, Nicoletta Commins, Joshua T. Smith, Arjun M. Miklos, Thomas Abeel, Ashlee M. Earl, and Abigail L. Manson.
Owner
- Name: Broad Institute
- Login: broadinstitute
- Kind: organization
- Location: Cambridge, MA
- Website: http://www.broadinstitute.org/
- Twitter: broadinstitute
- Repositories: 1,083
- Profile: https://github.com/broadinstitute
Broad Institute of MIT and Harvard
GitHub Events
Total
- Release event: 1
- Push event: 1
- Create event: 1
Last Year
- Release event: 1
- Push event: 1
- Create event: 1