carpdm
Comprehensive Antibiotic Resistance Probe Design Machine (CARPDM)
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org, ncbi.nlm.nih.gov -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.7%) to scientific vocabulary
Repository
Comprehensive Antibiotic Resistance Probe Design Machine (CARPDM)
Basic Info
- Host: GitHub
- Owner: arpcard
- License: other
- Language: Python
- Default Branch: main
- Size: 402 KB
Statistics
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
CARPDM
The Comprehensive Antibiotic Resistance Probe Design Machine
This program uses BLASTN to design probesets for targeted enrichment of DNA sequencing libraries. If desired, it can also modify these sequences to be an oligo pool, which is used for the in-house synthesis of the probeset, reducing cost. Finally, it outputs several analysis graphics and summary statistics to guage the effectivenes of the probe design strategy.
Pre-Calculated Probe Sets & Laboratory Protocols
Targeted enrichment probesets designed for each release of the Comprehensive Antibiotic Resistance Database (CARD), plus probe synthesis and library enrichment protocols, are available here.
License
Use or reproduction of these materials, in whole or in part, by any commercial organization whether or not for non-commercial (including research) or commercial purposes is prohibited, except with written permission of McMaster University. For full details see CARD website.
Citation
Hackenberger et al. 2024. CARPDM: cost-effective antibiotic resistome profiling of metagenomic samples using targeted enrichment. bioRxiv 2024.03.27.587061.
Support & Bug Reports
Please create a github issue to report problems or post queries about CARPDM. Alternatively, you can email the CARD curators or developers directly at card@mcmaster.ca.
Installation
Dependencies via Conda
If not already installed, follow the documentation to install Miniconda.
Clone the github repository or download the carpdm_env.yml file:
console
git clone https://github.com/arpcard/CARPDM.git
Create a conda environment from the carpdm_env.yml file:
console
conda env create -f CARPDM/carpdm_env.yml
This will install the following dependencies:
Activate the conda environment before running the script:
console
conda activate carpdm
Running the CARPDM script
If desired, add the directory containing the carpdm.py script to your path by modifying your .bashrc file.
Add execute permissions to the carpdm.py script:
console
chmod a+x carpdm.py
See the full help menu by running:
```console carpdm.py --help
usage: carpdm.py [-h] -i INPUT_FASTA [-p PROBENUMCUTOFF] [-s TILING_STEP] [-l PROBE_LENGTH] [-m MELT_TEMP] [-b BASENAME] [-o OUTPUT_DIR] [-f FILTERFASTA | -d FILTERDB] [-t NUM_THREADS] [-n] [-c] [-v]
Design probes against an input fasta according to design parameters, then pass through several filters to remove off-target enrichment (nt filter) and probe redundancy (self filter)
options:
-h, --help show this help message and exit
-i INPUTFASTA, --inputfasta INPUTFASTA
Path to the fasta that contains sequences against which probes are to be
designed. Required
-p PROBENUMCUTOFF, --probenumcutoff PROBENUMCUTOFF
Maximum number of final probes allowed in the probeset. The redundancy filter
will iterate until it reaches below this cutoff. Default = 42000
-s TILINGSTEP, --tilingstep TILINGSTEP
Number of bases by which to offset initial probe tiling. Higher values make
computation less intense, though may result in sparser coverage. Default = 4
-l PROBELENGTH, --probelength PROBELENGTH
Probe length to create. Note that if >80 (default), probes will be too long
for the base price of a Twist oligo pool after addition of amplification and
transcription primers for in-house synthesis. Default = 80
-m MELTTEMP, --melttemp MELTTEMP
Minimum melting temperature for probes to be considered during basic filter.
Default = 50
-b BASENAME, --basename BASENAME
File basename
-o OUTPUTDIR, --outputdir OUTPUTDIR
Output directory
-f FILTERFASTA, --filterfasta FILTERFASTA
Fasta file to filter against. This creates a blast database from the provided
file to filter probes against after the basic filter. Probes with over
>(probelength * 0.625) identities against anything in this fasta file will
be removed
-d FILTERDB, --filterdb FILTERDB
Premade blast database to filter against. Probes with >(probelength * 0.625)
identities against anything in this fasta file will be removed, unless the
specified database is the nt database (basename == "nt"). In that case it
will only remove probes that align with that condition to non-bacterial
accessions, that also do not have >(probelength * 0.975) identities
-t NUMTHREADS, --numthreads NUMTHREADS
Number of threads to be used during BLAST searching
-n, --noo_pool
If included, omits steps to design template oligo pool. Only include if
ordering RNA probes directly from the manufacturer.
-c, --clean
If included, removes all files except for final baits, oligo pools, and analysis plots
-v, --version
show program's version number and exit
```
We recommend a test run using the clinicallyrelevantamr.fna file. Optionally, you might include a fasta file or BLAST database against which to filter. When filtering against the E. coli K12 reference genome, this command took just over 30 minutes of wall clock time to fully execute.
console
carpdm.py -i clinically_relevant_amr.fna -p 20000 -t 16 -f /path/to/filter/fasta.fna
CARPDM Output
Running the carpdm.py script as above will result in the following output structure, sorted here by creation order. Note that running with the --clean flag will only output the final probeset, oligo pool, amplification primers, and analysis plots.
console
probe_design/
├── probes_input_no_comp.fna
| Input fasta with complementary sequences between targets removed
├── probes_basic_filter.fna
| Naive tiled probes. Only unique probes with no perfect complements that
| satisfy Tm requirements and lack ambiguous bases are included. If an
| o-pool is being constructed, probes with an LguI cut site are also
| removed.
├── probes_id_blast.txt
| BLASTN results for basic filter probes against negative id filter
| fasta/db (if supplied).
├── probes_id_filter.fna
| Basic filter probes that have <(probe_length * 0.625) identities against
| anything in the id filter fasta/db (if supplied).
├── probes_self_blast.txt
| BLASTN results for the basic filter or, if present, id filter probes
| against themselves.
├── probes_self_filter.fna
| Basic filter or, if present, id filter probes with redundant members
| removed. The degree of redundancy maintained in the set is determined by
| the probe number cutoff specified when the program is called. Included
| with --clean.
├── probes_o_pool_amp_primers.fna
| PCR amplification primers for the oligo pool. Included with --clean.
├── probes_o_pool_oligos.fna
| Final oligo pool sequences, to be ordered if synthesizing probes
| in-house. Included with --clean.
├── probes_final_blast.xml
| BLASTN results of the probes against the input sequences, used to
| construct summary data and plots.
├── probes_target_info.csv
| Summary statistics of target and associated coverage by probes in set.
├── probes_target_probe_pairs.csv
| Target:Probe pairs, each line is a probe that aligns to that target.
├── probes_probe_info.csv
| Summary statistics of probes in set.
├── probes_count_info.csv
| Number of probes remaining after each filtering step.
├── probes_max_id.txt
| Maximum number of identites remaining between probes after the
| redundancy filter.
└── probes_plots ( All included with --clean)
├── individual_target_coverages
| Directory containing coverage plots of all sequences used as input.
├── probe_gc.png
├── probe_gc.svg
| Violin plot showing probe GC content distribution.
├── probe_tm.png
├── probe_tm.svg
| Violin plot showing probe melt temp distribution.
├── probe_num_targets.png
├── probe_num_targets.svg
| Violin plot showing the distribution of the number of input targets
| per probe.
├── target_len.png
├── target_len.svg
| Violin plot showing target length distribution.
├── target_gc.png
├── target_gc.svg
| Violin plot showing target GC content distribution.
├── target_coverage_prop.png
├── target_coverage_prop.svg
| Violin plot showing target coverage proportion distribution.
├── target_probe_count.png
├── target_probe_count.svg
| Violin plot showing distribution of the number of probes per target.
├── target_coverage_depth.png
├── target_coverage_depth.svg
| Violin plot showing distribution of the target coverage depth.
├── target_coverage_stdev.png
└── target_coverage_stdev.svg
Violin plot showing distribution of the target coverage standard
deviation.
Owner
- Name: Comprehensive Antibiotic Resistance Database
- Login: arpcard
- Kind: user
- Website: http://card.mcmaster.ca
- Repositories: 13
- Profile: https://github.com/arpcard
A bioinformatic database of resistance genes, their products and associated phenotypes.
Citation (CITATION.cff)
cff-version: 1.1.0
date-released: 2024-09-10
version: 1.0.0
message: "If you use CARPDM in your research please cite:"
repository-code: "https://github.com/arpcard/carpdm"
title: "CARPDM: cost-effective antibiotic resistome profiling of metagenomic samples using targeted enrichment"
journal: "bioRxiv"
doi: "10.1101/2024.03.27.587061"
authors:
-
family-names: "Hackenberger"
given-names: "D"
-
family-names: "Imtiaz"
given-names: "H"
-
family-names: "Raphenya"
given-names: "AR"
-
family-names: "Smith"
given-names: "KW"
-
family-names: "Alcock"
given-names: "BP"
-
family-names: "Poinar"
given-names: "HN"
-
family-names: "Wright"
given-names: "GD"
-
family-names: "McArthur"
given-names: "AG"
GitHub Events
Total
- Watch event: 1
- Push event: 1
Last Year
- Watch event: 1
- Push event: 1