extract_genes_abricate
Small script to use ABRicate output to extract genes from genome assemblies, reverse complement if necessary, and print to a file
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary
Repository
Small script to use ABRicate output to extract genes from genome assemblies, reverse complement if necessary, and print to a file
Basic Info
- Host: GitHub
- Owner: boasvdp
- License: mit
- Language: Python
- Default Branch: master
- Size: 64.5 KB
Statistics
- Stars: 5
- Watchers: 1
- Forks: 2
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
extractgenesABRicate
Small script to use ABRicate output to extract genes from genome assemblies, reverse complement if necessary, and print to a file
Installation
This script needs Python 3 with the Pandas and BioPython libraries, as well as seqtk to run. ABRicate is not necessarily needed, although the ABRicate output should include a STRAND column with relevent information.
If you have Miniconda installed (https://docs.conda.io/en/latest/miniconda.html), these dependencies can be easily installed. First clone the directory to your machine:
```
Clone and enter the directory
git clone https://github.com/boasvdp/extractgenesABRicate.git cd extractgenesABRicate
Create a conda environment with the necessary packages
conda env create -f env.yaml
Activate the conda environment
conda activate envextractgenes_ABRicate ```
Alternatively, these commands can be used to install the tools separately through conda (not in a separate environment!):
conda install -c conda-forge -c bioconda biopython pandas seqtk
Usage
``` usage: extractgenesabricate.py [-h] -a ABRICATE FILE -g GENOMES DIR -o OUTPUT DIR [-s SUFFIX] [--genecluster] [--csv] [--flanking] [--flanking-bp FLANKING LENGTH] [-v]
Extract genes from genes based on ABRicate output.
optional arguments: -h, --help show this help message and exit -a ABRICATE FILE, --abricatefile ABRICATE FILE ABRicate file to parse genes -g GENOMES DIR, --genomedir GENOMES DIR directory containing genomes -o OUTPUT DIR, --output OUTPUT DIR directory for output -s SUFFIX, --suffix SUFFIX Genome assembly file suffix (default: .fasta) --genecluster Extract all genes to a single fasta if located on a single contig (default: false) --csv Use this option if your ABRicate output file is comma-separated (default: parse as tab-separated file). --flanking Extract flanking sequences --flanking-bp FLANKING LENGTH Length of flanking sequence to extract in bp (default: 100) -v, --verbose Increase verbosity ```
IMPORTANT ASSUMPTIONS
The script assumes the genome assemblies are named almost exactly as they are provided in the ABRicate output (#FILE column). The only thing that may differ is the suffix (default .fasta, unless otherwise provided using --suffix). The script is also at this time only able to handle a single suffix for genome assemblies at a time.
If you have identified genes for all genomes in your genomes/ directory (in which all genome assembly files end with .fasta) and your ABRicate output is present in ABRicate_out/strainA.tsv, run:
python extract_genes_ABRicate.py --abricatefile ABRicate_out/strainA.tsv --genomedir genomes/ --output extracted_genes/
Extended usage
ABRicate files can also be combined to speed up things. To combine all files in ABRicate_out/, e.g. run:
cat <(head -n 1 ABRicate_out/strainA.tsv) <(for i in ABRicate_out/*.tsv; do tail -n +2 $i; done) > ABRicate_all.tsv
After which the extractgenesABRicate.py script has to be run only once:
python extract_genes_ABRicate.py --abricatefile ABRicate_all.tsv --genomedir genomes/ --output extracted_genes/
Owner
- Name: Boas van der Putten
- Login: boasvdp
- Kind: user
- Location: Amsterdam, the Netherlands
- Company: Amsterdam UMC/Netherlands Reference Laboratory for Bacterial Meningitis
- Website: https://boasvdp.github.io/
- Twitter: boasvdputten
- Repositories: 8
- Profile: https://github.com/boasvdp
Postdoc using bioinformatics to study bacterial meningitis
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "van der Putten" given-names: "Boas CL" orcid: "https://orcid.org/0000-0002-7916-6665" title: "extract_genes_abricate.py" date-released: 2023-02-24 url: "https://github.com/boasvdp/extract_genes_ABRicate"
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Dependencies
- actions/checkout v2 composite
- conda-incubator/setup-miniconda v2 composite
- google-github-actions/release-please-action v3 composite