extract_genes_abricate

Small script to use ABRicate output to extract genes from genome assemblies, reverse complement if necessary, and print to a file

https://github.com/boasvdp/extract_genes_abricate

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Small script to use ABRicate output to extract genes from genome assemblies, reverse complement if necessary, and print to a file

Basic Info

Host: GitHub
Owner: boasvdp
License: mit
Language: Python
Default Branch: master
Size: 64.5 KB

Statistics

Stars: 5
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 2

Created almost 6 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

extractgenesABRicate

Small script to use ABRicate output to extract genes from genome assemblies, reverse complement if necessary, and print to a file

Installation

This script needs Python 3 with the Pandas and BioPython libraries, as well as seqtk to run. ABRicate is not necessarily needed, although the ABRicate output should include a STRAND column with relevent information.

If you have Miniconda installed (https://docs.conda.io/en/latest/miniconda.html), these dependencies can be easily installed. First clone the directory to your machine:

```

Clone and enter the directory

git clone https://github.com/boasvdp/extractgenesABRicate.git cd extractgenesABRicate

Create a conda environment with the necessary packages

conda env create -f env.yaml

Activate the conda environment

conda activate envextractgenes_ABRicate ```

Alternatively, these commands can be used to install the tools separately through conda (not in a separate environment!):

conda install -c conda-forge -c bioconda biopython pandas seqtk

Usage

``` usage: extractgenesabricate.py [-h] -a ABRICATE FILE -g GENOMES DIR -o OUTPUT DIR [-s SUFFIX] [--genecluster] [--csv] [--flanking] [--flanking-bp FLANKING LENGTH] [-v]

Extract genes from genes based on ABRicate output.

optional arguments: -h, --help show this help message and exit -a ABRICATE FILE, --abricatefile ABRICATE FILE ABRicate file to parse genes -g GENOMES DIR, --genomedir GENOMES DIR directory containing genomes -o OUTPUT DIR, --output OUTPUT DIR directory for output -s SUFFIX, --suffix SUFFIX Genome assembly file suffix (default: .fasta) --genecluster Extract all genes to a single fasta if located on a single contig (default: false) --csv Use this option if your ABRicate output file is comma-separated (default: parse as tab-separated file). --flanking Extract flanking sequences --flanking-bp FLANKING LENGTH Length of flanking sequence to extract in bp (default: 100) -v, --verbose Increase verbosity ```

IMPORTANT ASSUMPTIONS

The script assumes the genome assemblies are named almost exactly as they are provided in the ABRicate output (#FILE column). The only thing that may differ is the suffix (default .fasta, unless otherwise provided using --suffix). The script is also at this time only able to handle a single suffix for genome assemblies at a time.

If you have identified genes for all genomes in your genomes/ directory (in which all genome assembly files end with .fasta) and your ABRicate output is present in ABRicate_out/strainA.tsv, run:

python extract_genes_ABRicate.py --abricatefile ABRicate_out/strainA.tsv --genomedir genomes/ --output extracted_genes/

Extended usage

ABRicate files can also be combined to speed up things. To combine all files in ABRicate_out/, e.g. run:

cat <(head -n 1 ABRicate_out/strainA.tsv) <(for i in ABRicate_out/*.tsv; do tail -n +2 $i; done) > ABRicate_all.tsv

After which the extractgenesABRicate.py script has to be run only once:

python extract_genes_ABRicate.py --abricatefile ABRicate_all.tsv --genomedir genomes/ --output extracted_genes/

Owner

Name: Boas van der Putten
Login: boasvdp
Kind: user
Location: Amsterdam, the Netherlands
Company: Amsterdam UMC/Netherlands Reference Laboratory for Bacterial Meningitis

Website: https://boasvdp.github.io/
Twitter: boasvdputten
Repositories: 8
Profile: https://github.com/boasvdp

Postdoc using bioinformatics to study bacterial meningitis

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "van der Putten"
  given-names: "Boas CL"
  orcid: "https://orcid.org/0000-0002-7916-6665"
title: "extract_genes_abricate.py"
date-released: 2023-02-24
url: "https://github.com/boasvdp/extract_genes_ABRicate"

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Dependencies

.github/workflows/main.yml actions

actions/checkout v2 composite
conda-incubator/setup-miniconda v2 composite

.github/workflows/release-please.yml actions

google-github-actions/release-please-action v3 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

extract_genes_abricate

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

extractgenesABRicate

Installation

Clone and enter the directory

Create a conda environment with the necessary packages

Activate the conda environment

Usage

Extended usage

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies