extract_genes_abricate

Small script to use ABRicate output to extract genes from genome assemblies, reverse complement if necessary, and print to a file

https://github.com/boasvdp/extract_genes_abricate

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Small script to use ABRicate output to extract genes from genome assemblies, reverse complement if necessary, and print to a file

Basic Info
  • Host: GitHub
  • Owner: boasvdp
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 64.5 KB
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 2
Created over 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

extractgenesABRicate

Small script to use ABRicate output to extract genes from genome assemblies, reverse complement if necessary, and print to a file

Installation

This script needs Python 3 with the Pandas and BioPython libraries, as well as seqtk to run. ABRicate is not necessarily needed, although the ABRicate output should include a STRAND column with relevent information.

If you have Miniconda installed (https://docs.conda.io/en/latest/miniconda.html), these dependencies can be easily installed. First clone the directory to your machine:

```

Clone and enter the directory

git clone https://github.com/boasvdp/extractgenesABRicate.git cd extractgenesABRicate

Create a conda environment with the necessary packages

conda env create -f env.yaml

Activate the conda environment

conda activate envextractgenes_ABRicate ```

Alternatively, these commands can be used to install the tools separately through conda (not in a separate environment!):

conda install -c conda-forge -c bioconda biopython pandas seqtk

Usage

``` usage: extractgenesabricate.py [-h] -a ABRICATE FILE -g GENOMES DIR -o OUTPUT DIR [-s SUFFIX] [--genecluster] [--csv] [--flanking] [--flanking-bp FLANKING LENGTH] [-v]

Extract genes from genes based on ABRicate output.

optional arguments: -h, --help show this help message and exit -a ABRICATE FILE, --abricatefile ABRICATE FILE ABRicate file to parse genes -g GENOMES DIR, --genomedir GENOMES DIR directory containing genomes -o OUTPUT DIR, --output OUTPUT DIR directory for output -s SUFFIX, --suffix SUFFIX Genome assembly file suffix (default: .fasta) --genecluster Extract all genes to a single fasta if located on a single contig (default: false) --csv Use this option if your ABRicate output file is comma-separated (default: parse as tab-separated file). --flanking Extract flanking sequences --flanking-bp FLANKING LENGTH Length of flanking sequence to extract in bp (default: 100) -v, --verbose Increase verbosity ```

IMPORTANT ASSUMPTIONS

The script assumes the genome assemblies are named almost exactly as they are provided in the ABRicate output (#FILE column). The only thing that may differ is the suffix (default .fasta, unless otherwise provided using --suffix). The script is also at this time only able to handle a single suffix for genome assemblies at a time.

If you have identified genes for all genomes in your genomes/ directory (in which all genome assembly files end with .fasta) and your ABRicate output is present in ABRicate_out/strainA.tsv, run:

python extract_genes_ABRicate.py --abricatefile ABRicate_out/strainA.tsv --genomedir genomes/ --output extracted_genes/

Extended usage

ABRicate files can also be combined to speed up things. To combine all files in ABRicate_out/, e.g. run:

cat <(head -n 1 ABRicate_out/strainA.tsv) <(for i in ABRicate_out/*.tsv; do tail -n +2 $i; done) > ABRicate_all.tsv

After which the extractgenesABRicate.py script has to be run only once:

python extract_genes_ABRicate.py --abricatefile ABRicate_all.tsv --genomedir genomes/ --output extracted_genes/

Owner

  • Name: Boas van der Putten
  • Login: boasvdp
  • Kind: user
  • Location: Amsterdam, the Netherlands
  • Company: Amsterdam UMC/Netherlands Reference Laboratory for Bacterial Meningitis

Postdoc using bioinformatics to study bacterial meningitis

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "van der Putten"
  given-names: "Boas CL"
  orcid: "https://orcid.org/0000-0002-7916-6665"
title: "extract_genes_abricate.py"
date-released: 2023-02-24
url: "https://github.com/boasvdp/extract_genes_ABRicate"

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Dependencies

.github/workflows/main.yml actions
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/release-please.yml actions
  • google-github-actions/release-please-action v3 composite