annotater

Functional Annotation of Gene Lists

https://github.com/systemsgenetics/annotater

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Functional Annotation of Gene Lists

Basic Info
  • Host: GitHub
  • Owner: SystemsGenetics
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 4.29 MB
Statistics
  • Stars: 3
  • Watchers: 10
  • Forks: 4
  • Open Issues: 6
  • Releases: 1
Created over 7 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

AnnoTater

AnnoTater Logo

GitHub Actions CI Status GitHub Actions Linting Status nf-test Nextflow run with docker run with singularity <!-- Launch on Seqera Platform --> <!--Cite with Zenodo --> <!-- run with conda -->

Introduction

AnnoTater AnnoTater is a whole or partial genome functional annotation workflow built using Nextflow. It takes a set of protein coding gene sequences (either in nucleotide or protein FASTA format) and runs InterProScan; BLAST vs UniProt SwissProt, NCBI NR, NCBI RefSeq, OrthoDB and StringDB in order to provide a first pass set of annotations for genes.

AnnoTater is constructed using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

AnnoTater provides the following steps:

  1. Homology searching against specified databases using Diamond BLAST (Diamond). Supported databases include:
    • NCBI nr
    • NCBI RefSeq
    • ExPASy SwissProt
    • ExPASy Trembl
    • STRING database
  2. Execution of InterProScan

Usage

  1. Download databases. AnnoTater must have available the databases. These can take quite a while to download and can consume large amounts of storage. Use the bash scripts in the scripts folder to retrieve and index the databases prior to using this workflow.

  2. Install Nextflow (>=21.10.3)

  3. Install any of Docker, Singularity, Podman, Shifter or Charliecloud for full pipeline reproducibility (Conda is currently not supported); see docs),

  4. Download the pipeline and test it on a minimal dataset with a single command:

console nextflow run systemsgenetics/annotater -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>

  • Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.
  • If you are using singularity then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the --singularity_pull_docker_container parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the nf-core download command to pre-download all of the required containers before running the pipeline and to set the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options to be able to store and re-use the images from a central location for future pipeline runs.
  1. Start running your own analysis!

```console nextflow run systemsgenetics/annotater \ -profile \ --batchsize 100 \ --input \ --datasprot \ --datarefseq \ --dataipr \ --maxcpus 10 \ --maxmemory 6GB

```

  • The --batch_size arguments indicates the number of sequences to process in each batch.
  • It is recommended if using NCBI nr to set a large enough --max_memory size.

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

AnnoTater and was written by the Ficklin Computational Biology Team at Washington State University. Development of AnnoTater was initially funded by the U.S. National Science Foundation (NSF) Award #1659300.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

AnnoTater is currently unpublished. For now, please use the GitHub URL when referencing. An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: SystemsGenetics
  • Login: SystemsGenetics
  • Kind: organization

Citation (CITATIONS.md)

# systemsgenetics/annotater: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools
  
- [Diamond](https://github.com/bbuchfink/diamond)

  > Buchfink B, Xie C, Huson DH. "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12, 59-60 (2015). doi:10.1038/nmeth.3176

- [InterProScan](https://www.ncbi.nlm.nih.gov/pubmed/27312411/)
  > Philip Jones, David Binns, Hsin-Yu Chang, Matthew Fraser, Weizhong Li, Craig McAnulla, Hamish McWilliam, John Maslen, Alex Mitchell, Gift Nuka, Sebastien Pesseat, Antony F. Quinn, Amaia Sangrador-Vegas, Maxim Scheremetjew, Siew-Yit Yong, Rodrigo Lopez, Sarah Hunter. "InterProScan 5: genome-scale protein function classification" Bioinformatics (2014), PMID: 24451626

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 12
  • Pull request event: 1
Last Year
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 12
  • Pull request event: 1

Dependencies

.github/workflows/awsfulltest.yml actions
  • nf-core/tower-action master composite
.github/workflows/awstest.yml actions
  • nf-core/tower-action master composite
.github/workflows/branch.yml actions
  • mshick/add-pr-comment v1 composite
.github/workflows/ci.yml actions
  • actions/checkout v2 composite
  • nf-core/setup-nextflow v1 composite
.github/workflows/fix-linting.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v2 composite
.github/workflows/linting.yml actions
  • actions/checkout v2 composite
  • actions/setup-node v2 composite
  • actions/setup-python v3 composite
  • actions/upload-artifact v2 composite
  • mshick/add-pr-comment v1 composite
  • nf-core/setup-nextflow v1 composite
  • psf/black stable composite
.github/workflows/linting_comment.yml actions
  • dawidd6/action-download-artifact v2 composite
  • marocchino/sticky-pull-request-comment v2 composite
modules/local/entap/entap_config/meta.yml cpan
modules/local/entap/entap_run/meta.yml cpan
modules/local/interproscan/meta.yml cpan
modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan
modules/nf-core/diamond/blastp/meta.yml cpan
modules/nf-core/diamond/blastx/meta.yml cpan
modules/nf-core/diamond/makedb/meta.yml cpan
pyproject.toml pypi