spice-nf

https://github.com/felixhaidle/spice-nf

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 13 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: felixhaidle
License: gpl-3.0
Language: HTML
Default Branch: main
Size: 2.27 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created over 1 year ago · Last pushed 7 months ago

Metadata Files

Readme Changelog Contributing License Citation

BIONF/spicelibrarypipeline

Acknowledgements & Original Software

This pipeline includes adaptations from the Splicing-based Protein Isoform Comparison Estimator (SPICE) tool, originally developed by Christian Blümel and Julian Dosch. The original SPICE implementation is available at https://github.com/chrisbluemel/SPICE and is licensed under the GNU General Public License v3.0.

Introduction

This repository contains a Nextflow-based pipeline to automate the creation of SPICE libraries. Instead of running multiple scripts manually, you can now use this streamlined pipeline.

```mermaid flowchart TB %% Parameters %% subgraph PARAMETERS species[species] annotools[annotools] faspartitions[faspartitions] outdir[outdir]

end

%% Processes %%
subgraph PIPELINE
    SEQUENCES[ENSEMBL_DOWNLOAD]
    LIBRARY_INITIALIZATION[LIBRARY_INITIALIZATION]
    FAS_ANNOTATION[FAS_ANNOTATION]
    LIBRARY_RESTRUCTURE[LIBRARY_RESTRUCTURE]
    FAS_SCORING[FAS_SCORING]
    CONCAT_FAS_SCORES[CONCAT_FAS_SCORES]
    COMPLEXITY[COMPLEXITY]
    SEED_PARALLELIZATION[SEED_PARALLELIZATION]
end

%% Outputs %%
subgraph OUTPUTS
    library[spice_library]
end

%% Connections %%
species --> SEQUENCES
SEQUENCES --> LIBRARY_INITIALIZATION



anno_tools --> FAS_ANNOTATION



LIBRARY_INITIALIZATION --> FAS_ANNOTATION




FAS_ANNOTATION --> LIBRARY_RESTRUCTURE

LIBRARY_RESTRUCTURE --> COMPLEXITY


COMPLEXITY --> SEED_PARALLELIZATION

fas_partitions --> SEED_PARALLELIZATION

SEED_PARALLELIZATION --> FAS_SCORING

FAS_SCORING --> CONCAT_FAS_SCORES
outdir --> CONCAT_FAS_SCORES

CONCAT_FAS_SCORES --> library

```

It is roughly divided into the following steps:

The pipeline is roughly divided into the following steps:

Download peptide sequences and annotation files and fetch species metadata from ENSEMBL for the target organism.
Initialize the SPICE library structure ENSEMBL.
Annotate the peptide sequences using fas.doAnno.
Restructure the annotated sequences in preparation for FAS scoring.
Order FAS scoring by estimating run time for each protein pairing by using fas.calcComplexity.
Group protein pairings into a user specified amount of partitions to reduce the amount of processes (optional).
Perform FAS scoring using fas.run.
Merge the resulting FAS scores into the final library structure.
Emit the created library in to the target directory

Usage

Below is the general installation/setup/usage explanation. A detailed explanation and further assistance can be found in the WIKI

[!NOTE] If you are from AK Ebersberger please refer to the AKE usage documentation.

Install FAS

SPICE heavily relies on FAS to function, as the FAS algorithm estimates transcript (dis)similarities.

FAS itself depends on various annotation tools, which cannot be bundled with this pipeline. You must first follow the instructions here to set it up.

Make sure the following command runs successfully in the environment where you will execute the pipeline (e.g., when submitting a job via SLURM):

bash fas.doAnno -i test_annofas.fa -o test_output

Set up the pipeline

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page for instructions on setting up Nextflow. Make sure to test your setup using -profile test before running the workflow on actual data.

You should test the functionality using the test profile.

[!WARNING] This pipeline currently only supports the "conda" profile. The environment requirements are defined in assets/environment.yml. All processes use the same environment.

bash nextflow run git@github.com:felixhaidle/spice-nf.git \ -r <DESIRED_RELEASE> -profile conda,test \ --outdir <OUTDIR>

Run the full pipeline:

bash nextflow run git@github.com:felixhaidle/spice-nf.git \ -r <DESIRED_RELEASE> -profile conda \ --species <SPECIES> \ --anno_tools <PATH_TO_ANNOTOOLS_INSTALLATION> \ --outdir <OUTDIR> \ --fas_partitions <AVAILABLE_CPUS>

| Parameter | Description | | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | -r | specifies the pipeline version | | -profile conda | Specifies the execution profile (only conda is supported). | | --species | Species name (e.g., homo_sapiens, mus_musculus). Must match ENSEMBL naming. | | --anno_tools | Path to the installed annotation tools directory (equivalent to the -t parameter in fas.setup). | | --outdir | Output directory for pipeline results. Will be created if it doesn't exist. | | --fas_partitions | Amount of parallel fas scoring processes you can run in parallel. Will group the protein pairs into this amout of processes. Higher amount means more parallel scoring, but if the processes can't run in parallel the benefits diminishes. This parameter is optional buth highly highly highly recommended to be spcified since otherweise for each protein pairing a process will be created, which significantly increases runtime. |

A full overview of all available parameters can be found in parameters.md. Check it out before you run the pipeline.

[!WARNING] Please provide pipeline parameters via the CLI or the Nextflow -params-file option. Custom config files (via the -c option) can be used for configuration except for parameters; see docs.

A full list of available parameters and documentation can also be found in the WIKI.

Credits

BIONF/spice_library_pipeline was originally written by Felix Haidle.

Become a contributor

AI Assistance Acknowledgment

Parts of this project’s documentation, code structuring, and scripting were developed with the assistance of AI (ChatGPT by OpenAI). All final decisions, modifications, and validations were made by the project author(s).

Citations

An extensive list of references for the tools used by the pipeline can be found in CITATIONS.md.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines. Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

Login: felixhaidle
Kind: user

Repositories: 1
Profile: https://github.com/felixhaidle

Citation (CITATIONS.md)

# BIONF/spice_library_pipeline: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools
- [SPICE](https://github.com/chrisbluemel/SPICE)

  > Christian Blümel

- [FAS](https://doi.org/10.1093/bioinformatics/btad226)

  > Julian Dosch, Holger Bergmann, Vinh Tran, Ingo Ebersberger, FAS: assessing the similarity between proteins using multi-layered feature architectures, Bioinformatics, Volume 39, Issue 5, May 2023

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

GitHub Events

Total

Release event: 1
Delete event: 1
Push event: 24
Public event: 1
Gollum event: 22

Last Year

Release event: 1
Delete event: 1
Push event: 24
Public event: 1
Gollum event: 22

Dependencies

subworkflows/nf-core/utils_nextflow_pipeline/meta.yml cpan

subworkflows/nf-core/utils_nfcore_pipeline/meta.yml cpan

subworkflows/nf-core/utils_nfschema_plugin/meta.yml cpan

environment.yml conda

hmmer
matplotlib
numpy
pandas
pip
plotly
pyranges
python 3.7.*
pyyaml
requests
scipy
tqdm

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science