Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 13 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: felixhaidle
- License: gpl-3.0
- Language: HTML
- Default Branch: main
- Size: 2.27 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
BIONF/spicelibrarypipeline
Acknowledgements & Original Software
This pipeline includes adaptations from the Splicing-based Protein Isoform Comparison Estimator (SPICE) tool, originally developed by Christian Blümel and Julian Dosch. The original SPICE implementation is available at https://github.com/chrisbluemel/SPICE and is licensed under the GNU General Public License v3.0.
Introduction
This repository contains a Nextflow-based pipeline to automate the creation of SPICE libraries. Instead of running multiple scripts manually, you can now use this streamlined pipeline.
```mermaid flowchart TB %% Parameters %% subgraph PARAMETERS species[species] annotools[annotools] faspartitions[faspartitions] outdir[outdir]
end
%% Processes %%
subgraph PIPELINE
SEQUENCES[ENSEMBL_DOWNLOAD]
LIBRARY_INITIALIZATION[LIBRARY_INITIALIZATION]
FAS_ANNOTATION[FAS_ANNOTATION]
LIBRARY_RESTRUCTURE[LIBRARY_RESTRUCTURE]
FAS_SCORING[FAS_SCORING]
CONCAT_FAS_SCORES[CONCAT_FAS_SCORES]
COMPLEXITY[COMPLEXITY]
SEED_PARALLELIZATION[SEED_PARALLELIZATION]
end
%% Outputs %%
subgraph OUTPUTS
library[spice_library]
end
%% Connections %%
species --> SEQUENCES
SEQUENCES --> LIBRARY_INITIALIZATION
anno_tools --> FAS_ANNOTATION
LIBRARY_INITIALIZATION --> FAS_ANNOTATION
FAS_ANNOTATION --> LIBRARY_RESTRUCTURE
LIBRARY_RESTRUCTURE --> COMPLEXITY
COMPLEXITY --> SEED_PARALLELIZATION
fas_partitions --> SEED_PARALLELIZATION
SEED_PARALLELIZATION --> FAS_SCORING
FAS_SCORING --> CONCAT_FAS_SCORES
outdir --> CONCAT_FAS_SCORES
CONCAT_FAS_SCORES --> library
```
It is roughly divided into the following steps:
The pipeline is roughly divided into the following steps:
- Download peptide sequences and annotation files and fetch species metadata from ENSEMBL for the target organism.
- Initialize the SPICE library structure ENSEMBL.
- Annotate the peptide sequences using fas.doAnno.
- Restructure the annotated sequences in preparation for FAS scoring.
- Order FAS scoring by estimating run time for each protein pairing by using fas.calcComplexity.
- Group protein pairings into a user specified amount of partitions to reduce the amount of processes (optional).
- Perform FAS scoring using fas.run.
- Merge the resulting FAS scores into the final library structure.
- Emit the created library in to the target directory
Usage
Below is the general installation/setup/usage explanation. A detailed explanation and further assistance can be found in the WIKI
[!NOTE] If you are from AK Ebersberger please refer to the AKE usage documentation.
Install FAS
SPICE heavily relies on FAS to function, as the FAS algorithm estimates transcript (dis)similarities.
FAS itself depends on various annotation tools, which cannot be bundled with this pipeline. You must first follow the instructions here to set it up.
Make sure the following command runs successfully in the environment where you will execute the pipeline (e.g., when submitting a job via SLURM):
bash
fas.doAnno -i test_annofas.fa -o test_output
Set up the pipeline
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page for instructions on setting up Nextflow. Make sure to test your setup using
-profile testbefore running the workflow on actual data.
You should test the functionality using the test profile.
[!WARNING] This pipeline currently only supports the "conda" profile. The environment requirements are defined in
assets/environment.yml. All processes use the same environment.
bash
nextflow run git@github.com:felixhaidle/spice-nf.git \
-r <DESIRED_RELEASE>
-profile conda,test \
--outdir <OUTDIR>
Run the full pipeline:
bash
nextflow run git@github.com:felixhaidle/spice-nf.git \
-r <DESIRED_RELEASE>
-profile conda \
--species <SPECIES> \
--anno_tools <PATH_TO_ANNOTOOLS_INSTALLATION> \
--outdir <OUTDIR> \
--fas_partitions <AVAILABLE_CPUS>
| Parameter | Description |
| ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| -r | specifies the pipeline version |
| -profile conda | Specifies the execution profile (only conda is supported). |
| --species | Species name (e.g., homo_sapiens, mus_musculus). Must match ENSEMBL naming. |
| --anno_tools | Path to the installed annotation tools directory (equivalent to the -t parameter in fas.setup). |
| --outdir | Output directory for pipeline results. Will be created if it doesn't exist. |
| --fas_partitions | Amount of parallel fas scoring processes you can run in parallel. Will group the protein pairs into this amout of processes. Higher amount means more parallel scoring, but if the processes can't run in parallel the benefits diminishes. This parameter is optional buth highly highly highly recommended to be spcified since otherweise for each protein pairing a process will be created, which significantly increases runtime. |
A full overview of all available parameters can be found in parameters.md. Check it out before you run the pipeline.
[!WARNING] Please provide pipeline parameters via the CLI or the Nextflow
-params-fileoption. Custom config files (via the-coption) can be used for configuration except for parameters; see docs.
A full list of available parameters and documentation can also be found in the WIKI.
Credits
BIONF/spice_library_pipeline was originally written by Felix Haidle.
AI Assistance Acknowledgment
Parts of this project’s documentation, code structuring, and scripting were developed with the assistance of AI (ChatGPT by OpenAI). All final decisions, modifications, and validations were made by the project author(s).
Citations
An extensive list of references for the tools used by the pipeline can be found in CITATIONS.md.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines. Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Login: felixhaidle
- Kind: user
- Repositories: 1
- Profile: https://github.com/felixhaidle
Citation (CITATIONS.md)
# BIONF/spice_library_pipeline: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [SPICE](https://github.com/chrisbluemel/SPICE) > Christian Blümel - [FAS](https://doi.org/10.1093/bioinformatics/btad226) > Julian Dosch, Holger Bergmann, Vinh Tran, Ingo Ebersberger, FAS: assessing the similarity between proteins using multi-layered feature architectures, Bioinformatics, Volume 39, Issue 5, May 2023 ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
GitHub Events
Total
- Release event: 1
- Delete event: 1
- Push event: 24
- Public event: 1
- Gollum event: 22
Last Year
- Release event: 1
- Delete event: 1
- Push event: 24
- Public event: 1
- Gollum event: 22
Dependencies
- hmmer
- matplotlib
- numpy
- pandas
- pip
- plotly
- pyranges
- python 3.7.*
- pyyaml
- requests
- scipy
- tqdm