https://github.com/bigbio/spectrafuse

Incremental clustesting pipeline from quantms data.

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Incremental clustesting pipeline from quantms data.

Basic Info

Host: GitHub
Owner: bigbio
License: mit
Language: Python
Default Branch: main
Size: 376 KB

Statistics

Stars: 1
Watchers: 5
Forks: 0
Open Issues: 4
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

spectrafuse

Incremental clustering pipeline from quantms data. quantms is a workflow for reanalysis of public proteomics data. The quantms not only release a workflow to the public but also perform reanalysis of public proteomics data in a systematic way for TMT, LFQ, ITRAQ and other DDA methods.

quantms has reanalyzed an extensive number of datasets with almost 1 billion MS/MS (Mass Spectrometry/Mass Spectrometry) MS2 analyzed, comprising nearly 100 million PSMs (Peptide-Spectrum Matches) derived from various tissues, cell lines, and diseases. In light of this vast wealth of data,The spectrafuse aims to apply spectral clustering techniques to organize this data and construct spectral libraries.

spectrafuse is a nextflow workflow that perform incremental clustering of quantms and is based in the tool MaRaCluster.

The workflow in a nutshell:

Reference: https://github.com/bigbio/spectrafuse/blob/main/docs/algorithm.png

The workflow is designed to be run in a high-performance computing environment, and it is built using Nextflow. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

Workflow steps

The start of the workflow is the SDRF of each reanalyzed project in quantms and the corresponding PSM parquet files generated by the quantms workflow. Note: the PSM parquet file MUST contain the corresponding spectra for each identified peptide.

This workflow mainly consists of the following processes:

Tool mgf-converter:A tool for converting each project file analyzed by QuantMS into an MGF file.
Incremental Maracluster Algorithm : where we will utilize the incremental clustering method of Maracluster to cluster MGF files from the same species, instrument, and charge within each project.
Library converter: - After all the clustering is done we should have a folder with the corresponding structure.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

Usage:

First, you should generate a flat text file directory containing the absolute/relative paths to each MS2 spectrum file in all projects.

Now, you can run the pipeline using:

shell nextflow run main.nf \ --files_list_folder <FILE_LIST_FOLER> --maracluster_output <OUTDIR>

Owner

Name: BigBio Stack
Login: bigbio
Kind: organization
Email: proteomicsstack@gmail.com
Location: Cambridge, UK

Website: http://bigbio.xyz
Repositories: 24
Profile: https://github.com/bigbio

Provide big data solutions Bioinformatics

GitHub Events

Total

Push event: 2

Last Year

Push event: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bigbio/spectrafuse

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

spectrafuse

Workflow steps

Usage:

Owner

GitHub Events

Total

Last Year