https://github.com/cmkobel/mspipeline1

🪆🦖 A snakemake wrapper around Nesvilab's FragPipe-CLI. In a perfect world, this pipeline was based on Sage.

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary

Keywords

fragpipe hpc metaproteomics parallel-computing proteomics snakemake-pipeline

Last synced: 5 months ago · JSON representation

Repository

🪆🦖 A snakemake wrapper around Nesvilab's FragPipe-CLI. In a perfect world, this pipeline was based on Sage.

Basic Info

Host: GitHub
Owner: cmkobel
License: gpl-3.0
Language: Python
Default Branch: main
Homepage:
Size: 1.78 MB

Statistics

Stars: 1
Watchers: 2
Forks: 0
Open Issues: 3
Releases: 0

Topics

fragpipe hpc metaproteomics parallel-computing proteomics snakemake-pipeline

Created over 3 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License

mspipeline1

_____________ < mspipeline1 > ------------- \ ___......__ _ \ _.-' ~-_ _.=a~~-_ --=====-.-.-_----------~ .--. _ -.__.-~ ( ___===> '''--...__ ( \ \\\ { ) _.-~ =_ ~_ \\-~~~//~~~~-=-~ |-=-~_ \\ \\ |_/ =. ) ~} |} || // || _// {{ '='~' \\_ = ~~'

If you want to use fragpipe using the command line interface, then this is the tool for you.

This pipeline takes 1) a list of .d files and 2) a list of fasta-amino acid files and outputs sane protein calls with abundances. It uses philosopher database and fragpipe to do the job. The snakemake pipeline maintains a nice output file tree.

Why you should use this pipeline

Because it makes sure that all outputs are updated when you change input-parameters. It also yells at you if something fails, and hopefully makes it a bit easier to find the error.

Installation

1) Prerequisites: - Preferably a HPC system, or a beefy local workstation. - An conda package manager on that system. (We recommend miniforge)

2) Clone this repo on the HPC/workstation where you want to work. git clone https://github.com/cmkobel/mspipeline1.git && cd mspipeline1

3) If you don't already have an environment with snakemake and mamba installed, use the following command to install a "snakemake" environment with the bundled environment file: conda env create -f environment.yaml -n mspipeline1

This environment can then be activated by typing conda activate mspipeline1

4) If needed, tweak the profiles/slurm/ configuration so that it matches your execution environment. There is a profile for local execution without a job management system (profiles/local/) as well as a few profiles for different HPC environments like PBS and SLURM.

Usage

1) Update config.yaml

The file config_template.yaml contains all the parameters needed to run this pipeline. You should change the parameters to reflect your sample batch.

Because nesvilab do not make their executables immediately publicly available, you need to tell the pipeline where to find them on your system. Update addresses for the keys philosopher_executable, msfragger_jar, ionquant_jar and fragpipe_executable which can be downloaded here, here, here and here, respectively.

Currently the pipeline is only tested on the input of .d-files (agilent/bruker): Create an item in batchparameters where you define key `dbasewhich is the base directory where all .d-files reside. Define keydatabase_glob` which is a path (or glob) to the fasta-amino acid files that you want to include in the target protein database.

Define items under the samples key which link sample names to the .d-files.

Lastly, set the batch key to point at the batch that you want to run.

2) Run

Finally, run the pipeline in your command line with: $ snakemake --profile profiles/slurm/

Below is visualization of the workflow graph:

Screenshot 2023-02-23 at 10 48 07

Future

This pipeline might involve an R-markdown performing trivial QC. Also, a test data set that accelerates the development cycle. 🚴‍♀️

Owner

Name: Carl Mathias Kobel
Login: cmkobel
Kind: user
Company: Norges Miljø- & Biovitenskapelige Universitet

Website: https://orcid.org/0000-0002-4461-1159
Repositories: 49
Profile: https://github.com/cmkobel

PhD-fellow in the MEMO group at NMBU.

GitHub Events

Total

Last Year

Dependencies

environment.yaml pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cmkobel/mspipeline1

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

mspipeline1

Why you should use this pipeline

Installation

Usage

1) Update config.yaml

2) Run

Future

Owner

GitHub Events

Total

Last Year

Dependencies