https://github.com/bihealth/seasnap-pipeline
SeA-SnaP: (Se)q (A)nalysis (Sna)kemake (P)ipeline
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (1.0%) to scientific vocabulary
Repository
SeA-SnaP: (Se)q (A)nalysis (Sna)kemake (P)ipeline
Basic Info
- Host: GitHub
- Owner: bihealth
- Language: Python
- Default Branch: main
- Size: 1.53 MB
Statistics
- Stars: 1
- Watchers: 9
- Forks: 2
- Open Issues: 27
- Releases: 2
Metadata Files
README.html
RNA SeA-SnaP
RNA SeA-SnaP is a RNA-(Se)q (A)nalysis (Sna)kemake (P)ipeline tool and combines two tasks:
- A sub-pipeline mapping fastq files to a reference genome/transcriptome using
STARorSalmonand including extensive quality control (Fastqc,Dupradar,Qualimap,RNASeQC,Preseq,infer_experiment,Multiqc)- A sub-pipeline for Differential Expression (DE) analysis with
DESeq2Both pipelines are based on
Snakemake.Outline
Concept
The focus of RNA SeA-SnaP is to be as easy to use, adapt and develop as possible. To this end, SeA-SnaP is divided in three main parts:
- Pipeline: Nearly all code corresponding to a specific (sub-)pipeline is included in one file.
- Configuration: Setting parameters and pipeline configuration is done via a separate config file (YAML).
- Tools: Generic functions and tools that are part of the pipeline framework are located in a separate file.
Finally there is also a directory with R markdown snippets for the DE sub-pipeline. Based on a configuration made in the config file, individual snippets can be assembled to generate a customized report. The splitting into snippets allows to easily develop, share and include different analyses of the results.
Quick-Start
Installation
After cloning this git repository:
git clone git@cubi-gitlab.bihealth.org:CUBI/Pipelines/sea-snap.gitall required tools and packages can be installed via conda. Download and install them into a new environment called
sea_snap:conda env create -f conda_env.yamlThe file
conda_env.yamlis located in the main directory of the git repository. Each time before using SeA-SnaP, activate the environment with:conda activate sea_snapRunning the pipeline
set up a working directory
Set up a working directory to store the results produced by the pipeline. (For CUBI projects create a project directory in the cluster under
/fast/groups/cubi/projects/). To create a directory and copy required files for the configuration of your pipeline run:path/to/git/sea-snap.py working_dirThis will create a directory at the location from where you are running the command called
results_<year>_<month>_<day>/and add config files for both pipelines, but you can customize this behaviour via the command line options (typesea-snap.py working_dir -hfor help). Directory names you provide can include formatting instructions for pythonstimepackage.
cd <dir_name>to the newly created working directory. SeA-SnaP also creates a symbolic link to the sea-snap.py script, so that you can from now on use./sea-snapto run helpers or pipelines from the working directory. You should always run pipelines and helpers from there.
run the pipeline
The next steps depend on, whether you want to run:
The results of an analysis can also be
exportedto a new folder structure, e.g. to upload them to SODAR.
Development
Let’s first introduce the general structure of SeA-SnaP.
As outlined above, the pipeline core functionality is separated from additional generic tools like the path handler (that handles where files are stored) and the pipeline configuration. The config file is loaded in Snakemake and its static parts (like parameter values) can be accessed in the pipeline rules. For other ‘dynamic’ parts of the configuration like file paths which are described by path patterns or the report- and contrast configuration tools are provided that can be used within the pipeline to access this information.
In addition, there is also a directory with report snippets for the DE pipeline, small pieces of R-Markdown code that run a single analysis step like producing a PCA plot. In the configuration file it can be set which snippets to use and in which order to assemble them into a full report.
![]()
Finally, there are some helper functions, that can be accessed via the./sea-snapwrapper to e.g. automatically produce a covariate file or sample information. There are also foldersexternal_scripts/, where scripts can be placed that may be used in the pipeline (although it is prefereable if small pieces of code are kept inside of the Snakemake file), andreport/R_common/, where R functions can be put that are generic and may be used in several report snippets.The pipelines can be easily extended.
See the separate sections for:
SeA-SnaP options
Available commands in the
./sea-snapwrapper:helpers:
working_dirto set up a new working directory for pipeline resultssample_infoto generate a yaml file with sample information used by the mapping pipelinecovariate_fileto generate a table with information required by the DE pipelineselect_contrastdisplay information to help choosing contrast definitionrun pipeline:
mappingrun the mapping pipelineDErun the DE pipelineType
./sea-snap -hor./sea-snap COMMAND -hfor help.Hints
understanding the reported number of reads (copied from old pipeline)
This has been inferred from single end data:
- STAR reports the total number of input reads, the number of uniquely mapped reads, the number of reads mapped to multiple loci (counted ones)
- STAR does not report directly the total number of unmapped reads, but the number of unmapped reads due to mapping to too many location
- feature counts reports in its summary file the total number of reads found in the alignment file: multi mapping reads are counted several times
- hence by summing up all the numbers from feautureCount you will not get number of input reads as reported by STAR
- however, the number of unmapped reads should be the same amd summing up all but Unassigned_MultiMapping and Unassigned_Unmapped should give the uniqyely mapped reads reported by STAR
Help
Address questions to Patrick Pett (patrick.pett@bihealth.de)
Owner
- Name: Berlin Institute of Health
- Login: bihealth
- Kind: organization
- Website: https://www.cubi.bihealth.org/
- Repositories: 215
- Profile: https://github.com/bihealth
BIH Core Unit Bioinformatics & BIH HPC IT
GitHub Events
Total
- Issues event: 3
- Push event: 7
- Pull request event: 2
- Create event: 3
Last Year
- Issues event: 3
- Push event: 7
- Pull request event: 2
- Create event: 3
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 1
- Average time to close issues: 2 days
- Average time to close pull requests: 2 days
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 1
- Average time to close issues: 2 days
- Average time to close pull requests: 2 days
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ericblanc20 (1)
- ErikaZ95 (1)
Pull Request Authors
- ericblanc20 (1)