dropseqpipe

A SingleCell RNASeq pre-processing snakemake workflow

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 8 committers (12.5%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (19.8%) to scientific vocabulary

Keywords

conda drop-seq dropseq dropseqtools multiqc picard pipeline plot reference-genome scrb-seq scrbseq snakemake star umi yaml

Last synced: 6 months ago · JSON representation

Repository

A SingleCell RNASeq pre-processing snakemake workflow

Basic Info

Host: GitHub
Owner: Hoohm
License: cc-by-sa-4.0
Language: Python
Default Branch: master
Homepage:
Size: 6.74 MB

Statistics

Stars: 146
Watchers: 9
Forks: 45
Open Issues: 26
Releases: 0

Topics

conda drop-seq dropseq dropseqtools multiqc picard pipeline plot reference-genome scrb-seq scrbseq snakemake star umi yaml

Created over 9 years ago · Last pushed about 3 years ago

Metadata Files

Readme License

Description

This pipeline is based on snakemake and the dropseq tools provided by the McCarroll Lab. It allows to go from raw data of your Single Cell RNA seq experiment until the final count matrix with QC plots along the way.

This is the tool we use in our lab to improve our wetlab protocol as well as provide an easy framework to reproduce and compare different experiments with different parameters.

It uses STAR to map the reads. It is usable for any single cell protocol using two reads where the first one holds the Cell and UMI barcodes and the second read holds the RNA. Here is a non-exhausitve list of compatible protocols/brands:

Drop-Seq
SCRB-Seq
10x Genomics
DroNc-seq
Dolomite Bio (Nadia Instrument)

This package is trying to be as user friendly as possible. One of the hopes is that non-bioinformatician can make use of it without too much hassle. It will still require some command line execution, this is not going to be fully interactive package.

Authors

Patrick Roelli (@Hoohm))
Sebastian Mueller (@seb-mueller))
Charles Girardot (@cgirardot))

Usage

Step 1: Install workflow

If you simply want to use this workflow, download and extract the latest release. If you intend to modify and further develop this workflow, fork this reposity. Please consider providing any generally applicable modifications via a pull request.

In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, once available, its DOI.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the file config.yaml and the samples.tsv following those instructions

Step 3: Execute workflow

All you need to execute this workflow is to install Snakemake via the Conda package manager. Software needed by this workflow is automatically deployed into isolated environments by Snakemake.

Test your configuration by performing a dry-run via

snakemake --use-conda -n --directory $WORKING_DIR

Execute the workflow locally via

snakemake --use-conda --cores $N --directory $WORKING_DIR

using $N cores on the $WORKING_DIR. Alternatively, it can be run in cluster or cloud environments (see the docs for details).

If you not only want to fix the software stack but also the underlying OS, use

snakemake --use-conda --use-singularity

in combination with any of the modes above.

Step 4: Investigate results

After successful execution, you can create a self-contained report with all results via:

snakemake --report report.html

Documentation

You can find the documentation here

Future implementations

I'm actively seeking help to implement the points listed bellow. Don't hesitate to contact me if you wish to contribute.

Create a sharing platform where quality plots/logs can be discussed and troubleshooted.
Create a full html report for the whole pipeline
Multiqc module for drop-seq-tools
Implement an elegant "preview" mode where the pipeline would only run on a couple of millions of reads and allow you to have an approximated view before running all of the data. This would dramatically reduce the time needed to get an idea of what filters whould be used.

I hope it can help you out in your single cell experiments!

Feel free to comment and point out potential improvements via issues

TODO

Add a mixed reference reference for testing purposes
Finalize the parameters validation schema
Make the debug feature a bit "cleaner". Deal with automatic naming of the debug variables
Implement ddseq barcoding strategies

Owner

Name: Patrick Roelli
Login: Hoohm
Kind: user
Location: Switzerland

Website: https://twitter.com/Hoohm_
Repositories: 32
Profile: https://github.com/Hoohm

GitHub Events

Total

Watch event: 3
Pull request event: 1

Last Year

Watch event: 3
Pull request event: 1

Committers

Last synced: about 2 years ago

All Time

Total Commits: 278
Total Committers: 8
Avg Commits per committer: 34.75
Development Distribution Score (DDS): 0.558

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
hoohm	p**i@g**m	123
Patrick Roelli	r**p@b**h	60
Sebastian Mueller	s**m@p**e	55
Charles Girardot	g**t@e**e	18
Patrick Roelli	d**m@m**e	15
Hoohm	p**k@N**n	4
Patrick Roelli	r**9@l**g	2
Kyle Duyck	d**e@g**m	1

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 77
Total pull requests: 23
Average time to close issues: 3 months
Average time to close pull requests: 10 days
Total issue authors: 38
Total pull request authors: 5
Average comments per issue: 4.03
Average comments per pull request: 3.57
Merged pull requests: 16
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

seb-mueller (8)
Hofphi (6)
abmmki (5)
dylkot (5)
colindaven (4)
Jun-Lizst (4)
olechnwin (3)
manarai (3)
YOU-k (3)
RichardCorbett (2)
PGuen (2)
grst (2)
amufaamo (2)
terooatt (2)
rjg2186 (2)

Pull Request Authors

seb-mueller (11)
TomKellyGenetics (6)
cgirardot (3)
grst (2)
mys721tx (1)
Hoohm (1)

Top Labels

Issue Labels

enhancement (5) help wanted (2) bug (2) question (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 8 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 2
Total maintainers: 1

pypi.org: dropseqpipe

A drop-seq pipeline

Homepage: http://github.com/hoohm/dropSeqPipe
Documentation: https://dropseqpipe.readthedocs.io/
License: GNU GPL3
Latest release: 0.23a0
published over 8 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 8 Last month

Rankings

Stargazers count: 6.0%

Forks count: 6.0%

Dependent packages count: 10.1%

Average: 16.6%

Dependent repos count: 21.6%

Downloads: 39.1%

Maintainers (1)

hoohm

Last synced: 6 months ago

dropseqpipe

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Description

Authors

Usage

Step 1: Install workflow

Step 2: Configure workflow

Step 3: Execute workflow

Step 4: Investigate results

Documentation

Future implementations

TODO

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: dropseqpipe

Rankings

Maintainers (1)