https://github.com/bioinfo-pf-curie/scrna-smartseq3

Smartseq3 single-cell RNAseq data analysis.

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Smartseq3 single-cell RNAseq data analysis.

Basic Info

Host: GitHub
Owner: bioinfo-pf-curie
License: other
Language: Nextflow
Default Branch: master
Size: 1.08 GB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 2

Created almost 5 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License

SmartSeq3

Institut Curie - Nextflow SmartSeq3 analysis pipeline

Introduction

The pipeline was built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with containers making installation trivial and results highly reproducible.

Pipeline Summary

The aim of the SmartSeq3 is to combine a full-length transcriptome coverage and a 5' UMI counting strategy to allow a better characterisation of single-cell transcriptomes. To do so, a template-switching oligo (TSO) is added in 5' parts of mRNAs (cf. figure below). The TSO is used for reverse transcription and Tn5-based tagmentation that randomly cut cDNAs. This leads to three types of reads: 5'UMI reads, internal reads and 3' linker reads. Finally, these reads are sequenced in a paired-end fashion and analyzed by this bioinformatic pipeline.

MultiQC

Get R1 reads having a 5' tag to catch UMI reads (seqkit)
Extract UMIs from tagged reads (umi-tools)
Trim 3' linker and polyA tails on R2 reads (cutadapt)
Read alignments on R1+R2 (STAR)
Read assignments on R1+R2 (FeatureCounts)
Generation of UMI count matrices (umi-tools)
BigWig generations (bamCoverage)
Estimate gene body coverage (genebody_coverage)
Generate cell QC plots (#UMIS per cell, %MT transcrits per cell, UMI & Gene per cell)
Generate a 10X format matrix with all cells
Results summary (MultiQC)

Quick help

```bash

N E X T F L O W ~ version 20.01.0

SmartSeq3 v.1.0

Usage:

nextflow run main.nf --reads '*R{1,2}.fastq.gz' -profile conda --genomeAnnotationPath '/data/annotations/pipelines' --genome 'hg38' nextflow run main.nf --samplePlan 'sampleplan.csv' -profile conda --genomeAnnotationPath '/data/annotations/pipelines' --genome 'hg38'

Mandatory arguments: --reads [file] Path to input data (must be surrounded with quotes) --samplePlan [file] Path to sample plan input file (cannot be used with --reads) --genome [str] Name of genome reference -profile [str] Configuration profile to use. test / conda / multiconda / path / multipath / singularity / docker / cluster (see below)

Inputs: --starIndex [dir] Index for STAR aligner --singleEnd [bool] Specifies that the input is single-end reads

Skip options: All are false by default --skipSoftVersion [bool] Do not report software version --skipMultiQC [bool] Skips MultiQC --skipGeneCov [bool] Skips calculating genebody coverage

Genomes: If not specified in the configuration file or if you wish to overwrite any of the references given by the --genome field --genomeAnnotationPath [file] Path to genome annotation folder

Other options: --outDir [file] The output directory where the results will be saved -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic

======================================================= Available Profiles

-profile test                Set up the test dataset
-profile conda               Build a single conda for with all tools used by the different processes before running the pipeline
-profile multiconda          Build a new conda environment for each tools used by the different processes before running the pipeline
-profile path                Use the path defined in the configuration for all tools
-profile multipath           Use the paths defined in the configuration for each tool
-profile docker              Use the Docker containers for each process
-profile singularity         Use the singularity images for each process
-profile cluster             Run the workflow on the cluster, instead of locally

```

Quick run

The pipeline can be run on any infrastructure from a list of input files or from a sample plan as follow

Run the pipeline on a test dataset

See the conf/test.conf to set your test dataset.

``` nextflow run main.nf -profile test,conda

```

Run the pipeline from a `sample plan`

``` nextflow run main.nf --samplePlan MYSAMPLEPLAN --genome 'hg19' --genomeAnnotationPath ANNOTATIONPATH --outDir MYOUTPUT_DIR

```

Defining the '-profile'

By default (whithout any profile), Nextflow will excute the pipeline locally, expecting that all tools are available from your PATH variable.

In addition, we set up a few profiles that should allow you i/ to use containers instead of local installation, ii/ to run the pipeline on a cluster instead of on a local architecture. The description of each profile is available on the help message (see above).

Here are a few examples of how to set the profile option.

```

Run the pipeline locally, using a global environment where all tools are installed (build by conda for instance)

-profile path --globalPath INSTALLATION_PATH

Run the pipeline on the cluster, using the Singularity containers

-profile cluster,singularity --singularityPath SINGULARITY_PATH

Run the pipeline on the cluster, building a new conda environment

-profile cluster,conda --condaCacheDir CONDA_CACHE

```

Sample Plan

A sample plan is a csv file (comma separated) that list all samples with their biological IDs. The sample plan is expected to be created as below :

SAMPLEID | SAMPLENAME | FASTQR1 [Path to R1.fastq file] | FASTQR2 [For paired end, path to Read 2 fastq]

Full Documentation

Credits

This pipeline has been written by the single cell & bioinformatics platform of the Institut Curie (Louisa Hadj Abed, Celine Vallot, Nicolas Servant)

Contacts

For any question, bug or suggestion, please use the issues system or contact the bioinformatics core facility.

Owner

Name: Institut Curie, Bioinformatics Core Facility
Login: bioinfo-pf-curie
Kind: organization
Location: Paris, France

Website: https://bioinfo-pf-curie.github.io/
Repositories: 11
Profile: https://github.com/bioinfo-pf-curie

bioinformatics platform of the Institut Curie

GitHub Events

Total

Release event: 3
Watch event: 1
Member event: 1
Push event: 2
Fork event: 1
Create event: 1

Last Year

Release event: 3
Watch event: 1
Member event: 1
Push event: 2
Fork event: 1
Create event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bioinfo-pf-curie/scrna-smartseq3

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

SmartSeq3

Introduction

Pipeline Summary

Quick help

N E X T F L O W ~ version 20.01.0

SmartSeq3 v.1.0

Quick run

Run the pipeline on a test dataset

Run the pipeline from a `sample plan`

Defining the '-profile'

Run the pipeline locally, using a global environment where all tools are installed (build by conda for instance)

Run the pipeline on the cluster, using the Singularity containers

Run the pipeline on the cluster, building a new conda environment

Sample Plan

Full Documentation

Credits

Contacts

Owner

GitHub Events

Total

Last Year

https://github.com/bioinfo-pf-curie/scrna-smartseq3

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

SmartSeq3

Introduction

Pipeline Summary

Quick help

N E X T F L O W ~ version 20.01.0

SmartSeq3 v.1.0

Quick run

Run the pipeline on a test dataset

Run the pipeline from a sample plan

Defining the '-profile'

Run the pipeline locally, using a global environment where all tools are installed (build by conda for instance)

Run the pipeline on the cluster, using the Singularity containers

Run the pipeline on the cluster, building a new conda environment

Sample Plan

Full Documentation

Credits

Contacts

Owner

GitHub Events

Total

Last Year

Run the pipeline from a `sample plan`