https://github.com/bioinfo-pf-curie/scrna-smartseq3
Smartseq3 single-cell RNAseq data analysis.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
Smartseq3 single-cell RNAseq data analysis.
Basic Info
- Host: GitHub
- Owner: bioinfo-pf-curie
- License: other
- Language: Nextflow
- Default Branch: master
- Size: 1.08 GB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
SmartSeq3
Institut Curie - Nextflow SmartSeq3 analysis pipeline
Introduction
The pipeline was built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with containers making installation trivial and results highly reproducible.
Pipeline Summary
The aim of the SmartSeq3 is to combine a full-length transcriptome coverage and a 5' UMI counting strategy to allow a better characterisation of single-cell transcriptomes. To do so, a template-switching oligo (TSO) is added in 5' parts of mRNAs (cf. figure below). The TSO is used for reverse transcription and Tn5-based tagmentation that randomly cut cDNAs. This leads to three types of reads: 5'UMI reads, internal reads and 3' linker reads. Finally, these reads are sequenced in a paired-end fashion and analyzed by this bioinformatic pipeline.

- Get R1 reads having a 5' tag to catch UMI reads (
seqkit) - Extract UMIs from tagged reads (
umi-tools) - Trim 3' linker and polyA tails on R2 reads (
cutadapt) - Read alignments on R1+R2 (
STAR) - Read assignments on R1+R2 (
FeatureCounts) - Generation of UMI count matrices (
umi-tools) - BigWig generations (
bamCoverage) - Estimate gene body coverage (
genebody_coverage) - Generate cell QC plots (#UMIS per cell, %MT transcrits per cell, UMI & Gene per cell)
- Generate a 10X format matrix with all cells
- Results summary (
MultiQC)
Quick help
```bash
N E X T F L O W ~ version 20.01.0
SmartSeq3 v.1.0
Usage:
nextflow run main.nf --reads '*R{1,2}.fastq.gz' -profile conda --genomeAnnotationPath '/data/annotations/pipelines' --genome 'hg38' nextflow run main.nf --samplePlan 'sampleplan.csv' -profile conda --genomeAnnotationPath '/data/annotations/pipelines' --genome 'hg38'
Mandatory arguments: --reads [file] Path to input data (must be surrounded with quotes) --samplePlan [file] Path to sample plan input file (cannot be used with --reads) --genome [str] Name of genome reference -profile [str] Configuration profile to use. test / conda / multiconda / path / multipath / singularity / docker / cluster (see below)
Inputs: --starIndex [dir] Index for STAR aligner --singleEnd [bool] Specifies that the input is single-end reads
Skip options: All are false by default --skipSoftVersion [bool] Do not report software version --skipMultiQC [bool] Skips MultiQC --skipGeneCov [bool] Skips calculating genebody coverage
Genomes: If not specified in the configuration file or if you wish to overwrite any of the references given by the --genome field --genomeAnnotationPath [file] Path to genome annotation folder
Other options: --outDir [file] The output directory where the results will be saved -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic
======================================================= Available Profiles
-profile test Set up the test dataset
-profile conda Build a single conda for with all tools used by the different processes before running the pipeline
-profile multiconda Build a new conda environment for each tools used by the different processes before running the pipeline
-profile path Use the path defined in the configuration for all tools
-profile multipath Use the paths defined in the configuration for each tool
-profile docker Use the Docker containers for each process
-profile singularity Use the singularity images for each process
-profile cluster Run the workflow on the cluster, instead of locally
```
Quick run
The pipeline can be run on any infrastructure from a list of input files or from a sample plan as follow
Run the pipeline on a test dataset
See the conf/test.conf to set your test dataset.
``` nextflow run main.nf -profile test,conda
```
Run the pipeline from a sample plan
``` nextflow run main.nf --samplePlan MYSAMPLEPLAN --genome 'hg19' --genomeAnnotationPath ANNOTATIONPATH --outDir MYOUTPUT_DIR
```
Defining the '-profile'
By default (whithout any profile), Nextflow will excute the pipeline locally, expecting that all tools are available from your PATH variable.
In addition, we set up a few profiles that should allow you i/ to use containers instead of local installation, ii/ to run the pipeline on a cluster instead of on a local architecture. The description of each profile is available on the help message (see above).
Here are a few examples of how to set the profile option.
```
Run the pipeline locally, using a global environment where all tools are installed (build by conda for instance)
-profile path --globalPath INSTALLATION_PATH
Run the pipeline on the cluster, using the Singularity containers
-profile cluster,singularity --singularityPath SINGULARITY_PATH
Run the pipeline on the cluster, building a new conda environment
-profile cluster,conda --condaCacheDir CONDA_CACHE
```
Sample Plan
A sample plan is a csv file (comma separated) that list all samples with their biological IDs. The sample plan is expected to be created as below :
SAMPLEID | SAMPLENAME | FASTQR1 [Path to R1.fastq file] | FASTQR2 [For paired end, path to Read 2 fastq]
Full Documentation
- Installation
- Reference genomes
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
Credits
This pipeline has been written by the single cell & bioinformatics platform of the Institut Curie (Louisa Hadj Abed, Celine Vallot, Nicolas Servant)
Contacts
For any question, bug or suggestion, please use the issues system or contact the bioinformatics core facility.
Owner
- Name: Institut Curie, Bioinformatics Core Facility
- Login: bioinfo-pf-curie
- Kind: organization
- Location: Paris, France
- Website: https://bioinfo-pf-curie.github.io/
- Repositories: 11
- Profile: https://github.com/bioinfo-pf-curie
bioinformatics platform of the Institut Curie
GitHub Events
Total
- Release event: 3
- Watch event: 1
- Member event: 1
- Push event: 2
- Fork event: 1
- Create event: 1
Last Year
- Release event: 3
- Watch event: 1
- Member event: 1
- Push event: 2
- Fork event: 1
- Create event: 1