single_cell_nextflow_pipeline

Single cell NextFlow pipeline using Pipseeker

https://github.com/uk-sbcoa-ebbertlab/single_cell_nextflow_pipeline

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Single cell NextFlow pipeline using Pipseeker

Basic Info

Host: GitHub
Owner: UK-SBCoA-EbbertLab
License: mit
Language: Java
Default Branch: main
Homepage:
Size: 586 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed 8 months ago

Metadata Files

Readme License Citation

Single-Cell Nextflow Pipeline (Pipseeker)

This pipeline was used to prepare single-cell data, sequenced using a protocol designed in our lab, for analysis. The protocol combined ONT long-reads with PIP-seq single-cell. This pipeline was based on a pipeline developed by Bernardo Aguzzoli-Heberle and then heavily modified to fit our needs here.

Read our paper for more information about this pipeline: TODO: place paper URL here

Steps Overview

This pipeline that we ran has 5 steps. A brief explanation of each is found below.

The first step is the pre-processing step, called Step 0, that takes fastqs outputted after basecalling ONT long-reads. Briefly, this step takes those fastqs and demultiplexes them into fastqs that contain reads for only a single barcode, ie. each fastq contains the reads for a single cell.

The second step is what we call Step 2 - Bulk. Here we run the data not on a single cell level, but as 'bulk' in order for us to determine the QC reports and to prepare the bulk data to be run through Bambu Discovery to identify novel isoforms.

The third step, Step 3 - Bulk, runs the prepared .rds files through Bambu Discovery. This step generates the extended annotation file used in later steps, the multiqc report, and outputs from gffcompare comparing other paper's annotations to ours to be analysed later.

The fourth step is Step 2 - Single Cell. This step takes the cell fastq files through alignment, filtering, and bambu preparation to be run through Bambu quantification using the extended annotation from the previous step.

The fifth step is Step 3 - Single Cell, which runs the data through Bambu quantification and generates the final counts matrices to be analysed downstream.

Examples of the submissions

Example for Step 0: Single-cell preprocessing

nextflow ../../../workflow/main.nf \
    --sample_id_table "../../../datasets/PBMC/PBMC_patient0_rebasecalled/sample_id_to_folder.tsv" \
    --ont_reads_fq_dir "../../../datasets/PBMC/PBMC_patient0_rebasecalled/" \
    --out_dir "PBMC_rebasecalled_FEB_10_2025" \
        --demultiplex_name "PBMC_rebasecalled_FEB_10_2025" \
    --cdna_kit "PCS114" \
    --qscore_thresh "9" \
    --barcode_thresh 100 \
    --mpldir "/tmp/dir/mpl_config" \
    --n_threads 8 \
        --step 0 \
    -with-report \
    -with-timeline \
    -with-trace

Example for Step 2 - Bulk

nextflow ../../../workflow/main.nf \
    --step 2 \
    --ont_reads_fq "../../../../submission/PBMC_patient0/STEP_0/results/PBMC_rebasecalled_FEB_10_2025/pre_processing/concatenated_fastq_and_sequencing_summary_files/*.fastq" \
    --ont_reads_txt "../../../../submission/PBMC_patient0/STEP_0/results/PBMC_rebasecalled_FEB_10_2025/pre_processing/concatenated_fastq_and_sequencing_summary_files/*.txt" \
    --read_stats "../../../../submission/PBMC_patient0/STEP_0/results/PBMC_rebasecalled_FEB_10_2025/pre_processing/stats/**/*.combined_stats.json" \
    --ref "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
    --annotation "/path/to/sequencing_resources/annotations/Ensembl/hg38_release_113/Homo_sapiens.GRCh38.113.gtf" \
    --housekeeping "../../../references/hg38.HouseKeepingGenes.bed" \
    --out_dir "PBMC_rebasecalled_bulk_discovery_FEB_10_2025_with_contamination" \
    --cdna_kit "PCS114" \
    --is_chm13 "False" \
    --mapq 10 \
    --is_dRNA "False" \
    --contamination_ref "../../../contamination_reference_doc/master_contaminant_reference.fasta" \
    --qscore_thresh 9 \
    --tmpwritedir "/tmp/dir/mpl_config/" \
    -with-trace \
    -with-timeline \
    -with-report

Example for Step 3 - Bulk

nextflow ../../../workflow/main.nf \
    --bambu_rds "../STEP_2/results/PBMC_rebasecalled_bulk_discovery_FEB_10_2025_with_contamination/bambu_prep/*.rds" \
    --ref "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
    --fai "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai" \
    --annotation "/path/to/sequencing_resources/annotations/Ensembl/hg38_release_113/Homo_sapiens.GRCh38.113.gtf" \
    --is_discovery "True" \
    --track_reads "False" \
    --multiqc_input "../STEP_2/results/PBMC_rebasecalled_bulk_discovery_FEB_10_2025_with_contamination/multiQC_input/**" \
    --multiqc_config "../../../workflow/bin/multiqc_config.yaml" \
    --intermediate_qc "../STEP_2/results/PBMC_rebasecalled_bulk_discovery_FEB_10_2025_with_contamination/intermediate_qc_reports/" \
    --glinos_annotation "../../../annotations/glinos_annotation_clean.gtf" \
    --leung_annotation "../../../annotations/leung_annotation_clean.gtf" \
    --heberle_annotation "../../../annotations/heberle_annotation_clean.gtf" \
    --out_dir "PBMC_rebasecalled_bulk_discovery_FEB_11_2025_with_contamination" \
    --step "3" \
    --is_chm13 "False" \
    -with-trace \
    -with-timeline \
    -with-report

Example for Step 2 - Single-Cell

nextflow ../../../workflow/main.nf \
    --step "2" \
    --ont_reads_fq "../STEP_0/results/PBMC_rebasecalled_FEB_10_2025/pre_processing/06_demultiplexed/**/*.fastq" \
    --ref "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
    --annotation "../../../cDNA_pipeline/submission/PBMC_patien0/STEP_3/results/PBMC_rebasecalled_bulk_discovery_FEB_11_2025_with_contamination/bambu_discovery/extended_annotations.gtf" \
    --out_dir "PBMC_patient0_FEB_11_2025_bambu_quant" \
    --mapq "10" \
    --track_reads "True" \
    --mapped_reads_thresh 100 \
    -with-trace \
    -with-timeline \
    -with-report

Example for Step 3 - Single-Cell

nextflow ../../../workflow/main.nf \
    --bambu_rds "../STEP_2/results/PBMC_patient0_FEB_11_2025_bambu_quant/bambu_prep/*.rds" \
    --ref "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
    --fai "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai" \
    --annotation "../../../cDNA_pipeline/submission/PBMC_patien0/STEP_3/results/PBMC_rebasecalled_bulk_discovery_FEB_11_2025_with_contamination/bambu_discovery/extended_annotations.gtf" \
    --is_discovery "False" \
    --track_reads "False" \
    --out_dir "PBMC_patient0_FEB_11_2025_bambu_quant" \
    --step "3" \
    --is_chm13 "False" \
    -with-trace \
    -with-timeline \
    -with-report

Owner

Name: UK Sanders-Brown Center on Aging - Ebbert Lab
Login: UK-SBCoA-EbbertLab
Kind: organization

Repositories: 3
Profile: https://github.com/UK-SBCoA-EbbertLab

Ebbert BioInformatics Lab

Citation (CITATIONS.md)

# Citations

## Main

[Nextflow](https://www.nextflow.io/docs/latest/index.html)

[Singularity](https://docs.sylabs.io/guides/latest/user-guide/)

## Single-cell pre-processing

[Pipseeker](https://www.fluentbio.com/products/pipseeker-software-for-data-analysis/)

## Pre-processing

[Guppy](https://timkahlke.github.io/LongRead_tutorials/BS_G.html)

[Pychopper](https://github.com/epi2me-labs/pychopper)


## Quality Control

[PycoQC](https://github.com/a-slide/pycoQC)

[RSeQC](https://rseqc.sourceforge.net/)

[MultiQC](https://multiqc.info/)


## Mapping

[Minimap2](https://github.com/lh3/minimap2)

## Transcriptomics

[GFFread](https://github.com/gpertea/gffread)

[GFFcompare](https://ccb.jhu.edu/software/stringtie/gffcompare.shtml)

[Bambu](https://github.com/GoekeLab/bambu)


## Other Genomics Tools

[Samtools](https://github.com/samtools/samtools)


## R Packages

[BioConductor](https://www.bioconductor.org/)

[Bambu](https://github.com/GoekeLab/bambu)



## Other

[Conda](https://docs.conda.io/en/latest/)

[Bioconda](https://bioconda.github.io/)

[pip](https://pypi.org/project/pip/)

GitHub Events

Total

Delete event: 3
Push event: 20
Pull request event: 5
Create event: 3

Last Year

Delete event: 3
Push event: 20
Pull request event: 5
Create event: 3

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: about 9 hours
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: about 9 hours
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science