single_cell_nextflow_pipeline
Single cell NextFlow pipeline using Pipseeker
https://github.com/uk-sbcoa-ebbertlab/single_cell_nextflow_pipeline
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary
Repository
Single cell NextFlow pipeline using Pipseeker
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Single-Cell Nextflow Pipeline (Pipseeker)
This pipeline was used to prepare single-cell data, sequenced using a protocol designed in our lab, for analysis. The protocol combined ONT long-reads with PIP-seq single-cell. This pipeline was based on a pipeline developed by Bernardo Aguzzoli-Heberle and then heavily modified to fit our needs here.
Read our paper for more information about this pipeline: TODO: place paper URL here
Steps Overview
This pipeline that we ran has 5 steps. A brief explanation of each is found below.
The first step is the pre-processing step, called Step 0, that takes fastqs outputted after basecalling ONT long-reads. Briefly, this step takes those fastqs and demultiplexes them into fastqs that contain reads for only a single barcode, ie. each fastq contains the reads for a single cell.
The second step is what we call Step 2 - Bulk. Here we run the data not on a single cell level, but as 'bulk' in order for us to determine the QC reports and to prepare the bulk data to be run through Bambu Discovery to identify novel isoforms.
The third step, Step 3 - Bulk, runs the prepared .rds files through Bambu Discovery. This step generates the extended annotation file used in later steps, the multiqc report, and outputs from gffcompare comparing other paper's annotations to ours to be analysed later.
The fourth step is Step 2 - Single Cell. This step takes the cell fastq files through alignment, filtering, and bambu preparation to be run through Bambu quantification using the extended annotation from the previous step.
The fifth step is Step 3 - Single Cell, which runs the data through Bambu quantification and generates the final counts matrices to be analysed downstream.
Examples of the submissions
Example for Step 0: Single-cell preprocessing
nextflow ../../../workflow/main.nf \
--sample_id_table "../../../datasets/PBMC/PBMC_patient0_rebasecalled/sample_id_to_folder.tsv" \
--ont_reads_fq_dir "../../../datasets/PBMC/PBMC_patient0_rebasecalled/" \
--out_dir "PBMC_rebasecalled_FEB_10_2025" \
--demultiplex_name "PBMC_rebasecalled_FEB_10_2025" \
--cdna_kit "PCS114" \
--qscore_thresh "9" \
--barcode_thresh 100 \
--mpldir "/tmp/dir/mpl_config" \
--n_threads 8 \
--step 0 \
-with-report \
-with-timeline \
-with-trace
Example for Step 2 - Bulk
nextflow ../../../workflow/main.nf \
--step 2 \
--ont_reads_fq "../../../../submission/PBMC_patient0/STEP_0/results/PBMC_rebasecalled_FEB_10_2025/pre_processing/concatenated_fastq_and_sequencing_summary_files/*.fastq" \
--ont_reads_txt "../../../../submission/PBMC_patient0/STEP_0/results/PBMC_rebasecalled_FEB_10_2025/pre_processing/concatenated_fastq_and_sequencing_summary_files/*.txt" \
--read_stats "../../../../submission/PBMC_patient0/STEP_0/results/PBMC_rebasecalled_FEB_10_2025/pre_processing/stats/**/*.combined_stats.json" \
--ref "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
--annotation "/path/to/sequencing_resources/annotations/Ensembl/hg38_release_113/Homo_sapiens.GRCh38.113.gtf" \
--housekeeping "../../../references/hg38.HouseKeepingGenes.bed" \
--out_dir "PBMC_rebasecalled_bulk_discovery_FEB_10_2025_with_contamination" \
--cdna_kit "PCS114" \
--is_chm13 "False" \
--mapq 10 \
--is_dRNA "False" \
--contamination_ref "../../../contamination_reference_doc/master_contaminant_reference.fasta" \
--qscore_thresh 9 \
--tmpwritedir "/tmp/dir/mpl_config/" \
-with-trace \
-with-timeline \
-with-report
Example for Step 3 - Bulk
nextflow ../../../workflow/main.nf \
--bambu_rds "../STEP_2/results/PBMC_rebasecalled_bulk_discovery_FEB_10_2025_with_contamination/bambu_prep/*.rds" \
--ref "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
--fai "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai" \
--annotation "/path/to/sequencing_resources/annotations/Ensembl/hg38_release_113/Homo_sapiens.GRCh38.113.gtf" \
--is_discovery "True" \
--track_reads "False" \
--multiqc_input "../STEP_2/results/PBMC_rebasecalled_bulk_discovery_FEB_10_2025_with_contamination/multiQC_input/**" \
--multiqc_config "../../../workflow/bin/multiqc_config.yaml" \
--intermediate_qc "../STEP_2/results/PBMC_rebasecalled_bulk_discovery_FEB_10_2025_with_contamination/intermediate_qc_reports/" \
--glinos_annotation "../../../annotations/glinos_annotation_clean.gtf" \
--leung_annotation "../../../annotations/leung_annotation_clean.gtf" \
--heberle_annotation "../../../annotations/heberle_annotation_clean.gtf" \
--out_dir "PBMC_rebasecalled_bulk_discovery_FEB_11_2025_with_contamination" \
--step "3" \
--is_chm13 "False" \
-with-trace \
-with-timeline \
-with-report
Example for Step 2 - Single-Cell
nextflow ../../../workflow/main.nf \
--step "2" \
--ont_reads_fq "../STEP_0/results/PBMC_rebasecalled_FEB_10_2025/pre_processing/06_demultiplexed/**/*.fastq" \
--ref "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
--annotation "../../../cDNA_pipeline/submission/PBMC_patien0/STEP_3/results/PBMC_rebasecalled_bulk_discovery_FEB_11_2025_with_contamination/bambu_discovery/extended_annotations.gtf" \
--out_dir "PBMC_patient0_FEB_11_2025_bambu_quant" \
--mapq "10" \
--track_reads "True" \
--mapped_reads_thresh 100 \
-with-trace \
-with-timeline \
-with-report
Example for Step 3 - Single-Cell
nextflow ../../../workflow/main.nf \
--bambu_rds "../STEP_2/results/PBMC_patient0_FEB_11_2025_bambu_quant/bambu_prep/*.rds" \
--ref "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa" \
--fai "/path/to/sequencing_resources/references/Ensembl/hg38_release_113/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai" \
--annotation "../../../cDNA_pipeline/submission/PBMC_patien0/STEP_3/results/PBMC_rebasecalled_bulk_discovery_FEB_11_2025_with_contamination/bambu_discovery/extended_annotations.gtf" \
--is_discovery "False" \
--track_reads "False" \
--out_dir "PBMC_patient0_FEB_11_2025_bambu_quant" \
--step "3" \
--is_chm13 "False" \
-with-trace \
-with-timeline \
-with-report
Owner
- Name: UK Sanders-Brown Center on Aging - Ebbert Lab
- Login: UK-SBCoA-EbbertLab
- Kind: organization
- Repositories: 3
- Profile: https://github.com/UK-SBCoA-EbbertLab
Ebbert BioInformatics Lab
Citation (CITATIONS.md)
# Citations ## Main [Nextflow](https://www.nextflow.io/docs/latest/index.html) [Singularity](https://docs.sylabs.io/guides/latest/user-guide/) ## Single-cell pre-processing [Pipseeker](https://www.fluentbio.com/products/pipseeker-software-for-data-analysis/) ## Pre-processing [Guppy](https://timkahlke.github.io/LongRead_tutorials/BS_G.html) [Pychopper](https://github.com/epi2me-labs/pychopper) ## Quality Control [PycoQC](https://github.com/a-slide/pycoQC) [RSeQC](https://rseqc.sourceforge.net/) [MultiQC](https://multiqc.info/) ## Mapping [Minimap2](https://github.com/lh3/minimap2) ## Transcriptomics [GFFread](https://github.com/gpertea/gffread) [GFFcompare](https://ccb.jhu.edu/software/stringtie/gffcompare.shtml) [Bambu](https://github.com/GoekeLab/bambu) ## Other Genomics Tools [Samtools](https://github.com/samtools/samtools) ## R Packages [BioConductor](https://www.bioconductor.org/) [Bambu](https://github.com/GoekeLab/bambu) ## Other [Conda](https://docs.conda.io/en/latest/) [Bioconda](https://bioconda.github.io/) [pip](https://pypi.org/project/pip/)
GitHub Events
Total
- Delete event: 3
- Push event: 20
- Pull request event: 5
- Create event: 3
Last Year
- Delete event: 3
- Push event: 20
- Pull request event: 5
- Create event: 3
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 9 hours
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 9 hours
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- mpage21 (2)