phip-seq-tp-vac
Pipeline for analysis of the Treponema pallidum PhiP-Seq assay data @ Greninger Lab
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 7 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.5%) to scientific vocabulary
Repository
Pipeline for analysis of the Treponema pallidum PhiP-Seq assay data @ Greninger Lab
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Phip-seq TP vac Nextflow pipeline
Introduction
nf-core/phipseqtpvac is a bioinformatics pipeline designed to analyze Treponema pallidum PhiP-Seq assay data. It processes FASTQ files using a provided sample sheet, performing quality control (QC), trimming, and (pseudo-)alignment. The pipeline outputs a count matrix in CSV format, where rows represent features from the PhiP-Seq library and columns correspond to the input samples, providing a comprehensive quantification of feature counts across all samples.
- Adapter and quality trimming (
Cutadapt) - Pseudoalignment and quantification (
Kallisto) - Aggregation of Kallisto quantifications into a unified count matrix (custom Python script)
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
Each row corresponds to a single sample (one pull-down) with its associated pair of FASTQ files for paired-end sequencing. Example samplesheet can be found in the test_input directory along with the test data.
Now, you can run the pipeline using:
bash
nextflow run nf-core/phipseqtpvac \
-profile <docker/singularity/...> \
--input example_samplesheet.csv \
--outdir <OUTDIR>
To run a specific verion of the pipeline, use the appropriate tag: nextflow run nf-core/phipseqtpvac -r <tag>
[!WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
The pipeline parameters and their default values:
| Parameter | Description | Default value |
| ------------- | ------------- | ------------- |
| input | Path to the samplesheet with input data | Required |
| outdir | Directory to save results | ./results |
| cutadapt_minimum_len | Minimum read length after Cutadapt trimming | 20 |
| trimR1 | Bases removed from the start of Forward reads (Cutadapt) | 44 |
| trimR2 | Bases removed from the start of Reverse reads (Cutadapt) | 66 |
| kal_index_ref | Path to the Kallisto index file | See latest release assets archive |
| target_keys | Path to the PhiP-Seq library keys | See latest release assets archive |
kal_index_ref and target_keys files are provided in the assets archive of the latest release. The kal_index_ref file is a Kallisto index file for the PhiP-Seq library, and the target_keys file contains the PhiP-Seq library keys. The pipeline uses these files to quantify the PhiP-Seq library features.
Pipeline output
The pipeline generates separate nested output directories for results from Cutadapt, Kallisto, and the custom Kallisto output parsing program (cutadapt_out, kallisto_out, and parsed_raw_counts, respectively). Each directory contains the relevant quality control, trimming, or quantification files. A unified count matrix (parsed_raw_counts/kallisto_raw_counts_merged.csv) is produced, summarizing feature counts across all input samples.
Credits
nf-core/phipseqtpvac was originally written by @DariiaVyshenska.
We thank the following people for their extensive assistance in the development of this pipeline: Thaddeus Armstrong, Ben Wieland, Alex Greninger.
Contributions and Support
If you’re interested in contributing to this pipeline, please reach out to the repository owner.
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: Dariia Vyshenska
- Login: DariiaVyshenska
- Kind: user
- Location: Seattle, WA, USA
- Website: www.linkedin.com/in/dariia-vyshenska/
- Repositories: 24
- Profile: https://github.com/DariiaVyshenska
Citation (CITATIONS.md)
# nf-core/phipseqtpvac: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [Cutadapt](https://cutadapt.readthedocs.io/en/stable/) > Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1), pp. 10-12. doi:https://doi.org/10.14806/ej.17.1.200 - [Kallisto](https://pachterlab.github.io/kallisto/) > Nicolas L Bray, Harold Pimentel, Páll Melsted and Lior Pachter, Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology 34, 525–527 (2016), doi:10.1038/nbt.3519 ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Release event: 2
- Push event: 11
- Create event: 2
Last Year
- Release event: 2
- Push event: 11
- Create event: 2
Dependencies
- actions/upload-artifact v4 composite
- seqeralabs/action-tower-launch v2 composite
- actions/upload-artifact v4 composite
- seqeralabs/action-tower-launch v2 composite
- mshick/add-pr-comment b8f338c590a895d50bcbfa6c5859251edc8952fc composite
- actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
- jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
- nf-core/setup-nextflow v1 composite
- actions/stale 28ca1036281a5e5922ead5184a1bbf96e5fc984e composite
- actions/setup-python 0a5c61591373683505ea898e09a3ea4f39ef2b9c composite
- eWaterCycle/setup-singularity 931d4e31109e875b13309ae1d07c70ca8fbc8537 composite
- nf-core/setup-nextflow v1 composite
- actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
- actions/setup-python 0a5c61591373683505ea898e09a3ea4f39ef2b9c composite
- peter-evans/create-or-update-comment 71345be0265236311c031f5c7866368bd1eff043 composite
- actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
- actions/setup-python 0a5c61591373683505ea898e09a3ea4f39ef2b9c composite
- actions/upload-artifact 5d5d22a31266ced268874388b861e4b58bb5c2f3 composite
- nf-core/setup-nextflow v1 composite
- dawidd6/action-download-artifact f6b0bace624032e30a85a8fd9c1a7f8f611f5737 composite
- marocchino/sticky-pull-request-comment 331f8f5b4215f0445d3c07b4967662a32a2d3e31 composite
- actions/setup-python 0a5c61591373683505ea898e09a3ea4f39ef2b9c composite
- rzr/fediverse-action master composite
- zentered/bluesky-post-action 80dbe0a7697de18c15ad22f4619919ceb5ccf597 composite