https://github.com/bioinfo-pf-curie/nf-neoant

Detection of neoantigens from WES and RNA sequencing data

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Detection of neoantigens from WES and RNA sequencing data

Basic Info

Host: GitHub
Owner: bioinfo-pf-curie
License: other
Language: Python
Default Branch: main
Size: 2.89 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created about 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme Changelog License

nf-neoant

Detection of neoantigens from WES and RNA sequencing data

nf-neoAnt pipeline

Introduction

The pipeline is built using Nextflow, a workflow manager to run tasks across multiple compute infrastructures in a very portable manner. It supports conda package manager and singularity / Docker containers making installation easier and results highly reproducible.

Pipeline summary

The objective of the pipeline is to predict tumor-specific neoantigen based on both DNA and RNA next generation sequencing data from patients.  * HLA typing is performed by seq2HLA (v2.2) on both MHCI and MHCII, based on the paired RNA fast files.

Detection of neoantigen is performed by the pVACtools suite (v4.1.1). The pipeline is divided into two parts, one focusing on DNA-based analysis (pVACseq) and the other one based on fusions events derived from RNAseq data (pVACfuse).
MiXCR (v4.5.0) was added to provide a fast analysis of raw T- or B- cell receptor repertoires.

pVACseq

Paired RNAseq reads are aligned using STAR (v2.7.6a) on the STAR index using the --quantMode TranscriptomeSAM option to obtain a transcriptome-based alignments BAM file. Per gene and per transcript TPM (transcript per million) are then estimated using Salmon (v1.10.2) with the adequate Gencode GFF3 and transcripts fasta files.
Small somatic variants (snvs, indels) were first called using the GATK Mutect2 (v4.1.8.0).
- Variants were annotated using VEP (ENSEMBL v110.1).
- Both gene (GX) and transcript (TX) expressions were then added using vatools (v5.1.0) and previously computed expression files
- RNA depth (RDP) and RNA allelic ratio (RAF) were then added using a combination of bcftools (v1.15.1), GATK SelectVariants (v4.1.9.0) and bam-readcount (v0.8).
pVACseq was then run using HLA typing files (for MHCI & MHCII) on the resulting variant file.

pVACfuse

Arriba (v2.4.0) was run on a subset of the original STAR aligned file containing only reads of putative relevance to fusion detection, such as unmapped and clipped reads.
pVACfuse was then run on the list of filtered fusions of interest, using both HLA typing files.

Workflow

HLAtyping DNAseq RNAseq Fusion

Run the pipeline from a sample plan

Arguments & Parameters

sample_plan: csv file containing per-row samples information
assembly: the genome assembly for the analysis (example: hg38)
genomePath: path containing the different files described in "conf/genomes.config"

singularityImagePath: path to singularity images
vepdircache: path to the downloaded VEP cache from those instructions (here: species="homosapiens" & version="110GRCh38")
veppluginrepo: path to the VEP_plugins repository in which the Frameshift.pm was downloaded.

blacklisttsv: file obtained from downloading arriba archive (in the /database folder) called "blacklist${assembly}*.tsv.gz"
proteinGff: file obtained from downloading arriba archive (in the /database folder) called "proteindomain${assembly}*.gff3"

mi_license: path to the "mi.license" file neeeded for mixcr, free for academic
tmpdir: path to temporary folder

bash nextflow run main.nf --samplePlan ${sample_plan} \ --genome ${assembly} \ --genomeAnnotationPath ${genomePath} \ --outDir ${outputDir} \ --singularityImagePath ${sif} \ --vepDirCache ${vep_dir_cache} \ --vepPluginRepo ${vep_plugin_repo} \ --miLicense ${mi_license} \ --tmpdir ${tmpdirp} \ -profile singularity,cluster \ -w ${tmp_dir} \ -resume

Sample plan

A sample plan is a csv file (comma separated) that lists all the samples with a biological IDs. The sample plan is expected to contain the following fields (with no header):

sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf, path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex

Steps

Basic steps are the following: HLAtyping, RNAquant, pVacseq, pVacfuse, mixcr. They can be use separately (e.g.: --step HLAtyping or --step RNAquant or --step mixcr) or combined partially (e.g.: --step HLAtyping,RNAquant,pVacseq ; --step HLAtyping,pVacfuse) or all together (default mode ; --step HLAtyping, RNAquant, pVacseq, pVacfuse, mixcr) using the --step option.

HLA typing

If you only want to get HLA alleles (MHCI & MHCII), add the step "--step HLAtyping" to your command line. If you already have the two HLA allele files (MHCI & MHCII), add the full path to the sample plan as follow:

RNA expression

If you only want to get transcript/gene based expression files (tpm), add the step "--step RNAquant" to your command line. If you already have the two gene-based and transcript-based expression files, add the full path to the sample plan as follow:

or, if you want to run the HLAtyping step (--step HLAtyping,RNAquant,pVacseq) sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf, path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex,,,path_to_gene_tpm_file,path_to_transcript_tpm_file

Test

Run the pipeline on the test dataset that will launch HLAtyping:

bash nextflow run main.nf -profile test,singularity --outDir ${outputDir} --singularityImagePath ${sif} -w ${work_dir}

Credits

This pipeline has been written by Institut Curie bioinformatics platform CUBIC (E.Girard, N.Servant). The project was funded by IMMUcan, the integrated European immuno-oncology profiling platform.

Contacts

For any question, bug or suggestion, please use the issue system or contact the bioinformatics core facility.

Owner

Name: Institut Curie, Bioinformatics Core Facility
Login: bioinfo-pf-curie
Kind: organization
Location: Paris, France

Website: https://bioinfo-pf-curie.github.io/
Repositories: 11
Profile: https://github.com/bioinfo-pf-curie

bioinformatics platform of the Institut Curie

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Dependencies

geniac/docs/environment.yml pypi

sphinx-rtd-theme ==1.0.0

geniac/docs/requirements.txt pypi

GitPython ==3.1.20
cmake ==3.21.3
colorlog ==6.5.0
dotty-dict ==1.3.0
importlib-metadata *
sphinx ==3.5.4
sphinx-rtd-theme ==1.0.0
validators ==0.18.2

geniac/environment.yml pypi

GitPython ==3.1.20
colorlog ==6.5.0
dotty-dict ==1.3.0
geniac *
pre-commit ==2.15.0
pytest ==6.2.5
pytest-cov ==3.0.0
pytest-datadir ==1.3.1
pytest-datafiles ==2.0
pytest-icdiff ==0.5
pytest-sugar ==0.9.4
setuptools-scm ==6.3.2
tox-conda ==0.8.3
validators ==0.18.2
wheel ==0.37.0

geniac/pyproject.toml pypi

geniac/setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bioinfo-pf-curie/nf-neoant

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

nf-neoant

Detection of neoantigens from WES and RNA sequencing data

nf-neoAnt pipeline

Introduction

Pipeline summary

pVACseq

pVACfuse

Workflow

Run the pipeline from a sample plan

Arguments & Parameters

Sample plan

Steps

HLA typing

RNA expression

Test

Credits

Contacts

Owner

GitHub Events

Total

Last Year

Dependencies