https://github.com/bioinfo-pf-curie/nf-neoant

Detection of neoantigens from WES and RNA sequencing data

https://github.com/bioinfo-pf-curie/nf-neoant

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Detection of neoantigens from WES and RNA sequencing data

Basic Info
  • Host: GitHub
  • Owner: bioinfo-pf-curie
  • License: other
  • Language: Python
  • Default Branch: main
  • Size: 2.89 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Changelog License

README.md

nf-neoant

Detection of neoantigens from WES and RNA sequencing data

nf-neoAnt pipeline

Nextflow Install with Singularity Container available Docker Container available

Introduction

The pipeline is built using Nextflow, a workflow manager to run tasks across multiple compute infrastructures in a very portable manner. It supports conda package manager and singularity / Docker containers making installation easier and results highly reproducible.

Pipeline summary

The objective of the pipeline is to predict tumor-specific neoantigen based on both DNA and RNA next generation sequencing data from patients. <!-- * HLA typing are divided into two parts: - Optitype (v1.3.5) for MHCI, based on the nf-core hlatyping pipeline - HLA-LA (v1.0.3) for MHCII --> * HLA typing is performed by seq2HLA (v2.2) on both MHCI and MHCII, based on the paired RNA fast files.

  • Detection of neoantigen is performed by the pVACtools suite (v4.1.1). The pipeline is divided into two parts, one focusing on DNA-based analysis (pVACseq) and the other one based on fusions events derived from RNAseq data (pVACfuse).

  • MiXCR (v4.5.0) was added to provide a fast analysis of raw T- or B- cell receptor repertoires.

pVACseq

  • Paired RNAseq reads are aligned using STAR (v2.7.6a) on the STAR index using the --quantMode TranscriptomeSAM option to obtain a transcriptome-based alignments BAM file. Per gene and per transcript TPM (transcript per million) are then estimated using Salmon (v1.10.2) with the adequate Gencode GFF3 and transcripts fasta files.

  • Small somatic variants (snvs, indels) were first called using the GATK Mutect2 (v4.1.8.0).

    • Variants were annotated using VEP (ENSEMBL v110.1).
    • Both gene (GX) and transcript (TX) expressions were then added using vatools (v5.1.0) and previously computed expression files
    • RNA depth (RDP) and RNA allelic ratio (RAF) were then added using a combination of bcftools (v1.15.1), GATK SelectVariants (v4.1.9.0) and bam-readcount (v0.8).
  • pVACseq was then run using HLA typing files (for MHCI & MHCII) on the resulting variant file.

pVACfuse

  • Arriba (v2.4.0) was run on a subset of the original STAR aligned file containing only reads of putative relevance to fusion detection, such as unmapped and clipped reads.
  • pVACfuse was then run on the list of filtered fusions of interest, using both HLA typing files.

Workflow

HLAtyping DNAseq RNAseq Fusion

Run the pipeline from a sample plan

Arguments & Parameters

  • sample_plan: csv file containing per-row samples information

  • assembly: the genome assembly for the analysis (example: hg38)

  • genomePath: path containing the different files described in "conf/genomes.config"

  • singularityImagePath: path to singularity images

  • vepdircache: path to the downloaded VEP cache from those instructions (here: species="homosapiens" & version="110GRCh38")

  • veppluginrepo: path to the VEP_plugins repository in which the Frameshift.pm was downloaded.

  • blacklisttsv: file obtained from downloading arriba archive (in the /database folder) called "blacklist${assembly}*.tsv.gz"

  • proteinGff: file obtained from downloading arriba archive (in the /database folder) called "proteindomain${assembly}*.gff3"

  • mi_license: path to the "mi.license" file neeeded for mixcr, free for academic

  • tmpdir: path to temporary folder

bash nextflow run main.nf --samplePlan ${sample_plan} \ --genome ${assembly} \ --genomeAnnotationPath ${genomePath} \ --outDir ${outputDir} \ --singularityImagePath ${sif} \ --vepDirCache ${vep_dir_cache} \ --vepPluginRepo ${vep_plugin_repo} \ --miLicense ${mi_license} \ --tmpdir ${tmpdirp} \ -profile singularity,cluster \ -w ${tmp_dir} \ -resume

Sample plan

A sample plan is a csv file (comma separated) that lists all the samples with a biological IDs. The sample plan is expected to contain the following fields (with no header):

sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf, path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex

Steps

Basic steps are the following: HLAtyping, RNAquant, pVacseq, pVacfuse, mixcr. They can be use separately (e.g.: --step HLAtyping or --step RNAquant or --step mixcr) or combined partially (e.g.: --step HLAtyping,RNAquant,pVacseq ; --step HLAtyping,pVacfuse) or all together (default mode ; --step HLAtyping, RNAquant, pVacseq, pVacfuse, mixcr) using the --step option.

HLA typing

If you only want to get HLA alleles (MHCI & MHCII), add the step "--step HLAtyping" to your command line. If you already have the two HLA allele files (MHCI & MHCII), add the full path to the sample plan as follow:

sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf, path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex,path_to_HLAI_file,path_toHLAII_file

RNA expression

If you only want to get transcript/gene based expression files (tpm), add the step "--step RNAquant" to your command line. If you already have the two gene-based and transcript-based expression files, add the full path to the sample plan as follow:

sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf, path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex,path_to_HLAI_file,path_toHLAII_file,path_to_gene_tpm_file,path_to_transcript_tpm_file

or, if you want to run the HLAtyping step (--step HLAtyping,RNAquant,pVacseq) sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf, path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex,,,path_to_gene_tpm_file,path_to_transcript_tpm_file

Test

Run the pipeline on the test dataset that will launch HLAtyping:

bash nextflow run main.nf -profile test,singularity --outDir ${outputDir} --singularityImagePath ${sif} -w ${work_dir}

Credits

This pipeline has been written by Institut Curie bioinformatics platform CUBIC (E.Girard, N.Servant). The project was funded by IMMUcan, the integrated European immuno-oncology profiling platform.

Contacts

For any question, bug or suggestion, please use the issue system or contact the bioinformatics core facility.

Owner

  • Name: Institut Curie, Bioinformatics Core Facility
  • Login: bioinfo-pf-curie
  • Kind: organization
  • Location: Paris, France

bioinformatics platform of the Institut Curie

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Dependencies

geniac/docs/environment.yml pypi
  • sphinx-rtd-theme ==1.0.0
geniac/docs/requirements.txt pypi
  • GitPython ==3.1.20
  • cmake ==3.21.3
  • colorlog ==6.5.0
  • dotty-dict ==1.3.0
  • importlib-metadata *
  • sphinx ==3.5.4
  • sphinx-rtd-theme ==1.0.0
  • validators ==0.18.2
geniac/environment.yml pypi
  • GitPython ==3.1.20
  • colorlog ==6.5.0
  • dotty-dict ==1.3.0
  • geniac *
  • pre-commit ==2.15.0
  • pytest ==6.2.5
  • pytest-cov ==3.0.0
  • pytest-datadir ==1.3.1
  • pytest-datafiles ==2.0
  • pytest-icdiff ==0.5
  • pytest-sugar ==0.9.4
  • setuptools-scm ==6.3.2
  • tox-conda ==0.8.3
  • validators ==0.18.2
  • wheel ==0.37.0
geniac/pyproject.toml pypi
geniac/setup.py pypi