variantalker

https://github.com/zhanyinx/variantalker

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: science.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: zhanyinx
License: other
Language: Python
Default Branch: main
Size: 81.6 MB

Statistics

Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 6

Created about 3 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

Variant annotation and prioritization pipeline

Contents
Overview
Installation
Documentation
Usage
Input
Output
Liability

Overview

Variant annotation in cancer genomics involves identifying and characterizing the genetic changes (variants) that contribute to cancer development and progression. The challenge is that there are many different types of variants that can occur in the genome, and not all of them are relevant to cancer. Therefore, accurate annotation is critical for identifying the key driver mutations and designing targeted therapies. However, this process is complicated by the large number of potential variants, the need to integrate data from multiple sources, and the ongoing discovery of new cancer-associated variants.

We have developed a Nextflow pipeline called variantalker that enables users to annotate variants from VCF files. Our pipeline supports VCF files generated from dragen, nf-sarek, and ION-torrent platforms.

BETA version: we have implemented the possibility to extract biomarkers such as TMB, mutational signatures (apobec, uv and tabacco), clonal TMB (if bam/cram files and sex are provided), expression of specific genes (if RNA-seq data are provided), gene cnv, etc. For more information, look at here

Installation

Clone the repo

bash git clone https://github.com/zhanyinx/variantalker.git

variantalker relies on Annovar software and Funcotator databases.

Download the updated databases. Separate repositories for hg19 and hg38 are available.

bash wget -r -N --no-parent -nH --cut-dirs=3 -P public_databases/hg38 https://bioserver.ieo.it/repo/dima/hg38 wget -r -N --no-parent -nH --cut-dirs=3 -P public_databases/hg19 https://bioserver.ieo.it/repo/dima/hg19

Documentation

The pipeline employs several tools to annotate and prioritize variants:

Funcotator for variant annotation
CancerVar for somatic variants prioritization
InterVar for germline variants annotation
Annovar: cancervar and intervar reply on Annovar.
CIViC: somatic variant classification using CIViC evidence level.
AlphaMissense: somatic and germline variant prioritization.

To ensure the accuracy of the pipeline, the databases for Funcotator and Annovar must be regularly updated using the provided tools found here: update utilities.

Usage

If you are using for the first time, please consider updating the databases following the instructions.

Modify the configuration file (nextflow.config) by setting the following parameters:

funcotatorgermlinedb: e.g. path2/publicdatabases/funcotatordataSources.v1.7.20200521g
funcotatorsomaticdb: e.g. path2/publicdatabases/funcotatordataSources.v1.7.20200521s
annovardb: e.g. path2/publicdatabases/humandb
annovarsoftwarefolder: e.g. path2/annovar
alphamisgenomebasedir: e.g. path2/publicdatabases
fasta: path to fasta file used to generate the vcf
target: path to the target bed file

The main command line for the annotation is the following

bash nextflow run path_to/main.nf -c yourconfig -profile singularity --input samplesheet.csv --outdir outdir

bash nextflow run path_to/main.nf --help --show_hidden_params

Input

variantalker takes as input a csv samplesheet with 4 columns

IMPORTANT: HEADER is required

| patient | tumortissue | samplefile | sample_type | | -------------- | -------------- | ----------------- | -------------| | patient1 | Lung | path/tumor.vcf.gz | somatic | | ..... | ..... | ..... | ..... |

Samplefile must be provided with full path, _not__ relative path

Available sample_type are: somatic, germline, cnv.

somatic sample type: it can be tumoronly (single sample) or tumornormal (multi sample) vcf.gz file. Requires tumortissue to be specified
germline: single sample vcf.gz file. It does not require tumor_tissue
cnv: for nfcore/sarek, CNVKit output is supported (cnr file). For dragen, vcf.gz file required. It does not require tumor_tissue

Available tumortissue are: AdrenalGland BileDuct Bladder Blood Bone BoneMarrow Brain Breast Cancerall Cervix Colorectal Esophagus Eye HeadandNeck Inflammatory Intrahepatic Kidney Liver Lung LymphNodes NervousSystem Other Ovary Pancreas Pleura Prostate Skin SoftTissue Stomach Testis Thymus Thyroid Uterus

Output

Output structure:

variantalker outputs for each sample multiple files

1) maf file with all the annotations 2) vcf file with the PASS variants 3) filtered pass file with variants passing the filters (see below). 4) filtered nopass file with variants not passing the filters (see below). 5) cnv annotated file (if cnv samples provided)

Default filters applied:

"Silent", "IGR", "RNA" variant types are filtered out (unless it's pathogenic or likely pathogenic for clinvar/cancervar/intervar)
minimum coverage 50 (unless it's pathogenic or likely pathogenic for clinvar/cancervar/intervar)
minimum somatic VAF: 0.01
minimum germline VAF: 0.2
InterVar classes to be kept: Pathogenic,Likely pathogenic (logic OR)
CancerVar classes to be kept: TierIIpotential,TierIstrong (logic OR)
ReNOVo class to be kept: LP Pathogenic,IP Pathogenic,HP Pathogenic (logic OR)
CIViC evidence levels to be kept: A,B,C (logic OR)
no filters on genes (somatic or germline)

Logic OR filters: a variant is kept if at least one of the OR filters is true

Liability

Variantalker assumes no responsibility for any injury to person or damage to persons or property arising out of, or related to any use of Variantalker, or for any errors or omissions. The user recognizes they are using Liability at their own risk.

Owner

Name: Yinxiu Zhan
Login: zhanyinx
Kind: user
Location: Milan
Company: IEO

Website: https://www.linkedin.com/in/yinxiu-zhan-75a18380/
Repositories: 3
Profile: https://github.com/zhanyinx

Head of data science unit at European Institute for Oncology (IEO)

GitHub Events

Total

Release event: 1
Watch event: 1
Member event: 3
Push event: 11
Pull request event: 2
Fork event: 2
Create event: 3

Last Year

Release event: 1
Watch event: 1
Member event: 3
Push event: 11
Pull request event: 2
Fork event: 2
Create event: 3

Dependencies

Dockerfile docker

r-base latest build

streamlit_app/requirements.txt pypi

numpy *
pandas *
streamlit *
streamlit-aggrid *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science