variantalker
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: science.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: zhanyinx
- License: other
- Language: Python
- Default Branch: main
- Size: 81.6 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 6
Metadata Files
README.md
Variant annotation and prioritization pipeline
Contents
Overview
Variant annotation in cancer genomics involves identifying and characterizing the genetic changes (variants) that contribute to cancer development and progression. The challenge is that there are many different types of variants that can occur in the genome, and not all of them are relevant to cancer. Therefore, accurate annotation is critical for identifying the key driver mutations and designing targeted therapies. However, this process is complicated by the large number of potential variants, the need to integrate data from multiple sources, and the ongoing discovery of new cancer-associated variants.
We have developed a Nextflow pipeline called variantalker that enables users to annotate variants from VCF files. Our pipeline supports VCF files generated from dragen, nf-sarek, and ION-torrent platforms.
BETA version: we have implemented the possibility to extract biomarkers such as TMB, mutational signatures (apobec, uv and tabacco), clonal TMB (if bam/cram files and sex are provided), expression of specific genes (if RNA-seq data are provided), gene cnv, etc. For more information, look at here
Installation
Clone the repo
bash
git clone https://github.com/zhanyinx/variantalker.git
variantalker relies on Annovar software and Funcotator databases.
Download the updated databases. Separate repositories for hg19 and hg38 are available.
bash
wget -r -N --no-parent -nH --cut-dirs=3 -P public_databases/hg38 https://bioserver.ieo.it/repo/dima/hg38
wget -r -N --no-parent -nH --cut-dirs=3 -P public_databases/hg19 https://bioserver.ieo.it/repo/dima/hg19
Documentation
The pipeline employs several tools to annotate and prioritize variants:
- Funcotator for variant annotation
- CancerVar for somatic variants prioritization
- InterVar for germline variants annotation
- Annovar: cancervar and intervar reply on Annovar.
- CIViC: somatic variant classification using CIViC evidence level.
- AlphaMissense: somatic and germline variant prioritization.
To ensure the accuracy of the pipeline, the databases for Funcotator and Annovar must be regularly updated using the provided tools found here: update utilities.
Usage
If you are using for the first time, please consider updating the databases following the instructions.
Modify the configuration file (nextflow.config) by setting the following parameters:
funcotatorgermlinedb: e.g. path2/publicdatabases/funcotatordataSources.v1.7.20200521g
funcotatorsomaticdb: e.g. path2/publicdatabases/funcotatordataSources.v1.7.20200521s
annovardb: e.g. path2/publicdatabases/humandb
annovarsoftwarefolder: e.g. path2/annovar
alphamisgenomebasedir: e.g. path2/publicdatabases
fasta: path to fasta file used to generate the vcf
target: path to the target bed file
The main command line for the annotation is the following
bash
nextflow run path_to/main.nf -c yourconfig -profile singularity --input samplesheet.csv --outdir outdir
bash
nextflow run path_to/main.nf --help --show_hidden_params
Input
variantalker takes as input a csv samplesheet with 4 columns
IMPORTANT: HEADER is required
| patient | tumortissue | samplefile | sample_type | | -------------- | -------------- | ----------------- | -------------| | patient1 | Lung | path/tumor.vcf.gz | somatic | | ..... | ..... | ..... | ..... |
Samplefile must be provided with full path, _not__ relative path
Available sample_type are: somatic, germline, cnv.
somatic sample type: it can be tumoronly (single sample) or tumornormal (multi sample) vcf.gz file. Requires tumortissue to be specified
germline: single sample vcf.gz file. It does not require tumor_tissue
cnv: for nfcore/sarek, CNVKit output is supported (cnr file). For dragen, vcf.gz file required. It does not require tumor_tissue
Available tumortissue are: AdrenalGland BileDuct Bladder Blood Bone BoneMarrow Brain Breast Cancerall Cervix Colorectal Esophagus Eye HeadandNeck Inflammatory Intrahepatic Kidney Liver Lung LymphNodes NervousSystem Other Ovary Pancreas Pleura Prostate Skin SoftTissue Stomach Testis Thymus Thyroid Uterus
Output
Output structure:
params.outdir
|-- date
| `-- annotation
| |-- germline
| | `-- patient
| | |-- filtered.patient.maf.pass.tsv
| | |-- filtered.patient.maf.nopass.tsv
| | |-- patient.vcf
| | `-- patient.maf
| `-- somatic
| `-- patient
| | |-- filtered.patient.maf.pass.tsv
| | |-- filtered.patient.maf.nopass.tsv
| | |-- patient.vcf
| `-- patient.maf
| `-- cnv
| `-- patient
| | |-- patient.cnv.annotated.tsv
variantalker outputs for each sample multiple files
1) maf file with all the annotations 2) vcf file with the PASS variants 3) filtered pass file with variants passing the filters (see below). 4) filtered nopass file with variants not passing the filters (see below). 5) cnv annotated file (if cnv samples provided)
Default filters applied:
"Silent", "IGR", "RNA" variant types are filtered out (unless it's pathogenic or likely pathogenic for clinvar/cancervar/intervar)
minimum coverage 50 (unless it's pathogenic or likely pathogenic for clinvar/cancervar/intervar)
minimum somatic VAF: 0.01
minimum germline VAF: 0.2
InterVar classes to be kept: Pathogenic,Likely pathogenic (logic OR)
CancerVar classes to be kept: TierIIpotential,TierIstrong (logic OR)
ReNOVo class to be kept: LP Pathogenic,IP Pathogenic,HP Pathogenic (logic OR)
CIViC evidence levels to be kept: A,B,C (logic OR)
no filters on genes (somatic or germline)
Logic OR filters: a variant is kept if at least one of the OR filters is true
Liability
Variantalker assumes no responsibility for any injury to person or damage to persons or property arising out of, or related to any use of Variantalker, or for any errors or omissions. The user recognizes they are using Liability at their own risk.
Owner
- Name: Yinxiu Zhan
- Login: zhanyinx
- Kind: user
- Location: Milan
- Company: IEO
- Website: https://www.linkedin.com/in/yinxiu-zhan-75a18380/
- Repositories: 3
- Profile: https://github.com/zhanyinx
Head of data science unit at European Institute for Oncology (IEO)
GitHub Events
Total
- Release event: 1
- Watch event: 1
- Member event: 3
- Push event: 11
- Pull request event: 2
- Fork event: 2
- Create event: 3
Last Year
- Release event: 1
- Watch event: 1
- Member event: 3
- Push event: 11
- Pull request event: 2
- Fork event: 2
- Create event: 3
Dependencies
- r-base latest build
- numpy *
- pandas *
- streamlit *
- streamlit-aggrid *