vcf-reformatter
๐งฌ High-performance VCF file parser and reformatter with VEP annotation support. Converts complex VCF files to analyzable TSV format with intelligent transcript handling.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
โCITATION.cff file
Found CITATION.cff file -
โcodemeta.json file
Found codemeta.json file -
โ.zenodo.json file
Found .zenodo.json file -
โDOI references
-
โAcademic publication links
-
โAcademic email domains
-
โInstitutional organization owner
-
โJOSS paper metadata
-
โScientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
Repository
๐งฌ High-performance VCF file parser and reformatter with VEP annotation support. Converts complex VCF files to analyzable TSV format with intelligent transcript handling.
Basic Info
- Host: GitHub
- Owner: flalom
- License: mit
- Language: Rust
- Default Branch: main
- Homepage: https://github.com/flalom/vcf-reformatter
- Size: 112 KB
Statistics
- Stars: 36
- Watchers: 1
- Forks: 3
- Open Issues: 0
- Releases: 3
Topics
Metadata Files
README.md
VCF Reformatter: What is it?
Did it ever happen that you had VCF files and you wanted to have a look at the data as you would do with a normal table? VCF Reformatter is here for your rescue!
A Rust command-line tool for parsing and reformatting VCF (Variant Call Format) files, with support for VEP (Variant Effect Predictor) and SnpEff annotations. This tool flattens complex VCF files into tab-separated values (TSV) format for easier downstream analysis. Also incredibly useful for quick checks to your data!
VCF Reformatter
๐ Quick Start
```` bash
Download binary from releases (easiest! You download and use it)
wget https://github.com/flalom/vcf-reformatter/releases/latest/download/vcf-reformatter-v0.3.0-linux-x8664 chmod +x vcf-reformatter-v0.3.0-linux-x8664
Transform your VCF file
./vcf-reformatter-v0.3.0-linux-x86_64 sample.vcf.gz
Generate MAF output โ ๏ธ (in beta!)
./vcf-reformatter-v0.3.0-linux-x86_64 sample.vcf.gz --output-format maf
`
OR Via Bioconda
bash
conda install -c bioconda vcf-reformatter
or
mamba install vcf-reformatter -c bioconda
OR install from [crates.io](https://crates.io/crates/vcf-reformatter):
bash
cargo install vcf-reformatter
OR build from source (you need Rust toolchain):
bash
git clone https://github.com/flalom/vcf-reformatter.git
cd vcf-reformatter
cargo build --release
./target/release/vcf-reformatter sample.vcf.gz
```
โ ๏ธ Experimental MAF support
MAF output is currently in beta testing (v0.3.0). Known limitations:
- VAF calculation needs refinement for some genotype patterns
- Multi-sample handling requires validation
- Use with caution in production workflows
Memory considerations for MAF: - Files >100K variants: Monitor memory usage - Files >1M variants: Ensure adequate RAM (16GB+)
๐ฏ Why VCF Reformatter?
The Problem: VCF files are notoriously difficult to analyze. Complex nested annotations, semicolon-separated INFO fields, and multi-transcript VEP annotations make downstream analysis a nightmare.
The Solution: VCF Reformatter flattens everything into clean, readable TSV format that works seamlessly with Excel, R, Python, and any analysis tool (โ ๏ธ beware Excel auto-correction!).
Before & After
Before (Raw VCF):
chr1 69511 . A G 1294.53 . DP=65;AF=1;CSQ=G|missense_variant|MODERATE|OR4F5|ENSG00000186092...
After (Reformatted TSV):
CHROM POS REF ALT QUAL INFO_DP INFO_AF CSQ_Allele CSQ_Consequence CSQ_SYMBOL
chr1 69511 A G 1294.53 65 1 G missense_variant OR4F5
โจ Key Features
| Feature | Description | Benefit |
|-----------------------------------------|--------------------------------------------------|------------------------------------------------------|
| ๐งฌ VEP/SnpEff Annotation Parsing | Intelligent handling of CSQ/ANN annotations | No more manual parsing of complex VEP/SnpEff output |
| ๐ Automatic Annotation Recognition | Automatic detection of CSQ/ANN annotations | Saving even more time now for both VEP and SnpEff |
| ๐ Smart Transcript Handling | Most severe, first only, or split transcripts | Choose the analysis approach that fits your needs |
| ๐ Parallel Processing | Multi-threaded processing up to 30k variants/sec | Process large cohorts in minutes, not hours |
| ๐ Native Compression | Direct .vcf.gz reading & gzip output | Seamless workflow with compressed/uncompressed files |
| ๐ฏ Production Ready | Comprehensive error handling & logging | Reliable for automated pipelines |
| ๐ณ Container Support | Docker & Singularity ready | Deploy anywhere, from laptops to HPC clusters |
๐ฆ Installation
Option 1: Download Pre-compiled Binaries (Easiest!)
No Rust installation required - just download and run:
- Go to Releases
Download the binary for your platform:
vcf-reformatter-v0.3.0-linux-x86_64โ Linux (most users)vcf-reformatter-v0.3.0-linux-x86_64-staticโ HPC clusters (works everywhere)vcf-reformatter-v0.3.0-windows-x86_64.exeโ Windowsvcf-reformatter-v0.3.0-macos-x86_64โ Intel Macvcf-reformatter-v0.3.0-macos-arm64โ Apple Silicon Mac (M1/M2/M3/M4)
Make executable and run: ````bash
Linux/Mac
chmod +x vcf-reformatter-* ./vcf-reformatter-* --help
Windows
Just double-click or run from command prompt
C++ might be required, if not already installed
````
Option 2: Build from Source
bash
git clone https://github.com/flalom/vcf-reformatter.git
cd vcf-reformatter
cargo build --release
Option 3: Docker
```shell script
Build the container
docker build -t vcf-reformatter .
Run with your data
docker run --rm -v $(pwd):/data vcf-reformatter /data/sample.vcf.gz ```
Option 4: Singularity
```shell script
Build Singularity image
singularity build vcf-reformatter.sif Singularity
Run on HPC cluster
singularity run --bind $PWD:/data vcf-reformatter.sif /data/sample.vcf.gz -j 16 ```
๐ ๏ธ Usage
Basic Usage
```shell script
Simple conversion
vcf-reformatter input.vcf.gz
Most severe consequence only (recommended for analysis)
vcf-reformatter input.vcf.gz -t most-severe
All transcripts in separate rows (comprehensive)
vcf-reformatter input.vcf.gz -t split ```
Annotation Type Detection
```shell script
Auto-detect annotation type (recommended)
vcf-reformatter input.vcf.gz -a auto
Force VEP processing
vcf-reformatter vep_annotated.vcf.gz -a vep -t most-severe
Force SnpEff processing
vcf-reformatter snpeff_annotated.vcf.gz -a snpeff -t most-severe ```
Advanced Usage
```shell script
High-performance processing with compression
vcf-reformatter largecohort.vcf.gz \ --transcript-handling most-severe \ --threads 0 \ --compress \ --output-dir results/ \ --prefix myanalysis \ --verbose
Optimized for HPC environments
vcf-reformatter huge_dataset.vcf.gz -t most-severe -j 32 -o /scratch/results/ -c -v ```
Complete Options
```
Usage: vcf-reformatter [OPTIONS]
Arguments:
Options:
--output-format
--ncbi-build
๐งฌ Transcript Handling Modes
VCF files with VEP annotations often contain multiple transcript annotations per variant. Choose the strategy that fits your analysis:
๐ฏ Most Severe (--transcript-handling most-severe)
Best for: Clinical analysis, variant prioritization ```shell script vcf-reformatter input.vcf.gz -t most-severe
for maf output
vcf-reformatter input.vcf.gz -t most-severe --output-format maf ``` Selects the transcript with the most severe consequence (stopgained > missensevariant > synonymous, etc.)
โก First Only (--transcript-handling first) [Default]
Best for: Quick analysis, performance-critical workflows
shell script
vcf-reformatter input.vcf.gz # Uses first transcript by default
Processes only the first transcript annotation (fastest option)
๐ Split All (--transcript-handling split)
Best for: Comprehensive analysis, transcript-level studies
shell script
vcf-reformatter input.vcf.gz -t split
Creates separate rows for each transcript (most detailed output)
๐ Performance
Benchmarks
- Small files (< 1K variants): ~5,000 variants/sec
- Medium files (1K-10K variants): ~15,000 variants/sec
- Large files (10K+ variants): ~30,000 variants/sec
Optimization Tips
```shell script
Auto-detect optimal thread count
vcf-reformatter input.vcf.gz -j 0
For files > 10K variants, use parallel processing
vcf-reformatter input.vcf.gz -t most-severe -j 0 -v
Combine with compression for large outputs
vcf-reformatter input.vcf.gz -t split -j 0 -c -v ```
๐ Output Format
File Structure
VCF Reformatter generates two files:
- {prefix}_header.txt - Original VCF header and metadata
- {prefix}_reformatted.tsv - Flattened tabular data
Column Types
- Standard VCF:
CHROM,POS,ID,REF,ALT,QUAL,FILTER - INFO Fields:
INFO_DP,INFO_AF,INFO_AC, etc. - VEP Annotations:
CSQ_Allele,CSQ_Consequence,CSQ_SYMBOL,CSQ_Gene, etc. - SnpEff Annotations:
ANN_Allele,ANN_Annotation_Impact,ANN_Gene_Name,ANN_Distance, etc. - Sample Data:
SAMPLE1_GT,SAMPLE1_DP,SAMPLE1_AD, etc.
Example Output VEP
CHROM POS ID REF ALT QUAL FILTER INFO_DP CSQ_Consequence CSQ_SYMBOL SAMPLE1_GT
chr1 69511 . A G 1294.53 PASS 65 missense_variant OR4F5 1/1
chr1 69761 rs123 C T 892.15 PASS 42 synonymous_variant OR4F5 0/1
Example Output SnpEff
CHROM POS ID REF ALT QUAL FILTER INFO_DP ANN_Annotation ANN_Gene_Name SAMPLE1_GT
chr1 69761 rs587 C T 730 PASS . 214 synonymous_variant OR4F5 0/1
chr1 924024 . A G 53 PASS . 409 5_prime_UTR_variant SAMD11 1/1
๐ง Integration Examples
With R
```textmate
Read compressed output directly
library(data.table) data <- fread("output_reformatted.tsv.gz")
Quick variant summary
summary(data$CSQ_Consequence) ```
With Python
```textmate import pandas as pd
Load and analyze
df = pd.readcsv("outputreformatted.tsv.gz", sep="\t", compression="gzip") df['CSQConsequence'].valuecounts() ```
In Workflows
```shell script
Nextflow pipeline
vcf-reformatter ${vcf} -t most-severe -j ${task.cpus} -o results/ -c
Snakemake rule
shell: "vcf-reformatter {input.vcf} -t most-severe -j {threads} -o {params.outdir} -c" ```
๐ณ Container Usage
Docker
```shell script
Build once
docker build -t vcf-reformatter .
Run anywhere
docker run --rm \ -v $(pwd):/data \ vcf-reformatter \ /data/input.vcf.gz \ -t most-severe -j 4 -o /data/results/ -c ```
Singularity (HPC)
```shell script
On HPC cluster
singularity run \ --bind $PWD:/data \ --bind /scratch:/scratch \ vcf-reformatter.sif \ /data/large_cohort.vcf.gz \ -t most-severe -j 16 -o /scratch/results/ -c -v ```
๐งช Use Cases
| Use Case | Command | Why It Works |
|----------|---------|--------------|
| Clinical Variant Review | vcf-reformatter variants.vcf.gz -t most-severe | Prioritizes clinically relevant consequences |
| Population Analysis | vcf-reformatter cohort.vcf.gz -t first -j 0 -c | Fast processing of large cohorts |
| Transcript Studies | vcf-reformatter genes.vcf.gz -t split -v | Comprehensive transcript-level analysis |
| Quick Data Exploration | vcf-reformatter sample.vcf.gz | Simple, fast conversion for immediate analysis |
| HPC Batch Processing | vcf-reformatter huge.vcf.gz -t most-severe -j 32 -c | Optimized for high-performance computing |
๐ What's New in v0.3.0
- โ MAF Output Support (in Betaโ ๏ธ) - Direct conversion to Mutation Annotation Format
- โ Auto-metadata Detection (in Betaโ ๏ธ) - Extracts center/sample info from VCF headers for MAF
- โ Memory-Efficient Processing (streaming) - Chunked streaming for large files (>>100K variants)
- โ Enhanced Error Handling - Better processing of malformed files
- โ Comprehensive Testing - 70+ test cases ensure reliability
Previous Releases
๐ What's New in v0.2.0
- โ SnpEff Support - Full ANN field parsing with intelligent detection
- โ Smart Auto-Detection - Automatically identifies VEP vs SnpEff annotations
- โ Enhanced Error Handling - Better processing of malformed or headerless files
TODOs
- ~~Add SnpEff supportโ ~~
- ~~Output MAF format optionโ ~~
- Add
stdinto combine with other tools, such asbcftools - Support for multi-sample VCF files in MAF output
๐ค Contributing
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Add tests for new functionality
- Commit your changes:
git commit -am 'Add feature' - Push to the branch:
git push origin feature-name - Submit a pull request
Development Setup
shell script
git clone https://github.com/flalom/vcf-reformatter.git
cd vcf-reformatter
cargo test # Run the test suite
cargo run -- data/sample.vcf.gz -v # Test with sample data
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- VCF Format Contributors - For the standard that enables genomic data sharing
- VEP Team - For the powerful variant annotation framework
- Rust Community - For the incredible ecosystem that makes this possible
- Bioinformatics Community - For feedback and feature requests
Frequently Asked Questions
Q: Which transcript handling mode should I use?
- Clinical analysis:
--transcript-handling most-severe - Quick exploration:
--transcript-handling first - Comprehensive analysis:
--transcript-handling split
Q: How does this compare to other VCF tools?
VCF Reformatter is specifically designed for: - Converting complex VEP/SnpEff annotations to tabular format - Handling multiple transcripts intelligently - High-performance parallel processing - Easy integration with R/Python workflows
Q: Can I use this in production pipelines?
Yes! VCF Reformatter is designed for production use with: - Comprehensive error handling - Docker/Singularity support - Automated testing - Stable CLI interface
Q: What's the difference between TSV and MAF output?
- TSV: Direct flattening of VCF fields (default)
- MAF (beta): Standardized cancer genomics format for downstream tools
Q: What if I get out-of-memory errors?
- Use TSV format instead of MAF:
vcf-reformatter file.vcf.gz -j 0 -c - Enable verbose mode to monitor:
vcf-reformatter file.vcf.gz -v
๐ Support
- ๐ Issues: GitHub Issues
- ๐ง Email: fl@flaviolombardo.site
Owner
- Name: Flavio Lombardo
- Login: flalom
- Kind: user
- Location: Switzerland
- Website: https://flalom.github.io/flalom/
- Twitter: flalom_flavio
- Repositories: 2
- Profile: https://github.com/flalom
Bioinformatics|Computational biology|Data science๐งฌ๐ฌ๐ฅ๏ธ๐งฎ๐๐ค
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use VCF Reformatter in your research, please cite it as below."
title: "VCF Reformatter: High-performance VCF file parser and reformatter with VEP and SnpEff annotation support"
version: "0.2.0"
date-released: "2025-07-23"
url: "https://github.com/flalom/vcf-reformatter"
repository-code: "https://github.com/flalom/vcf-reformatter"
doi: "10.5281/zenodo.16354810"
type: software
license: MIT
authors:
- family-names: "Lombardo"
given-names: "Flavio"
orcid: "https://orcid.org/0000-0002-4853-6838"
affiliation: "University Hospital Basel and University of Basel"
email: "fl@flaviolombardo.site"
abstract: >
VCF Reformatter is a high-performance Rust command-line tool for parsing and
reformatting VCF (Variant Call Format) files, with comprehensive support for both
VEP (Variant Effect Predictor) and SnpEff annotations. The tool flattens complex
VCF files into tab-separated values (TSV) format for easier downstream analysis,
featuring intelligent transcript handling, auto-detection of annotation types,
and parallel processing for high-throughput genomic workflows.
keywords:
- bioinformatics
- genomics
- VCF
- variant calling
- VEP annotations
- SnpEff annotations
- file format conversion
- parallel processing
- rust
- computational biology
- variant effect predictor
GitHub Events
Total
- Create event: 3
- Release event: 2
- Issues event: 2
- Watch event: 21
- Delete event: 2
- Issue comment event: 6
- Push event: 5
- Fork event: 2
Last Year
- Create event: 3
- Release event: 2
- Issues event: 2
- Watch event: 21
- Delete event: 2
- Issue comment event: 6
- Push event: 5
- Fork event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: 15 days
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 5.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: 15 days
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 5.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ksarathbabu (1)
- Aljumiliy1 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cargo 510 total
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
crates.io: vcf-reformatter
Fast VCF file parser and reformatter with VEP and SnpEff annotation support which can output to MAF
- Homepage: https://github.com/flalom/vcf-reformatter/blob/main/README.md
- Documentation: https://docs.rs/vcf-reformatter/
- License: MIT
-
Latest release: 0.3.0
published 7 months ago
Rankings
Maintainers (1)
Dependencies
- actions/cache v3 composite
- actions/checkout v4 composite
- docker/build-push-action v5 composite
- docker/setup-buildx-action v3 composite
- dtolnay/rust-toolchain stable composite
- actions/cache v3 composite
- actions/checkout v4 composite
- docker/build-push-action v5 composite
- docker/metadata-action v5 composite
- docker/setup-buildx-action v3 composite
- dtolnay/rust-toolchain stable composite
- softprops/action-gh-release v1 composite
- adler2 2.0.1
- aho-corasick 1.1.3
- anstream 0.6.19
- anstyle 1.0.11
- anstyle-parse 0.2.7
- anstyle-query 1.1.3
- anstyle-wincon 3.0.9
- bitflags 2.9.1
- cfg-if 1.0.1
- clap 4.5.41
- clap_builder 4.5.41
- clap_derive 4.5.41
- clap_lex 0.7.5
- colorchoice 1.0.4
- crc32fast 1.4.2
- crossbeam-deque 0.8.6
- crossbeam-epoch 0.9.18
- crossbeam-utils 0.8.21
- either 1.15.0
- errno 0.3.13
- fastrand 2.3.0
- flate2 1.1.2
- getrandom 0.3.3
- heck 0.5.0
- hermit-abi 0.5.2
- is_terminal_polyfill 1.70.1
- libc 0.2.174
- linux-raw-sys 0.9.4
- memchr 2.7.5
- miniz_oxide 0.8.9
- num_cpus 1.17.0
- once_cell 1.21.3
- once_cell_polyfill 1.70.1
- proc-macro2 1.0.95
- quote 1.0.40
- r-efi 5.3.0
- rayon 1.10.0
- rayon-core 1.12.1
- regex 1.11.1
- regex-automata 0.4.9
- regex-syntax 0.8.5
- rustix 1.0.7
- strsim 0.11.1
- syn 2.0.104
- tempfile 3.20.0
- unicode-ident 1.0.18
- utf8parse 0.2.2
- wasi 0.14.2+wasi-0.2.4
- windows-sys 0.59.0
- windows-targets 0.52.6
- windows_aarch64_gnullvm 0.52.6
- windows_aarch64_msvc 0.52.6
- windows_i686_gnu 0.52.6
- windows_i686_gnullvm 0.52.6
- windows_i686_msvc 0.52.6
- windows_x86_64_gnu 0.52.6
- windows_x86_64_gnullvm 0.52.6
- windows_x86_64_msvc 0.52.6
- wit-bindgen-rt 0.39.0