Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Repository
expressions on VCFs
Basic Info
- Host: GitHub
- Owner: brentp
- License: mit
- Language: Rust
- Default Branch: main
- Size: 1.45 MB
Statistics
- Stars: 85
- Watchers: 2
- Forks: 4
- Open Issues: 0
- Releases: 5
Metadata Files
README.md
vcfexpress
[!CAUTION] While the output of vcfexpress is tested and reliable, the error messages might be lacking. Please report.
This is an experiment on how to implement user-expressions that can filter (and modify) a VCF and specify an output template. It uses lua as the expression language. It is fast Because of the speed and flexibility, we can, for example implement CSQ parsing in lua, just as a user could. The resulting functionality is as fast or faster than other tools that have this built in.
For the optional output template, it uses luau string templates where luau is lua with some extensions and very good speed.
Installation
- For rust users:
cargo install vcfexpress - Otherwise see Releases for a static linux binary
Examples
Further examples are collected here and we encourage users to suggest helpful examples or snippets.
Short functionality examples
--- extract a single variant and output a bed of the variant: ``` vcfexpress filter -e "return variant.id == 'rs2124717267'" \ --template '{variant.chrom}\t{variant.start}\t{variant.stop}' -o var.bed $vcf ``` --- filter based on INFO and write bcf: ``` vcfexpress filter -e "return variant:info('AN') > 3000" \ -o high_an.bcf $input_vcf ``` --- check the sample fields to get variants where `all` samples have high DP. `all` is defined by `vcfexpress` (`any`, `filter` are also available). Users can load their own functions with `-p $lua_file`. ``` vcfexpress filter \ -e 'return all(function (dp) return dp > 10 end, variant:format("DP"))' \ -o all-high-dp.bcf $input_vcf ``` --- Extract variants that are HIGH impact according to the `CSQ` field. This uses user-defind code to parse the CSQ field in scripts/csq.lua. ``` vcfexpress filter \ -e 'csqs = CSQS.new(variant:info("ANN"), desc); return csqs:any(function(c) return c["Annotation_Impact"] == "HIGH" end)' \ -o all-high-impact.bcf $input_vcf \ -p scripts/csq.lua -p scripts/pre.lua ``` --- get all of the FORMAT fields for a single sample into a lua table. find variant that are high-quality hom-alts. ``` vcfexpress filter \ -e 's=variant:sample("NA12878"); return s.DP > 10 and s.GQ > 20 and s.GT[1] == 1 and s.GT[2] == 1' \ -o output.bcf \ input.vcf ``` --- add a new info field (`af_copy`) and set it. ``` $ cat pre.lua header:add_info({ID="af_copy", Number=1, Description="adding a single field", Type="Float"}) ``` then run with: ``` vcfexpress filter -p pre.lua -e 'return variant:format("AD")[1][2] > 0' \ -s 'af_copy=return variant:info("AF", 0)' \ input.vcf > output.vcf ```speed
see speed
Lua API
Full documentation of lua attributes and methods is here
```lua
variant.chrom -> string
variant.REF (get/set) -> string
variant.ALT (get/set) -> vec
genotypes = variant.genotypes genotype = genotypes[i] -- get single genotype for 1 sample tostring(genotype) -- e.g. "0/1" genotype.alts -- integer for number of non-zero, non-unknown alleles
allele = genotype[1] allele.phased -> bool allele.allele -> integer e.g. 0 for "0" allele
header.samples (set/get) -> vec
-- these header:add* are available only in the prelude. currently only Number=1 is supported. header:addinfo({Type="Integer", Number=1, Description="asdf", ID="new field"}) header:addformat({Type="Integer", Number=1, Description="xyz", ID="new format field"}) header:addfilter({ID="LowQual", Description="Qual less than 50"})
sample = variant:sample("NA12878") sample.DP -- any fields in the row are available. special case for GT. use pprint to see structure: pprint(sample) --[[ { .GQ = 63, .DP = 23, .GT = { -- GT gives index into alt alles (or -1 for .) [1] = 0, [2] = 1}, .AD = { [1] = 23, [2] = 0}, .PL = { [1] = 0, [2] = 63, [3] = 945}, -- this is the genotype phase. so with GT, this is 0|1 .phase = { [1] = false, [2] = true}} --]] ```
Usage
``` Filter a VCF/BCF and optionally print by template expression. If no template is given the output will be VCF/BCF
Usage: vcfexpress filter [OPTIONS]
Arguments:
Options:
-e, --expression header is available here to access or modify the header
-o, --output
Owner
- Name: Brent Pedersen
- Login: brentp
- Kind: user
- Location: Oregon, USA
- Twitter: brent_p
- Repositories: 220
- Profile: https://github.com/brentp
Doing genomics
Citation (CITATION.cff)
cff-version: 1.2.0
message: If you use this software, please cite it using these metadata.
title: "Vcfexpress: flexible, rapid user-expressions to filter and format VCFs"
abstract: "Motivation Variant Call Format (VCF) files are the standard output format for various software tools that identify genetic variation from DNA sequencing experiments. Downstream analyses require the ability to query, filter, and modify them simply and efficiently. Several tools are available to perform these operations from the command line, including BCFTools, vembrane, slivar, and others. Results Here, we introduce vcfexpress, a new, high-performance toolset for the analysis of VCF files, written in the Rust programming language. It is nearly as fast as BCFTools, but adds functionality to execute user expressions in the lua programming language for precise filtering and reporting of variants from a VCF or BCF file. We demonstrate performance and flexibility by comparing vcfexpress to other tools using the vembrane benchmark."
authors:
- family-names: Pedersen
given-names: Brent S.
email: bpederse@gmail.com
- family-names: Quinlan
given-names: Aaron R.
email: aquinlan@genetics.utah.edu
type: software
doi: 10.1101/2024.11.05.622129
year: 2024
journal: bioRxiv
repository-code: https://github.com/brentp/vcfexpress
license: MIT
identifiers:
- type: doi
value: 10.1101/2024.11.05.622129
description: The bioRxiv preprint
references:
- type: article
authors:
- family-names: Pedersen
given-names: Brent S.
- family-names: Quinlan
given-names: Aaron R.
title: "Vcfexpress: flexible, rapid user-expressions to filter and format VCFs"
year: 2024
journal: bioRxiv
doi: 10.1101/2024.11.05.622129
GitHub Events
Total
- Create event: 4
- Issues event: 2
- Release event: 4
- Watch event: 42
- Push event: 25
- Fork event: 3
Last Year
- Create event: 4
- Issues event: 2
- Release event: 4
- Watch event: 42
- Push event: 25
- Fork event: 3
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Brent Pedersen | b****e@g****m | 87 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: about 1 hour
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: about 1 hour
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- brentp (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cargo 1,369 total
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
crates.io: vcfexpress
A tool for filtering VCF files using Lua expressions
- Homepage: https://github.com/brentp/vcfexpress
- Documentation: https://docs.rs/vcfexpress/
- License: MIT
-
Latest release: 0.3.3
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v4 composite
- encodedvenom/install-luau v3 composite