smoove
structural variant calling and genotyping with existing tools, but, smoothly.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 5 committers (20.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Keywords
Repository
structural variant calling and genotyping with existing tools, but, smoothly.
Basic Info
Statistics
- Stars: 252
- Watchers: 10
- Forks: 21
- Open Issues: 101
- Releases: 19
Topics
Metadata Files
README.md
smoove
smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many
spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls.
There is a blog-post describing smoove in more detail here
It both supports small cohorts in a single command, and population-level calling with 4 total steps, 2 of which are parallel by sample.
There is a table on the precision and recall of smoove and duphold (which is used by smoove)here
It requires:
- lumpy and lumpy_filter
- samtools: for CRAM support
- gsort: to sort final VCF
- bgzip+tabix: to compress and index final VCF
And optionally (but all highly recommended):
- svtyper: to genotypes SVs
- svtools: required for large cohorts
- mosdepth: remove high coverage regions.
- bcftools: version 1.5 or higher for VCF indexing and filtering.
- duphold: to annotate depth changes within events and at the break-points.
Running smoove without any arguments will show which of these are found so they can be added to the PATH as needed.
smoove will:
- parallelize calls to
lumpy_filterto extract split and discordant reads required by lumpy - further filter
lumpy_filtercalls to remove high-coverage, spurious regions and user-specified chroms like 'hs37d5'; it will also remove reads that we've found are likely spurious signals. after this, it will remove singleton reads (where the mate was removed by one of the previous filters) from the discordant bams. This makeslumpymuch faster and less memory-hungry. - calculate per-sample metrics for mean, standard deviation, and distribution of insert size as required by lumpy.
- stream output of lumpy directly into multiple svtyper processes for parallel-by-region genotyping while lumpy is still running.
- sort, compress, and index final VCF.
installation
you can get smoove and all dependencies via (a large) docker image:
docker pull brentp/smoove
docker run -it brentp/smoove smoove -h
Or, you can download a smoove binary from here: https://github.com/brentp/smoove/releases
When run without any arguments, smoove will show you which of it's dependencies it can find
so you can adjust your $PATH and install accordingly.
usage
small cohorts (n < ~ 40)
for small cohorts it's possible to get a jointly-called, genotyped VCF in a single command.
smoove call -x --name my-cohort --exclude $bed --fasta $reference_fasta -p $threads --genotype /path/to/*.bam
output will go to ./my-cohort-smoove.genotyped.vcf.gz
the --exclude $bed is highly recommended as it can be used to ignore reads that overlap problematic regions.
A good set of regions for GRCh37 is here.
And for hg38 here
population calling
For population-level calling (large cohorts) the steps are:
- For each sample, call genotypes:
smoove call --outdir results-smoove/ --exclude $bed --name $sample --fasta $reference_fasta -p 1 --genotype /path/to/$sample.bam
For large cohorts, it's better to parallelize across samples rather than using a large $threads per sample. smoove can only
parallelize up to 2 or 3 threads on a single-sample and it's most efficient to use 1 thread.
output will go to results-smoove/$sample-smoove.genotyped.vcf.gz`
- Get the union of sites across all samples (this can parallelize this across as many CPUs or machines as needed):
```
this will create ./merged.sites.vcf.gz
smoove merge --name merged -f $reference_fasta --outdir ./ results-smoove/*.genotyped.vcf.gz ```
- genotype each sample at those sites (this can parallelize this across as many CPUs or machines as needed) and run duphold to add depth annotations.
smoove genotype -d -x -p 1 --name $sample-joint --outdir results-genotped/ --fasta $reference_fasta --vcf merged.sites.vcf.gz /path/to/$sample.$bam
- paste all the single sample VCFs with the same number of variants to get a single, squared, joint-called file.
smoove paste --name $cohort results-genotyped/*.vcf.gz
- (optional) annotate the variants with exons, UTRs that overlap from a GFF and annotate high-quality heterozygotes:
smoove annotate --gff Homo_sapiens.GRCh37.82.gff3.gz $cohort.smoove.square.vcf.gz | bgzip -c > $cohort.smoove.square.anno.vcf.gz
This adds a SHQ (Smoove Het Quality) tag to every sample format) a value of 4 is a high quality call and the value of 1 is low quality. -1 is non-het.
It also adds a MSHQ for Mean SHQ to the INFO field which is the mean SHQ score across all heterozygous samples for that variant.
As a first pass, users can look for variants with MSHQ > 3. If you added duphold annotations, it's also
useful to check deletions with DHFFC < 0.7 and duplications with DHFFC > 1.25.
Troubleshooting
A panic with a message like
Segmentation fault (core dumped) | bcftools view -O z -c 1 -ois likely to mean you have an old version of bcftools. see #10smoovewill write to the system TMPDIR. For large cohorts, make sure to set this to something with a lot of space. e.g.export TMPDIR=/path/to/bigsmooverequires recent version oflumpyandlumpy_filterso build those from source or get the most recent bioconda version.
see also
Owner
- Name: Brent Pedersen
- Login: brentp
- Kind: user
- Location: Oregon, USA
- Twitter: brent_p
- Repositories: 220
- Profile: https://github.com/brentp
Doing genomics
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
-
family-names: Pedersen
given-names: Brent S.
email: bpederse@gmail.com
-
family-names: Layer
given-names: Ryan
-
family-names: Quinlan
given-names: Aaron R.
title: "smoove: structural-variant calling and genotyping with existing tools"
version: 0.2.8
date-released: 2020-01-01
license: Apache-2.0
GitHub Events
Total
- Issues event: 5
- Watch event: 22
- Issue comment event: 6
Last Year
- Issues event: 5
- Watch event: 22
- Issue comment event: 6
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Brent Pedersen | b****e@g****m | 170 |
| Pierre Lindenbaum | 3****b | 1 |
| Joe Brown | b****m@g****m | 1 |
| Dave Larson | d****n@g****u | 1 |
| Brad Chapman | c****b@f****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 138
- Total pull requests: 1
- Average time to close issues: 28 days
- Average time to close pull requests: 2 minutes
- Total issue authors: 104
- Total pull request authors: 1
- Average comments per issue: 3.62
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 7
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 7
- Pull request authors: 0
- Average comments per issue: 1.57
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tyyiyi (5)
- robertwhbaldwin (5)
- lindenb (4)
- nitha26 (3)
- heidihyang (3)
- cfz1998 (3)
- Navin-techi (3)
- C-YONG (2)
- y1025i (2)
- olivia-gc (2)
- framic23 (2)
- Saeideh-Ashouri (2)
- sulinq (2)
- Martaprf (2)
- Giuseppe1995 (2)
Pull Request Authors
- bounlu (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total docker downloads: 1,339
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 20
proxy.golang.org: github.com/brentp/smoove
- Homepage: https://github.com/brentp/smoove
- Documentation: https://pkg.go.dev/github.com/brentp/smoove#section-documentation
- License: Apache-2.0
-
Latest release: v0.2.8
published over 4 years ago
Rankings
Dependencies
- github.com/alexflint/go-arg v1.4.2
- github.com/biogo/biogo v1.0.3
- github.com/biogo/hts v1.4.3
- github.com/biogo/store v0.0.0-20201120204734-aad293a2328f
- github.com/brentp/faidx v0.0.0-20200301150453-c39eb85760d8
- github.com/brentp/gargs v0.3.9
- github.com/brentp/go-athenaeum v0.0.0-20180711164918-19f838fd53de
- github.com/brentp/go-chartjs v0.0.0-20170901194241-a37b166b7875
- github.com/brentp/goleft v0.2.5
- github.com/brentp/irelate v0.0.1
- github.com/brentp/vcfgo v0.0.0-20190824021612-654ed2e5945d
- github.com/brentp/xopen v0.0.0-20181116180855-111b45cadc7d
- github.com/edsrzf/mmap-go v1.0.0
- github.com/fatih/color v1.12.0
- github.com/kyroy/kdtree v0.0.0-20200419114247-70830f883f1d
- github.com/mattn/go-isatty v0.0.13
- github.com/pkg/errors v0.9.1
- github.com/valyala/fasttemplate v1.2.1
- gonum.org/v1/gonum v0.9.3
- gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c
- 110 dependencies