https://github.com/broadinstitute/str-analysis

Scripts and utilities related to analyzing short tandem repeats (STRs).

https://github.com/broadinstitute/str-analysis

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org
  • Committers with academic emails
  • Institutional organization owner
    Organization broadinstitute has institutional domain (www.broadinstitute.org)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Scripts and utilities related to analyzing short tandem repeats (STRs).

Basic Info
  • Host: GitHub
  • Owner: broadinstitute
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.91 MB
Statistics
  • Stars: 39
  • Watchers: 6
  • Forks: 9
  • Open Issues: 0
  • Releases: 0
Created over 4 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

This repo contains scripts and utilities for analyzing tandem repeats (TRs).

Installation

To install the latest version using pip, run:

python3 -m pip install --upgrade str-analysis@git+https://github.com/broadinstitute/str-analysis

or use the docker image (though it may not have the latest version of the code):

docker run -it weisburd/str-analysis:latest

Tools

  • callnonref_motifs (docs) - takes a bam/cram file and, optionally, an ExpansionHunter variant catalog. Then, for each locus, it determines which STR motifs are supported by reads overlapping that locus before running ExpansionHunter on the motif(s) it detected.
  • filtervcftoSTRvariants - takes a single-sample VCF file and filters it to the INS/DEL variants that represent tandem repeat expansions or contractions by peforming brute-force k-mer search on each variant's inserted or deleted bases. This tool was a core part of Weisburd, B., Tiao, G. & Rehm, H. L. Insights from a genome-wide truth set of tandem repeat variation. (2023)
  • merge_loci - takes one or more STR catalogs and combines them into a single catalog while removing duplicates based on overlap and repeat motif.
  • annotateandfilterstrcatalog - takes an STR catalog and annotates the loci based on their overlap with genes
    and known disease associated STRs. It then allows filtering by motif size, gene region, and various other criteria.
  • computecatalogstats - takes an annotated catalog output by the annotateandfilterstrcatalog script and computes various summary statistics about it.
  • addofftargetregions - takes an ExpansionHunter variant catalog and adds a list of off-target regions to each locus definition by querying a database of off-target regions that have been precomputed for each TR motif. This database was generated by using wgsim to simulate fully-repetitive reads for each motif, and then recording where these reads mapped on hg19 and hg38 after aligning them using bwa.
  • addadjacentlocitoexpansionhuntercatalog - takes an ExpansionHunter variant catalog and a bed file containing all simple repeats in the reference genome. Outputs a new catalog with updated LocusStructures and ReferenceRegions that include any adjacent repeats found near each locus in the input catalog.
  • checktriosformendelianviolations - takes a table of combined ExpanssionHunter calls generated by the combine_str_json_to_tsv as well as a FAM or PED file with parent/child relationships, and outputs a table of mendelian violations in the callset.
  • simulatestrexpansions - uses wgsim to generate .bam files with simulated read data containing STR expansions at a given locus, and having a given number of repeats, motif, zygosity, etc.
  • filteroutlociwithNsinflanks - removes loci from an ExpansionHunter catalog if their flanks contain enough Ns to trigger an ExpansionHunter error.

    • ExpansionHunterDenovo output post-processing:
  • annotateEHdnlocus_outliers - takes an ExpansionHunterDenovo outlier result table (locus outliers or case-control) as well as a bed file containing all simple repeats in the reference genome and, optionally, a gene models GTF file, a variant catalog of known-disease associated loci, and/or other bed files with genomic regions of interest. Outputs a new table where each EHdn outlier is annotated with multiple columns related to the provided reference data.

  • convertannotatedEHdnlocusoutlierstoexpansionhuntercatalog - takes the output table from annotateEHdnlocus_outliers and lets the user apply a range of filters before writing out the passing loci to an ExpansionHunter variant catalog.

    • gnomAD STR calls:
  • generategnomadjson - was used to combine the gnomAD STR calls into the files available for download on the gnomAD website.

    • post-process and combine ExpansionHunter outputs:
  • combinestrjsontotsv - takes a set of ExpansionHunter json output files and combines them into a single tsv table.

  • combinejsonto_tsv - takes a set of arbitrary json files that share the same schema and combines their top-level fields into a single tsv file.

  • copyEHvcffieldsto_json - takes the ExpansionHunter output vcf and json file for a given sample and copies fields that are only present in the vcf to the json file.

  • run_reviewer - takes ExpansionHunter output files for a single sample and runs REViewer on the subset of loci where the genotypes exceed locus-specific thresholds specified in the variant catalog.

    • format converters:
  • convertbedtoexpansionhuntervariantcatalog

  • convertexpansionhuntervariantcatalogtogangstr_spec

  • convertexpansionhuntervariantcatalogtohipstr_format

  • convertexpansionhuntervariantcatalogtotrgt_catalog

  • convertexpansionhuntervariantcatalogtolongtr_format

  • convertgangstrspectoexpansionhuntervariant_catalog

  • convertexpansionhunterdenovolocustsvto_bed

  • convertgangstrvcftoexpansionhunterjson

  • converthipstrvcftoexpansionhunterjson

  • convertstrlingcallstoexpansionhunterjson

Owner

  • Name: Broad Institute
  • Login: broadinstitute
  • Kind: organization
  • Location: Cambridge, MA

Broad Institute of MIT and Harvard

GitHub Events

Total
  • Issues event: 1
  • Watch event: 6
  • Issue comment event: 1
  • Push event: 78
  • Fork event: 1
Last Year
  • Issues event: 1
  • Watch event: 6
  • Issue comment event: 1
  • Push event: 78
  • Fork event: 1

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 936
  • Total Committers: 4
  • Avg Commits per committer: 234.0
  • Development Distribution Score (DDS): 0.093
Past Year
  • Commits: 292
  • Committers: 2
  • Avg Commits per committer: 146.0
  • Development Distribution Score (DDS): 0.007
Top Committers
Name Email Commits
bw2 b****d@g****m 849
bw2 w****d@b****g 83
Shubham Saini s****i@g****m 2
Hope Tanudisastro 4****o 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 8
  • Total pull requests: 11
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 5
  • Total pull request authors: 4
  • Average comments per issue: 2.88
  • Average comments per pull request: 0.64
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • themkdemiiir (4)
  • sphussey (1)
  • MKaandemir (1)
  • LiliiaY (1)
  • ChiaraF32 (1)
Pull Request Authors
  • bw2 (8)
  • hopedisastro (4)
  • shubhamsaini (1)
  • kew24 (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 117 last-month
  • Total docker downloads: 201
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 43
  • Total maintainers: 1
pypi.org: str-analysis

Utilities for analyzing short tandem repeats (STRs)

  • Versions: 43
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 117 Last month
  • Docker Downloads: 201
Rankings
Docker downloads count: 1.9%
Dependent packages count: 10.1%
Downloads: 11.0%
Average: 12.0%
Stargazers count: 13.2%
Forks count: 14.2%
Dependent repos count: 21.5%
Maintainers (1)
bw2
Last synced: 9 months ago

Dependencies

requirements.txt pypi
  • intervaltree >=3.1.0
  • numpy >=1.20.3
  • pandas >=1.1.4
  • pysam >=0.16.0.1
  • tqdm >=4.62.3
docker/Dockerfile docker
  • bitnami/minideb stretch build