Recent Releases of https://github.com/broadinstitute/tandem-repeat-catalog

https://github.com/broadinstitute/tandem-repeat-catalog - tandem repeat catalog + variation clusters v1.0.2

This version contains a minor update to the TRGT catalog, involving changes to the STRUC and MOTIF fields.

The latest versions of TRGT no longer rely on a STRUC field, so the updated catalog repurposes this field to annotate whether a locus is part of a variation cluster. Additionally, the MOTIF field now has consistent motif ordering when multiple motifs are specified.

- Python
Published by bw2 11 months ago

https://github.com/broadinstitute/tandem-repeat-catalog - tandem repeat catalog + variation clusters v1.0.1

This is an update to the variation_clusters_and_isolated_repeats_v1.hg38.TRGT.bed.gz catalog file.

The new version is the same as the original v1.0 file except we: 1) removed 103,015 rows that represented very large regions of variation which exceed the max region size of the variation cluster algorithm. 2) repurposed the STRUC field to contain labels (since this field is ignored by recent versions of TRGT).

- Python
Published by bw2 about 1 year ago

https://github.com/broadinstitute/tandem-repeat-catalog - repeat catalog v1.0

Genome-wide TR catalog based on combining the following 4 catalogs in order: 1) Known disease-associated loci 2) Illumina catalog of 174k polymorphic repeats 3) Catalog of all perfect repeats in hg38 that span at least 9bp in the reference and consist of at least 3 repeats of some motif that is between 2bp (dinucleotide) and 1000bp in size. This catalog was computed using ColabRepeatFinder 4) Catalog of polymorphic loci in 51 HPRC samples computed using the methods described in [Weisburd et al. 2023]

The merging procedure involved taking all loci from the 1st catalog, then all non-duplicate loci from the 2nd catalog (where a locus was considered a duplicate if it overlapped a previously-added locus by 66% or more and had the same motif after cyclic shift as that other locus), then all non-duplicate loci from the 3rd catalog, and so on.

The numbers (and %) of loci in the combined catalog that were added from each of the source catalogs were as follows:

``` 82 out of 3,286,072 ( 0.0%) from 1. known disease-associated loci 173,879 out of 3,286,072 ( 5.3%) from 2. Illumina catalog of 174k polymorphic loci 3,053,992 out of 3,286,072 (92.9%) from 3. perfect repeats in hg38 58,119 out of 3,286,072 ( 1.8%) from 4. polymorphic loci in 51 HPRC samples

```

Changes relative to v0.9: - switch order of source catalogs, so that loci from the Illumina catalog are added ahead of loci from the catalog of perfect repeats in hg38 - in all locus definitions, look for and simplify motifs that are themselves perfect repeats of smaller motifs, for example, replacing (TTTT)* with (T)* and (CAGCAG)* with (CAG)*

- Python
Published by bw2 almost 2 years ago

https://github.com/broadinstitute/tandem-repeat-catalog - repeat catalog v0.9

- Python
Published by bw2 almost 2 years ago