ge3net

Inferring Continuous Population Structure Coordinates Along the Genome

https://github.com/richrast/ge3net

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Inferring Continuous Population Structure Coordinates Along the Genome

Basic Info
  • Host: GitHub
  • Owner: RichRast
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 54.2 MB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created almost 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Ge3Net

Our work targets the disparities in genomic medicine that are emerging between various worldwide populations.

Personalized genomic predictions are revolutionizing medical diagnosis and treatment. These predictions rely on associations between health outcomes (disease severity, drug response, cancer risk) and correlated neighboring positions along the genome. However, these local genomic correlations differ widely amongst worldwide populations, necessitating that genetic research include all human populations. For admixed populations further computational challenges arise, because individuals of diverse combined ancestries inherit genomic segments from multiple ancestral populations. To extend population-specific associations to such individuals, their multiple ancestries must be identified along their genome (local ancestry inference, LAI). Here we introduce Ge3Net, Genomic Geographic Geometric Network, the first LAI method to identify ancestral origin of each segment of an individual's genome as a continuous coordinate, rather than an ethnic category, using a transformer based framework, yielding higher resolution local ancestry inference, and eliminating a need for ethnic labels.

By annotating ancestry along the genome accurately, and with simple to use coordinates, we hope to enable genetic researchers to incorporate ancestry-specific genetic effects into their future models with ease. This could help to extend the benefits of such research and models to more diverse cohorts. Ge3Net is particularly targeted at improving genetic modeling applied to admixed individuals. Such individuals inherit genomic segments from diverse populations that have very different genetic correlation (linkage). This ancestry-specific structure must be identified for each segment of the genome to apply appropriate ancestry-specific risk models.

Paper

The paper can be accessed from Ge3Net.pdf. A short version of this work was presented at Neurips Learning Meaningful Representations of Life, 2020

Learnt Representations from Ge3Net

Learnt Representations from Ge3Net

Demo

Below is an example of geographic predictions (x and y axes) from Ge3Net with the ground-truth ancestral origin for each piece of an admixed individual's chromosome 22 shown extending along the z axis (Yoruba segment orange, Spanish segment blue, and Vietnamese segment green) and the predicted ancestral location for each piece of the chromosome shown alongside in pink.

Demo

Repo

Experiments were run on three genotypes - humans, dogs and ancient and for geography, unsupervised space constructed from pca and umap. 1. Build labels by running the script buildLabels.py 2. For training, run trainer.py 3. For inference only with a pre-trained model, run inference.py

Ge3Net mdefault model is src\models\Model_H.py

Acknowledgements

Here we reference publicly available third party code implementations that are used/modified in our code base BOCD implementation available at https://github.com/gwgundersen/bocd based on the original Bayesian Changepoint Detection paper https://arxiv.org/abs/0710.3742 Pyadmix module implementation from https://github.com/AI-sandbox/gnomix

Citation

If you find Ge3Net useful for your research, please consider citing our paper/software: @article{Rastogi_Ge3Net_Inferring_Continuous_2021, author = {Rastogi, Richa and Kumar, Arvind S. and Hilmarsson, Helgi and Bustamante, Carlos D. and Montserrat, Daniel Mas and Ioannidis, Alexander G.}, doi = {10.5281/zenodo.7837947}, title = {{Ge3Net: Inferring Continuous Population Structure Coordinates Along the Genome }}, year = {2021} } A short version of this work was presented at Neurips Learning Meaningful Representations of Life, 2020

Feedback

Please send feedback/issues related to this repository or the paper to here

Owner

  • Login: RichRast
  • Kind: user

GitHub Events

Total
  • Delete event: 1
  • Push event: 1
  • Pull request event: 2
  • Create event: 1
Last Year
  • Delete event: 1
  • Push event: 1
  • Pull request event: 2
  • Create event: 1

Dependencies

requirements.txt pypi
  • PyYAML ==5.4.1
  • dataclasses ==0.8
  • matplotlib ==3.3.3
  • numpy ==1.22.0
  • optuna ==2.4.0
  • pandas ==1.1.5
  • plotly ==4.14.1
  • pytest ==6.2.4
  • scikit_allel ==1.3.2
  • scikit_learn ==0.24.2
  • scipy ==1.10.0
  • seaborn ==0.11.0
  • torch ==1.13.1
  • umap ==0.1.1
  • umap_learn ==0.5.1
  • wandb ==0.10.22