ge3net
Inferring Continuous Population Structure Coordinates Along the Genome
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary
Repository
Inferring Continuous Population Structure Coordinates Along the Genome
Basic Info
- Host: GitHub
- Owner: RichRast
- License: mit
- Language: Python
- Default Branch: main
- Size: 54.2 MB
Statistics
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Ge3Net
Our work targets the disparities in genomic medicine that are emerging between various worldwide populations.
Personalized genomic predictions are revolutionizing medical diagnosis and treatment. These predictions rely on associations between health outcomes (disease severity, drug response, cancer risk) and correlated neighboring positions along the genome. However, these local genomic correlations differ widely amongst worldwide populations, necessitating that genetic research include all human populations. For admixed populations further computational challenges arise, because individuals of diverse combined ancestries inherit genomic segments from multiple ancestral populations. To extend population-specific associations to such individuals, their multiple ancestries must be identified along their genome (local ancestry inference, LAI). Here we introduce Ge3Net, Genomic Geographic Geometric Network, the first LAI method to identify ancestral origin of each segment of an individual's genome as a continuous coordinate, rather than an ethnic category, using a transformer based framework, yielding higher resolution local ancestry inference, and eliminating a need for ethnic labels.
By annotating ancestry along the genome accurately, and with simple to use coordinates, we hope to enable genetic researchers to incorporate ancestry-specific genetic effects into their future models with ease. This could help to extend the benefits of such research and models to more diverse cohorts. Ge3Net is particularly targeted at improving genetic modeling applied to admixed individuals. Such individuals inherit genomic segments from diverse populations that have very different genetic correlation (linkage). This ancestry-specific structure must be identified for each segment of the genome to apply appropriate ancestry-specific risk models.
Paper
The paper can be accessed from Ge3Net.pdf. A short version of this work was presented at Neurips Learning Meaningful Representations of Life, 2020
Learnt Representations from Ge3Net
Demo
Below is an example of geographic predictions (x and y axes) from Ge3Net with the ground-truth ancestral origin for each piece of an admixed individual's chromosome 22 shown extending along the z axis (Yoruba segment orange, Spanish segment blue, and Vietnamese segment green) and the predicted ancestral location for each piece of the chromosome shown alongside in pink.

Repo
Experiments were run on three genotypes - humans, dogs and ancient and for geography, unsupervised space constructed from pca and umap. 1. Build labels by running the script buildLabels.py 2. For training, run trainer.py 3. For inference only with a pre-trained model, run inference.py
Ge3Net mdefault model is src\models\Model_H.py
Acknowledgements
Here we reference publicly available third party code implementations that are used/modified in our code base BOCD implementation available at https://github.com/gwgundersen/bocd based on the original Bayesian Changepoint Detection paper https://arxiv.org/abs/0710.3742 Pyadmix module implementation from https://github.com/AI-sandbox/gnomix
Citation
If you find Ge3Net useful for your research, please consider citing our paper/software:
@article{Rastogi_Ge3Net_Inferring_Continuous_2021,
author = {Rastogi, Richa and Kumar, Arvind S. and Hilmarsson, Helgi and Bustamante, Carlos D. and Montserrat, Daniel Mas and Ioannidis, Alexander G.},
doi = {10.5281/zenodo.7837947},
title = {{Ge3Net: Inferring Continuous Population Structure Coordinates Along the Genome }},
year = {2021}
}
A short version of this work was presented at Neurips Learning Meaningful Representations of Life, 2020
Feedback
Please send feedback/issues related to this repository or the paper to here
Owner
- Login: RichRast
- Kind: user
- Repositories: 1
- Profile: https://github.com/RichRast
GitHub Events
Total
- Delete event: 1
- Push event: 1
- Pull request event: 2
- Create event: 1
Last Year
- Delete event: 1
- Push event: 1
- Pull request event: 2
- Create event: 1
Dependencies
- PyYAML ==5.4.1
- dataclasses ==0.8
- matplotlib ==3.3.3
- numpy ==1.22.0
- optuna ==2.4.0
- pandas ==1.1.5
- plotly ==4.14.1
- pytest ==6.2.4
- scikit_allel ==1.3.2
- scikit_learn ==0.24.2
- scipy ==1.10.0
- seaborn ==0.11.0
- torch ==1.13.1
- umap ==0.1.1
- umap_learn ==0.5.1
- wandb ==0.10.22