unsupervised-learning-of-aging-principles

Final project for Skoltech Computational Biology of Aging 2023 course.

https://github.com/dont-care-didnt-ask/unsupervised-learning-of-aging-principles

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: biorxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Final project for Skoltech Computational Biology of Aging 2023 course.

Basic Info

Host: GitHub
Owner: Dont-Care-Didnt-Ask
License: gpl-3.0
Language: Jupyter Notebook
Default Branch: master
Size: 4.41 MB

Statistics

Stars: 0
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed about 3 years ago

Metadata Files

Readme License Citation

Unsupervised learning of aging principles from longitudinal data

Introduction

Wouldn’t you like to know, how long will you live? And if somebody asked you to give an estimate, what would you do? The obvious thing is to rely on chronological age. Surprisingly, it may not be the best or most convenient approach.

Biomarker of aging is a measurable characteristic of a living creature, which predicts longevity and future functional capacity better, then chronological age. Discovering good biomarkers of aging is crucial for testing ways to extend lifespan, since the change in biomarkers would be observable throughout the lifespan of an organism. This in term allows for faster research iterations and brings us closer to the world without aging.

Authors of paper "Identification of a blood test-based biomarker of aging through deep learning of aging trajectories in large phenotypic datasets of mice" claim to find a new biomarker — dynamic Frailty Index (dFI), which correlates well with existing ones, and has the benefit of being computed from easily measurable blood parameters. Moreover, this biomarker was found in unsupervised fashion, by analyzing a number of cross-sectional and longitudinal datasets. (Longitudinal data refers to data that is collected over a period of time, typically in the context of observing changes in a particular subject or group of subjects. A cross-sectional dataset is a collection of data from different individuals or groups at the same point in time. It can be used to compare and analyze differences between these individuals or groups.)

The theory behind this indicator is the science of dynamical systems. The idea of the order parameter associated with instability is a generalization of a concept initially introduced to describe phase transitions in thermodynamics. The idea was further developed for applications to open non-equilibrium systems: next to the critical point, the dynamics of stable components of a system is completely determined by the "slow" dynamics of only a few "order-parameters". The dFI identiﬁed as an approximation to the order parameter is a fundamental macroscopic property of the aging organism as a non-equilibrium system.

Results

Our task is basically to reproduce the results of this paper. It includes training a deep learning model on the mouse phenome database and checking correlations of dFI with known aging biomarkers. The official implementation is available on github. It is written in python and Tensorflow. It took quite some time to configure and launch it, but we managed to do it in the end.

In particular, we came up with a recipe for creating a conda environment, which hopefully will make reproduction easier (the instructions are in ORIG_README.md).

Also the link to the dataset Yuan2_strainmeans.csv was not working. We found it in the internet, but we are not sure, that this is the same version, since results of our run differ a bit from what we saw in the paper.

Regarding experiments: in the paper, dFI was compared to the following markers: - PFI (physiological frailty index), - RDW (red blood cell distribution width), - BW (body weight), - C-reactive protein, - murine chemokine CXCL1, - total luciferase ﬂux.

Overall, they turned out to be strongly associated. Also, authors showed, that dFI reflects lifespan-modulating interventions: a high-fat diet increases the dFI (for male mices), and rapamycin treatment decreases it.

We rerun the notebooks in the repository, which perform all of aforementioned experiments. The results were slightly different numerically, but all of the author’s claims still hold. To avoid populating the repo with tons of pictures, we refrain ourselves to one example (the rest could be obtained by running the corresponding notebook).

Let's look at the change between to consecutive measurements of dFI in groups with and without rapamycin treatment (rapamycin is a drug, which has been shown to decelerate aging in mice).

Change between two consecutive measurements of dFI

In the paper, page 8, figure 7c, authors have another result:

We can see, that absolute values are different, but their relative position (and therefore conclusion) is the same. The difference might be due to discrepancy in dataset version, or due to stochasticity in neural net training process (on different hardware results might vary). In the end, absolute value of a biomarker is not as important, as it's behaviour as a whole.

Discussion

Discuss your results here and answer additional questions from questions/tasks section of project proposal.

The paradigm of aging

There are two major paradigms around aging: the first one is seeing aging as a consequence of developmental process, for example, some mutations can provide some advantages early in life, but become pathological later in life. The second paradigm is aging resulting from a stochastic process of damage accumulation. Additionally, there is a view that aging is not and cannot be programmed. Instead, aging is a continuation of developmental growth, driven by genetic pathways such as mTOR. This is often misunderstood as a sort of programmed aging. In contrast, aging is a purposeless quasi-program or, figuratively, a shadow of actual programs.

The authors precisely state that they assume aging is a particular case of the dynamics of a complex system unfolding near a bifurcation or a tipping point on the boundary of a dynamic stability region. Aging results from inherent dynamic instability of the underlying regulatory networks and manifests itself as small deviations of the organism state variables (physiological indices) get exponentially ampliﬁed and lead to the exponential acceleration of mortality. At the age approximately corresponding to the average lifespan in the population, non-linear effects take over the dynamics of dFI, and the organism state deviates from its youthful state even faster than exponentially. Such a situation is incompatible with survival and hence cannot be observed in the data. According to the model and the experiment in the paper, death occurs quickly once the maximum dFI level is reached at some point in the life history of the animal.

Therefore, we conclude that the authors follow the stochastic paradigm of aging.

Weak points and improvements

They use PCA to analize the data. However, there are more advanced methods like t-SNE or UMAP, which do not assume linearity.

Next, they aimed to build easy-to-compute biomarker, but now they use 12 features. Probably, it is not a problem to collect from one blood test, but it seems logical to try to reduce the amount of used features. Maybe, in doing that we'll find out, that there is no need for deep autoencoders and a couple of linear layers will do just fine.

Speaking of which, it seems reasonable to tweak the model architecture. Now they only use previous value of dFI to predict the next one for longitudinal data. That corresponds to Markov assumption. However, it might be beneficial to consider the whole sequence, available at current time. This way the model will become more flexible, and hopefully this will transfer into better quality of the index itself.

Also, the authors themselves point out that the nonlinear dynamics of the order parameter are crucial for explaining mortality. The bigger the animal lifespan in units of the mortality rate doubling time, the more we can neglect the non-linearity. However, if we do not neglect it by increasing the rank of the AR model, we might obtain better variants of dFI.

Why do authors call their approach unsupervised

The approach belongs to the class of unsupervised learning algorithms because it does not require labels associated with age, mortality, and morbidity. The authors create a deep neural network composed of denoising auto-encoder, whose function is to perform dimensionality reduction, and of auto-regressor, which models the stochastic dynamics of the dFI. dFI is an output of the model but, at the same time, not the result of prediction of any label.

Credits

Managing project: Mikhail Seleznyov.

The report text: Mikhail Zybin, Mikhail Seleznyov.

Reproducing results: Nikolay Kotoyants, Mikhail Seleznyov.

References

"Identification of a blood test-based biomarker of aging through deep learning of aging trajectories in large phenotypic datasets of mice"

Owner

Login: Dont-Care-Didnt-Ask
Kind: user

Repositories: 17
Profile: https://github.com/Dont-Care-Didnt-Ask

DL researcher, Data Science student, chess lover.

GitHub Events

Total

Last Year

Committers

Last synced: over 1 year ago

All Time

Total Commits: 26
Total Committers: 4
Avg Commits per committer: 6.5
Development Distribution Score (DDS): 0.308

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Mikhail Seleznyov	s**8@g**m	18
cataclysmus	c****s	4
Mikhail Zybin	z**8@m**u	3
Nikolay	7****c	1

Committer Domains (Top 20 + Academic)

mail.ru: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

unsupervised-learning-of-aging-principles

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Unsupervised learning of aging principles from longitudinal data

Introduction

Results

Discussion

The paradigm of aging

Weak points and improvements

Why do authors call their approach unsupervised

Credits

References

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels