sckagae
Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference using Expression matrix and known-TFs.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Repository
Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference using Expression matrix and known-TFs.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
scKAGAE: Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference
See paper for more information on its implementation.
scKAGAE is a Kolmogorov-Arnold-based model that is designed to infer Gene Regulatory Networks (GRNs) based on scRNA-seq datasets. It uses both the expression matrix and known Transcription Factors as input, and provides you a ranking of all the genes based on the importance of their relationship.
Abstract
Motivation: Gene Regulatory Networks (GRNs) are complex gene interaction networks that are executed at the local and global level. This makes it challenging to reconstruct those networks when relying on information that is difficult to interpret. The current models meant for GRN reconstruction have shown promise, but are limited by their ability to only reconstruct part of a network. This challenge is reinforced when most of the tasks involves unsupervised or semi-supervised learning at most.
Results: We present scKAGAE; A Kolmogorov-Arnold Graph Auto-Encoder which introduces the new Kolmogorov-Arnold architecture to GRN inference. We evaluate our model on both semi-supervised and unsupervised tasks, and found that it performs similarly against alternative models for semi-supervised tasks. For unsupervised tasks, we present factors that could’ve led to poor inference and what should potentially be investigated to make the model competitive. For inputs, we use a given expression matrix and add known-TF names in the semi-supervised tasks. Our model overall has shown a smaller architectural overhead needing only one Convolutional layer for inference, but suffers from instability that rises from its simper architecture.

Training and inference
Because the model's architecture is exposed in scKAGAE.ipynb, you can easily adapt it for your use. The current parameters are optimized for PBMC-CTL dataset, so you may need to adjust the parameters given other datasets. Be aware that the model is sensitive to hyperparameter changes, so it would be recommended to leave them as is in the current implementation.
Once the data is in the appropriate format, you may run training. It is recommended to use PyTorch with GPU-acceleration (e.g.: CUDA or MPS). The current implementation was tested on Apple's M3 Pro with MacOS.
Requirements
- Python 3.10 (tested on 3.10.15)
- GraphKan (https://github.com/Ryanfzhang/GraphKan)
- PyTorch Geometric (https://pytorch-geometric.readthedocs.io/en/stable/notes/installation.html)
- PyTorch
- Sci-kit learn
- SciPy
- Plotly
- tqdm
- Pandas
- Numpy
Data pre-processing
In order for the model to work properly, the data needs to be pre-processed in a specific format for it to work seamlessly. The following example uses the notebook's default where it pre-process the PBMC-CTL Dataset.
PBMC-CTL dataset from: https://epbmc.immunospot.com
Pre-processing PBMC-CTL Dataset
This step is meant to convert PBMC-CTL data files into a format that can be processed when processing the correlation matrix from the inputs. This step is necessary because the Graph Auto-Encoder model needs a prior graph to run training on.
In notebook: ```Python3 scaler = MinMaxScaler() # Call the sk-learn scaler
Prepare data
datapth = 'data/PBMC-CTL1000cells.csv' X = pd.readcsv(datapth) y = pd.readcsv('data/PBMC-CTLImposedGRN.csv') y.columns = ['TF', 'target']
Prepare X
X = X.iloc[:, 1:] # Fix index (PBMC-CTL) X.set_index(X['Unnamed: 0'].values, inplace=True) # Fix index (PBMC-CTL)
Shuffle
X = X.sample(frac=1, axis=0, random_state=2) # Shuffle dataset
X = X.T # (cells, genes) # Transpose to necessary format
Normalize
X = pd.DataFrame(scaler.fit_transform(X), index=X.index, columns=X.columns) # Normalize the dataset ```
This steps loads and prepare the dataset before the next step. Afterwards, we need to generate the necessary meta-data that is necessary for training (e.g.: the gene names) followed by preparing the known-TF names that are necessary for the semi-supervised task.
```Python3
Gene Names
genenames = X.columns # Extract the gene names genenamedic = {gene: idx for idx, gene in enumerate(genenames)} # Gene name to index dictionary map (necessary for labeling) numgenes = len(genenames) # Set the num_genes for a few methods
TFs
tfnames = list(set(y['TF']) & set(genenames)) # Extract the known TF names tfindices = [genenamedic[tf] for tf in tfnames] # Get the TF indices ```
If you wish to run this on your own dataset, make sure to define all the above meta data, as well as providing a DataFrame X (cells x genes) for the expression matrix and y with the columns TF and target containing gene names in both columns.
Citation
Please cite our paper if you wish to cite this repository.
bibtex
@article{duchesne2024sckagae,
title={Kolmogorov-Arnold Graph Auto-Encoder for Gene Regulatory Networks inference},
author={Duchesne, Loïc},
pages={1--7},
year={2024},
month={12},
keywords = {Gene Regulatory Network, Kolmogorov-Arnold Network, Graph Auto-Encoder}
}
Owner
- Name: Loïc
- Login: loicduchesne
- Kind: user
- Location: Montreal
- Website: https://www.linkedin.com/in/loicduchesne/
- Repositories: 1
- Profile: https://github.com/loicduchesne
Electrical Engineering & Physics student @ McGill University
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Duchesne"
given-names: "Loïc"
title: "Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference"
version: 2.0.4
url: "https://github.com/loicduchesne/scKAGAE"
preferred-citation:
type: article
authors:
- family-names: "Duchesne"
given-names: "Loïc"
month: 12
start: 1 # First page number
end: 7 # Last page number
title: "Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference"
year: 2024
GitHub Events
Total
- Push event: 3
- Create event: 2
Last Year
- Push event: 3
- Create event: 2