sckagae

Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference using Expression matrix and known-TFs.

https://github.com/loicduchesne/sckagae

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference using Expression matrix and known-TFs.

Basic Info
  • Host: GitHub
  • Owner: loicduchesne
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 1.9 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

scKAGAE: Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference

See paper for more information on its implementation.

scKAGAE is a Kolmogorov-Arnold-based model that is designed to infer Gene Regulatory Networks (GRNs) based on scRNA-seq datasets. It uses both the expression matrix and known Transcription Factors as input, and provides you a ranking of all the genes based on the importance of their relationship.

Abstract

Motivation: Gene Regulatory Networks (GRNs) are complex gene interaction networks that are executed at the local and global level. This makes it challenging to reconstruct those networks when relying on information that is difficult to interpret. The current models meant for GRN reconstruction have shown promise, but are limited by their ability to only reconstruct part of a network. This challenge is reinforced when most of the tasks involves unsupervised or semi-supervised learning at most.

Results: We present scKAGAE; A Kolmogorov-Arnold Graph Auto-Encoder which introduces the new Kolmogorov-Arnold architecture to GRN inference. We evaluate our model on both semi-supervised and unsupervised tasks, and found that it performs similarly against alternative models for semi-supervised tasks. For unsupervised tasks, we present factors that could’ve led to poor inference and what should potentially be investigated to make the model competitive. For inputs, we use a given expression matrix and add known-TF names in the semi-supervised tasks. Our model overall has shown a smaller architectural overhead needing only one Convolutional layer for inference, but suffers from instability that rises from its simper architecture.

model

Training and inference

Because the model's architecture is exposed in scKAGAE.ipynb, you can easily adapt it for your use. The current parameters are optimized for PBMC-CTL dataset, so you may need to adjust the parameters given other datasets. Be aware that the model is sensitive to hyperparameter changes, so it would be recommended to leave them as is in the current implementation.

Once the data is in the appropriate format, you may run training. It is recommended to use PyTorch with GPU-acceleration (e.g.: CUDA or MPS). The current implementation was tested on Apple's M3 Pro with MacOS.

Requirements

  • Python 3.10 (tested on 3.10.15)
  • GraphKan (https://github.com/Ryanfzhang/GraphKan)
  • PyTorch Geometric (https://pytorch-geometric.readthedocs.io/en/stable/notes/installation.html)
  • PyTorch
  • Sci-kit learn
  • SciPy
  • Plotly
  • tqdm
  • Pandas
  • Numpy

Data pre-processing

In order for the model to work properly, the data needs to be pre-processed in a specific format for it to work seamlessly. The following example uses the notebook's default where it pre-process the PBMC-CTL Dataset.

PBMC-CTL dataset from: https://epbmc.immunospot.com

Pre-processing PBMC-CTL Dataset

This step is meant to convert PBMC-CTL data files into a format that can be processed when processing the correlation matrix from the inputs. This step is necessary because the Graph Auto-Encoder model needs a prior graph to run training on.

In notebook: ```Python3 scaler = MinMaxScaler() # Call the sk-learn scaler

Prepare data

datapth = 'data/PBMC-CTL1000cells.csv' X = pd.readcsv(datapth) y = pd.readcsv('data/PBMC-CTLImposedGRN.csv') y.columns = ['TF', 'target']

Prepare X

X = X.iloc[:, 1:] # Fix index (PBMC-CTL) X.set_index(X['Unnamed: 0'].values, inplace=True) # Fix index (PBMC-CTL)

Shuffle

X = X.sample(frac=1, axis=0, random_state=2) # Shuffle dataset

X = X.T # (cells, genes) # Transpose to necessary format

Normalize

X = pd.DataFrame(scaler.fit_transform(X), index=X.index, columns=X.columns) # Normalize the dataset ```

This steps loads and prepare the dataset before the next step. Afterwards, we need to generate the necessary meta-data that is necessary for training (e.g.: the gene names) followed by preparing the known-TF names that are necessary for the semi-supervised task.

```Python3

Gene Names

genenames = X.columns # Extract the gene names genenamedic = {gene: idx for idx, gene in enumerate(genenames)} # Gene name to index dictionary map (necessary for labeling) numgenes = len(genenames) # Set the num_genes for a few methods

TFs

tfnames = list(set(y['TF']) & set(genenames)) # Extract the known TF names tfindices = [genenamedic[tf] for tf in tfnames] # Get the TF indices ```

If you wish to run this on your own dataset, make sure to define all the above meta data, as well as providing a DataFrame X (cells x genes) for the expression matrix and y with the columns TF and target containing gene names in both columns.


Citation

Please cite our paper if you wish to cite this repository.

bibtex @article{duchesne2024sckagae, title={Kolmogorov-Arnold Graph Auto-Encoder for Gene Regulatory Networks inference}, author={Duchesne, Loïc}, pages={1--7}, year={2024}, month={12}, keywords = {Gene Regulatory Network, Kolmogorov-Arnold Network, Graph Auto-Encoder} }

Owner

  • Name: Loïc
  • Login: loicduchesne
  • Kind: user
  • Location: Montreal

Electrical Engineering & Physics student @ McGill University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Duchesne"
  given-names: "Loïc"
title: "Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference"
version: 2.0.4
url: "https://github.com/loicduchesne/scKAGAE"
preferred-citation:
  type: article
  authors:
    - family-names: "Duchesne"
      given-names: "Loïc"
  month: 12
  start: 1 # First page number
  end: 7 # Last page number
  title: "Kolmogorov-Arnold Graph Auto-Encoder for semi-supervised Gene Regulatory Networks inference"
  year: 2024

GitHub Events

Total
  • Push event: 3
  • Create event: 2
Last Year
  • Push event: 3
  • Create event: 2