https://github.com/bioinfomachinelearning/grnformer
Transformer models for predicting gene regulatory networks from omics data
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Repository
Transformer models for predicting gene regulatory networks from omics data
Basic Info
- Host: GitHub
- Owner: BioinfoMachineLearning
- License: mit
- Language: Python
- Default Branch: main
- Size: 254 MB
Statistics
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
GRNFormer - Accurate Gene Regulatory Network Inference Using Graph Transformer
GRNFormer, is an advanced variaional graph transformer autoencoder model designed to accurately infer regulatory relationships between transcription factors and target genes from single-cell RNA-seq transcriptomics data, while supporting generalization across species and cell types.

GRNFormer consists of three main novel designs: 1. TFWalker: a de-novo Transcription Factor (TF) centered subgraph sampling method to extract local or neighborhood co-expression of a transcription factor (TF) to facilitate GRN inference; 2. End-to-End Learning: - Gen-Transcoder: a transformer encoder representation module for encoding single-cell RNA-seq (scRNA-seq) gene expression data across different species and cell types - a graph transformer model with a GRNFormer Encoder and a variational GRNFormer decoder coupled with GRN inference module for the reconstruction of GRNs; 3. A novel inference strategy, to incorporate both node features and edge features to infer GRNs for given gene expression data of any given length.
Given a scRNA-seq dataset, a gene co-expression network is first constructed, from which a set of subgraphs are sampled by TF-Walker. The subgraphs are processed by GeneTranscoder to generate node and edge embeddings, which are fed to the variational graph transformer autoencoder to learn a GRN representation. The representation is used to infer a gene regulatory sub-network for each subgraph. The subnetworks are aggregated to construct a full GRN.
The repository contains codes and scripts to create datasets, train the model, evaluate and infer gene regulatory networks.
Installation
To use this repository, clone this repository to the required folder on your system using
``` git clone https://github.com/BioinfoMachineLearning/GRNformer.git
```
Set up conda environement and install necessary packages using the setup.sh script.
cd GRNformer
./setup.sh
Usage
Run GRNFormer inference on a a sample gene expression file.
``` python infergrn.py --expfile /path/to/expression-file.csv --tffile /path/to/lisoftfs.csv --outputfile /path/to/predicted-edges.csv
``` Run GRNFormer to evaluate if ground truth network in present
``` python evalgrn.py --expfile /path/to/expression-file.csv --tffile /path/to/lisoftfs.csv --netfile /path/to/ground-truth-network.csv --output_file /path/to/predicted-edges.csv
```
Evaluate model on test datasets
Download BEELINE sc-RNAseq dataset from the below script.
``` python scripts/collectdata.py --datadir ./Data/scRNA-seq/
``` The downloaded dataset can be found in the Data/scRNA-seq/ and network can be found in Data/scRNA-seq-Networks/ folders
Run evaluation pipeline on test datasets - with all subsets creations
``` python evaluationpipeline.py --datasetfile Data/mESC.csv --output_dir ./outputs/evluation
```
Build model from the scratch
Download BEELINE sc-RNAseq dataset from the below script.
``` python scripts/collectdata.py --datadir ./Data/scRNA-seq/
``` Note: Copying all the Regulatory Networks (Non-specific-Chip-seq-network.csv, STRING-network.csv, [cell-type]-Chip-seq-network.csv) and TFs.csv file to the corresponding cell-type datasets ./Data/scRNA-seq/[cell-type] before begining the training will be of convinience.
For generalization training, GRNformer combines all the networks for every training dataset.
python dataset_combiner.py --cell-type-network ./Data/scRNA-seq/hSEC/hESC-Chip-seq-network.csv --non-specific-network ./Data/scRNA-seq/hSEC/Non-specific-Chip-seq-network.csv --string-network ./Data/scRNA-seq/hESC/STRING-network.csv --output-file ./Data/scRNA-seq/hESC/hESC-combined.csv
Create dataset and splits for training, validation and testing
``` python createdataset.py --datasetdir ./Data/sc-RNAseq --datasetname ./Data/trainlist.csv
```
Training the model from scratch
``` python main.py fit --config/grnformer.yaml
```
Datasets Availability
BEELINE: https://zenodo.org/records/3701939 DREAM5 : https://www.synapse.org/Synapse:syn2787209/wiki/70351 PBMC3k : https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/pbmc3k preprocessed PBMC can be accessed from scanpy python package.
Cite Us
@article {Hegde2025.01.26.634966, author = {Hegde, Akshata and Cheng, Jianlin}, title = {GRNFomer: Accurate Gene Regulatory Network Inference Using Graph Transformer}, elocation-id = {2025.01.26.634966}, year = {2025}, doi = {10.1101/2025.01.26.634966}, publisher = {Cold Spring Harbor Laboratory}, URL = {https://www.biorxiv.org/content/early/2025/01/27/2025.01.26.634966}, eprint = {https://www.biorxiv.org/content/early/2025/01/27/2025.01.26.634966.full.pdf}, journal = {bioRxiv} }
Owner
- Name: BioinfoMachineLearning
- Login: BioinfoMachineLearning
- Kind: organization
- Repositories: 29
- Profile: https://github.com/BioinfoMachineLearning
GitHub Events
Total
- Issues event: 1
- Watch event: 6
- Issue comment event: 3
- Push event: 8
- Public event: 1
- Fork event: 1
Last Year
- Issues event: 1
- Watch event: 6
- Issue comment event: 3
- Push event: 8
- Public event: 1
- Fork event: 1
Dependencies
- aiohappyeyeballs ==2.4.3
- aiohttp ==3.10.8
- aiosignal ==1.3.1
- annotated-types ==0.7.0
- anyio ==4.6.0
- arrow ==1.3.0
- attrs ==24.2.0
- beautifulsoup4 ==4.12.3
- blessed ==1.20.0
- boto3 ==1.35.32
- botocore ==1.35.32
- click ==8.1.7
- contourpy ==1.3.0
- croniter ==1.3.15
- cycler ==0.12.1
- deepdiff ==8.0.1
- docker-pycreds ==0.4.0
- docstring-parser ==0.16
- editor ==1.6.6
- einops ==0.8.0
- fastapi ==0.115.0
- fonttools ==4.54.1
- frozenlist ==1.4.1
- gitdb ==4.0.11
- gitpython ==3.1.43
- h11 ==0.14.0
- importlib-resources ==6.4.5
- inquirer ==3.4.0
- itsdangerous ==2.2.0
- jmespath ==1.0.1
- joblib ==1.4.2
- jsonargparse ==4.33.2
- kiwisolver ==1.4.7
- lightning ==1.8.6
- lightning-cloud ==0.5.70
- markdown-it-py ==3.0.0
- matplotlib ==3.9.2
- mdurl ==0.1.2
- multidict ==6.1.0
- numpy ==1.23.5
- orderly-set ==5.2.2
- pandas ==2.2.3
- platformdirs ==4.3.6
- protobuf ==5.28.2
- psutil ==6.0.0
- pydantic ==1.10.2
- pydantic-core ==2.23.4
- pygments ==2.18.0
- pyjwt ==2.9.0
- pyparsing ==3.1.4
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.0.1
- python-multipart ==0.0.12
- pytz ==2024.2
- readchar ==4.2.0
- rich ==13.9.1
- rotary-embedding-torch ==0.8.4
- runs ==1.2.2
- s3transfer ==0.10.2
- scikit-learn ==1.5.2
- scipy ==1.14.1
- seaborn ==0.13.2
- sentry-sdk ==2.15.0
- setproctitle ==1.3.3
- six ==1.16.0
- smmap ==5.0.1
- sniffio ==1.3.1
- soupsieve ==2.6
- starlette ==0.38.6
- starsessions ==1.3.0
- tensorboardx ==2.6.2.2
- threadpoolctl ==3.5.0
- torch-geometric ==2.6.1
- traitlets ==5.14.3
- types-python-dateutil ==2.9.0.20241003
- typeshed-client ==2.7.0
- tzdata ==2024.2
- uvicorn ==0.31.0
- wandb ==0.18.3
- wcwidth ==0.2.13
- websocket-client ==1.8.0
- websockets ==13.1
- xmod ==1.8.1
- yarl ==1.13.1