helical-0.0.1a25
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: izumiando
- License: agpl-3.0
- Language: Python
- Default Branch: main
- Size: 1.77 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
What is Helical ?
Helical provides a framework for state-of-the-art pre-trained bio foundation models on genomics and transcriptomics modalities.
Helical simplifies the entire application lifecycle when building with bio foundation models. You will be able to: - Leverage the latest bio foundation models through our easy-to-use python package - Run example notebooks on key downstream tasks from examples
We will update this repo on a regular basis with new models, benchmarks, modalities and functions - so stay tuned. Let’s build the most exciting AI-for-Bio community together!
What's new?
Evo2
We have integrated Evo2 into our helical package and have made a model card for it in our Evo2 model folder. If you would like to test the model, take a look at our example notebook! Let us know what you think and we are happy to help you with the larger model (40B parameters!) if needed!
🧬 Introducing Helix-mRNA-v0: Unlocking new frontiers & use cases in mRNA therapy 🧬
We’re thrilled to announce the release of our first-ever mRNA Bio Foundation Model, designed to:
1) Be Efficient, handling long sequence lengths effortlessly 2) Balance Diversity & Specificity, leveraging a 2-step pre-training approach 3) Deliver High-Resolution, using single nucleotides as a resolution
Check out our blog post to learn more about our approach and read the model card to get started.
Installation
We recommend installing Helical within a conda environment with the commands below (run them in your terminal) - this step is optional:
conda create --name helical-package python=3.11.8
conda activate helical-package
To install the latest pip release of our Helical package, you can run the command below:
pip install helical
To install the latest Helical package, you can run the command below:
pip install --upgrade git+https://github.com/helicalAI/helical.git
Alternatively, clone the repo and install it:
git clone https://github.com/helicalAI/helical.git
pip install .
[Optional] To install mamba-ssm and causal-conv1d use the command below:
pip install helical[mamba-ssm]
or in case you're installing from the Helical repo cloned locally:
pip install .[mamba-ssm]
Note:
- Make sure your machine has GPU(s) and Cuda installed. Currently this is a requirement for the packages mamba-ssm and causal-conv1d.
- The package causal_conv1d requires torch to be installed already. First installing helical separately (without [mamba-ssm]) will install torch for you. A second installation (with [mamba-ssm]), installs the packages correctly.
Singularity (Optional)
If you desire to run your code in a singularity file, you can use the singularity.def file and build an apptainer with it:
apptainer build --sandbox singularity/helical singularity.def
and then shell into the sandbox container (use the --nv flag if you have a GPU available):
apptainer shell --nv --fakeroot singularity/helical/
RNA models:
DNA models:
Demo & Use Cases
To run examples, be sure to have installed the Helical package (see Installation) and that it is up-to-date.
You can look directly into the example folder above and download the script of your choice, look into our documentation for step-by-step guides or directly clone the repository using:
git clone https://github.com/helicalAI/helical.git
Within the examples/notebooks folder, open the notebook of your choice. We recommend starting with Quick-Start-Tutorial.ipynb
Current Examples:
| Example | Description | Colab |
| ----------- | ----------- |----------- |
|Quick-Start-Tutorial.ipynb| A tutorial to quickly get used to the helical package and environment. | |
|Helix-mRNA.ipynb|An example of how to use the Helix-mRNA model.|
|
|Geneformer-vs-UCE.ipynb | Zero-Shot Reference Mapping with Geneformer & UCE and compare the outcomes. |
|
|Hyena-DNA-Inference.ipynb|An example how to do probing with HyenaDNA by training a neural network on 18 downstream classification tasks.|
|
|Cell-Type-Annotation.ipynb|An example how to do probing with scGPT by training a neural network to predict cell type annotations.|
|
|Cell-Type-Classification-Fine-Tuning.ipynb|An example how to fine-tune different models on classification tasks.|
|
|HyenaDNA-Fine-Tuning.ipynb|An example of how to fine-tune the HyenaDNA model on downstream benchmarks.|
|
|Cell-Gene-Cls-embedding-generation.ipynb|A notebook explaining the different embedding modes of single cell RNA models.|
|
Stuck somewhere ? Other ideas ?
We are eager to help you and interact with you. Reach out via support@helical-ai.com. You can also open github issues here.
Why should I use Helical & what to expect in the future?
If you are (or plan to) working with bio foundation models s.a. Geneformer or UCE on RNA and DNA data, Helical will be your best buddy! We provide and improve on: - Up-to-date model library - A unified API for all models - User-facing abstractions tailored to computational biologists, researchers & AI developers - Innovative use case and application examples and ideas - Efficient data processing & code-base
We will continuously upload the latest model, publish benchmarks and make our code more efficient.
Acknowledgements
A lot of our models have been published by talend authors developing these exciting technologies. We sincerely thank the authors of the following open-source projects:
Licenses
You can find the Licenses for each model implementation in the model repositories:
Citation
Please use this BibTeX to cite this repository in your publications:
bibtex
@software{allard_2024_13135902,
author = {Helical Team},
title = {helicalAI/helical: v0.0.1a14},
month = nov,
year = 2024,
publisher = {Zenodo},
version = {0.0.1a14},
doi = {10.5281/zenodo.13135902},
url = {https://doi.org/10.5281/zenodo.13135902}
}
Owner
- Login: izumiando
- Kind: user
- Repositories: 1
- Profile: https://github.com/izumiando
Citation (CITATION.bib)
@software{allard_2024_13135902,
author = {Helical Team},
title = {helicalAI/helical: v0.0.1a14},
month = nov,
year = 2024,
publisher = {Zenodo},
version = {0.0.1a14},
doi = {10.5281/zenodo.13135902},
url = {https://doi.org/10.5281/zenodo.13135902}
}
GitHub Events
Total
- Push event: 1
- Create event: 2
Last Year
- Push event: 1
- Create event: 2
Dependencies
- actions/checkout v2 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- actions/checkout v2 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- pytorch/pytorch 2.5.0-cuda12.4-cudnn9-runtime build
- pytorch/pytorch 2.6.0-cuda12.4-cudnn9-devel build
- mkdocs ==1.6.1
- mkdocs-jupyter ==0.25.1
- mkdocs-material ==9.5.44
- mkdocstrings-python ==1.12.2
- accelerate ==1.4.0
- anndata ==0.11
- azure-core ==1.30.1
- azure-identity ==1.16.1
- azure-storage-blob ==12.19.1
- datasets ==2.20.0
- einops ==0.8.0
- gitpython ==3.1.43
- hydra-core ==1.3.2
- loompy ==3.0.7
- louvain ==0.8.2
- numpy ==1.26.4
- omegaconf ==2.3.0
- pandas ==2.2.2
- pyensembl *
- requests ==2.32.2
- scib ==1.1.5
- scikit-learn >=1.5.0
- scikit-misc ==0.3.1
- scipy ==1.13.1
- torch ==2.5.1
- torchvision ==0.20.1
- transformers ==4.49.0
- nbmake ==1.5.4 development
- pytest ==8.2.0 development
- pytest-cov ==5.0.0 development
- pytest-mock ==3.14.0 development
- accelerate ==1.4.0
- anndata ==0.10.7
- azure-core ==1.30.1
- azure-identity ==1.16.1
- azure-storage-blob ==12.19.1
- biopython ==1.85
- causal-conv1d ==1.5.0.post8
- datasets ==2.14.7
- einops ==0.8.0
- gitpython ==3.1.43
- hydra-core ==1.3.2
- loompy ==3.0.7
- louvain ==0.8.2
- mamba-ssm ==2.2.2
- numpy ==1.26.4
- omegaconf ==2.3.0
- pandas ==2.2.2
- pyensembl ==2.3.13
- requests ==2.32.2
- scib ==1.1.5
- scikit-learn ==1.5.0
- scikit-misc ==0.3.1
- scipy ==1.13.1
- torch ==2.6.0
- torchvision ==0.21.0
- transformers ==4.49.0