Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Fundacion-de-Neurociencias
- License: mit
- Language: Python
- Default Branch: main
- Size: 13.8 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
GeneForgeLang (GFL) v1.0.0 🧬
A powerful Domain-Specific Language (DSL) for genomic workflows and bioinformatics applications with AI-powered analysis capabilities.
GeneForgeLang (GFL) is a comprehensive framework for specifying, validating, and executing genomic workflows. It combines the simplicity of YAML-like syntax with advanced features like AI-powered inference, plugin extensibility, and web-based interfaces.
✨ Key Features
🔬 Genomic Workflow Specification - Declarative YAML-like syntax for complex genomic experiments 🤖 AI-Powered Analysis - Built-in inference engine with machine learning capabilities 🧪 Workflow Execution Engine - Execute design and optimize blocks with intelligent plugin dispatch 🔌 Advanced Plugin System - Extensible interfaces for generators, optimizers, and AI models 🌐 Web Interface - Modern web platform for interactive workflow creation and execution ⚡ High Performance - Optimized for large-scale genomic data processing with intelligent caching 🔒 Secure & Robust - Comprehensive security features and error handling
🚀 Quick Start
Installation
```bash
Basic installation
pip install -e .
With all features
pip install -e .[full]
Optional extras
pip install -e .[apps] # Demo applications with Gradio pip install -e .[ml] # Machine learning capabilities pip install -e .[server] # Web server and API ```
Your First GFL Workflow
```python from gfl.api import parse, validate, execute
Define a protein design workflow with AI-powered generation
workflow = """ metadata: experimentid: PROTEINDESIGN001 researcher: Dr. Jane Smith project: therapeuticproteins
design: entity: ProteinSequence model: ProteinVAEGenerator objective: maximize: stability target: therapeuticprotein constraints: - length(50, 150) - synthesizability > 0.8 - stabilityscore > 0.7 count: 10 output: designed_proteins
optimize: searchspace: temperature: range(25, 42) concentration: range(10, 100) strategy: name: BayesianOptimization objective: maximize: expressionlevel budget: maxexperiments: 25 run: experiment: tool: proteinexpression type: validation params: proteins: designed_proteins temp: ${temperature} conc: ${concentration} """
Parse, validate, and execute
ast = parse(workflow) errors = validate(ast) print(f"Validation: {'✅ Passed' if not errors else '❌ Failed'}")
Execute complete workflow with plugin dispatch
result = execute(ast) print(f"Generated {result['design']['count']} protein candidates") print(f"Best experimental conditions: {result['optimize']['best_parameters']}") ```
📚 Documentation
🌐 Complete Documentation - Full user guide, tutorials, and API reference
Quick Links
- 🚀 Getting Started - Installation and setup guide
- 🎯 Tutorial - Step-by-step learning guide
- 🔧 API Reference - Complete API documentation
- 🌐 Web Platform - Web interface guide
- 🤖 AI Features - Machine learning capabilities
- 🔒 Security - Security guidelines and best practices
- 🔌 Plugin Ecosystem - Advanced plugin system and workflow execution
- 🎯 Language Features - Design and optimize block documentation
- 🧪 Workflow Examples - Complete workflow examples with AI integration
🧪 Advanced AI-Driven Workflows
GeneForgeLang now supports intelligent experimental design with AI-powered plugins:
Design Block - Biological Entity Generation
yaml
design:
entity: ProteinSequence # or DNA, RNA, SmallMolecule
model: ProteinVAEGenerator # AI plugin for generation
objective:
maximize: binding_affinity
target: SARS_CoV2_RBD
constraints:
- length(100, 200)
- synthesizability > 0.8
- stability_score > 0.7
count: 50
output: therapeutic_candidates
`
Optimize Block - Intelligent Parameter Search
yaml
optimize:
search_space:
temperature: range(25, 42) # Continuous parameters
duration: choice([6, 12, 24]) # Discrete choices
concentration: range(10, 100)
strategy:
name: BayesianOptimization # AI optimization strategy
uncertainty_metric: entropy
objective:
maximize: editing_efficiency
budget:
max_experiments: 100
max_time: 48h
run:
experiment:
tool: CRISPR_cas9
params:
temp: ${temperature} # Parameter injection
conc: ${concentration}
dur: ${duration}h
`
Key Features:
- ✨ AI-Powered Generation - VAE, GAN, Transformer models for biological design
- 🤖 Intelligent Optimization - Bayesian, evolutionary, and reinforcement learning
- 🔄 Parameter Injection - Dynamic parameter substitution with ${...} syntax
- 🔗 Workflow Integration - Seamless combination of design and optimization
- 📊 Real-time Monitoring - Live tracking of experimental campaigns
🎉 GFL v1.0.0 Release Highlights
GeneForgeLang v1.0.0 introduces major enhancements that make it the most powerful and extensible version yet:
Advanced AI Workflow Syntax
- Active Learning Optimization: Enhanced optimize blocks with Active Learning strategy support
- Inverse Design: Extended design blocks for inverse design workflows
- Data Refinement: New refine_data blocks for data processing workflows
- Guided Discovery: New guided_discovery blocks that combine design and optimization
IO Contracts System
- Data Integrity: IO contracts ensure data compatibility between workflow blocks
- Static Validation: Compile-time checking of data flow between blocks
- Type Safety: Strong typing for genomic data with built-in validation
Type System & Schema Registry
- Extensible Types: Define custom data types in external schema files
- Schema Imports: Import type definitions with
import_schemasdirective - Custom Validation: Validate data against user-defined schemas
🌍 Industrial & Research Applications
🧬 Genomics Research - CRISPR Design - Automated guide RNA design and off-target prediction - RNA-seq Analysis - Differential expression and pathway analysis workflows - Variant Analysis - SNP/INDEL interpretation and clinical annotation - Protein Studies - Structure prediction and interaction analysis
🏥 Clinical Applications
- Diagnostic Pipelines - Automated variant interpretation workflows
- Pharmacogenomics - Drug response prediction based on genetic profiles
- Cancer Genomics - Somatic mutation analysis and treatment recommendations
- Rare Disease - Comprehensive genomic analysis for rare disorders
🌱 Agricultural & Industrial
- Crop Improvement - Gene editing workflows for enhanced traits
- Bioengineering - Synthetic biology pipeline automation
- Quality Control - Genomic validation and testing workflows
📦 Core Components
🔌 Advanced Plugin System
- Generator Plugins - AI models for biological entity creation (proteins, DNA, molecules)
- Optimizer Plugins - Intelligent algorithms for parameter space exploration
- Prior Plugins - Bayesian integration for enhanced experimental design
- Plugin Registry - Automatic discovery and lifecycle management
- Extensible Interfaces - Standard contracts for seamless integration
🧪 Workflow Execution Engine
- Design Block Execution - Automated dispatch to appropriate AI generators
- Optimize Block Execution - Intelligent experimental loops with parameter injection
- State Management - Persistent workflow variables and execution history
- Error Recovery - Comprehensive error handling and recovery mechanisms
- Real-time Monitoring - Live tracking of workflow execution progress
🔭 Language Core
- Parser - YAML-like DSL with stable, JSON-serializable AST
- Validator - Semantic validation with customizable rules
- Interpreter - Efficient AST execution with plugin support
- Type System - Strong typing for genomic entities and operations
🤖 AI & Machine Learning
- Inference Engine - Built-in ML models for genomic prediction
- Natural Language - Convert English descriptions to GFL workflows
- Model Integration - Support for custom models and external APIs
- Probabilistic Reasoning - Likelihood-based decision making
🌐 Web Platform
- Interactive Interface - Modern web UI for workflow creation
- REST API - Complete RESTful API for programmatic access
- Real-time Execution - Live workflow execution and monitoring
- Collaboration Tools - Share and collaborate on workflows
🔌 Extension System
- Advanced Plugin Interfaces - GeneratorPlugin, OptimizerPlugin, PriorsPlugin
- Intelligent Dispatch - Automatic plugin discovery and execution
- Plugin Ecosystem - Community-driven plugin development and sharing
- Dependency Management - Automatic dependency resolution and validation
- Lifecycle Hooks - Plugin loading, activation, and cleanup events
🔧 CLI Tools
GeneForgeLang provides powerful command-line tools for workflow management:
```
Parse and validate workflows
gfl-parse workflow.gfl gfl-validate workflow.gfl
Execute complete workflows with AI plugins
gfl-execute workflow.gfl gfl-plugins --list
Run inference and analysis
gfl-inference workflow.gfl gfl-enhanced workflow.gfl
Start web server and API
gfl-server --port 8000 gfl-api --host 0.0.0.0
Launch web interface
gfl-web
Get system information
gfl-info ```
🌐 Web Applications
Interactive Translator
Convert natural language descriptions to GFL workflows:
bash
python applications/translator_app/app.py
Features: - 🗣️ Natural language to GFL conversion - ✅ Real-time validation and syntax checking - 🤖 AI-powered workflow optimization - 📊 Interactive visualization and analysis
Web Platform
Full-featured web interface for genomic workflow management:
bash
gfl-web --port 8080
Access at: http://localhost:8080
📦 Repository Structure
GeneForgeLang/
├── gfl/ # Core library
│ ├── api.py # Public API with execute() function
│ ├── parser.py # YAML parser
│ ├── validator.py # Semantic validation
│ ├── execution_engine.py # NEW: Workflow execution engine
│ ├── inference_engine.py # AI inference
│ ├── web_interface.py # Web platform
│ └── plugins/ # NEW: Advanced plugin system
│ ├── interfaces.py # Plugin interface definitions
│ ├── example_implementations.py # Reference plugins
│ └── plugin_registry.py # Plugin discovery and management
├── applications/ # Demo applications
├── docs/ # Documentation source
│ ├── features/ # NEW: Feature-specific documentation
│ ├── PLUGIN_ECOSYSTEM.md # NEW: Plugin development guide
│ └── PHASE_3_PLUGIN_ECOSYSTEM_SUMMARY.md # NEW: Implementation summary
├── examples/ # Example workflows and projects
│ ├── gfl-genesis/ # Advanced example project
│ │ ├── genesis.gfl # Main workflow definition
│ │ ├── plugins/ # Custom plugins
│ │ ├── schemas/ # Schema definitions
│ │ └── docs/ # Project documentation
│ └── ... # Simple examples
├── tests/ # Test suite
│ ├── test_new_features.py # NEW: 24 regression tests
│ └── test_plugin_interfaces.py # NEW: Plugin interface tests
└── integrations/ # External integrations
🔒 Security & Quality
- ✅ Comprehensive Testing - 50+ tests including 24 new feature regression tests
- ✅ Plugin Ecosystem Testing - Complete test coverage for AI workflow execution
- 🔒 Security Scanning - Automated security analysis with Bandit
- 🧙 Code Quality - Enforced with Ruff, Black, and MyPy
- 🔄 Continuous Integration - Automated testing on multiple Python versions
- 📄 Documentation - Comprehensive docs with plugin ecosystem guides
🛣️ API Stability
- Public API -
gfl.apimodule provides stable interface for all operations - AST Format - Dictionary-based AST with guaranteed backward compatibility
- Plugin Interface - Well-defined plugin system for extending functionality
- Semantic Versioning - Clear versioning strategy for API changes
🚀 Performance
- Optimized Parsing - Fast YAML processing with minimal overhead
- Efficient Validation - Incremental validation with early error detection
- Scalable Execution - Support for large-scale genomic datasets
- Memory Efficient - Optimized memory usage for large workflows
🌍 Community & Support
- 📚 Documentation - Comprehensive user guides and API reference
- 🐛 Issues - Bug reports and feature requests
- 💬 Discussions - Community support and Q&A
- 🔄 Contributing - Guidelines for contributing to the project
🗺️ Roadmap
🔄 Current Version (v0.1.0)
- ✅ Core language implementation
- ✅ Web interface and API
- ✅ AI-powered inference engine
- ✅ Plugin system
- ✅ Comprehensive documentation
🔮 Upcoming Features
- 🔄 Enhanced ML Models - Advanced genomic prediction models
- 🔌 More Integrations - Support for popular bioinformatics tools
- 🌐 Cloud Deployment - Docker and Kubernetes support
- 📈 Analytics Dashboard - Workflow monitoring and metrics
- 🛠️ Visual Editor - Drag-and-drop workflow creation
🤝 Contributing
We welcome contributions from the genomics and bioinformatics community!
How to Contribute
- 🍿 Fork the repository
- 🌱 Create a feature branch (
git checkout -b feature/amazing-feature) - ✨ Make your changes with tests
- ✅ Test your changes (
pytest tests/) - 📝 Commit your changes (
git commit -m 'Add amazing feature') - 🚀 Push to the branch (
git push origin feature/amazing-feature) - 🎉 Open a Pull Request
Read the Contributing Guide for detailed instructions.
Development Setup
```
Clone the repository
git clone https://github.com/Fundacion-de-Neurociencias/GeneForgeLang.git cd GeneForgeLang
Install in development mode
pip install -e .[full]
Install pre-commit hooks
pre-commit install
Run tests
pytest tests/ ```
📜 Citation
If you use GeneForgeLang in your research, please cite:
@software{geneforgelang2025,
title={GeneForgeLang: A Domain-Specific Language for Genomic Workflows},
author={GeneForgeLang Development Team},
year={2025},
url={https://github.com/Fundacion-de-Neurociencias/GeneForgeLang},
version={0.1.0}
}
📚 Publications
Scientific Papers Using GeneForgeLang
Accelerating Complex Genomic Design Tasks: AI-Guided gRNA Optimization for TP53 with GeneForgeLang Menendez Gonzalez, M. (2025). Preprints. https://doi.org/10.20944/preprints202509.0193.v1 This preprint demonstrates how GeneForgeLang was used to optimize guide RNA design for TP53 gene editing, showcasing the language's capabilities in real-world genomic research applications.
GeneForgeLang (GFL): A Symbolic Language for Rational Bio-Design and Clinical Genomic Engineering Fundación de Neurociencias. (2025). Zenodo. https://doi.org/10.5281/zenodo.15493559 This whitepaper introduces GeneForgeLang as a symbolic language for representing, analyzing, and simulating biomolecular processes with clarity and logical reasoning, particularly suited for AI interaction and therapeutic prototyping.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🚀 Quick Links
| Resource | Link | |----------|------| | 📚 Documentation | fundacion-de-neurociencias.github.io/GeneForgeLang | | 🐛 Issues | GitHub Issues | | 💬 Discussions | GitHub Discussions | | 🔄 CI/CD | GitHub Actions | | 📈 Releases | GitHub Releases |
Owner
- Name: Fundación de Neurociencias
- Login: Fundacion-de-Neurociencias
- Kind: organization
- Email: admin@fneurociencias.org
- Location: Spain
- Website: fneurociencias.org
- Twitter: fneurociencias
- Repositories: 1
- Profile: https://github.com/Fundacion-de-Neurociencias
Fighting the effects of neurologic and psychiatric conditions
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
type: software
title: "GeneForgeLang: A Domain-Specific Language for Genomic Workflows"
authors:
- family-names: "Fundación de Neurociencias"
given-names: "Research Team"
email: "research@fundacion-neurociencias.org"
website: "https://fundacion-neurociencias.org"
repository-code: "https://github.com/Fundacion-de-Neurociencias/GeneForgeLang"
url: "https://fundacion-de-neurociencias.github.io/GeneForgeLang/"
abstract: >-
GeneForgeLang (GFL) is a domain-specific language designed for specifying,
validating, and reasoning about genomic workflows and experiments. It enables
structured representation of biological protocols using a YAML-like syntax,
facilitating automation, reproducibility, and integration with AI/ML models
in genomics research.
keywords:
- genomics
- domain-specific-language
- bioinformatics
- workflow-specification
- YAML
- machine-learning
- probabilistic-reasoning
- CRISPR
- gene-editing
license: MIT
version: "1.0.0"
date-released: "2024-08-31"
preferred-citation:
type: article
title: "GeneForgeLang: A Domain-Specific Language for Reproducible Genomic Workflows with AI Integration"
authors:
- family-names: "Fundación de Neurociencias"
given-names: "Research Team"
year: 2024
journal: "bioRxiv"
doi: "10.1101/2024.08.31.geneforgelang"
url: "https://github.com/Fundacion-de-Neurociencias/GeneForgeLang"
GitHub Events
Total
- Watch event: 1
- Member event: 1
- Push event: 118
- Pull request review event: 1
- Fork event: 1
- Create event: 7
Last Year
- Watch event: 1
- Member event: 1
- Push event: 118
- Pull request review event: 1
- Fork event: 1
- Create event: 7
Dependencies
- gradio ==3.50.2
- torch *
- transformers *
- Jinja2 ==3.1.6
- MarkupSafe ==3.0.2
- certifi ==2025.6.15
- charset-normalizer ==3.4.2
- filelock ==3.18.0
- fsspec ==2025.5.1
- idna ==3.10
- mpmath ==1.3.0
- networkx ==3.5
- nvidia-cublas-cu12 ==12.6.4.1
- nvidia-cuda-cupti-cu12 ==12.6.80
- nvidia-cuda-nvrtc-cu12 ==12.6.77
- nvidia-cuda-runtime-cu12 ==12.6.77
- nvidia-cudnn-cu12 ==9.5.1.17
- nvidia-cufft-cu12 ==11.3.0.4
- nvidia-cufile-cu12 ==1.11.1.6
- nvidia-curand-cu12 ==10.3.7.77
- nvidia-cusolver-cu12 ==11.7.1.2
- nvidia-cusparse-cu12 ==12.5.4.2
- nvidia-cusparselt-cu12 ==0.6.3
- nvidia-nccl-cu12 ==2.26.2
- nvidia-nvjitlink-cu12 ==12.6.85
- nvidia-nvtx-cu12 ==12.6.77
- ply ==3.11
- requests ==2.32.4
- setuptools ==80.9.0
- sympy ==1.14.0
- torch ==2.7.1
- triton ==3.3.1
- typing_extensions ==4.14.1
- urllib3 ==2.5.0