https://github.com/aaronxu9/al_fep

Active Learning for Free Energy Perturbation in Molecular Virtual Screening

https://github.com/aaronxu9/al_fep

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Active Learning for Free Energy Perturbation in Molecular Virtual Screening

Basic Info
  • Host: GitHub
  • Owner: AaronXu9
  • License: other
  • Language: Python
  • Default Branch: main
  • Size: 22.5 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 12 months ago
Metadata Files
Readme Changelog Contributing License

README.md

AL-FEP: Active Learning for Free Energy Perturbation in Molecular Virtual Screening

CI/CD Pipeline License: MIT Python 3.9+ Code style: black

A comprehensive framework for applying active learning and reinforcement learning to molecular virtual screening, with a focus on FEP (Free Energy Perturbation) and docking oracles for target 7JVR (SARS-CoV-2 Main Protease).

Quick Start

Clone Repository

bash git clone https://github.com/yourusername/AL_FEP.git cd AL_FEP

Project Overview

This project implements: - Active Learning: Iterative molecular selection and evaluation - Reinforcement Learning: Agent-based molecular discovery - Multi-Oracle System: FEP, Docking, and ML-FEP evaluations - Target-Specific: Optimized for 7JVR protein target

Quick Start

1. Environment Setup

Create and activate the conda environment:

```bash

Create environment from environment.yml

conda env create -f environment.yml

Activate environment

conda activate al_fep

Verify installation

python -c "import rdkit; print('RDKit version:', rdkit.version)" ```

2. Project Structure

AL_FEP/ environment.yml # Conda environment specification requirements.txt # Additional pip requirements setup.py # Package installation config/ # Configuration files targets/ # Target-specific configs experiments/ # Experiment configurations src/ # Source code al_fep/ # Main package oracles/ # FEP, Docking, ML-FEP oracles active_learning/ # AL algorithms reinforcement/ # RL algorithms molecular/ # Molecular utilities utils/ # Common utilities data/ # Data directory targets/ # Target protein structures molecules/ # Molecular datasets results/ # Experiment results notebooks/ # Jupyter notebooks scripts/ # Standalone scripts tests/ # Unit tests

3. Target 7JVR Setup

The project is pre-configured for the 7JVR target. Key files: - config/targets/7jvr.yaml: Target-specific parameters - data/targets/7jvr/: Protein structures and binding site info - notebooks/01_7jvr_analysis.ipynb: Target analysis notebook

Oracle Systems

1. FEP Oracle

  • High-accuracy free energy calculations
  • GPU-accelerated simulations
  • AMBER/GROMACS integration

2. Docking Oracle

  • AutoDock Vina integration
  • Multiple conformer generation
  • Binding pose analysis

3. ML-FEP Oracle

  • Fast ML-based FEP predictions
  • Pre-trained on experimental data
  • Cost-effective screening

Active Learning Workflows

  1. Uncertainty Sampling: Select molecules with highest prediction uncertainty
  2. Query by Committee: Ensemble-based selection
  3. Expected Improvement: Optimize acquisition functions
  4. Diversity-Based: Ensure chemical space coverage

Reinforcement Learning Agents

  1. Molecular REINFORCE: Policy gradient for molecular generation
  2. Actor-Critic: Value-based molecular optimization
  3. PPO: Proximal policy optimization for stable training
  4. Multi-Objective: Balance multiple molecular properties

Usage Examples

Basic Active Learning Run

```python from alfep import ActiveLearningPipeline from alfep.oracles import FEPOracle, DockingOracle

Initialize oracles

feporacle = FEPOracle(target="7jvr") dockingoracle = DockingOracle(target="7jvr")

Setup active learning

alpipeline = ActiveLearningPipeline( oracles=[feporacle, dockingoracle], strategy="uncertaintysampling", budget=100 )

Run active learning loop

results = al_pipeline.run() ```

Reinforcement Learning Training

```python from alfep import RLAgent from alfep.environments import MolecularEnv

Setup environment

env = MolecularEnv(target="7jvr", oracle="ml_fep")

Initialize agent

agent = RLAgent(algorithm="ppo", env=env)

Train agent

agent.train(total_timesteps=100000) ```

Configuration

All experiments are configured via YAML files in config/: - Global settings in config/default.yaml - Target-specific in config/targets/7jvr.yaml - Experiment-specific in config/experiments/

Remote Deployment

GitHub Repository Setup

This project is ready for GitHub deployment with: - Git repository initialized - Comprehensive .gitignore for Python/scientific computing - GitHub Actions CI/CD pipeline - Pre-commit hooks for code quality - Issue and PR templates

Deploy to Remote Server

  1. Clone on remote server: bash git clone https://github.com/yourusername/AL_FEP.git cd AL_FEP

  2. Setup environment: bash conda env create -f environment.yml conda activate al_fep pip install -e .

  3. Run tests to verify: bash python -m pytest tests/ -v

For detailed deployment instructions, see DEPLOYMENT.md.

Development

Code Quality Tools

```bash

Install development dependencies

pip install -e ".[dev]"

Setup pre-commit hooks

pre-commit install

Run all quality checks

black src/ tests/ # Code formatting isort src/ tests/ # Import sorting
flake8 src/ tests/ # Linting mypy src/ # Type checking ```

Running Tests

bash pytest tests/ -v --cov=src/al_fep

Contributing

Please read CONTRIBUTING.md for guidelines on contributing to this project.

CI/CD Pipeline

The project includes a comprehensive GitHub Actions pipeline that: - Tests across Python 3.9, 3.10, 3.11 on Ubuntu and macOS - Runs linting, formatting, and type checking - Performs security vulnerability scanning - Builds and validates the package

License

MIT License - see LICENSE file for details.

Support

Acknowledgments

  • RDKit for molecular handling
  • OpenMM for molecular dynamics
  • AutoDock Vina for docking
  • PyTorch for machine learning

Citation

If you use this code in your research, please cite: bibtex @article{al_fep_2025, title={Active Learning and Reinforcement Learning for Molecular Virtual Screening}, author={Your Name}, journal={Journal of Chemical Information and Modeling}, year={2025} }

Owner

  • Name: Ao Xu
  • Login: AaronXu9
  • Kind: user

GitHub Events

Total
  • Push event: 5
  • Create event: 2
Last Year
  • Push event: 5
  • Create event: 2

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
  • conda-incubator/setup-miniconda v2 composite
environment.yml pypi
  • meeko *
  • mols2grid *
  • selfies *
  • stable-baselines3 *
  • wandb *
pyproject.toml pypi
requirements.txt pypi
  • biopandas >=0.4.0
  • chembl-webresource-client >=0.10.0
  • deepchem >=2.7.0
  • dgl >=1.0.0
  • dgllife >=0.3.0
  • fegrow >=1.0.0
  • meeko >=0.4.0
  • mol2vec >=0.1
  • mols2grid >=1.1.0
  • oddt >=0.7.0
  • plip >=2.2.0
  • prody >=2.4.0
  • prolif >=2.0.0
  • selfies >=2.1.0
  • stable-baselines3 >=2.0.0
  • wandb >=0.17.0
setup.py pypi