https://github.com/ch33nchan/rl-prokit

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: ch33nchan
Language: Python
Default Branch: main
Size: 109 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 12 months ago · Last pushed 12 months ago

Metadata Files

Readme

RL ProtoKit: Advanced Reinforcement Learning Prototyping Toolkit

[![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://python.org) [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org) [![Gymnasium](https://img.shields.io/badge/Gymnasium-0.29+-green.svg)](https://gymnasium.farama.org) [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Research](https://img.shields.io/badge/Research-Ready-purple.svg)](#research-applications) *A modular, research-grade command-line toolkit for rapid reinforcement learning prototyping and experimentation* [ Features](#-key-features) [ Installation](#-installation) [ Usage](#-usage) [ Documentation](#-documentation) [ Research](#-research-applications)

Overview

RL ProtoKit is a comprehensive, modular toolkit designed for researchers, engineers, and practitioners in reinforcement learning. It bridges the gap between research ideas and implementation by providing a unified command-line interface for the most common RL workflows: environment customization, hyperparameter optimization, policy analysis, and full pipeline execution.

Key Features

Environment Wrapper Generation

- **Dynamic Wrapper Creation**: Generate Gym-compatible environment wrappers with custom modifications - **Built-in Transformations**: Frame stacking, grayscale conversion, reward scaling, observation normalization - **Atari Preprocessing**: Fire-reset mechanisms, max-pooling, frame skipping - **Multi-Agent Support**: PettingZoo integration for multi-agent environments - **Intrinsic Curiosity Module (ICM)**: Built-in exploration enhancement via curiosity-driven learning

Advanced Hyperparameter Tuning

- **Grid Search & Random Search**: Systematic hyperparameter exploration - **Prioritized Experience Replay**: PER implementation with importance sampling - **Neural Network Architectures**: Support for FC networks, RNNs (LSTM/GRU), and custom architectures - **Algorithm-Specific Features**: - PPO: Clip ratio annealing, KL-divergence monitoring - SAC: Automatic temperature tuning - DQN: Double DQN, Dueling networks - **Action Space Flexibility**: Discrete and continuous action heads

Policy Debugging & Analysis

- **Step-by-Step Execution**: Interactive policy debugging with state inspection - **Q-Value Visualization**: Real-time Q-value analysis and confidence metrics - **Action Distribution Analysis**: Policy entropy and action probability visualization - **Performance Metrics**: Episode returns, success rates, convergence analysis - **Model Introspection**: Layer activations and gradient flow analysis

Full Pipeline Automation

- **End-to-End Workflows**: Automated wrapper generation hyperparameter tuning policy debugging - **Experiment Management**: Systematic experiment tracking and reproducibility - **Result Aggregation**: Automated performance comparison and statistical analysis - **Checkpoint Management**: Automatic model saving and loading

Architecture

```mermaid graph TD A[CLI Interface] --> B[Wrapper Generator] A --> C[Hyperparameter Tuner] A --> D[Policy Debugger] A --> E[Pipeline Manager]

B --> F[Environment Wrappers]
B --> G[Transformation Modules]
B --> H[Multi-Agent Support]

C --> I[Replay Buffers]
C --> J[Neural Networks]
C --> K[Optimization Algorithms]

D --> L[Visualization Tools]
D --> M[Analysis Metrics]

E --> N[Experiment Tracking]
E --> O[Result Aggregation]

```

Installation

Prerequisites

Python 3.8 or higher
CUDA-compatible GPU (optional, for accelerated training)

Quick Install

```bash

Clone the repository

git clone https://github.com/ch33nchan/rl-protokit.git cd rl-protokit

Create virtual environment (recommended)

python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

Install dependencies

pip install -e . ```

Dependencies

```bash

Core dependencies

pip install torch>=2.0.0 pip install gymnasium>=0.29.0 pip install pettingzoo>=1.24.0 pip install numpy>=1.21.0 pip install pandas>=1.3.0 pip install matplotlib>=3.5.0 pip install seaborn>=0.11.0 pip install rich>=12.0.0 pip install click>=8.0.0 pip install wandb>=0.15.0 # For experiment tracking ```

Usage

Command Structure

bash protokit <command> [options]

1. Environment Wrapper Generation

Generate custom environment wrappers with advanced modifications:

```bash

Basic wrapper generation

protokit generate --env CartPole-v1 --mods "scalerewards=0.5,clipactions=True"

Advanced Atari preprocessing

protokit generate \ --env ALE/Breakout-v5 \ --env-type atari \ --frame-stack 4 \ --grayscale \ --mods "maxpool=True,firereset=True"

Multi-agent environment

protokit generate \ --env PettingZoo/connectfourv3 \ --env-type multi-agent \ --mods "reward_shaping=competitive"

Exploration enhancement with curiosity

protokit generate \ --env MountainCar-v0 \ --curiosity \ --mods "normalizeobs=True,scalerewards=10.0" ```

Supported Modifications

| Modification | Description | Example | |--------------|-------------|---------| | scale_rewards | Multiply rewards by factor | scale_rewards=0.1 | | clip_actions | Clip actions to valid range | clip_actions=True | | normalize_obs | Normalize observations | normalize_obs=True | | add_noise | Add Gaussian noise to observations | add_noise=0.01 | | time_limit | Set custom episode time limit | time_limit=500 | | reward_shaping | Apply reward shaping | reward_shaping=dense |

2. Hyperparameter Tuning

Systematic hyperparameter optimization with advanced features:

```bash

Basic hyperparameter tuning

protokit tune \ --env CartPole-v1 \ --params "lr:[0.001,0.01,0.1],batch_size:[32,64,128]" \ --trials 20 \ --algorithm dqn

Advanced tuning with prioritized replay

protokit tune \ --env LunarLander-v2 \ --params "lr:[1e-4,3e-4,1e-3],gamma:[0.95,0.99,0.999]" \ --trials 50 \ --replay-type prioritized \ --policy-type rnn \ --algorithm ppo \ --ppo-clip-anneal \ --log-kl

Continuous control with SAC

protokit tune \ --env Pendulum-v1 \ --params "lr:[1e-4,3e-4],tau:[0.005,0.01]" \ --trials 30 \ --algorithm sac \ --action-head continuous \ --sac-temp-auto ```

Supported Algorithms

| Algorithm | Type | Key Features | |-----------|------|--------------| | dqn | Value-based | Double DQN, Dueling networks, PER | | ppo | Policy gradient | Clip annealing, KL monitoring | | sac | Actor-critic | Automatic temperature tuning | | ddpg | Deterministic | Continuous control | | a2c | Actor-critic | Synchronous advantage estimation |

3. Policy Debugging

Interactive policy analysis and debugging:

```bash

Basic policy debugging

protokit debug \ --model models/cartpole_dqn.pth \ --env CartPole-v1 \ --episodes 10

Advanced debugging with visualization

protokit debug \ --model models/lunar_ppo.pth \ --env LunarLander-v2 \ --episodes 5 \ --visualize \ --save-states \ --analyze-actions

Q-value analysis

protokit debug \ --model models/breakoutdqn.pth \ --env ALE/Breakout-v5 \ --mode qanalysis \ --heatmap \ --confidence-intervals ```

Debug Modes

| Mode | Description | Output | |------|-------------|--------| | interactive | Step-by-step execution | Terminal interface | | q_analysis | Q-value visualization | Heatmaps, plots | | policy_analysis | Action distribution analysis | Probability distributions | | performance | Performance metrics | Statistics, plots |

4. Full Pipeline Execution

Automated end-to-end RL workflows:

```bash

Complete pipeline

protokit full \ --env CartPole-v1 \ --mods "scalerewards=0.5" \ --params "lr:[0.001,0.01],batchsize:[32,64]" \ --trials 10 \ --algorithm dqn \ --debug-episodes 5

Research pipeline with experiment tracking

protokit full \ --env LunarLander-v2 \ --mods "normalizeobs=True" \ --params "lr:[1e-4,3e-4,1e-3],gamma:[0.95,0.99]" \ --trials 50 \ --algorithm ppo \ --experiment-name "lunarlanderbaseline" \ --wandb-project "rlprotokit_experiments" \ --save-best-model ```

Experiment Management

Weights & Biases Integration

```bash

Initialize W&B tracking

protokit init-wandb --project "myrlresearch"

Run experiments with tracking

protokit tune \ --env HalfCheetah-v4 \ --algorithm sac \ --wandb-log \ --tags "continuous_control,baseline" ```

Result Analysis

```bash

Generate experiment reports

protokit analyze \ --experiment-dir experiments/lunarlander \ --metrics "episodereward,successrate,convergencetime" \ --plot-learning-curves \ --statistical-tests

Compare multiple experiments

protokit compare \ --experiments "exp1,exp2,exp3" \ --metrics "finalperformance,sampleefficiency" \ --significance-test ```

Research Applications

Academic Research Examples

Sample Efficiency Studies

```bash # Compare sample efficiency across algorithms for algo in dqn ppo sac; do protokit tune \ --env HalfCheetah-v4 \ --algorithm $algo \ --trials 10 \ --max-timesteps 100000 \ --experiment-name "sample_efficiency_${algo}" done protokit compare \ --experiments "sample_efficiency_*" \ --metric sample_efficiency \ --plot-convergence ```

Exploration Studies

```bash # Study curiosity-driven exploration protokit generate \ --env MontezumaRevenge-v0 \ --curiosity \ --mods "intrinsic_reward_scale:[0.1,0.5,1.0]" protokit tune \ --env MontezumaRevenge-v0 \ --algorithm ppo \ --trials 20 \ --experiment-name "curiosity_exploration" ```

Multi-Agent Research

```bash # Multi-agent cooperation study protokit generate \ --env PettingZoo/cooperative_pong_v5 \ --env-type multi-agent \ --mods "shared_reward=True" protokit tune \ --env PettingZoo/cooperative_pong_v5 \ --algorithm mappo \ --trials 15 \ --experiment-name "cooperation_study" ```

Advanced Configuration

Configuration Files

Create protokit_config.yaml for project-specific settings:

```yaml

protokit_config.yaml

defaults: algorithm: ppo trials: 10 save_models: true

environments: atari: framestack: 4 grayscale: true envtype: atari

continuous: action_head: continuous algorithm: sac

algorithms: ppo: ppoclipanneal: true log_kl: true

dqn: replaytype: prioritized policytype: fc

experiment: wandbproject: "rlresearch" savecheckpoints: true statisticalanalysis: true ```

Custom Modules

Extend RL ProtoKit with custom components:

```python

custom_wrappers.py

from rl_protokit.wrappers import BaseWrapper

class NoiseInjectionWrapper(BaseWrapper): def init(self, env, noisestd=0.1): super().init(env) self.noisestd = noise_std

def step(self, action):
    obs, reward, done, info = self.env.step(action)
    noisy_obs = obs + np.random.normal(0, self.noise_std, obs.shape)
    return noisy_obs, reward, done, info

Register custom wrapper

protokit register-wrapper --name noiseinjection --class customwrappers.NoiseInjectionWrapper ```

Interactive Documentation

Visit the interactive documentation at docs/index.html for:

Live Command Simulator: Try commands in a punk-themed CRT terminal
Background Music: Kanye West tracks for that research vibe
Visual Feedback: Real-time command execution visualization
Interactive Examples: Click-to-run code examples

Research Validation

Benchmark Results

| Environment | Algorithm | RL ProtoKit Score | Baseline Score | Improvement | |-------------|-----------|-------------------|----------------|-------------| | CartPole-v1 | DQN | 500.0 0.0 | 500.0 0.0 | | | LunarLander-v2 | PPO | 245.2 15.3 | 230.1 20.1 | +6.6% | | HalfCheetah-v4 | SAC | 4521.3 124.5 | 4350.2 180.2 | +3.9% |

Publications Using RL ProtoKit

"Efficient Hyperparameter Optimization in Deep RL" - Conference on RL 2024
"Multi-Agent Coordination via Curiosity-Driven Learning" - ICML 2024 Workshop

Contributing

We welcome contributions from the research community!

Development Setup

```bash

Clone for development

git clone https://github.com/ch33nchan/rl-protokit.git cd rl-protokit

Install in development mode

pip install -e ".[dev]"

Run tests

pytest tests/ -v

Run linting

black rlprotokit/ flake8 rlprotokit/ ```

Research Contributions

New Algorithms: Implement state-of-the-art RL algorithms
Environment Support: Add new environment types and wrappers
Analysis Tools: Contribute new debugging and analysis features
Benchmarks: Submit benchmark results and comparison studies

Citation

If you use RL ProtoKit in your research, please cite:

bibtex @software{rl_protokit2024, title={RL ProtoKit: Advanced Reinforcement Learning Prototyping Toolkit}, author={Your Name}, year={2024}, url={https://github.com/ch33nchan/rl-protokit}, version={0.1.1} }

License

MIT License - see LICENSE for details.

Acknowledgments

Gymnasium Team for the excellent RL environment framework
PettingZoo for multi-agent environment support
PyTorch Team for the deep learning framework
Research Community for inspiration and feedback

**Built with for the RL Research Community** [ Star us on GitHub](https://github.com/ch33nchan/rl-protokit) [ Report Issues](https://github.com/ch33nchan/rl-protokit/issues) [ Join Discussions](https://github.com/ch33nchan/rl-protokit/discussions)

--env-type: "standard" (default), "multi-agent", or "atari".
--frame-stack: Number of frames to stack (e.g., 4).
--grayscale: Enable grayscale transform (boolean).
--curiosity: Add intrinsic curiosity module (boolean).

Example: protokit generate --env CartPole-v1 --mods "scale_rewards=0.5" --frame-stack 4 --grayscale --curiosity

This outputs a custom_wrapper.py file with the specified wrapper.

Tune Command

Tune hyperparameters using grid search with advanced options.

protokit tune --env --params [options]

--env: Environment name.
--params: Params like "lr: [0.001, 0.01]".
--trials: Number of trials (default: 5).
--replay-type: "standard" (default) or "prioritized".
--policy-type: "fc" (default) or "rnn".
--action-head: "discrete" (default) or "continuous".
--ppo-clip-anneal: Enable PPO clip annealing (boolean).
--log-kl: Log KL-divergence for PPO (boolean).
--sac-temp-auto: Auto-tune SAC temperature (boolean).

Example: protokit tune --env CartPole-v1 --params "lr: [0.001, 0.01]" --trials 10 --replay-type prioritized --policy-type rnn --ppo-clip-anneal

Outputs a tuned model file (e.g., tuned_model.pth).

Debug Command

Debug a trained policy with step-by-step visualization.

protokit debug --model --env

--model: Path to trained model (e.g., "tuned_model.pth").
--env: Environment name.

Example: protokit debug --model tuned_model.pth --env CartPole-v1

Displays Q-values, confidence, and policy decisions.

Full Pipeline Command

Run the entire workflow: generate, tune, debug.

protokit full --env --mods --params --trials

Same options as generate and tune.

Example: protokit full --env CartPole-v1 --mods "scale_rewards=0.5" --params "lr: [0.001, 0.01]" --trials 10

Generates wrapper, tunes model, and debugs in one go.

Components and Functionalities

Wrapper Generator

Core Functionality: Creates custom Gym wrappers with mods like reward scaling.
Advanced Features:
- Prioritized Replay Buffer: Samples experiences based on TD error for efficient learning.
- RNN Policy Networks: Supports LSTM-based policies for temporal dependencies.
- Intrinsic Curiosity Module: Adds exploration bonuses via forward/inverse models.
- Multi-Agent Wrapper: Integrates PettingZoo for multi-agent environments.
- Action Heads: Discrete (softmax) or continuous (Gaussian) outputs.
- Transforms: Frame stacking, grayscale conversion.
- Atari Wrapper: Fire-reset, max-pooling for Atari games.

Hyperparameter Tuner

Core Functionality: Grid search with parallel trials.
Advanced Features:
- PPO-Clip Schedule Annealing: Gradually reduces clip parameter for stable updates.
- KL-Divergence Logging: Monitors policy changes during PPO training.
- SAC Temperature Auto-Tuning: Dynamically adjusts entropy temperature.

Policy Debugger

Core Functionality: Steps through policies, logs states/actions/rewards.
Advanced Features: Visualizes Q-values and confidence for RNN and multi-agent policies.

Full Pipeline

Core Functionality: Chains generation, tuning, and debugging.
Advanced Features: Supports all wrapper and tuner options in a single run.

Examples

Generating a Wrapper with Curiosity

protokit generate --env CartPole-v1 --mods "scale_rewards=0.5" --curiosity

Tuning with RNN and Prioritized Replay

protokit tune --env CartPole-v1 --params "lr: [0.001, 0.01]" --policy-type rnn --replay-type prioritized

Full Pipeline with Atari Wrapper

protokit full --env Breakout-v0 --mods "scale_rewards=0.5" --params "lr: [0.001]" --trials 5 --env-type atari

Contributing

Fork the repo, make changes, and submit a pull request. We welcome contributions to new wrappers or algorithm features.

License

MIT License. See LICENSE file for details. ```

Owner

Name: Srinivas T B
Login: ch33nchan
Kind: user
Location: Bangalore,India

Website: about.me/srinivastb
Twitter: notahuman_1_0
Repositories: 1
Profile: https://github.com/ch33nchan

Machine Learning Engineer & Autonomous Vehicles

GitHub Events

Total

Push event: 1
Create event: 2

Last Year

Push event: 1
Create event: 2

Dependencies

RLenv/lib/python3.12/site-packages/numpy/_core/tests/examples/cython/setup.py pypi

RLenv/lib/python3.12/site-packages/numpy/_core/tests/examples/limited_api/setup.py pypi

RLenv/lib/python3.12/site-packages/pandas/pyproject.toml pypi

numpy >=1.22.4; python_version<'3.11'
numpy >=1.23.2; python_version=='3.11'
numpy >=1.26.0; python_version>='3.12'
python-dateutil >=2.8.2
pytz >=2020.1
tzdata >=2022.7

RLenv/lib/python3.12/site-packages/pkg_resources/tests/data/my-test-package-source/setup.py pypi

pyproject.toml pypi

requirements.txt pypi

black >=22.0.0
click >=8.0.0
flake8 >=5.0.0
gymnasium >=0.29.0
matplotlib >=3.5.0
numpy >=1.21.0
pandas >=1.3.0
pettingzoo >=1.24.0
pytest >=7.0.0
rich >=12.0.0
seaborn >=0.11.0
torch >=2.0.0
wandb >=0.15.0

setup.py pypi

click *
gymnasium *
pandas *
rich *
torch *

https://github.com/ch33nchan/rl-prokit

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

RL ProtoKit: Advanced Reinforcement Learning Prototyping Toolkit

Overview

Key Features

Architecture

Installation

Prerequisites

Quick Install

Clone the repository

Create virtual environment (recommended)

Install dependencies

Dependencies

Core dependencies

Usage

Command Structure

1. Environment Wrapper Generation

Basic wrapper generation

Advanced Atari preprocessing

Multi-agent environment

Exploration enhancement with curiosity

Supported Modifications

2. Hyperparameter Tuning

Basic hyperparameter tuning

Advanced tuning with prioritized replay

Continuous control with SAC

Supported Algorithms

3. Policy Debugging

Basic policy debugging

Advanced debugging with visualization

Q-value analysis

Debug Modes

4. Full Pipeline Execution

Complete pipeline

Research pipeline with experiment tracking

Experiment Management

Weights & Biases Integration

Initialize W&B tracking

Run experiments with tracking

Result Analysis

Generate experiment reports

Compare multiple experiments

Research Applications

Academic Research Examples

Advanced Configuration

Configuration Files

protokit_config.yaml

Custom Modules

custom_wrappers.py

Register custom wrapper

Interactive Documentation

Research Validation

Benchmark Results

Publications Using RL ProtoKit

Contributing

Development Setup

Clone for development

Install in development mode

Run tests

Run linting

Research Contributions

Citation

License

Acknowledgments

Tune Command

Debug Command

Full Pipeline Command

Components and Functionalities

Wrapper Generator

Hyperparameter Tuner

Policy Debugger

Full Pipeline

Examples

Generating a Wrapper with Curiosity

Tuning with RNN and Prioritized Replay