Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Keywords
Repository
Modular Extensible Reinforcement Learning Interface
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 3
Topics
Metadata Files
README.md
MERLIn
MERLIn short for modular extensible reinforcement learning interface, allows to easily define and run reinforcement learning experiments on top of PyTorch and Gym.
This project started as a homework assignment for a reinforcement learning module from my Master's studies. I made it public, hoping you find it useful or interesting.
Usage
0. Install
MERLIn uses poetry for dependency management.
To install all dependencies, run:
sh
poetry install
1. Configure experiments
Experiments can be defined as YAML files merged with the default
configuration before being passed into the main training loop. Parameters are
identical to the attributes of the Config class, and a table of all parameters is
given further down.
Example:
experiments/experiment_one.yaml
```yaml
maxepisodes: 1000 agentname: dueling_dqn alpha: 0.05 ```
This will train the agent dueling_dqn for 1000 episodes at a learning rate
alpha of 0.5, while all other parameters will fall back to their default values
as defined in the Config class.
Nested element definitions
Using the variants array, different flavors of the same base configuration can
be defined as objects in that array. The deeper nested parameter will overwrite those
higher up. Variants can be nested.
variants Example
```yaml
maxepisodes: 1000 variants: - {} - alpha: 0.01243 - maxepisodes: 333 variants: - gamma: 0.5 memorysize: 99000 - batchsize: 64 ```
The above configuration defines the following experiments:
max_episodes: 1000max_episodes: 1000andalpha: 0.01243max_episodes: 333,gamma: 0.5andmemory_size: 99000max_episodes: 333, andbatch_size: 64
2. Start training
After defining at least one experiment as described in the previous section, start training by simply invoking the following command:
poetry run train
Training in the background
To start training in the background, to allow training to proceed beyond the shell session, run the following script:
./scripts/traing_bg.sh
The script will also watch the generated log statements to provide continuous console output.
3. Results
Console output
During training, the following outputs are continuously logged to the console:
- episode index
- epsilon
- reward
- train loss
- episode steps
- total episode time
Special events like model saving or video recording will also be logged if they occur.
File output
Each experiment will generate a subfolder in the results/ directory. Within
that subfolder, the following files will be placed:
experiment.yaml: The exact parameters the experiment was run with- A log holding the training logs, as printed out to the console (see section before)
- Model checkpoints.
- Video files of selected episode runs.
- Images of the preprocessed state (optional).
Statistical Analysis
MERLIn will automatically conduct some crude statistical analysis of the experimental results post-training.
You can manually trigger the analysis by running: poetry run analyze <path/to/experiment/results>.
Analysis results will be written to a subfolder of the results directory analysis/.
Summarization
As of v1.0.0, the last 2,000 episodes (as a hard-coded assumption of plateauing) are used to compare different algorithms.
The statistical analysis will aggregate all runs of each variant and calculate the following:
- mean reward
- std reward
- lower bound of the confidence interval for mean reward
- mean steps
- std steps
Plottings
Line plots of rewards over episodes and histograms showing the reward distribution of all variants are produced.
Training Parameters
Below is an overview of the parameters to configure experiments.
| Parameter Name | Description | Optional | Default | |------------------------------|--------------------------------------------------------------------------------------------------|----------|--------------| | experiment | Unique id of the experiment. | No | | | variant | Unique id of the variant of an experiment. | No | | | run | Unique id of the run of a variant. | Yes | 0 | | runcount | The number of independent runs of an experiment. | Yes | 3 | | envname | The environment to be used. | Yes | 'pong' | | frameskip | The number of frames to skip per action. | Yes | 4 | | inputdim | The input dimension of the model. | Yes | 64 | | numstackedframes | The number of frames to stack. | Yes | 4 | | steppenalty | Penalty given to the agent per step. | Yes | 0.0 | | agentname | The agent to be used. | Yes | 'doubledqn' | | netname | The neural network to be used. | Yes | 'lineardeepnet' | | targetnetupdateinterval | The number of steps after which the target network should be updated. | Yes | 1024 | | episodes | The number of episodes to train for. | Yes | 5000 | | alpha | The learning rate of the agent. | Yes | 5e-6 | | epsilondecaystart | The episode to start epsilon decay on. | Yes | 1000 | | epsilonstep | The absolute value to decrease epsilon by per episode. | Yes | 1e-3 | | epsilonmin | The minimum epsilon value for epsilon-greedy exploration. | Yes | 0.1 | | gamma | The discount factor for future rewards. | Yes | 0.99 | | memorysize | The size of the replay memory. | Yes | 500,000 | | batchsize | The batch size for learning. | Yes | 32 | | modelsaveinterval | The number of steps after which the model should be saved. If None, model will be saved at the end of epoch only. | Yes | None | | videorecordinterval | Steps between video recordings. | Yes | 2500 | | savestateimg | Whether to take images during training. | Yes | False | | useamp | Whether to use automatic mixed precision. | Yes | True |
Extending Agents, Environments, and Neural Networks
MERLIn boasts itself of being modular and extensible, meaning you can quickly implement new agents, environments, and neural networks. So that you know, all you need to extend said objects is to derive a new class from the respective abstract base class and register it at the regarding registry.
Example: Implementing a new Neural Network
Create a new Python module, app/nets/new_net.py, holding a new class deriving from BaseNet.
You must provide a unique name via the name property.
```py from app.nets.basenet import BaseNet
class NewNet(BaseNet): @classmethod @property def name(cls) -> str: return "new_net" # give it a unique name here
def _define_net(
self, state_shape: tuple[int, int, int], num_actions: int
) -> nn.Sequential:
# your PyTorch network definition goes here
```
Add NewNet to the registry of neural networks in app/nets/__init__.py, to make it automatically available to the make_net factory function.
```py
...
net_registry = [ ... NewNet, # register here ]
...
```
That's it. That simple. From now on, you can use the new network in your experiment definitions:
```yaml
netname: newnet ```
Scripts
The application comes with several bash scripts to help conduct certain functions.
check_cuda.sh & watch_gpu
Print out information regarding the system's current CUDA installation and GPU usage for sanity-checking and troubleshooting.
install_atari.sh
Installs the Atari ROMs used by Gym into the virtual environment.
Sync scripts
Typically, you want to offload the training workload to a cloud virtual machine. In
In this regard, sync_up.sh will upload sources and experiments to that machine.
Afterward, the training results can be downloaded to your local system using
sync_down.sh.
A configuration-like connection data for both sync scripts is within the sync.cfg file.
Limitations
This project is now more of a didactic exercise rather than an attempt to topple
established reinforcement learning frameworks such as RLlib.
As of v1.0.0 the most crucial limitations of MERLIn stand as:
- Single environment implemented, namely
Pong. - Single class of agents implemented, namely variations of
DQN. - Statistical analysis is rudimentary and does not happen parallel to training.
Contributions welcome
If you like MERLIn and want to develop it further, feel free to fork and open any pull request. 🤓
Owner
- Name: Ben Felder
- Login: pykong
- Kind: user
- Location: Germany
- Company: Adeptus Mechanicus
- Website: https://resume.github.io/?pykong
- Repositories: 26
- Profile: https://github.com/pykong
Tech priest at the Adeptus Mechanicus, Biochemist, Pythonista, MLOps Guru at day, AI M.Sc. Student at night and Builder of Joyful Tools.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Felder" given-names: "Benjamin" title: "MERLIn - Modular Extensible Reinforcement Learning Interface" version: 1.2.0 date-released: 2023-09-09 url: "https://github.com/pykong/merlin"
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- pykong (5)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- 153 dependencies
- autorom *
- fastapi >=0.80
- gym *
- lightning 2.0.1
- loguru ^0.7.0
- moviepy ^1.0.3
- opencv-python *
- python >3.11.0,<3.12
- pyyaml ^6.0.1
- scipy ^1.10.1
- seaborn ^0.12.2
- statsmodels ^0.14.0
- torch 2.0.0
- triton *