https://github.com/berndsaurugger/muzero-general
MuZero
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary
Last synced: 7 months ago
·
JSON representation
Repository
MuZero
Basic Info
- Host: GitHub
- Owner: BerndSaurugger
- License: mit
- Default Branch: master
- Homepage: https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation
- Size: 7.09 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of werner-duvaud/muzero-general
Created over 3 years ago
· Last pushed over 3 years ago
https://github.com/BerndSaurugger/muzero-general/blob/master/
-929292)   [](https://github.com/psf/black)  [](https://discord.gg/GB2vwsF)  # MuZero General A commented and [documented](https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation) implementation of MuZero based on the Google DeepMind [paper](https://arxiv.org/abs/1911.08265) (Schrittwieser et al., Nov 2019) and the associated [pseudocode](https://arxiv.org/src/1911.08265v2/anc/pseudocode.py). It is designed to be easily adaptable for every games or reinforcement learning environments (like [gym](https://github.com/openai/gym)). You only need to add a [game file](https://github.com/werner-duvaud/muzero-general/tree/master/games) with the hyperparameters and the game class. Please refer to the [documentation](https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation) and the [example](https://github.com/werner-duvaud/muzero-general/blob/master/games/cartpole.py). This implementation is primarily for educational purpose.\ [Explanatory video of MuZero](https://youtu.be/We20YSAJZSE) MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to [AlphaZero](https://arxiv.org/abs/1712.01815) but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions. MuZero is also close to [Value prediction networks](https://arxiv.org/abs/1707.03497). See [How it works](https://github.com/werner-duvaud/muzero-general/wiki/How-MuZero-works). ## Features * [x] Residual Network and Fully connected network in [PyTorch](https://github.com/pytorch/pytorch) * [x] Multi-Threaded/Asynchronous/[Cluster](https://docs.ray.io/en/latest/cluster-index.html) with [Ray](https://github.com/ray-project/ray) * [X] Multi GPU support for the training and the selfplay * [x] TensorBoard real-time monitoring * [x] Model weights automatically saved at checkpoints * [x] Single and two player mode * [x] Commented and [documented](https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation) * [x] Easily adaptable for new games * [x] [Examples](https://github.com/werner-duvaud/muzero-general/blob/master/games/cartpole.py) of board games, Gym and Atari games (See [list of implemented games](https://github.com/werner-duvaud/muzero-general#games-already-implemented)) * [x] [Pretrained weights](https://github.com/werner-duvaud/muzero-general/tree/master/results) available * [ ] Windows support (Experimental / Workaround: Use the [notebook](https://github.com/werner-duvaud/muzero-general/blob/master/notebook.ipynb) in [Google Colab](https://colab.research.google.com)) ### Further improvements Here is a list of features which could be interesting to add but which are not in MuZero's paper. We are open to contributions and other ideas. * [x] [Hyperparameter search](https://github.com/werner-duvaud/muzero-general/wiki/Hyperparameter-Optimization) * [x] [Continuous action space](https://github.com/werner-duvaud/muzero-general/tree/continuous) * [x] [Tool to understand the learned model](https://github.com/werner-duvaud/muzero-general/blob/master/diagnose_model.py) * [ ] Batch MCTS * [ ] Support of more than two player games ## Demo All performances are tracked and displayed in real time in [TensorBoard](https://www.tensorflow.org/tensorboard) :  Testing Lunar Lander :  ## Games already implemented * Cartpole (Tested with the fully connected network) * Lunar Lander (Tested in deterministic mode with the fully connected network) * Gridworld (Tested with the fully connected network) * Tic-tac-toe (Tested with the fully connected network and the residual network) * Connect4 (Slightly tested with the residual network) * Gomoku * Twenty-One / Blackjack (Tested with the residual network) * Atari Breakout Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome. ## Code structure  Network summary: ## Getting started ### Installation ```bash git clone https://github.com/werner-duvaud/muzero-general.git cd muzero-general pip install -r requirements.lock ``` ### Run ```bash python muzero.py ``` To visualize the training results, run in a new terminal: ```bash tensorboard --logdir ./results ``` ### Config You can adapt the configurations of each game by editing the `MuZeroConfig` class of the respective file in the [games folder](https://github.com/werner-duvaud/muzero-general/tree/master/games). ## Related work * [EfficientZero](https://arxiv.org/abs/2111.00210) (Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao) * [Sampled MuZero](https://arxiv.org/abs/2104.06303) (Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, David Silver) ## Authors * Werner Duvaud * Aurle Hainaut * Paul Lenoir * [Contributors](https://github.com/werner-duvaud/muzero-general/graphs/contributors) Please use this bibtex if you want to cite this repository (master branch) in your publications: ```bash @misc{muzero-general, author = {Werner Duvaud, Aurle Hainaut}, title = {MuZero General: Open Reimplementation of MuZero}, year = {2019}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/werner-duvaud/muzero-general}}, } ``` ## Getting involved * [GitHub Issues](https://github.com/werner-duvaud/muzero-general/issues): For reporting bugs. * [Pull Requests](https://github.com/werner-duvaud/muzero-general/pulls): For submitting code contributions. * [Discord server](https://discord.gg/GB2vwsF): For discussions about development or any general questions.
Owner
- Login: BerndSaurugger
- Kind: user
- Repositories: 1
- Profile: https://github.com/BerndSaurugger