https://github.com/chychen/agents

Efficient Batched Reinforcement Learning in TensorFlow

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Efficient Batched Reinforcement Learning in TensorFlow

Basic Info

Host: GitHub
Owner: chychen
License: apache-2.0
Language: Python
Default Branch: master
Homepage:
Size: 149 KB

Statistics

Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Releases: 0

Fork of google-research/batch-ppo

Created about 8 years ago · Last pushed about 8 years ago

https://github.com/chychen/agents/blob/master/



TensorFlow Agents
=================

This project provides optimized infrastructure for reinforcement learning. It
extends the [OpenAI gym interface][post-gym] to multiple parallel environments
and allows agents to be implemented in TensorFlow and perform batched
computation. As a starting point, we provide BatchPPO, an optimized
implementation of [Proximal Policy Optimization][post-ppo].

Please cite the [TensorFlow Agents paper][paper-agents] if you use code from
this project in your research:

```bibtex
@article{hafner2017agents,
  title={TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow},
  author={Hafner, Danijar and Davidson, James and Vanhoucke, Vincent},
  journal={arXiv preprint arXiv:1709.02878},
  year={2017}
}
```

Dependencies: Python 2/3, TensorFlow 1.3+, Gym, ruamel.yaml

[paper-agents]: https://arxiv.org/pdf/1709.02878.pdf
[post-gym]: https://blog.openai.com/openai-gym-beta/
[post-ppo]: https://blog.openai.com/openai-baselines-ppo/

Instructions
------------

Clone the repository and run the PPO algorithm by typing:

```shell
python3 -m agents.scripts.train --logdir=/path/to/logdir --config=pendulum
```

The algorithm to use is defined in the configuration and `pendulum` started
here uses the included PPO implementation. Check out more pre-defined
configurations in `agents/scripts/configs.py`.

If you want to resume a previously started run, add the `--timestamp=`
flag to the last command and provide the timestamp in the directory name of
your run.

To visualize metrics start TensorBoard from another terminal, then point your
browser to `http://localhost:2222`:

```shell
tensorboard --logdir=/path/to/logdir --port=2222
```

To render videos and gather OpenAI Gym statistics to upload to the scoreboard,
type:

```shell
python3 -m agents.scripts.visualize --logdir=/path/to/logdir/- --outdir=/path/to/outdir/
```

Modifications
-------------

We release this project as a starting point that makes it easy to implement new
reinforcement learning ideas. These files are good places to start when
modifying the code:

| File | Content |
| ---- | ------- |
| `scripts/configs.py` | Experiment configurations specifying the tasks and algorithms. |
| `scripts/networks.py` | Neural network models. |
| `scripts/train.py` | The executable file containing the training setup. |
| `algorithms/ppo/ppo.py` | The TensorFlow graph for the PPO algorithm. |

To run unit tests and linting, type:

```shell
python2 -m unittest discover -p "*_test.py"
python3 -m unittest discover -p "*_test.py"
python3 -m pylint agents
```

For further questions, please open an issue on Github.

Implementation
--------------

We include a batched interface for OpenAI Gym environments that fully integrates
with TensorFlow for efficient algorithm implementations. This is achieved
through these core components:

- **`agents.tools.wrappers.ExternalProcess`** is an environment wrapper that
  constructs an OpenAI Gym environment inside of an external process. Calls to
  `step()` and `reset()`, as well as attribute access, are forwarded to the
  process and wait for the result. This allows to run multiple environments in
  parallel without being restricted by Python's global interpreter lock.
- **`agents.tools.BatchEnv`** extends the OpenAI Gym interface to batches of
  environments. It combines multiple OpenAI Gym environments, with `step()`
  accepting a batch of actions and returning a batch of observations, rewards,
  done flags, and info objects. If the individual environments live in external
  processes, they will be stepped in parallel.
- **`agents.tools.InGraphBatchEnv`** integrates a batch environment into the
  TensorFlow graph and makes its `step()` and `reset()` functions accessible as
  operations. The current batch of observations, last actions, rewards, and done
  flags is stored in variables and made available as tensors.
- **`agents.tools.simulate()`** fuses the step of an in-graph batch environment
  and a reinforcement learning algorithm together into a single operation to be
  called inside the training loop. This reduces the number of session calls and
  provides a simple way to train future algorithms.

To understand all the code, please make yourself familiar with TensorFlow's
control flow operations, especially [`tf.cond()`][tf-cond],
[`tf.scan()`][tf-scan], and
[`tf.control_dependencies()`][tf-control-dependencies].

[tf-cond]: https://www.tensorflow.org/api_docs/python/tf/cond
[tf-scan]: https://www.tensorflow.org/api_docs/python/tf/scan
[tf-control-dependencies]: https://www.tensorflow.org/api_docs/python/tf/control_dependencies

Disclaimer
----------

This is not an official Google product.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/chychen/agents

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/chychen/agents/blob/master/

Owner

GitHub Events

Total

Last Year