GridWorlds

Help! I'm lost in the flatland!

https://github.com/juliareinforcementlearning/gridworlds.jl

Keywords

grid-world gridworld gridworld-environment hacktoberfest julia makie reinforcement-learning

Keywords from Contributors

deep-q-network deep-reinforcement-learning numerical flux interpretability standardization animal hack

Last synced: 11 months ago · JSON representation ·

Repository

Help! I'm lost in the flatland!

Basic Info

Host: GitHub
Owner: JuliaReinforcementLearning
License: mit
Language: Julia
Default Branch: master
Homepage:
Size: 34.7 MB

Statistics

Stars: 47
Watchers: 7
Forks: 9
Open Issues: 13
Releases: 7

Topics

grid-world gridworld gridworld-environment hacktoberfest julia makie reinforcement-learning

Created almost 6 years ago · Last pushed about 3 years ago

Metadata Files

Readme License Citation

GridWorlds

A package for creating grid world environments for reinforcement learning in Julia. This package is designed to be lightweight and fast.

This package is inspired by gym-minigrid. In order to cite this package, please refer to the file CITATION.bib. Starring the repository on GitHub is also appreciated. For benchmarks, refer to benchmarks/benchmarks.md.

List of Environments 1. SingleRoomUndirected 1. SingleRoomDirected 1. GridRoomsUndirected 1. GridRoomsDirected 1. SequentialRoomsUndirected 1. SequentialRoomsDirected 1. MazeUndirected 1. MazeDirected 1. GoToTargetUndirected 1. GoToTargetDirected 1. DoorKeyUndirected 1. DoorKeyDirected 1. CollectGemsUndirected 1. CollectGemsDirected 1. CollectGemsMultiAgentUndirected 1. DynamicObstaclesUndirected 1. DynamicObstaclesDirected 1. SokobanUndirected 1. SokobanDirected 1. Snake 1. Catcher 1. TransportUndirected 1. TransportDirected

Getting Started

```julia import GridWorlds as GW

Each environment `Env` lives in its own module `EnvModule`

For example, the `SingleRoomUndirected` environment lives inside the `SingleRoomUndirectedModule` module

env = GW.SingleRoomUndirectedModule.SingleRoomUndirected()

reset the environment. All environments are randomized

GW.reset!(env)

get names of actions that can be performed in this environment

GW.getactionnames(env)

perform actions in the environment

GW.act!(env, 1) # move up GW.act!(env, 2) # move down GW.act!(env, 3) # move left GW.act!(env, 4) # move right

play an environment interactively inside the terminal

GW.play!(env)

play and record the interaction in a file called recording.txt

GW.play!(env, file_name = "recording.txt")

manually step through the frames in the recording

GW.replay(file_name = "recording.txt")

replay the recording inside the terminal at a given frame rate

GW.replay(filename = "recording.txt", framerate = 2)

use the RLBase API

import ReinforcementLearningBase as RLBase

wrap a game instance from this package to create an RLBase compatible environment

rlbase_env = GW.RLBaseEnv(env)

perform RLBase operations on the wrapped environment

RLBase.reset!(rlbaseenv) state = RLBase.state(rlbaseenv) actionspace = RLBase.actionspace(rlbaseenv) reward = RLBase.reward(rlbaseenv) done = RLBase.isterminated(rlbaseenv)

rlbaseenv(1) # move up rlbaseenv(2) # move down rlbaseenv(3) # move left rlbaseenv(4) # move right ```

Notes

Reinforcement Learning

This package does not intend to reinvent a fully usable reinforcement learning API. Instead, all the games in this package provide the bare minimum of what is needed to for the game logic, which is the ability to reset an environment using GW.reset!(env) and to perform actions in the environment using GW.act!(env, action). In order to utilize such a game for reinforcement learning, you would probably be using a higher level reinforcement learning API like the one offered by the ReinforcementLearning.jl package (RLBase API), for example. As of this writing, all the environments provide a default implementation for the RLBase API, which means that you can easily wrap a game from GridWorlds.jl and use it directly with the rest of the ReinforcementLearning.jl ecosystem.

States

There are a few possible options for representing the state/observation for an environment. You can use the entire tile map. You can also augment that with other environment specific information like the agent's direction, target (in GoToTargetUndirected) etc. In several games, you can also use the GW.get_sub_tile_map! function to get a partial view of the tile map to be used as the observation.

All environemnts provide a default implementation of the RLBase.state function. It is recommended that before performing reinforcement learning experiments using an environment, you carefully understand the information contained in the state representation for that environment.
Actions

As of this writing, all actions in all environments are discrete. And so, to keep things simple and consistent, they are represented by elements of Base.OneTo(NUM_ACTIONS) (basically integers going from 1 to NUMACTIONS). In order to know which action does what, you can call `GW.getaction_names(env)` to get a list of names which gives a better description. For example:

```julia julia> env = GW.SingleRoomUndirectedModule.SingleRoomUndirected();

julia> GW.getactionnames(env) (:MOVEUP, :MOVEDOWN, :MOVELEFT, :MOVERIGHT) ```

The order of elements in this list corresponds to that of the actions.
Rewards and Termination

As mentioned before, in order to use these for reinforcement learning experiments, you would mostly be using a higher level API like RLBase, which should already provide a way to get these values. For example, in RLBase, rewards can be accessed using RLBase.reward(env) and checking whether an environment has terminated or not can by done by calling RLBase.is_terminated(env). In case you are using some other API and need more direct control, it is better to take a look at the implementation for that environment to access things like reward and check for termination.

Tile Map

Each environment contains a tile map, which is a BitArray{3} that encodes information about the presence or absence of objects in the grid world. It is of size (num_objects, height, width). The second and third dimensions correspond to positions along the height and width of the tile map. The first dimension corresponds to the presence or absence of objects at a particular position using a multi-hot encoding along the first dimension. You can get the name and ordering of objects along the first dimension of the tile map by using the following method:

```julia julia> env = GW.SingleRoomUndirectedModule.SingleRoomUndirected();

julia> GW.getobjectnames(env) (:AGENT, :WALL, :GOAL) ```

Navigation

Several environments contain the word Undirected or Directed within their name. This refers to the navigation style of the agent. Undirected means that the agent has no direction associated with it, and navigates around by directly moving up, down, left, or right on the tile map. Directed means that the agent has a direction associated with it, and it navigates around by moving forward or backward along its current direction, or it could also turn left or right with respect to its current direction. There are 4 directions - UP, DOWN, LEFT, and RIGHT.

Interactive Playing and Recording

All the environments can be played directly inside the REPL. These interactive sessions can also be recorded in plain text files and replayed in the terminal. There are two ways to replay a recording: 1. The default way is to manually step through each recorded frame. This allows you to move through the frames one by one at your own pace using keyboard inputs. 1. The second way is to replay the frames at a given frame rate. This would loop through all the frames once and then (and only then) exit the replay.

Here is an example:

Programmatic Recording of Agent's Behavior

In order to programmatically record the behavior of an agent during an episode, you can simply log the string representation of the environment at each step prefixed with a delimiter. You can also log other arbitrary information if you want, like the total reward so far, for example. You can then use the GW.replay functiton to replay the recording inside the terminal. The string representation of an environment can be obtained using repr(MIME"text/plain"(), env). Here is an example:

```julia import GridWorlds as GW import ReinforcementLearningBase as RLBase

game = GW.SingleRoomUndirectedModule.SingleRoomUndirected() env = GW.RLBaseEnv(game) framestartdelimiter = "SOMEFRAMESTART_DELIMITER"

totalreward = zero(RLBase.reward(env)) framenumber = 1

str = ""

str = str * framestartdelimiter str = str * "framenumber: $(framenumber)\n" str = str * repr(MIME"text/plain"(), env) str = str * "\ntotalreward: $(totalreward)"

while !RLBase.isterminated(env) action = rand(RLBase.actionspace(env)) env(action) reward = RLBase.reward(env)

global total_reward += reward
global frame_number += 1

global str = str * frame_start_delimiter
global str = str * "frame_number: $(frame_number)\n"
global str = str * repr(MIME"text/plain"(), env)
global str = str * "\ntotal_reward: $(total_reward)"

end

write("recording.txt", str)

GW.replay(filename = "recording.txt", framestartdelimiter = framestart_delimiter) ```

In ReinforcementLearning.jl, you can create a hook for recording the agent's behavior at any point during training.

List of Environments

SingleRoomUndirected

The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
SingleRoomDirected

The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
GridRoomsUndirected

The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
GridRoomsDirected

The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
SequentialRoomsUndirected

The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
SequentialRoomsDirected

The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
MazeUndirected

The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
MazeDirected

The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
GoToTargetUndirected

The objective of the agent is to navigate its way to the desired target. When the agent reaches the desired target, it receives a reward of 1. When the agent reaches the other target, it receives a reward of -1. In either case, the environment terminates upon reaching a target.
GoToTargetDirected

The objective of the agent is to navigate its way to the desired target. When the agent reaches the desired target, it receives a reward of 1. When the agent reaches the other target, it receives a reward of -1. In either case, the environment terminates upon reaching a target.
DoorKeyUndirected

The objective of the agent is to collect the key and navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates. Without picking up the key, the agent will not be able to pass through the door that separtes the agent and goal.
DoorKeyDirected

The objective of the agent is to collect the key and navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates. Without picking up the key, the agent will not be able to pass through the door that separtes the agent and goal.
CollectGemsUndirected

The objective of the agent is to collect all the randomly scattered gems. When the agent collects a gem, it receives a reward of 1. The environment terminates when the agent has collected all the gems.
CollectGemsDirected

The objective of the agent is to collect all the randomly scattered gems. When the agent collects a gem, it receives a reward of 1. The environment terminates when the agent has collected all the gems.
CollectGemsMultiAgentUndirected

The objective of the agents is to collect all the randomly scattered gems. The agents take turns for performing actions. When an agent collects a gem, the environment gives a reward of 1. The environment terminates when the agents have collected all the gems.
DynamicObstaclesUndirected

The objective of the agent is to navigate its way to the goal while avoiding collision with obstacles. When the agent reaches the goal, it receives a reward of 1 and the environment terminates. If the agent collides with an obstacle, the agent receives a reward of -1 and the environment terminates.
DynamicObstaclesDirected

The objective of the agent is to navigate its way to the goal while avoiding collision with obstacles. When the agent reaches the goal, it receives a reward of 1 and the environment terminates. If the agent collides with an obstacle, the agent receives a reward of -1 and the environment terminates.
SokobanUndirected

The agent needs to push the boxes onto the target positions. The levels are taken from https://github.com/deepmind/boxoban-levels. Upon each reset, a level is randomly selected from https://github.com/deepmind/boxoban-levels/blob/master/medium/train/000.txt. The level dataset can be dynamically swapped during runtime in case more levels are needed. One way to achieve this while using ReinforcementLearning.jl is with the help of hooks.
SokobanDirected

The agent needs to push the boxes onto the target positions. The levels are taken from https://github.com/deepmind/boxoban-levels. Upon each reset, a level is randomly selected from https://github.com/deepmind/boxoban-levels/blob/master/medium/train/000.txt. The level dataset can be dynamically swapped during runtime in case more levels are needed. One way to achieve this while using ReinforcementLearning.jl is with the help of hooks.
Snake

The objective of the agent is to eat as many food pellets as possible. As soon as the agent eats a food pellet, the length of its body incrases by one and it receives a reward of 1. When the agent tries to move into a wall or into its body, it receives a reward of - tile_map_height * tile_map_width and the environment terminates. When the agent collects all the food pellets possible, it receives a reward of tile_map_height * tile_map_width + 1 (for the last food pellet it ate).
Catcher

The objective of the agent is to keep catching the falling gems for as long as possible. It receives a reward of 1 when it catches a gem and a new gem gets spawned in the next step. When the agent misses catching a gem, it receives a reward of -1 and the environment terminates.
TransportUndirected

The objective of the agent is to pick up the gem and drop it to the target location. When the agent drops the gem at the target location, it receives a reward of 1 and the environment terminates.
TransportDirected

The objective of the agent is to pick up the gem and drop it to the target location. When the agent drops the gem at the target location, it receives a reward of 1 and the environment terminates.

Owner

Name: JuliaReinforcementLearning
Login: JuliaReinforcementLearning
Kind: organization

Website: https://juliareinforcementlearning.org/
Repositories: 34
Profile: https://github.com/JuliaReinforcementLearning

A collection of tools for reinforcement learning research in Julia

Citation (CITATION.bib)

@misc{Bhatia2020GridWorlds,
  author={Bhatia, Siddharth and Tian, Jun and contributors},
  title={GridWorlds.jl: A package for creating grid worlds for reinforcement learning in Julia},
  year=2020,
  url={https://github.com/JuliaReinforcementLearning/GridWorlds.jl}
}

GitHub Events

Total

Last Year

Committers

Last synced: about 1 year ago

All Time

Total Commits: 141
Total Committers: 9
Avg Commits per committer: 15.667
Development Distribution Score (DDS): 0.262

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Sid-Bhatia-0	3****0	104
Sriram	s**k@g**m	11
Jun Tian	f**y@f**m	8
github-actions[bot]	4****]	7
Jun Tian	t**p@g**m	4
Jun Tian	j**i@m**m	2
Raj Ghugare	r**t@g**m	2
Sriyash Poddar	4****1	2
Ben Landrum	3****b	1

Committer Domains (Top 20 + Academic)

microsoft.com: 1 foxmail.com: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 20
Total pull requests: 81
Average time to close issues: 3 months
Average time to close pull requests: 2 days
Total issue authors: 4
Total pull request authors: 6
Average comments per issue: 2.6
Average comments per pull request: 1.16
Merged pull requests: 73
Bot issues: 0
Bot pull requests: 3

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Sid-Bhatia-0 (13)
findmyway (5)
almostintuitive (1)
JuliaTagBot (1)

Pull Request Authors

Sid-Bhatia-0 (74)
github-actions[bot] (3)
kharyal (1)
sriyash421 (1)
LooseTerrifyingSpaceMonkey (1)
findmyway (1)

Top Labels

Issue Labels

new-environment (4) bug (1) hacktoberfest (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- julia 7 total

Total dependent packages: 1
Total dependent repositories: 0
Total versions: 7

juliahub.com: GridWorlds

Help! I'm lost in the flatland!

Documentation: https://docs.juliahub.com/General/GridWorlds/stable/
License: MIT
Latest release: 0.5.0
published almost 5 years ago

Versions: 7
Dependent Packages: 1
Dependent Repositories: 0
Downloads: 7 Total

Rankings

Dependent repos count: 9.9%

Stargazers count: 15.0%

Average: 16.0%

Forks count: 16.2%

Dependent packages count: 23.0%

Last synced: 11 months ago

GridWorlds

Science Score: 18.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

GridWorlds

Table of contents:

Getting Started

Each environment Env lives in its own module EnvModule

For example, the SingleRoomUndirected environment lives inside the SingleRoomUndirectedModule module

reset the environment. All environments are randomized

get names of actions that can be performed in this environment

perform actions in the environment

play an environment interactively inside the terminal

play and record the interaction in a file called recording.txt

manually step through the frames in the recording

replay the recording inside the terminal at a given frame rate

use the RLBase API

wrap a game instance from this package to create an RLBase compatible environment

perform RLBase operations on the wrapped environment

Notes

Reinforcement Learning

States

Actions

Rewards and Termination

Tile Map

Navigation

Interactive Playing and Recording

Programmatic Recording of Agent's Behavior

List of Environments

SingleRoomUndirected

SingleRoomDirected

GridRoomsUndirected

GridRoomsDirected

SequentialRoomsUndirected

SequentialRoomsDirected

MazeUndirected

MazeDirected

GoToTargetUndirected

GoToTargetDirected

DoorKeyUndirected

DoorKeyDirected

CollectGemsUndirected

CollectGemsDirected

CollectGemsMultiAgentUndirected

DynamicObstaclesUndirected

DynamicObstaclesDirected

SokobanUndirected

SokobanDirected

Snake

Catcher

TransportUndirected

TransportDirected

Owner

Citation (CITATION.bib)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

juliahub.com: GridWorlds

Rankings

Each environment `Env` lives in its own module `EnvModule`

For example, the `SingleRoomUndirected` environment lives inside the `SingleRoomUndirectedModule` module