rl2d

A 2d scenario for rl demonstration

https://github.com/fsn9/rl2d

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

A 2d scenario for rl demonstration

Basic Info

Host: GitHub
Owner: Fsn9
Language: Python
Default Branch: main
Size: 141 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme Citation

rl2d

This code launches a Python Tkinter interface displaying a learning process of a Q-learning agent in a 2D grid-world.

Two navigational tasks are available for learning, each with its own environment: * Learning to reach a point in an empty world; * Learning to reach a point in an obstacle scattered world.

Run the simulation

To run with default parameters, enter in the terminal: python run_rl2d.py

rl2d_gridworld

Parameters

Q-learning:

--learning_rate, default=$0.1$. The learning rate $\alpha \in [0,1]$.
--discount_factor, default=$0.99$. The discount factor $\gamma \in [0,1]$. In the extreme, for $\gamma=1$ we have a long-term view agent. For $\gamma=0$ we have a myopic agent.
--episodes, default=4000. The number of learning episodes $e$.
--initial_epsilon, default=1. The initial epsilon $\epsilon_i$ is the exploration probability in the beggining of the learning process. An $\epsilon = 1$ represents a totally random agent. An $\epsilon = 0$ represents a totally greedy agent.
--final_epsilon, default=0.05. The value of the final exploration probability $\epsilon_f$.

Environment:

--env_type, default='empty'. The type of the environment $e_t \in {empty, obstacle}$.
--env_dim, default=5. The environment dimension $e_d \in [3,9]$.
--num_obstacles, default=2. The number of obstacles $n_o \ge 0$ in the environment.

Results

After training, a timestamped folder runs/run-<year>_<month>_<day>_<hour>_<minute>_<second>/ is created with results from the training and evaluation procedure: * The trajectories per evaluation scene * Metric plots * Average cummulative reward * Average steps * Ending causes (e.g., collision or goal reaching)

Owner

Name: Francisco Neves
Login: Fsn9
Kind: user
Location: Porto

Repositories: 2
Profile: https://github.com/Fsn9

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: Francisco Neves
title: "rl2d: Learning to navigate in a 2D world using RL"
version: 1.0
date-released: 2022-10-10

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

rl2d

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

rl2d

Run the simulation

Parameters

Q-learning:

Environment:

Results

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year