rl2d

A 2d scenario for rl demonstration

https://github.com/fsn9/rl2d

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

A 2d scenario for rl demonstration

Basic Info
  • Host: GitHub
  • Owner: Fsn9
  • Language: Python
  • Default Branch: main
  • Size: 141 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme Citation

README.md

rl2d

This code launches a Python Tkinter interface displaying a learning process of a Q-learning agent in a 2D grid-world.

Two navigational tasks are available for learning, each with its own environment: * Learning to reach a point in an empty world; * Learning to reach a point in an obstacle scattered world.


Run the simulation

To run with default parameters, enter in the terminal: python run_rl2d.py

rl2d_gridworld


Parameters

Q-learning:

  • --learning_rate, default=$0.1$. The learning rate $\alpha \in [0,1]$.
  • --discount_factor, default=$0.99$. The discount factor $\gamma \in [0,1]$. In the extreme, for $\gamma=1$ we have a long-term view agent. For $\gamma=0$ we have a myopic agent.
  • --episodes, default=4000. The number of learning episodes $e$.
  • --initial_epsilon, default=1. The initial epsilon $\epsilon_i$ is the exploration probability in the beggining of the learning process. An $\epsilon = 1$ represents a totally random agent. An $\epsilon = 0$ represents a totally greedy agent.
  • --final_epsilon, default=0.05. The value of the final exploration probability $\epsilon_f$.

Environment:

  • --env_type, default='empty'. The type of the environment $e_t \in {empty, obstacle}$.
  • --env_dim, default=5. The environment dimension $e_d \in [3,9]$.
  • --num_obstacles, default=2. The number of obstacles $n_o \ge 0$ in the environment.

Results

After training, a timestamped folder runs/run-<year>_<month>_<day>_<hour>_<minute>_<second>/ is created with results from the training and evaluation procedure: * The trajectories per evaluation scene * Metric plots * Average cummulative reward * Average steps * Ending causes (e.g., collision or goal reaching)

Owner

  • Name: Francisco Neves
  • Login: Fsn9
  • Kind: user
  • Location: Porto

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: Francisco Neves
title: "rl2d: Learning to navigate in a 2D world using RL"
version: 1.0
date-released: 2022-10-10

GitHub Events

Total
Last Year