https://github.com/evan-wang-13/deep-racing

Self-driving racecar using reinforcement learning (proximal policy optimization) in PyTorch

https://github.com/evan-wang-13/deep-racing

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Self-driving racecar using reinforcement learning (proximal policy optimization) in PyTorch

Basic Info
  • Host: GitHub
  • Owner: evan-wang-13
  • License: mit
  • Default Branch: master
  • Homepage:
  • Size: 11.3 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of JinayJain/deep-racing
Created almost 2 years ago · Last pushed almost 4 years ago

https://github.com/evan-wang-13/deep-racing/blob/master/

# Self-Driving Racecar with Proximal Policy Optimization

Solving the OpenAI Gym [CarRacing-v0](https://gym.openai.com/envs/CarRacing-v0) environment using Proximal Policy Optimization.

Read the [full report](extra/report.pdf).

## Demo

![Video Demo](extra/demo.gif)

See the full video demo on [YouTube](https://youtu.be/s1uKkmNiNhM).

## Results

After 5000 training steps, the agent achieves a mean score of 909.4810.30 over 100 episodes. To reproduce the results, run the following commands:

```
mkdir logs
python demo.py --ckpt extra/final_weights.pt --delay_ms 0
```

Results from episodes will be saved to `logs/episode_rewards.csv`.

## Implementation Details

-   A convolutional neural network to jointly approximate the value function and the policy.
-   Optimization is performed using [Proximal Policy Optimization](https://arxiv.org/abs/1707.06347).
-   Policy network outputs parameters to a Beta distribution, [which is better for bounded continuous action spaces](https://proceedings.mlr.press/v70/chou17a/chou17a.pdf).
-   Advantage estimation is done through the [Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438) algorithm.
-   A series of 4 frames are concatenated to form the input to the network, with frame skipping optionally applied.

Owner

  • Login: evan-wang-13
  • Kind: user

GitHub Events

Total
Last Year