https://github.com/evan-wang-13/deep-racing

Self-driving racecar using reinforcement learning (proximal policy optimization) in PyTorch

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.1%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Self-driving racecar using reinforcement learning (proximal policy optimization) in PyTorch

Basic Info

Host: GitHub
Owner: evan-wang-13
License: mit
Default Branch: master
Homepage:
Size: 11.3 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Fork of JinayJain/deep-racing

Created about 2 years ago · Last pushed about 4 years ago

https://github.com/evan-wang-13/deep-racing/blob/master/

# Self-Driving Racecar with Proximal Policy Optimization

Solving the OpenAI Gym [CarRacing-v0](https://gym.openai.com/envs/CarRacing-v0) environment using Proximal Policy Optimization.

Read the [full report](extra/report.pdf).

## Demo

![Video Demo](extra/demo.gif)

See the full video demo on [YouTube](https://youtu.be/s1uKkmNiNhM).

## Results

After 5000 training steps, the agent achieves a mean score of 909.4810.30 over 100 episodes. To reproduce the results, run the following commands:

```
mkdir logs
python demo.py --ckpt extra/final_weights.pt --delay_ms 0
```

Results from episodes will be saved to `logs/episode_rewards.csv`.

## Implementation Details

-   A convolutional neural network to jointly approximate the value function and the policy.
-   Optimization is performed using [Proximal Policy Optimization](https://arxiv.org/abs/1707.06347).
-   Policy network outputs parameters to a Beta distribution, [which is better for bounded continuous action spaces](https://proceedings.mlr.press/v70/chou17a/chou17a.pdf).
-   Advantage estimation is done through the [Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438) algorithm.
-   A series of 4 frames are concatenated to form the input to the network, with frame skipping optionally applied.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/evan-wang-13/deep-racing

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/evan-wang-13/deep-racing/blob/master/

Owner

GitHub Events

Total

Last Year