Recent Releases of d3rlpy
d3rlpy - Release v2.8.1
Bugfix
- Pin Gymnasium version at 1.0.0 to prevent version mismatch errors between
gymnasiumandgymnasium-robotics.
Enhancement
- maze2d datasets have been supported.
- Python
Published by takuseno about 1 year ago
d3rlpy - Release v2.8.0
New algorithms
Enhancement
- Health check is updated to check if PyTorch version is 2.5.0 or later.
- Shimmy version has been upgraded.
- Minari version has been upgraded.
Bugfix
- Model loading error caused by mismatched optimizer data has been fixed (thanks, @hasan-yaman )
- Fix
map_locationto support loading models trained with GPU onto CPU. - Fix Adroit dataset support.
- Python
Published by takuseno about 1 year ago
d3rlpy - Release v2.7.0
Breaking changes
Dependency
:warning: This release updates the following dependencies. - Python 3.9 or later - PyTorch v2.5.0 or later
OptimizerFactory
Import paths of OptimizerFactory has been changed from d3rlpy.models.OptimizerFactory to d3rlpy.optimizers.OptimizerFactory.
```py
before
optim = d3rlpy.models.AdamFactory()
after
optim = d3rlpy.optimizers.AdamFactory() ```
x2-3 speed up with CudaGraph and torch.compile
In this PR, d3rlpy supports CudaGraph and torch.compile to dramatically speed up training. You can just turn on this new feature by providing compile_graph option:
```py
import d3rlpy
enable CudaGraph and torch.compile
sac = d3rlpy.algos.SACConfig(compile_graph=True).create(device="cuda:0") ``` Here is some benchmark result with NVIDIA RTX4070:
| | v2.6.2 | v2.7.0 | |:-|-:|-:| | Soft Actor-Critic | 7.4 msec | 3.0 msec | | Conservative Q-Learning | 12.5 msec | 3.8 msec | | Decision Transformer | 8.9 msec | 3.4 msec |
Note that this feature can be only enabled if you use CUDA device.
Enhanced optimizer
Learning rate scheduler
This release adds LRSchedulerFactory that provides a learning rate scheduler to individual optimizer.
```py
import d3rlpy
optim = d3rlpy.optimizers.AdamFactory( lrscheduler=d3rlpy.optimizers.CosineAnnealingLRFactory(Tmax=1000000) ) ``` See an example here and docs here.
Gradient clipping
Now, clip_grad_norm option has been added to clip gradients by global norm.
```py
import d3rlpy
optim = d3rlpy.optimizers.AdamFactory(clipgradnorm=0.1) ```
SimBa encoder
This release adds SimBa architecture that allows us to scale models effectively. See the paper here.
See docs here.
Enhancement
- Gradients are now being tracked by loggers (thanks, @hasan-yaman)
Development
- Replace black, isort and pylint with Ruff.
scripts/formathas been removed.scripts/lintnow formats code styles too.
- Python
Published by takuseno over 1 year ago
d3rlpy - Release v2.6.2
This is an emergency update to resolve an issue caused by the new Gymnasium version v1.0.0. Additionally, d3rlpy internally checks versions of both Gym and Gymnasium to make sure that dependencies are correct.
- Python
Published by takuseno over 1 year ago
d3rlpy - Release v2.6.1
Bugfix
There has been an issue in data-parallel distributed training feature of d3rlpy. Each process doesn't correctly synchronize parameters. In this release, this issue has been fixed and the data-parallel distributed training is working properly. Please check the latest example script to see how to use it.
- Python
Published by takuseno over 1 year ago
d3rlpy - Release v2.6.0
New Algorithm
ReBRAC has been added to d3rlpy! Please check a reproduction script here.
Enhancement
- DeepMind Control support has been added. You can install dependencies by
d3rlpy install dm_control. Please check an example script here. use_layer_normoption has been added toVectorEncoderFactory.
Bugfix
- Fix return-to-go calculation for Decision Transformer.
- Fix custom model documentation.
- Python
Published by takuseno over 1 year ago
d3rlpy - Release v2.5.0
New Algorithm
Cal-QL has been added to d3rlpy in v2.5.0! Please check a reproduction script here. To support faithful reproduction, SparseRewardTransitionPicker has been also added, which is used in the reproduction script.
Custom Algorithm Example
One of the frequent questions is "How can I implement a custom algorithm on top of d3rlpy?". Now, the new example script has been added to answer this question. Based on this example, you can build your own algorithm while you can utilize a whole training pipeline provided by d3rlpy. Please check the script here.
Enhancement
- Exporting Decision Transformer models as TorchScript and ONNX has been implemented. You can use this feature via
save_policymethod in the same way as you use with Q-learning algorithms. - Tuple observation support has been added to PyTorch/ONNX export.
- Modified return-to-go calculation for Q-learning algorithms and skip this calculation if return-to-go is not necessary.
n_updatesoption has been added tofit_onlinemethod to control update-to-data (UTD) ratio.write_at_terminationoption has been added toReplayBuffer.
Bugfix
- Action scaling has been fixed for D4RL datasets.
- Default replay buffer creation at
fix_onlinemethod has been fixed.
- Python
Published by takuseno almost 2 years ago
d3rlpy - Release v2.4.0
Tuple observations
In v2.4.0, d3rlpy supports tuple observations. ```py import numpy as np import d3rlpy
observations = [np.random.random((1000, 100)), np.random.random((1000, 32))] actions = np.random.random((1000, 4)) rewards = np.random.random((1000, 1)) terminals = np.random.randint(2, size=(1000, 1)) dataset = d3rlpy.dataset.MDPDataset( observations=observations, actions=actions, rewards=rewards, terminals=terminals, ) ``` You can find an example script here
Enhancements
logging_stepsandlogging_strategyoptions have been added tofitandfit_onlinemethods (thanks, @claudius-kienle )- Logging with WanDB has been supported. (thanks, @claudius-kienle )
- Goal-conditioned envs in Minari have been supported.
Bugfix
- Fix errors for distributed training.
- OPE documentation has been fixed.
- Python
Published by takuseno about 2 years ago
d3rlpy - Release v2.3.0
Distributed data parallel training
Distributed data parallel training with multiple nodes and GPUs has been one of the most demanded feature. Now, it's finally available! It's extremely easy to use this feature.
Example: ```py
train.py
from typing import Dict
import d3rlpy
def main() -> None: # GPU version: # rank = d3rlpy.distributed.initprocessgroup("nccl") rank = d3rlpy.distributed.initprocessgroup("gloo") print(f"Start running on rank={rank}.")
# GPU version:
# device = f"cuda:{rank}"
device = "cpu:0"
# setup algorithm
cql = d3rlpy.algos.CQLConfig(
actor_learning_rate=1e-3,
critic_learning_rate=1e-3,
alpha_learning_rate=1e-3,
).create(device=device)
# prepare dataset
dataset, env = d3rlpy.datasets.get_pendulum()
# disable logging on rank != 0 workers
logger_adapter: d3rlpy.logging.LoggerAdapterFactory
evaluators: Dict[str, d3rlpy.metrics.EvaluatorProtocol]
if rank == 0:
evaluators = {"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}
logger_adapter = d3rlpy.logging.FileAdapterFactory()
else:
evaluators = {}
logger_adapter = d3rlpy.logging.NoopAdapterFactory()
# start training
cql.fit(
dataset,
n_steps=10000,
n_steps_per_epoch=1000,
evaluators=evaluators,
logger_adapter=logger_adapter,
show_progress=rank == 0,
enable_ddp=True,
)
d3rlpy.distributed.destroy_process_group()
if name == "main": main() ```
You need to use torchrun command to start training, which should be already installed once you install PyTorch.
$ torchrun \
--nnodes=1 \
--nproc_per_node=3 \
--rdzv_id=100 \
--rdzv_backend=c10d \
--rdzv_endpoint=localhost:29400 \
train.py
In this case, 3 processes will be launched and start training loop. DecisionTransformer-based algorithms also support this distributed training feature.
The example is also available here
Minari support (thanks, @grahamannett !)
Minari is an OSS library to provide a standard format of offline reinforcement learning datasets. Now, d3rlpy provides an easy access to this library.
You can install Minari via d3rlpy CLI.
$ d3rlpy install minari
Example: ```py import d3rlpy
dataset, env = d3rlpy.datasets.get_minari("antmaze-umaze-v0")
iql = d3rlpy.algos.IQLConfig( actorlearningrate=3e-4, criticlearningrate=3e-4, batchsize=256, weighttemp=10.0, maxweight=100.0, expectile=0.9, rewardscaler=d3rlpy.preprocessing.ConstantShiftRewardScaler(shift=-1), ).create(device="cpu:0")
iql.fit( dataset, nsteps=1000000, nstepsperepoch=100000, evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}, ) ```
Minimize redundant computes
From this version, calculation of some algorithms are optimized to remove redundant inference. Therefore, especially algorithms with dual optimization such as SAC and CQL became extremely faster than the previous version.
Enhancements
GoalConcatWrapperhas been added to support goal-conditioned environments.return_to_gohas been added toTransitionandTransitionMiniBatchMixedReplayBufferhas been added to sample two experiences from multiple buffers with arbitrary ratio.initial_temperaturesupports 0 atDiscreteSAC.
Bugfix
- Getting started page has been fixed.
- Python
Published by takuseno over 2 years ago
d3rlpy - Release v2.2.0
Algorithm
DiscreteDecisionTransformer, a Decision Transformer implementation for discrete action-space, has been finally implemented in v2.2.0! The reduction results with Atari 2600 are available here.
```py import d3rlpy
dataset, env = d3rlpy.datasets.get_cartpole()
dt = d3rlpy.algos.DiscreteDecisionTransformerConfig( batchsize=64, numheads=1, learningrate=1e-4, maxtimestep=1000, numlayers=3, positionencodingtype=d3rlpy.PositionEncodingType.SIMPLE, encoderfactory=d3rlpy.models.VectorEncoderFactory([128], excludelastactivation=True), observationscaler=d3rlpy.preprocessing.StandardObservationScaler(), contextsize=20, warmup_tokens=100000, ).create()
dt.fit( dataset, nsteps=100000, nstepsperepoch=1000, evalenv=env, evaltarget_return=500, ) ```
Enhancement
- Expose
action_sizeandaction_spaceoptions for manual dataset creation #338 FrameStackTrajectorySlicerhas been added.
Refactoring
- Typing check of
numpyis enabled. Some parts of codes differentiate data types of numpy arrays, which is checked by mypy.
Bugfix
- Device error at AWAC #341
- Invalid
batch.intervals#346- :warning: This fix is important to retain the performance of Q-learning algorithms since v1.1.1.
- Python
Published by takuseno over 2 years ago
d3rlpy - Release v2.1.0
Upgrade PyTorch to v2
From this version, d3rlpy requires PyTorch v2 (v1 still may partially work). To do this, the minimum Python version has been bumped to 3.8. This change allows d3rlpy to utilize more advanced features such as torch.compile in the upcoming releases.
Healthcheck
From this version, d3rlpy diagnoses dependency health automatically. In this version, the version of Gym is checked to make sure you have installed the correct version of Gym.
Gymnasium support
d3rlpy now supports Gymnasium as well as Gym. You can use it just same as Gym. Please check example for the further details.
d3rlpy install command
To make your life easier, d3rlpy provides d3rlpy install commands to install additional dependencies. This is the part of d3rlpy CLI. Please check docs for the further details.
$ d3rlpy install atari # Atari 2600 dependencies
$ d3rlpy install d4rl_atari # Atari 2600 + d4rl-atari dependencies
$ d3rlpy install d4rl # D4RL dependencies
Refactoring
In this version, the internal design has been refactored. The algorithm implementation and the way to assign models are mainly refactored. :warning: Because of this change, the previously saved models might be incompatible to load in this version.
Enhancement
- Added Jupyter Notebook for TPU on Google Colaboratory.
- Added
d3rlpy.notebook_utilsto provide utilities for Jupyter Notebook. - Updated notebook link #313 (thanks @asmith26 !)
Bugfix
- Fixed typo docstrings #316 (thanks @asmith26 !)
- Fixed docker build #311 (thanks @HassamSheikh !)
- Python
Published by takuseno over 2 years ago
d3rlpy - Release v2.0.4
Bugfix
- Fix DiscreteCQL loss metrics #298
- Fix
dumpReplayBuffer #299 - Fix
InitialStateValueEstimationEvaluator#301 - Fix rendering interface to match the latest Gym version #302
To the rendering fix, I recommend you reinstall d4rl-atari if you use it.
$ pip install -U git+https://github.com/takuseno/d4rl-atari
- Python
Published by takuseno over 2 years ago
d3rlpy - Release v2.0.3
An emergency patch to fix a bug of predict_value method #297 .
- Python
Published by takuseno over 2 years ago
d3rlpy - Release v2.0.2
The major update has been finally released! Since the start of the project, this project has earned almost 1K GitHub stars :star: , which is a great milestone of d3rlpy. In this update, there are many major changes.
Upgrade Gym version
From this version, d3rlpy only supports the latest Gym version 0.26.0. This change allows us to support Gymnasium in the future update.
Algorithm
Clear separation between configuration and algorithm
From this version, each algorithm (e.g. "DQN") has a config class (e.g. "DQNConfig"). This allows us to serialize and deserialize algorithms as described later.
py
dqn = d3rlpy.algos.DQNConfig(learning_rate=3e-4).create(device="cuda:0")
Decision Transformer
Decision Transformer is finally available! You can check reproduction code to see how to use it.
```py import d3rlpy
dataset, env = d3rlpy.datasets.get_pendulum()
dt = d3rlpy.algos.DecisionTransformerConfig( batchsize=64, learningrate=1e-4, optimfactory=d3rlpy.models.AdamWFactory(weightdecay=1e-4), encoderfactory=d3rlpy.models.VectorEncoderFactory( [128], excludelastactivation=True, ), observationscaler=d3rlpy.preprocessing.StandardObservationScaler(), rewardscaler=d3rlpy.preprocessing.MultiplyRewardScaler(0.001), contextsize=20, numheads=1, numlayers=3, warmupsteps=10000, maxtimestep=1000, ).create(device="cuda:0")
dt.fit( dataset, nsteps=100000, nstepsperepoch=1000, saveinterval=10, evalenv=env, evaltargetreturn=0.0, ) ```
Serialization
In this version, d3rlpy introduces a compact serialization, d3 format, that includes both hyperparameters and model parameters in a single file. This makes it possible for you to easily save checkpoints and reconstruct algorithms for evaluation and deployment.
```py import d3rlpy
dataset, env = d3rlpy.datasets.get_cartpole()
dqn = d3rlpy.algos.DQNConfig().create()
dqn.fit(dataset, n_steps=10000)
save as d3 file
dqn.save("model.d3")
reconstruct the exactly same DQN
newdqn = d3rlpy.loadlearnable("model.d3") ```
ReplayBuffer
From this version, there is no clear separation between ReplayBuffer and MDPDataset anymore. Instead, ReplayBuffer has unlimited flexibility to support any kinds of algorithms and experiments. Please check details at documentation.
- Python
Published by takuseno over 2 years ago
d3rlpy - Release v1.1.1
Benchmark
The benchmark results of IQL and NFQ have been added to d3rlpy-benchmarks. Plus, the results of the more random seeds up to 10 have been added to all algorithms. The benchmark results are more reliable now.
Documentation
- More descriptions have been added to
Finetuningtutorial page. Offline Policy Selectiontutorial page has been added
Enhancements
cloudpickleandGPUUtildependencies have been removed.- gaussian likelihood computation for MOPO becomes more mathematically right (thanks @tominku )
- Python
Published by takuseno over 3 years ago
d3rlpy - Release v1.1.0
MDPDataset
The timestep alignment is now exactly the same as D4RL: ```
observations = [o1, o2, ..., o_n]
observations = np.random.random((1000, 10))
actions = [a1, a2, ..., a_n]
actions = np.random.random((1000, 10))
rewards = [r(o1, a1), r(o2, a2), ...]
rewards = np.random.random(1000)
terminals = [t(o1, a1), t(o2, a2), ...]
terminals = ...
``
wherer(o, a)is the reward function andt(o, a)` is the terminal function.
The reason of this change is that the many users were confused with the difference between d3rlpy and D4RL. But, now it's aligned in the same way. This change might break your dataset.
Algorithms
- Neural Fitted Q-iteration (NFQ)
- https://link.springer.com/chapter/10.1007/11564096_32
Enhancements
- AWAC, CRR and IQL use a non-squashed gaussian policy function.
- The more tutorial pages have been added to the documentation.
- The software design page has been added to the documentation.
- The reproduction script for IQL has been added.
- The progress bar in online training is visually improved in Jupyter Notebook #161 (thanks, @aiueola )
- The nan checks have been added to
MDPDataset. - The
target_reduction_typeandbootstrapoptions have been removed.
Bugfix
- The unnecessary test conditions have been removed
- Typo in
dataset.pyxhas been fixed #167 (thanks, @zbzhu99 ) - The details of IQL implementation have been fixed.
- Python
Published by takuseno almost 4 years ago
d3rlpy - Release v1.0.0
It's proud to announce that v1.0.0 has been finally released! The first version was released in Aug 2020 under the support of the IPA MITOU program. At the first release, d3rlpy only supported a few algorithms and did not even support online training. After months of constructive feedbacks and insights from the users and the community, d3rlpy has been established as the first offline deep RL library with many online and offline algorithms support and unique features. The next chapter also starts towards the ambitious v2.0.0 today. Please stay tuned for the next announcement!
NeurIPS 2021 Offline RL Workshop
The workshop paper about d3rlpy has been presented at the NeurIPS 2021 Offline RL Workshop. URL: https://arxiv.org/abs/2111.03788
Benchmarks
The full benchmark results are finally available at d3rlpy-benchmarks.
Algorithms
- Implicit Q-Learning (IQL)
- https://arxiv.org/abs/2110.06169
Enhancements
deterministicoption is added tocollectmethodrollout_returnmetrics is added to online trainingrandom_stepsis added tofit_onlinemethod--saveoption is added tod3rlpyCLI commands (thanks, @pstansell )multiplieroption is added to reward normalizers- many reproduction scripts are added
policy_typeoption is added to BCget_atari_transitionfunction is added for the Atari 2600 offline benchmark procedure
Bugfix
- document fix (thanks, @araffin )
- Fix TD3+BC's actor loss function
- Fix gaussian noise for TD3 exploration
Roadmap towards v2.0.0
- Sophisticated config system using
dataclasses - Dump configuration and model parameters in a single file
- Change MDPDataset format to align with D4RL datasets
- Support large dataset
- Support tuple observation
- Support large-scale data-parallel offline training
- Support large-scale distributed online training
- Support Transformer architecture (e.g. Decision Transformer)
- Speed up training with
torch.jit.scriptand CUDA Graphs - Change library name to represent the unification of offline and online
- Python
Published by takuseno about 4 years ago
d3rlpy - Release v0.91
Algorithm
- TD3+BC
- https://arxiv.org/abs/2106.06860
RewardScaler
From this version, the preprocessors are available for the rewards, which allow you to normalize, standardize and clip the reward values. ```py import d3rlpy
normalize
cql = d3rlpy.algos.CQL(rewardscaler="minmax")
standardize
cql = d3rlpy.algos.CQL(reward_scaler="standardize")
clip (you can't use string alias)
cql = d3rlpy.algos.CQL(reward_scaler=d3rlpy.preprocessing.ClipRewardScaler(-1.0, 1.0)) ```
copypolicyfrom and copyqfunction_from methods
In the scenario of finetuning, you might want to initialize SAC's policy function with the pretrained CQL's policy function to boost the initial performance. From this version, you can do that as follows: ```py import d3rlpy
pretrain with static dataset
cql = d3rlpy.algos.CQL() cql.fit(...)
transfer the policy function
sac = d3rlpy.algos.SAC() sac.copypolicyfrom(cql)
you can also transfer the Q-function
sac.copyqfunction_from(cql)
finetuning with online algorithm
sac.fit_online(...) ```
Enhancements
- show messages for skipping model builds
- add
alphaparameter option toDiscreteCQL - keep counting the number of gradient steps
- allow expanding MDPDataset with the larger discrete actions (thanks, @jamartinh )
callbackfunction is called every gradient step (previously, it's called every epoch)
Bugfix
- FQE's loss function has been fixed (thanks for the report, @guyk1971)
- fix documentation build (thanks, @astrojuanlu)
- fix d4rl dataset conversion for MDPDataset (this will have a significant impact on the performance for d4rl dataset)
- Python
Published by takuseno over 4 years ago
d3rlpy - Release v0.90
Algorithm
- Conservative Offline Model-Based Optimization (COMBO)
- https://arxiv.org/abs/2102.08363
Drop data augmentation feature
From this version, the data augmentation feature has been dropped. The reason for this is that the feature introduces a lot of code complexity. In order to make d3rlpy support many algorithms and keep it as simple as possible, the feature was dropped. Instead, TorchMiniBatch was internally introduced, and all algorithms become more simple.
collect method
In offline RL experiments, data collection plays an important role especially when you try new tasks.
From this version, collect method is finally available.
```py import d3rlpy import gym
prepare environment
env = gym.make('Pendulum-v0')
prepare algorithm
sac = d3rlpy.algos.SAC()
prepare replay buffer
buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=100000, env=env)
start data collection without updates
sac.collect(env, buffer)
export to MDPDataset
dataset = buffer.tomdpdataset()
save as file
dataset.dump('pendulum.h5') ```
Along with this change, random policies are also introduced. These are useful to collect dataset with random policy. ```py
continuous action-space
policy = d3rlpy.algos.RandomPolicy()
discrete action-space
policy = d3rlpy.algos.DiscreteRandomPolicy() ```
Enhancements
- CQL and BEAR become closer to the official implementations
callbackargument has been added to algorithms- random dataset has been added to cartpole and pendulum dataset
- you can specify it via
dataset_type='random'atget_cartpoleandget_pendulummethod
- you can specify it via
Bugfix
- fix action normalization at
predict_valuemethod (thanks, @navidmdn ) - fix seed settings at reproduction codes
What's missing before v1.00?
Currently, I'm benchmarking all algorithms with d4rl dataset. Through the experiments, I realized that it's very difficult to reproduce the table reported in the paper because they actually didn't reveal full hyper-parameters, which are tuned to each dataset. So I gave up reproducing the table, and start producing numbers with the official codes to see if d3rlpy's result matches.
- Python
Published by takuseno almost 5 years ago
d3rlpy - Release v0.80
Algorithms
New algorithms are introduced in this version.
- Critic Regularized Regression (CRR)
- https://arxiv.org/abs/2006.15134
- Model-based Offline Policy Optimization (MOPO)
- https://arxiv.org/abs/2005.13239
Model-based RL
Previously, model-based RL has been supported. The model-based specific logic was implemented in dynamics side. This approach enabled us to combine model-based algorithms with arbitrary model-free algorithms. However, this requires complex designs to implement the recent model-based RL. So, the dynamics interface was refactored and the MOPO is the first algorithm to show how d3rlpy supports model-based RL algorithms.
```py
train dynamics model
from d3rlpy.datasets import getpendulum from d3rlpy.dynamics import ProbabilisticEnsembleDynamics from d3rlpy.metrics.scorer import dynamicsobservationpredictionerrorscorer from d3rlpy.metrics.scorer import dynamicsrewardpredictionerrorscorer from d3rlpy.metrics.scorer import dynamicspredictionvariancescorer from sklearn.modelselection import traintest_split
dataset, _ = get_pendulum()
trainepisodes, testepisodes = traintestsplit(dataset)
dynamics = d3rlpy.dynamics.ProbabilisticEnsembleDynamics(learningrate=1e-4, usegpu=True)
dynamics.fit(trainepisodes, evalepisodes=testepisodes, nepochs=100, scorers={ 'observationerror': dynamicsobservationpredictionerrorscorer, 'rewarderror': dynamicsrewardpredictionerrorscorer, 'variance': dynamicspredictionvariance_scorer, })
train Model-based RL algorithm
from d3rlpy.algos import MOPO
give mopo as generator argument.
mopo = MOPO(dynamics=dynamics)
mopo.fit(dataset, n_steps=100000) ```
enhancements
fittermethod has been implemented (thanks @jamartinh )tensorboard_dirreplecestensorboardflag atfitmethod (thanks @navidmdn )- show warning messages when the unused arguments are passed
- show comprehensive error messages when action-space is not compatible
fitmethod acceptsMDPDatasetobjectdropoutoption has been implemented in encoders- add appropriate
__repr__methods to show pretty outputs whenprint(algo) - metrics collection is refactored
bugfix
- fix
core dumpederrors by fixing numpy version - fix CQL backup
- Python
Published by takuseno almost 5 years ago
d3rlpy - Release v0.70
Command Line Interface
New commands are added in this version.
record
You can record the video of the evaluation episodes without coding anything.
```sh $ d3rlpy record d3rlpylogs/CQL20201224224314/model_100.pt --env-id HopperBulletEnv-v0
record wrapped environment
$ d3rlpy record d3rlpylogs/DiscreteCQL20201224224314/model100.pt \ --env-header 'import gym; env = d3rlpy.envs.Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)' ```
play
You can run the evaluation episodes with rendering images. ```sh
record simple environment
$ d3rlpy play d3rlpylogs/CQL20201224224314/model_100.pt --env-id HopperBulletEnv-v0
record wrapped environment
$ d3rlpy play d3rlpylogs/DiscreteCQL20201224224314/model100.pt \ --env-header 'import gym; env = d3rlpy.envs.Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)' ```
data-point mask for bootstrapping
Ensemble training for Q-functions has been shown as a powerful method to achieve robust training. Previously, bootstrap option has been available for algorithms. But, the mask for Q-function loss is randomly created every time when the batch is sampled.
In this version, create_mask option is available for MDPDataset and ReplayBuffer, which will create a unique mask at each data-point.
```py
offline training
dataset = d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, createmask=True, masksize=5) cql = d3rlpy.algos.CQL(ncritics=5, bootstrap=True, targetreduction_type='none') cql.fit(dataset)
online training
buffer = d3rlpy.online.buffers.ReplayBuffer(1000000, createmask=True, masksize=5)
sac = d3rlpy.algos.SAC(ncritics=5, bootstrap=True, targetreductiontype='none')
sac.fitonline(env, buffer)
``
As you noticed above,targetreductiontypeis newly introduced to specify how to aggregate target Q values. In the standard Soft Actor-Critic, thetargetreductiontype='min'. If you choosenone`, each ensemble Q-function uses its own target value, which is similar to what Bootstrapped DQN does.
better module access
From this version, you can navigate to all modules through d3rlpy.
```py
previously
from d3rlpy.datasets import getcartpole dataset = getcartpole()
v0.70
import d3rlpy dataset = d3rlpy.datasets.get_cartpole() ```
new logger style
From this version, structlog is internally used to print information instead of raw print function. This allows us to emit more structural information. Furthermore, you can control what to show and what to save to the file if you overwrite logger configuration.

enhancements
soft_q_backupoption is added toCQL.Paper Reproductionpage has been added to the documentation in order to show the performance with the paper configuration.commitmethod atD3RLPyLoggerreturns metrics (thanks, @jamartinh )
bugfix
- fix
epochcount in offline training. - fix
total_stepcount in online training. - fix typos at documentation (thanks, @pstansell )
- Python
Published by takuseno about 5 years ago
d3rlpy - Release v0.61
CLI
record command is newly introduced in this version. You can record videos of evaluation episodes with the saved model.
$ d3rlpy record d3rlpy_logs/CQL_20210131144357/model_100.pt --env-id Hopper-v2
You can also use the wrapped environment.
$ d3rlpy record d3rlpy_logs/DQN_online_20210130170041/model_1000.pt \
--env-header 'import gym; from d3rlpy.envs import Atari; env = Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)'
bugfix
- fix saving models every step in
fit_onlinemethod - fix Atari wrapper to reproduce the paper result
- fix CQL and BEAR algorithms
- Python
Published by takuseno about 5 years ago
d3rlpy - Release v0.60
logo
New logo images are made for d3rlpy 🎉
| standard | inverted |
|:-:|:-:|
|
|
|
ActionScaler
ActionScaler provides action scaling pre/post-processing for continuous control algorithms. Previously actions must be in between [-1.0, 1.0]. From now on, you don't need to care about the range of actions.
```py from d3rlpy.cql import CQL
cql = CQL(actionscaler='minmax') # just pass action_scaler argument ```
handling timeout episodes
Episodes terminated by timeouts should not be clipped at bootstrapping. From this version, you can specify episode boundaries as well as the terminal flags. ```py from d3rlpy.dataset import MDPDataset
observations = ... actions = ... rewards = ... terminals = ... # this indicates the environmental termination episode_terminals = ... # this indicates episode boundaries
datasets = MDPDataset(observations, actions, rewards, terminals, episode_terminals)
if episode_terminals are omitted, terminals will be used to specify episode boundaries
datasets = MDPDataset(observations, actions, rewards, terminals)
```
In online training, you can specify this option via timelimit_aware flag.
```py
from d3rlpy.sac import SAC
env = gym.make('Hopper-v2') # make sure if the environment is wrapped by gym.wrappers.Timelimit
sac = SAC() sac.fitonline(env, timelimitaware=True) # this flag is True by default ```
reference: https://arxiv.org/abs/1712.00378
batch online training
When training with computationally expensive environments such as robotics simulators or rich 3D games, it will take a long time to finish due to the slow environment steps. To solve this, d3rlpy supports batch online training. ```py from d3rlpy.algos import SAC from d3rlpy.envs import AsyncBatchEnv
if name == 'main': # this is necessary if you use AsyncBatchEnv env = AsyncBatchEnv([lambda: gym.make('Hopper-v2') for _ in range(10)]) # distributing 10 environments in different processes
sac = SAC(use_gpu=True)
sac.fit_batch_online(env) # train with 10 environments concurrently
```
docker image
Pre-built d3rlpy docker image is available in DockerHub.
$ docker run -it --gpus all --name d3rlpy takuseno/d3rlpy:latest bash
enhancements
BEARalgorithm is updated based on the official implementation- new
mmd_kerneloption is available
- new
to_mdp_datasetmethod is added toReplayBufferConstantEpsilonGreedyexplorer is addedd3rlpy.envs.ChannelFirstwrapper is added (thanks for reporting, @feyza-droid )- new dataset utility function
d3rlpy.datasets.get_d4rlis added- this is handling timeouts inside the function
- offline RL paper reproduction codes are added
- smoothed moving average plot at
d3rlpy plotCLI function (thanks, @pstansell ) - user-friendly messages for assertion errors
- better memory consumption
save_intervalargument is added tofit_online
bugfix
- core dumps are fixed in Google Colaboratory tutorials
- typos in some documentations (thanks for reporting, @pstansell )
- Python
Published by takuseno about 5 years ago
d3rlpy - Release v0.51
minor fix
- add
typing-extensionsdepdency - update MANIFEST.in
- Python
Published by takuseno about 5 years ago
d3rlpy - Release v0.50
typing
Now, d3rlpy is fully type-annotated not only for the better use of this library but also for the better contribution experiences.
- mypy and pylint check the type consistency and code quality.
- due to a lot of changes to add type annotations, there might be degradation that is not detected by linters.
CLI
v0.50 introduces the new command-line interface, d3rlpy command that helps you to do more without any efforts. For now, d3rlpy provides the following commands.
```
plot CSV data
$ d3rlpy plot d3rlpy_logs/XXX/YYY.csv
plot CSV data
$ d3rlpy plot-all d3rlpy_logs/XXX
export the save model as inference formats (e.g. ONNX, TorchScript)
$ d3rlpy export d3rlpylogs/XXX/modelYYY.pt ```
enhancements
- faster CPU to GPU transfer
- this change makes online training x2 faster
- make IQN Q function more precise based on the paper
documentation
- Add doc about SB3 integration ( thanks, @araffin )
- Python
Published by takuseno about 5 years ago
d3rlpy - Release v0.41
Algorithm
- Policy in Latent Action Space (PLAS)
- https://arxiv.org/abs/2011.07213
Off-Policy Evaluation
Off-policy evaluation (OPE) is a method to evaluate policy performance only with the offline dataset.
```py
train policy
from d3rlpy.algos import CQL from d3rlpy.datasets import getpybullet dataset, env = getpybullet('hopper-bullet-mixed-v0') cql = CQL() cql.fit(dataset.episodes)
Off-Policy Evaluation
from d3rlpy.ope import FQE from d3rlpy.metrics.scorer import softopcscorer from d3rlpy.metrics.scorer import initialstatevalueestimationscorer fqe = FQE(algo=cql) fqe.fit(dataset.episodes, evalepisodes=dataset.episodes scorers={ 'softopc': softopcscorer(1000), 'initvalue': initialstatevalueestimation_scorer }) ```
- Fitted Q-Evaluation
- https://arxiv.org/abs/2007.09055
Q Function Factory
d3rlpy provides flexible controls over Q functions through Q function factory. Following this change, the previous q_func_type argument was renamed to q_func_factory.
```py from d3rlpy.algos import DQN from d3rlpy.q_functions import QRQFunctionFactory
initialize Q function factory
qfuncfactory = QRQFunctionFactory(n_quantiles=32)
give it to algorithm object
dqn = DQN(qfuncfactory=qfuncfactory)
You can pass Q function name as string too.
py
dqn = DQN(qfuncfactory='qr')
```
You can also make your own Q function factory. Currently, these are the supported Q function factory.
EncoderFactory
- DenseNet architecture (only for vector observation)
- https://arxiv.org/abs/2010.09163
```py from d3rlpy.algos import DQN
dqn = DQN(encoder_factory='dense') ```
N-step TD calculation
d3rlpy supports N-step TD calculation for ALL algorithms. You can pass n_steps arugment to configure this parameters.
```py
from d3rlpy.algos import DQN
dqn = DQN(nsteps=5) # nsteps=1 by default ```
Paper reproduction scripts
d3rlpy supports many algorithms including online and offline paradigms. Originally, d3rlpy is designed for industrial practitioners. But, academic research is still important to push deep reinforcement learning forward. Currently, there are online DQN-variant reproduction codes.
The evaluation results will be also available soon.
enhancements
build_with_datasetandbuild_with_envmethods are added to algorithm objectsshuffleflag is added tofitmethod (thanks, @jamartinh )
- Python
Published by takuseno about 5 years ago
d3rlpy - Release v0.40
Algorithms
- Support the discrete version of Soft Actor-Critic
- https://arxiv.org/abs/1910.07207
fit_onlinehasn_stepsargument instead ofn_epochsfor the complete reproduction of the papers.
OptimizerFactory
d3rlpy provides more flexible controls for optimizer configuration via OptimizerFactory.
```py from d3rlpy.optimizers import AdamFactory from d3rlpy.algos import DQN
dqn = DQN(optimfactory=AdamFactory(weightdecay=1e-4)) ``` See more at https://d3rlpy.readthedocs.io/en/v0.40/references/optimizers.html .
EncoderFactory
d3rlpy provides more flexible controls for the neural network architecture via EncoderFactory.
```py from d3rlpy.algos import DQN from d3rlpy.encoders import VectorEncoderFactory
encoder factory
encoderfactory = VectorEncoderFactory(hiddenunits=[300, 400], activation='tanh')
set OptimizerFactory
dqn = DQN(encoderfactory=encoderfactory) ```
Also you can build your own encoders.
```py import torch import torch.nn as nn
from d3rlpy.encoders import EncoderFactory
your own neural network
class CustomEncoder(nn.Module): def init(self, obsevationshape, featuresize): self.featuresize = featuresize self.fc1 = nn.Linear(observationshape[0], 64) self.fc2 = nn.Linear(64, featuresize)
def forward(self, x):
h = torch.relu(self.fc1(x))
h = torch.relu(self.fc2(h))
return h
# THIS IS IMPORTANT!
def get_feature_size(self):
return self.feature_size
your own encoder factory
class CustomEncoderFactory(EncoderFactory): TYPE = 'custom' # this is necessary
def __init__(self, feature_size):
self.feature_size = feature_size
def create(self, observation_shape, action_size=None, discrete_action=False):
return CustomEncoder(observation_shape, self.feature_size)
def get_params(self, deep=False):
return {
'feature_size': self.feature_size
}
dqn = DQN(encoderfactory=CustomEncoderFactory(featuresize=64)) ```
See more at https://d3rlpy.readthedocs.io/en/v0.40/references/network_architectures.html .
Stable Baselines 3 wrapper
- Now d3rlpy is partially compatible with Stable Baselines 3.
- https://github.com/takuseno/d3rlpy/blob/master/d3rlpy/wrappers/sb3.py
- More documentations will be available soon.
bugfix
- fix the memory leak problem at
fit_online.- Now, you can train online algorithms with the big replay buffer size for the image observation.
- fix preprocessing at CQL.
- fix ColorJitter augmentation.
installation
PyPi
- From this version, d3rlpy officially supports Windows.
- The binary packages for each platform are built in GitHub Actions. And they are uploaded, which means that you don't have to install Cython to install this package from PyPi.
Anaconda
- From previous version, d3rlpy is available in conda-forge.
- Python
Published by takuseno over 5 years ago
d3rlpy - Release v0.32
This version introduces hotfix.
- ⚠️ Fix the significant bug in the case of online training with image observation.
- Python
Published by takuseno over 5 years ago
d3rlpy - Release v0.31
This version introduces minor changes.
- Move n_epochs arguments to fit method.
- Fix scikit-learn compatibility issues.
- Fix zero-division error during online training.
- Python
Published by takuseno over 5 years ago
d3rlpy - Release version v0.30
Algorithm
- Support Advantage-Weighted Actor-Critic (AWAC)
- https://arxiv.org/abs/2006.09359
fit_onlinemethod is available as a convenient alias tod3rlpy.online.iterators.trainfunction.- unnormalizing action problem is fixed at AWR.
Metrics
- The following metrics are available.
- initialstatevalueestimationscorer
- https://arxiv.org/abs/1906.01624
- softopcscorer
- https://arxiv.org/abs/2007.09055
⚠️ MDPDataset
d3rlpy.datasetmodule is now implemented with Cython in order to speed up memory copies.- Following operations are significantly faster than the previous version.
- creating
TransitionMiniBatchobject - frame stacking via
n_framesargument - lambda return calculation at AWR algorithms
- creating
- This change approximately makes Atari training 6% faster.
- Python
Published by takuseno over 5 years ago
d3rlpy - Release version v0.23
Algorithm
- Support Advantage-Weighted Regression (AWR)
- https://arxiv.org/abs/1910.00177
n_framesoption is added to all algorithmsn_framesoption controls frame stacking for image observation
eval_results_property is added to all algorithms- evaluation results can be retrieved from
eval_results_after training.
- evaluation results can be retrieved from
MDPDataset
prev_transitionandnext_transitionproperties are added tod3rlpy.dataset.Transition.- these properties are used for frame stacking and Monte-Carlo returns calculation at AWR.
Document
- new tutorial page is added
- Python
Published by takuseno over 5 years ago
d3rlpy - Release version v0.22
Support ONNX export
Now, the trained policy can be exported as ONNX as well as TorchScript
py
cql.save_policy('policy.onnx', as_onnx=True)
Support more data augmentations
- data augmentations for vector obsrevation
- ColorJitter augmentation for image observation
- Python
Published by takuseno over 5 years ago
d3rlpy - Release version v0.2
- support model-based algorithm
- Model-based Offline Policy Optimization
- support data augmentation (for image observation)
- Data-reguralized Q-learning
- a lot of improvements
- more dataset statistics
- more options to customize neural network architecture
- optimize default learning rates
- etc
- Python
Published by takuseno over 5 years ago
d3rlpy - First release!
- online algorithms
- Deep Q-Network (DQN)
- Double DQN
- Deep Deterministic Policy Gradients (DDPG)
- Twin Delayed Deep Deterministic Policy Gradients (TD3)
- Soft Actor-Critic (SAC)
- data-driven algorithms
- Batch-Constrained Q-leearning (BCQ)
- Bootstrapping Error Accumulation Reduction (BEAR)
- Conservative Q-Learning (CQL)
- Q functions
- mean
- Quantile Regression
- Implicit Quantile Network
- Fully-parametrized Quantile Function (experimental)
- Python
Published by takuseno over 5 years ago