Recent Releases of lc0
lc0 - v0.32.0
In this release, the code has been reorganized and undergone major changes. Therefore this changelog will be less detailed and describe the changes in major groups.
* We have a new search API that allows search algorithms to co-exist. Currently available are classic (the default), dag-preview (more later), valuehead and policyhead. The default algorithm can be changed either at build time by the default_search option or by renaming the executable to include the algorithm name (e.g. lc0-valuehead).
* We also have a new backend interface that is chess oriented and not tied to the network architecture. The existing backends still use the old interface through a wrapper.
* The source code is reorganized, with a more logical directory structure.
* The original search was ported to the new search and backend interfaces and is renamed to classic. This has allowed some streamlining and simplifications.
* The dag-preview search is the DAG algorithm that lived in a separate branch up to now. It hasn't been so well tested, that's why it has "preview" in its name for now, but lives in the src/search/dag-classic directory.
* The valuehead search replaces ValueOnly mode and selects the move with the best value head evaluation.
* The policyhead search is equivalent to a single node search, selecting the best move using just the policy head.
* The new default_backend build option allows to override the fixed priority for the backend used by default.
* The new native_arch build option to override the -march=native compiler default for linux release builds, to help with distribution package creation.
* We have a new sycl backend that will work with amd, intel and nvidia gpus.
* There is also a new onnx-trt backend, using tensorrt on nvidia gpus.
* The metal backend received several improvements.
* Support simple/normal/pro mode in options was cleaned up, using a common mechanism.
* Added the wait uci extension command to allow running simple tests from the command line.
* Removed the fen uci extension command as it was unnecessarily complicating things.
* Some preliminary fp8 support was added for onnx and xla. This is not functional, just there to make experimentation easier.
* Several build system changes and improvements.
* We now generate binaries for cuda 12, onnx-trt and macos.
* The onnx-trt package has a readme with instructions and an install script.
* Support for using lc0 with openbench.
* New bench mode for a quicker benchmark.
* RPE nets are now detected and give an error instead of bad results.
* The rescorer code and training data header were refactored to make them usable by external tools.
* Assorted small fixes and improvements.
- C++
Published by borg323 6 months ago
lc0 - v0.32.0-rc2
In this version:
* Fix for onnx-trt bug, where the wrong network could be used from the cache.
* Added code to detect RPE nets and give an error instead of bad results.
* Better instructions in the readme and install script for onnx-trt.
* Made UCI_ShowWDL again off by default again as some GUIs have issues.
* Fixed a long standing issue when compiled with -ffast-math (or icx -O3).
* Several improvements to the sycl backend.
* Several improvements to the metal backend.
* Refactored the rescorer code and training data header to make them usable by external tools.
* Relaxed cuda/cudnn version checks so that no warnings are shown for mismatched versions that are supported.
* Several build system updates.
* Assorted small fixes and improvements.
- C++
Published by borg323 6 months ago
lc0 - v0.32.0-rc1
In this release, the code has been reorganized and undergone major changes. Therefore this changelog will be less detailed and describe the changes in major groups.
* We have a new search API that allows search algorithms to co-exist. Currently available are classic (the default), dag-preview (more later), valuehead and policyhead. The default algorithm can be changed either at build time by the default_search option or by renaming the executable to include the algorithm name (e.g. lc0-valuehead).
* We also have a new backend interface that is chess oriented and not tied to the network architecture. The existing backends still use the old interface through a wrapper.
* The source code is reorganized, with a more logical directory structure.
* The original search was ported to the new search and backend interfaces and is renamed to classic. This has allowed some streamlining and simplifications.
* The dag-preview search is the DAG algorithm that lived in a separate branch up to now. It hasn't been so well tested, that's why it has "preview" in its name for now, but lives in the src/search/dag-classic directory.
* The valuehead search replaces ValueOnly mode and selects the move with the best value head evaluation.
* The policyhead search is equivalent to a single node search, selecting the best move using just the policy head.
* The new default_backend build option allows to override the fixed priority for the backend used by default.
* The new native_arch build option to override the -march=native compiler default for linux release builds, to help with distribution package creation.
* We have a new sycl backend that will work with amd, intel and nvidia gpus.
* There is also a new onnx-trt backend, using tensorrt on nvidia gpus.
* Support simple/normal/pro mode in options was cleaned up, using a common mechanism.
* Added the wait uci extension command to allow running simple tests from the command line.
* Removed the fen uci extension command as it was unnecessarily complicating things.
* Some preliminary fp8 support was added for onnx and xla. This is not functional, just there to make experimentation easier.
* Several build system changes and improvements.
* We now generate binaries for cuda 12, onnx-trt and macos.
* Support for using lc0 with openbench.
* New bench mode for a quicker benchmark.
* Assorted small fixes and improvements.
- C++
Published by borg323 7 months ago
lc0 - v0.31.0
In this version:
* The blas, cuda, eigen, metal and onnx backends now have support for multihead network architecture and can run BT3/BT4 nets.
* Updated the internal Elo model to better align with regular Elo for human players.
* There is a new XLA backend that uses OpenXLA compiler to produce code to execute the neural network. See https://github.com/LeelaChessZero/lc0/wiki/XLA-backend for details. Related are new leela2onnx options to output the HLO format that XLA understands.
* There is a vastly simplified lc0 interface available by renaming the executable to lc0simple.
* The backends can now suggest a minibatch size to the search, this is enabled by --minibatch-size=0 (the new default).
* If the cudnn backend detected an unsupported network architecture it will switch to the cuda backend.
* Two new selfplay options enable value and policy tournaments. A policy tournament is using a single node policy to select the move to play, while a value tournament searches all possible moves at depth 1 to select the one with the best q.
* While it is easy to get a single node policy evaluation (go nodes 1 using uci), there was no simple way to get the effect of a value only evalaution, so the --value-only option was added.
* Button uci options were implemented and a button to clear the tree was added (as hidden option).
* Support for the uci go mate option was added.
* The rescorer can now be built from the lc0 code base instead of a separate branch.
* A dicrete onnx layernorm implementation was added to get around a onnxruntime bug with directml - this has some overhead so it is only enabled for onnx-dml and can be switched off with the alt_layernorm=false backend option.
* The --onnx2pytoch option was added to leela2onnx to generate pytorch compatible models.
* There is a cuda min_batch backend option to reduce non-determinism with small batches.
* New options were added to onnx2leela to fix tf exported onnx models.
* The onnx backend can now be built for amd's rocm.
* Fixed a bug where the Contempt effect on eval was too low for nets with natively higher draw rates.
* Made the WDL Rescale sharpness limit configurable via the --wdl-max-s hidden option.
* The search task workers can be set automatically, to either 0 for cpu backends or up to 4 depending on the number of cpu cores. This is enabled by --task-workers=-1 (the new default).
* Changed cuda compilation options to use -arch=native or -arch=all-major if no specific version is requested, with fallback for older cuda that don't support those options.
* Updated android builds to use openblas 0.3.27.
* The WDLDrawRateTarget option now accepts the value 0 (new default) to retain raw WDL values if WDLCalibrationElo is set to 0 (default).
* Improvements to the verbose move stats if `WDLEvalObjectivity is used.
* The centipawn score is displayed by default for old nets without WDL output.
* Several assorted fixes and code cleanups.
- C++
Published by borg323 over 1 year ago
lc0 - v0.31.0-rc3
In this version:
* The WDLDrawRateTarget option now accepts the value 0 (new default) to retain raw WDL values if WDLCalibrationElo is set to 0 (default).
* Improvements to the verbose move stats if `WDLEvalObjectivity is used.
* The centipawn score is displayed by default for old nets without WDL output.
* Some build system improvements.
- C++
Published by borg323 over 1 year ago
lc0 - v0.31.0-rc2
In this version:
* Changed cuda compilation options to use -arch=native or -arch=all-major if no specific version is requested, with fallback for older cuda that don't support those options.
* Updated android builds to use openblas 0.3.27.
* A few small fixes.
- C++
Published by borg323 almost 2 years ago
lc0 - v0.31.0-rc1
In this version:
* The blas, cuda, eigen, metal and onnx backends now have support for multihead network architecture and can run BT3/BT4 nets.
* Updated the internal Elo model to better align with regular Elo for human players.
* There is a new XLA backend that uses OpenXLA compiler to produce code to execute the neural network. See https://github.com/LeelaChessZero/lc0/wiki/XLA-backend for details. Related are new leela2onnx options to output the HLO format that XLA understands.
* There is a vastly simplified lc0 interface available by renaming the executable to lc0simple.
* The backends can now suggest a minibatch size to the search, this is enabled by --minibatch-size=0 (the new default).
* If the cudnn backend detected an unsupported network architecture it will switch to the cuda backend.
* Two new selfplay options enable value and policy tournaments. A policy tournament is using a single node policy to select the move to play, while a value tournament searches all possible moves at depth 1 to select the one with the best q.
* While it is easy to get a single node policy evaluation (go nodes 1 using uci), there was no simple way to get the effect of a value only evaluation, so the --value-only option was added.
* Button uci options were implemented and a button to clear the tree was added (as hidden option).
* Support for the uci go mate option was added.
* The rescorer can now be built from the lc0 code base instead of a separate branch.
* A dicrete onnx layernorm implementation was added to get around a onnxruntime bug with directml - this has some overhead so it is only enabled for onnx-dml and can be switched off with the alt_layernorm=false backend option.
* The --onnx2pytoch option was added to leela2onnx to generate pytorch compatible models.
* There is a cuda min_batch backend option to reduce non-determinism with small batches.
* New options were added to onnx2leela to fix tf exported onnx models.
* The onnx backend can now be built for amd's rocm.
* Fixed a bug where the Contempt effect on eval was too low for nets with natively higher draw rates.
* Made the WDL Rescale sharpness limit configurable via the --wdl-max-s hidden option.
* The search task workers can be set automatically, to either 0 for cpu backends or up to 4 depending on the number of cpu cores. This is enabled by --task-workers=-1 (the new default).
* Several assorted fixes and code cleanups.
- C++
Published by borg323 almost 2 years ago
lc0 - v0.30.0
In this version:
- Support for networks with attention body and smolgen added to blas, cuda, metal and onnx backends.
- WDL conversion for more realistic WDL score and contempt. Adds an Elo based WDL transformation of the NN value head output. Helps with more accurate play at high level (WDL sharpening), more aggressive play against weaker opponents and draw avoiding openings (contempt), piece odds play. For details on how it works see https://lczero.org/blog/2023/07/the-lc0-v0.30.0-wdl-rescale/contempt-implementation/.
- A new score type
WDL_muwhich follows the new eval convention, where +1.00 means 50% white win chance. - Changed mlh threshold effect to create a smooth transition.
WDL_muscore type is now the default and the--moves-left-thresholddefault was changed from 0 to 0.8.- Simplified to a single
--draw-scoreparameter, adjusting the draw score from white's perspective: 0 gives standard scoring, -1 gives Armageddon scoring. - Updated describenet for new net architectures.
- Added a
first-move-bonusoption to the legacy time manager, to accompanybook-ply-bonusfor shallow openings. - Persistent L2 cache optimization for the cuda backend. Use the
cache_opt=truebackend option to turn it on. - Some performance improvements for the cuda, onnx and blas backends.
- Added the
threadsbackend option to onnx, defaults to 0 (let the onnxruntime decide) except for onnx-cpu that defaults to 1. - The onnx-dml package now includes a
directml.dllinstallation script. - Some users experienced memory issues with onnx-dml, so the defaults were changed. This may affect performance, in which case you can use the
steps=8backend option to get the old behavior. - The Python bindings are available as a package, see the README for instructions.
- Revised 'simple' time manager.
- A new spinlock implementation (selected with
--search-spin-backoff) to help with many cpu threads (e.g. 128 threads), obviously for cpu backends only. - Fixes for contempt with infinite search/pondering and for the wdl display when pondering.
- Some assorted fixes and code cleanups.
- C++
Published by borg323 over 2 years ago
lc0 - v0.30.0-rc2
In this release:
* WDL conversion for more realistic WDL score and contempt. Adds an Elo based
WDL transformation of the NN value head output. Helps with more accurate play
at high level (WDL sharpening), more aggressive play against weaker opponents
and draw avoiding openings (contempt), piece odds play. There will be a blog
post soon explaining in detail how it works.
* A new score type WDL_mu which follows the new eval convention, where +1.00
means 50% white win chance.
* Simplified to a single --draw-score parameter, adjusting the draw score from
white's perspective: 0 gives standard scoring, -1 gives Armageddon scoring.
* Updated describenet for new net architectures.
* Added a first-move-bonus option to the legacy time manager, to accompany
book-ply-bonus for shallow openings.
* Changed mlh threshold effect to create a smooth transition.
* Revised 'simple' time manager.
* A new spinlock implementation (selected with --search-spin-backoff) to help
with many cpu threads (e.g. 128 threads), obviously for cpu backends only.
* Some assorted fixes and code cleanups.
- C++
Published by borg323 over 2 years ago
lc0 - v0.30.0-rc1
In this release:
* Support for networks with attention body and smolgen added to blas, cuda, metal and onnx backends.
* Persistent L2 cache optimization for the cuda backend. Use the cache_opt=true backend option to turn it on.
* Some performance improvements for the cuda, onnx and blas backends.
* Added the threads backend option to onnx, defaults to 0 (let the onnxruntime decide) except for onnx-cpu that defaults to 1.
* The onnx-dml package now includes a directml.dll installation script.
* Some users experienced memory issues with onnx-dml, so the defaults were changed. This may affect performance, in which case you can use the steps=8 backend option to get the old behavior.
* The Python bindings are available as a package, see the README for instructions.
* Some assorted fixes and code cleanups.
- C++
Published by borg323 almost 3 years ago
lc0 - v0.29.0
In this release:
* New metal backend for apple systems. This is now the default backend for macos builds.
* New onnx-dml backend to use DirectML under windows, has better net compatibility than dx12 and is faster than opencl. See the README for use instructions, a separate download of the DirectML dll is required.
* Full attention policy support in cuda, cudnn, metal, onnx, blas, dnnl, and eigen backends.
* Partial attention policy support in onednn backend (good enough for T79).
* Non multigather (legacy) search code and --multigather option are removed.
* Now the onnx backends can use fp16 when running with a network file (not with .onnx model files). This is the default for onnx-cuda and onnx-dml, can be switched on or off with by setting the fp16 backend option to true or false respectively.
* The onednn package comes with the latest dnnl compiled to allow running on an intel gpu by adding gpu=0 to the backend options.
* The default net is now 791556 for most backends except opencl and dx12 that get 753723 (as they lack attention policy support).
* Support for using pgn book with long lines in training: selfplay can start at a random point in the book.
* New "simple" time manager.
* Support for double Fischer random chess (dfrc).
* Added TC-dependent output to the backendbench assistant.
* Starting with this version, the check backend compares policy for valid moves after softmax.
* The onnx backend now allows selecting gpu to use.
* Improved error messages for unsupported network files.
* Some assorted fixes and code cleanups.
- C++
Published by borg323 about 3 years ago
lc0 - v0.29.0-rc1
In this release:
* New metal backend for apple systems. This is now the default backend for
macos builds.
* New onnx-dml backend to use DirectML under windows, has better net
compatibility than dx12 and is faster than opencl. See the README for use
instructions, a separate download of the DirectML dll is required.
* Full attention policy support in cuda, cudnn, metal, onnx, blas, dnnl, and
eigen backends.
* Partial attention policy support in onednn backend (good enough for T79).
* Now the onnx backends can use fp16 when running with a network file (not with
.onnx model files). This is the default for onnx-cuda and onnx-dml, can be
switched on or off with by setting the fp16 backend option to true or
false respectively.
* The onednn package comes with a dnnl compiled to allow running on an intel gpu
by adding gpu=0 to the backend options.
* The default net is now 791556 for most backends except opencl and dx12 that
get 753723 (as they lack attention policy support).
* Support for using pgn book with long lines in training: selfplay can start at
a random point in the book.
* New "simple" time manager.
* Support for double Fischer random chess (dfrc).
* Added TC-dependent output to the backendbench assistant.
* Starting with this version, the check backend compares policy for valid moves
after softmax.
* Some assorted fixes and code cleanups.
- C++
Published by borg323 about 3 years ago
lc0 - v0.29.0-rc0
In this release:
* Initial support for attention policy, only cuda backend and partially in
blas/dnnl/eigen (good enough for T79).
* Non multigather (legacy) search code and --multigather option are removed.
* 15b default net is now 753723.
* The onnx backend now allows selecting gpu to use.
* Improved error messages for unsupported network files.
* Some assorted fixes.
- C++
Published by borg323 almost 4 years ago
lc0 - v0.28.2
This is what should have been v0.28.1: * Improved cuda performance for 512 filter networks on Amprere GPUs. * Several fixes for the onnx backend. * New lc0 modes to process network files: describenet, leela2onnx and onnx2leela * Documentation updates. * Correctness fixes for rescorer support functions.
- C++
Published by borg323 about 4 years ago
lc0 - v0.28.1-rc1
- Improved cuda performance for 512 filter networks on Amprere GPUs.
- Several fixes for the onnx backend.
- Command line options for network file conversion to/from onnx.
- Documentation updates.
- Correctness fixes for rescorer support functions.
- C++
Published by borg323 about 4 years ago
lc0 - v0.28.0
In this release:
* Multigather is now made the default (and also improved). Some search settings have changed meaning, so if you have modified values please discard them. Specifically, max-collision-events, max-collision-visits and max-out-of-order-evals-factor have changed default values, but other options also affect the search. Similarly, check that your GUI is not caching the old values.
* Updated several other default parameter values, including the MLH ones.
* Performance improvements for the cuda/cudnn backends. This includes the multi_stream cuda backend option that is off by default. You should test adding multi_stream=true to backend-opts (command line) or BackendOptions (UCI) if you have a recent GPU with a lot of VRAM.
* Support for policy focus during training.
* Larger/stronger 15b default net for all packages except android, blas and dnnl that get a new 10b network.
* The distributed binaries come with the mimalloc memory allocator for better performance when a large tree has to be destroyed (e.g. after an unexpected move).
* The legacy time manager is again the default and will use more time for the first move after a long book line.
* The --preload command line flag will initialize the backend and load the network during startup. This may help in cases where the GUI is confused by long start times, but only if backend and network are not changed via UCI options.
* A 'fen' command was added as a UCI extension to print the current position.
* Experimental onednn backend for recent intel CPUs and GPUs.
* Added support for ONNX network files and runtime with the onnx backend.
* Several bug and stability fixes.
Note: Some small third-party nets seem to play really bad with the dx12 backend and certain GPU drivers, setting the enable-gemm-metacommand=false backend option is reported to work around this issue.
- C++
Published by borg323 over 4 years ago
lc0 - v0.28.0-rc2
- The cuda backend option multi_stream is now off by default. You should consider setting it to on if you have a recent gpu with a lot of vram.
- Updated default parameters.
- Newer and stronger nets are included in the release packages.
- Added support for onnx network files and runtime with the "onnx" backend.
- Several bug and stability fixes.
- C++
Published by borg323 over 4 years ago
lc0 - v0.28.0-rc1
- Multigather is now made the default (and also improved). Some search settings
have changed meaning, so if you have modified values please discard them.
Specifically,
max-collision-events,max-collision-visitsandmax-out-of-order-evals-factorhave changed default values, but other options also affect the search. Similarly, check that your gui is not caching the old values. - Performance improvements for the cuda/cudnn backends.
- Support for policy focus during training.
- Larger/stronger 15b default net for all packages except android, blas and dnnl that get a new 10b network.
- The distributed binaries come with the mimalloc memory allocator for better performance when a large tree has to be destroyed (e.g. after an unexpected move).
- The
legacytime manager will use more time for the first move after a long book line. - The
--preloadcommand line flag will initialize the backend and load the network during startup. - A 'fen' command was added as a UCI extension to print the current position.
- Experimental onednn backend for recent intel cpus and gpus.
- C++
Published by borg323 over 4 years ago
lc0 - v0.27.0-rc2
- Fix additional cases where 'invalid move' could be incorrectly reported.
- Replace WDL softmax in cudnn backend with same implementation as cuda backend. This fixes some inaccuracy issues that were causing training data to be rejected at a fairly low frequency.
- Ensure that training data Q/D pairs form valid WDL targets even if there is accumulated drift in calculation.
- Fix for the calculation of the 'best q is proven' bit in training data.
- Multiple fixes for timelosses and infinite instamoving in smooth time manager. Smooth time manager now made default after these fixes.
- C++
Published by Tilps about 5 years ago
lc0 - v0.27.0-rc1
- Fix a bug which meant
position ... moves ...didn't work if the moves went off the end of the existing tree. (Which happens normally when playing from an opening book.)
- C++
Published by Tilps about 5 years ago
lc0 - v0.27.0-rc0
Note: This version is very broken, do not attempt use it.
- Multigather search inspired by Ceres. (Default is off. Note that the meaning of max-collision-events changes considerably when enabled and max-collision-visits will need to be set to a value close to previous values of max-collision-events in order to have similar search behavior.)
- V6 training format with additional info for training experiments.
- Updated default search parameters.
- A better algorithm for the backendbench assistant.
- Terminate search early if only 1 move isn't a proven loss.
- Various build system changes.
- C++
Published by Tilps about 5 years ago
lc0 - v0.26.3
Starting with this release, we are distributing two packages for windows with Nvidia GPUs: the cuda package and the cudnn package. The cudnn package is what we used to distribute so far (but we called it cuda), and comes with the same versions of cuda and cudnn dlls we were using for the last few months. The new cuda package comes with cuda 11.1 dlls and requires at least version 456.38 of the windows Nvidia drivers, and should give better performance on RTX cards and in particular the new RTX 30XX cards.
Notes:
1. The cudnn package will work as-is in existing setups, but for the cuda package you may have to replace cudnn with cuda (or cuda-auto or cuda-fp16) as a backend (if specified) - this will certainly be necessary for multi-gpu setups.
2. Some testing indicates that cuda 11.1 may be slower for GTX 10XX cards, so owners of older cards may want to stay with the cudnn package. If your testing shows otherwise do let us know.
- C++
Published by borg323 over 5 years ago
lc0 - v0.26.3-rc2
- Fix for uninitialized variable that led to crashes with the cudnn backend.
- Correct windows support for systems with more than 64 threads.
- A new package is built for the
cudabackend with cuda 11.1. The oldcudapackage is renamed tocudnn.
Note: The cuda package requires nvidia driver 456.38 or newer.
- C++
Published by borg323 over 5 years ago
lc0 - v0.26.3-rc1
- Residual block fusion optimization for cudnn backend, that depends on
custom_winograd=true. Enabled by default only for networks with up to 384 filters in fp16 mode and never in fp32 mode. Default can be overridden with--backend-opts=res_block_fusing=falseto disable (or=trueto enable). - New experimental cuda backend without cudnn dependency (
cuda-auto,cudaandcuda-fp16are available).
- C++
Published by borg323 over 5 years ago
lc0 - v0.26.2-rc1
- Repetitions in the search tree are marked as draws, to explore more promising lines. Enabled by default (except in selfplay mode) use
--two-fold-draws=falseto disable. - Syzygy tablebase files can now be used in selfplay. Still need to add adjudication support before we can consider using this for training.
- Default net updated to 703810.
- Fix for book with CR/LF line endings.
- Updated Eigen wrap to use new download link.
If you build from source, note that old versions of meson cannot download from the new Eigen download link. You will either have to update meson or build with -Dblas=false.
- C++
Published by borg323 over 5 years ago
lc0 - v0.26.0-rc1
- Verbose move stats now includes a line for the root node itself.
- Added optional
alphazerotime manager type for fixed fraction of remaining time per move. - The WL score is now tracked with double pecision to improve accuracy during very long search.
- Fix for a performance bug when playing from tablebase position with tablebases enabled and the PV move was changing frequently.
- Illegal searchmove restrictions will now be ignored rather than crash.
- Policy is cleared for terminal losses to encourage better quality MLH estimates by reducing how many visits a move that will not be selected (unless all other options are equally bad) receives.
- Smart pruning will now cause leela to play immediately once mate score has been declared.
- Fix an issue where sometimes the pv reported wouldn't match the move that would be selected at that moment.
- Improvement for logic for when to disable custom_winograd optimization to avoid running out of video ram.
--show-hiddencan now be specified after--helpand still work.- Performance tuning for populating the policy into nodes after nn eval completes.
- Enable custom optimized SE paths for nets with 384 filters when using the custom_winograd=false path.
- Updates to zlib/gtest/eigen when included via meson wrap.
- Added build option to build python bindings to the lc0 engine.
- Only show the git hash in uci name if not a release tag build.
- Add
--nps-limitoption to artificially reduce nps to make for easier opponent or whatever other reason you want. - Fixed a bug where search tree shape could be affected even when the
--smart-pruning-factorsetting was 0. - Changed the search logic to find the lc0.config file if left on the default value.
- Changed the search logic to find network files in autodiscover mode.
- Changed the logic to determine the default location for training games generated by selfplay in training mode.
- Changed the logic to decide where to look for the opencl backend tuning settings file.
- Android binaries published by appveyor are now stripped.
- Build can now use system installed eigen if available.
- When nodes in the tree get proven terminal, parents are updated as if they had always been terminal. This allows for faster convergence on more accurate MLH estimates amongst other details.
- Removed shortsightedness and logit-q options that have not found a reliable use case.
- Fixed a bug where m_effect calculated as part of S in verbose move stats was not consistent with the value used in search itself.
- Added 'pro' mode as an alternative to
--show-hiddenfor UCI hosts that do not support command line arguments. Simply rename the lc0 binary to include 'pro' in order to enable. backendbenchnow has a--clippyoption to try and auto suggest which batch size is a good idea.- The demux backend now splits the batch into equal sizes based on the number of threads that demux is using rather than number of backends. By default this is no change as usually there is 1 thread per backend. But it allows to more easily use demux against a blas backend sending one chunk per core.
- Added support for new training input variants canonicalhectoplies and canonicalhectoplies_armageddon.
- Fixed a bug where if the network search paths for autodiscover contain files which lc0 cannot open it would error out rather than continuing on to other files.
- Blas backends no longer have a
blas_coresoption, as it never seemed useful compared to running more threads at a higher level. --help-mdoption removed as it was deemed not very useful.- Updated to the latest version of dnnl for the dnnl build.
- Selfplay mode now supports per color settings in addition to per player settings. Per player settings have higher priority if there is a conflict. This will be used as part of armageddon training.
- Added a new experimental backend type:
recordreplay. This allows to record the output of a backend under a particular search and then replay it back again later. Theoretically this lets you simulate a CPU bottlenecked environment but still use a search tree that is a match for what might be a GPU bottlenecked environment. In practice there are a lot of corner cases where replay is not reliable yet. At a minimum you must disable prefetch. - During search the node tree is occasionally compacted to reduce cache misses
during the search tree walk. New option
--solid-tree-thresholdcan be used to adjust how aggressive this optimization is. Note that very small values can cause very large growth in ram usage and are not a good idea. The default value is a little conservative, if you have plenty of spare ram it can be good to decrease it a bit. - Small performance optimization for windows build with MLH enabled.
- Meson configuration changed to build with LTO by default. Note that meson does not always configure visual studio project files to apply this correctly on windows.
- The included net in appveyor builds is now 703350. This network supports MLH although the default MLH parameters are still threshold 1.0 which means it will not trigger without parameter adjustment.
- New backend option to explicitly override the net details and force MLH disabled. If you weren't going to use MLH anyway, this may give a tiny nps increase.
- New flag
--show-movesleft(orUCI_ShowMovesLeftfor UCI hosts that support it) will cause movesleft (in moves) to be reported in the uci info messages. Only works with networks that have MLH enabled. - More sensible default values for MLH are in. Note that threshold is still 1.0 by default, so that will still need to be configured to enable it.
- The
smooth-experimentaltime manager has been renamedsmoothand support added to increase search time whenever the best N does not correspond with the move with best utility estimate.legacyremains the default for now assmoothhas only been tuned for short time controls and evidence suggests it doesn't scale with these defaults. - Selfplay mode now supports a logfile parameter just like normal mode.
- Reinstated the 4 billion visit limit on search to avoid overflowing counters and causing very strange behavior to occur.
- Performance optimization to make tree walk faster by ensuring that node edges are always sorted by policy. This has some very small side effects to do with tiebreaks in search no longer always being dominated by movegen order.
- Appveyor built blas and Android binaries now default to minibatch size 1 and prefetch 0, which should be much better than the normal GPU optimized defaults. Note this only affects Appveyor built binaries.
- The included client in Windows Appveyor releases is now v27 and is named
lc0-training-client.exeinstead ofclient.exe.
- C++
Published by Tilps over 5 years ago
lc0 - v0.25.1
- Fixed some issues with cudnn backend on the 16xx GTX models and also for low memory devices with large network files where the new optimizations could result in out of memory errors.
- Added a workaround for a cutechess issue where reporting depth 0 during instamoves causes it to ignore our info message.
- C++
Published by Tilps almost 6 years ago
lc0 - v0.25.0
A few small updates since RC2. Lots of new stuff in this release, take a look at the RC1/RC2 release notes for details. * Relax strictness for complete standard fens in uci and opening books. Fen must still be standard, but default values will be substituted for sections that are missing. * Restore some backwards compatibility in cudnn backends that was lost with the addition of the new convolution implementation. It is also on by default for more scenarios, although still off for fp16 on RTX gpus. * Small logic fix for nps smoothing in the new optional experimental time manager.
- C++
Published by Tilps almost 6 years ago
lc0 - v0.25.0-rc2
- Increased upper limit for maximum collision events.
- Allow negative values for some of the extended moves left head parameters.
- Fix a critical bug in training data generation for input type 3.
- Fix for switching between positions in uci mode that only differ by 50 move rule in initial fen.
- Some refinements of certainty propagation.
- Better support for c++17 implementations that are missing charconv.
- Option to more accurately apply time management for uci hosts using cuteseal or similar timing techniques.
- Fix for selfplay mode to allow exactly book length total games.
- Fix for selfplay opening books with castling moves starting from chess960 fens.
- Add build option to override nvcc compiler.
- Improved validity checking for some uci input parameters.
- Updated the Q to CP conversion formula to better fit recent T60 net outputs to expectations.
- Add a new optional experimental time manager.
- Bug fix for the Q+U in verbose move stats. It is now called S: and contains the total score, including any moves left based effect if applicable.
- New temperature decay option to allow to delay the start of decay.
- All temperature options have been hidden by default.
- New optional cuda backend convolution implementation. Off by default for cudnn-fp16 until an issue with cublas performance on some gpus is resolved.
- C++
Published by Tilps almost 6 years ago
lc0 - v0.25.0-rc1
- Now requires a c++17 supporting compilation environment to build.
- Support for Moves Left Head based networks. Includes options to adjust search to favour shorter/longer wins/losses based on the moves left head output.
- Mate score reporting is now possible, and move selection will prefer shorter mates over longer ones when they are proven.
- Training now outputs v5 format data. This passes the moves left information back to training. This also includes support for multiple sub formats, including the existing standard, a new variant which can encode FRC960 castling, and also a further extension of that which tries to make training data cannonical, so there aren't multiple positions that are trivially equivalent with different network inputs.
- Benchmark now includes a suite of 34 positions to test by default instead of just start position.
- Tensorflow backend works once more, almost just as hard to compile as it used to be though.
--noiseflag is gone, use--noise-epsilon=0.25to get the old behavior.- Some bug fixes related to drawscore.
- Selfplay mode now defaults to the same value as match play for
--root-has-own-cpuct-params(true). - Some advanced time management parameters are now accessed via the new
--time-managerparameter instead of individual parameters. - Windows build script has been modernized.
- Separate Eigen backend option for CPU.
- Random backend no longer requires a network.
- Random backend supports producing training data of any input format sub type.
- Integer parameters now give better error messages when given invalid values.
- C++
Published by Tilps almost 6 years ago
lc0 -
- New parameter
--max-out-of-order-evals-factorreplaces--max-out-of-order-evalsthat was introduced in v0.24.0-rc3 and provides the factor to multiply the maximum batch size to set maximum number out-of-order evals per batch. The default value of 1.0 keeps the behavior of previous releases. - Bug fix for hangs with very early stop command from non-conforming uci hosts.
- C++
Published by mooskagh almost 6 years ago
lc0 -
- New parameter
--max-out-of-order-evalsto set maximum number out-of-order evals per batch (was equal to the batch size before). - It's now possible to embed networks into the binary. It allows easier builds of .apk for Android.
- New parameter
--smart-pruning-minimum-batchesto only allow smart pruning to stop after at least k batches, preventing instamoves on slow backends.
- C++
Published by mooskagh almost 6 years ago
lc0 -
- All releases are now bundled with network id591226 (and the file date is old enough so it has a lower priority than networks that you already may have in your directory).
- Added a 'backendbench' mode to benchmark NN evaluation performance without search.
- Android builds are added to the official releases.
- C++
Published by mooskagh almost 6 years ago
lc0 -
- Introduced DirectX12 backend.
- Optimized Cpuct/FPU parameters are now default.
- There is now a separate set of CPuct parameters for the root node.
- Support of running selfplay games from an opening book.
- It's possible to adjust draw score from 0 to something else.
- There is a new --max-concurrent-seachers parameter (default is 1) which helps with thread congestion at the beginning of the search.
- Cache fullness is not reported in UCI info line by default anymore.
- Removed libproto dependency.
- C++
Published by mooskagh almost 6 years ago
lc0 - v0.23.2
- Fixed a bug where odd length openings had reversed training data results in selfplay.
- Fixed a bug where zero length training games could be generated due to discard pile containing positions that were already considered end of game.
- Add cudnn-auto backend.
- C++
Published by Tilps about 6 years ago
lc0 -
- Fixed a bug with Lc0 crashing sometimes during match phase of training game generation.
- Release packages now include CUDNN version without DLLs bundled.
- C++
Published by mooskagh about 6 years ago
lc0 -
- Fixed the order of BLAS options so that Eigen is lower priority, to match assumption in check_opencl patch introduced in v0.23.0-rc2.
(no other changes since rc2)
- C++
Published by mooskagh about 6 years ago
lc0 -
- Fixes in nps and time reporting during search.
- Introduced DNNL BLAS build for modern CPUs in addition to OpenBLAS.
- Build fixes on MacOS without OpenCL.
- Fixed smart pruning and KLDGain trying to stop search in
go infinitemode. - OpenCL package now has check_opencl tool to find computation behaves sanely.
- Fixed a bug in interoperation of shortsighteness and certainty propagation.
- C++
Published by mooskagh about 6 years ago
lc0 -
- Support for Fischer Random Chess (
UCI_Chess960option to enable FRC-style castling). Also added support for FRC-compatible weight files, but no training code yet. - New option
--logit-q(UCI:LogitQ). Changes subtree selection algorithm a bit, possibly making it stronger (experimental, default off). - Lc0 now reports WDL score. To enable it, use
--wdl-infocommand-line argument orUCI_WdlInfoUCI option. - Added "Badgame split" mode during the training. After the engine makes inferior move due to temperature, the game is branched and later the game is replayed from the position of the branch.
- Added experimental
--short-sightedness(UCI:ShortSightedness) parameter. Treats longer variations as more "drawish". - Lc0 can now open Fat Fritz weight files.
- Time management code refactoring. No functional changes, but will make time management changes easier.
- Lc0 logo is now printed in red! \o/
- Command line argument
-vis now short for--verbose-move-stats. - Errors in
--backend-optsparameter syntax are now reported. - The most basic version of "certainty propagation" feature (actually without "propagation"). If the engine sees checkmate, it plays it! (before it could play other good move).
- Various small changes: hidden options to control Dirichlet noise, floating point optimizations, Better error reporting if there is exception in worker thread, better error messages in CUDA backend.
- C++
Published by mooskagh about 6 years ago
lc0 -
Bunch of small changes that piled up from last major release.
- C++
Published by mooskagh over 6 years ago
lc0 - (Do Not Use - incorrectly tagged) v0.21.5-rc1
Remove softmax calculation from backends and apply it after filtering for illegal moves to ensure spurious outputs on illegal moves don't reduce (or entirely remove) the quality of the policy values on the legal moves. This was especially noticeable on fp16 backends for nets trained with legal move masking, but could theoretically be an improvement for any net.
- C++
Published by Tilps over 6 years ago
lc0 - v0.21.4
Two small changes in this release. - A fix for crashes that can occur during use of sticky-endgames - Change the false positive value reported when in wdl style resign and display average nodes per move as part of tournament stats in selfplay mode.
- C++
Published by Tilps over 6 years ago
lc0 -
Changes since v0.21.2-rc3
- Centipawn formula retweaked to show 128.00 instead of 127.99 pawns for checkmate.
Since v0.21.1
Highlights:
- --sticky-engames (minimal version of certainty propagation)
- New centipawn formula
- Way to exit training gracefully
- Optimizations for GTX 16xx videocards (cudnn-fp16 works now)
- Optimizations for larger filter sizes
- C++
Published by mooskagh over 6 years ago
lc0 -
The only change from RC2 is a centipawn formula:
centipawn = 295 * Q / (1 - 0.976953125 * Q^14)
- C++
Published by mooskagh over 6 years ago
lc0 -
Changes
- Add 320 and 352 channel support for fused SE layer (#855)
- SE layer fix when not using fused kernel (#852)
- Fp16 nchw for cudnn-fp16 backend (support GTX 16xx GPUs) (#849)
- Install lc0 on openSUSE - Full documentation, Install script and links to experimental RPM packages (#675)
- C++
Published by mooskagh almost 7 years ago
lc0 -
- Make --sticky-endgames on by default (still off in training) (#844)
- update download links in README (#842)
- Recalibrate centipawn formula (#841)
- Also make parents Terminal if any move is a win or all moves are loss or draw. (#822)
- Use parent Q as a default score instead of 0 for unvisited pv. (#828)
- Add stop command to selfplay interactive mode to allow for graceful exit. (#810)
- Increased hard limit on batch size in opencl backend to 32 (#807)
- C++
Published by mooskagh almost 7 years ago
lc0 - v0.21.1
- FPU can now be independently controlled at root vs in general using the new AtRoot variants. Default value for StrategyAtRoot is 'same' which means it uses the values from the normal parameters. The reduction parameter has been removed - strategy reduction now uses the value parameter as the reduction amount. NOTE: If you are using a UCI host that remembers settings over restarts (like cutechess) be careful to ensure that when upgrading to this new version you reset settings back to defaults - or you may get a bad combination of fpu settings.
- TempVisitOffset option now allows you to specify values less than -1, and its documentation has been fixed.
- A small performance improvement in movegen.
- Windows packages should include v22 of the Client.
- C++
Published by Tilps almost 7 years ago
lc0 - v0.21.0-rc2
- Add support for cudnn7.0 (#717)
- Informative Tournament Stats (#698)
- Memory leak fix cuda backend (#747)
- cudnn-fp16 fallback path for unusual se-ratios. (#739)
- Cudnn 7.4.2 in packaged binary and warning for using old cudnn with new gpu (#741)
- Move mode specific options to end of help. (#745)
- LogLiveStats hidden option (#754)
- Optional markdown support for help output (#769)
- Improved folding of batch norm into weights and biases - fixes negative gamma bug. (#779)
- C++
Published by Tilps almost 7 years ago
lc0 - v0.21.0-rc1
Major new features in this release are support for WDL value head, and convolution direct output (AZ-style) policy head.
- Check Syzygy tablebase file sizes for corruption (#690)
- search for nvcc on the path first (#709)
- AZ-style policy head support (#712)
- Implement V4TrainingData (#722)
- WDL value head support (#635)
- Add option for doing kldgain thresholding rather than absolute visit limiting (#721)
- Easily run latest releases of lc0 and client using NVIDIA docker (#621)
- Add WDL style resign option. (#724)
- Add a uniform output option for random backend to support a0 seed data style (#725)
- Fix c hw switching in cudnn-fp16 mode with convolution policy head. (#729)
- misc (non-functional) changes to cudnn backend (#731)
- handle 64 filter SE networks (#624)
- C++
Published by Tilps about 7 years ago
lc0 -
Changes
(relative to v0.20.2-rc1)
- Favor winning moves that minimize DTZ to reduce shuffling by assuming repeated position by default (#708)
- Print cuda and gpu info, warn if mismatches are noticed (#711)
- C++
Published by mooskagh about 7 years ago
lc0 -
Changes
- no terminal multivisits (#683)
- better fix for issue 651 (#693)
- Changed output of --help flag to stdout rather than stderr (#687)
- Movegen speedup via magic bitboards (#640)
- modify default benchmark setting to run for 10 seconds (#681)
- Fix incorrect index in OpenCL Winograd output transform (#676)
- Update OpenCL (#655)
- C++
Published by mooskagh about 7 years ago
lc0 -
Changes
vs v0.20.1-rc3:
- Change to atomic for cache capacity. (#665)
vs 0.20.0:
- Search algorithm performance optimizations.
- Time management logic has been also optimized.
- Fixed pondering with
movetimebug. - Fixed a few potential problems in source code.
- C++
Published by mooskagh about 7 years ago
lc0 -
Change
- Remove ffast-math from the default flags (#661)
- C++
Published by mooskagh about 7 years ago
lc0 -
Changes
- Don't use Winograd for 1x1 conv. (#659)
- Fix issues with pondering and search limits. (#658)
- Check for zero capacity in cache (#648)
- fix undefined behavior in DiscoverWeightsFile() (#650)
- fix fastmath.h undefined behavior and clean it up (#643)
- C++
Published by mooskagh about 7 years ago
lc0 -
Changes
- Search algorithm performance optimizations.
- Time management logic has been also optimized.
Full commit log:
- Simplify movestogo approximator to use median residual time. (#634)
- Replace time curve logic with movestogo approximator. (#271)
- Cache best edge to improve PickNodeToExtend performance. (#619)
- fix building with tensorflow 1.12 (#626)
- Minor changes to
src/chess(#606) - make uci search parameters the defaults ones (#609)
- Preallocate nodes in advance of their need to avoid the allocation being behind a mutex. (#613)
- imrpove meson error when no backends enabled (#614)
- allow building with the mklml library as an mkl alternative (#612)
- Only build the history up if we are actually going to extend the position. (#607)
- fix warning (#604)
- C++
Published by mooskagh about 7 years ago
lc0 -
Changes
vs v0.20.0-rc2
- no lto builds by default (#625)
vs 0.19.1
- Squeeze-and-Excitation Networks are now supported! (lc0.org/se)
- Older text network files are no longer supported.
- Various performance fixes (most major being having fast approximate math functions).
- For systems with multiple GPUs, in addition to "multiplexing" backend we now also have "demux" backend and "roundrobin" backend.
- Compiler settings tweaks (use VS2017 for windows builds, always have LTO enabled, windows releases have PGO enabled).
- Benchmark mode has more options now (e.g. movetime) and saner defaults.
- Added an option to prevent engine to resign too early (used in training).
- Fixed a bug when number of visits could be too high in collision nodes. The fix is pretty hacky, there will be better fix later.
- 32-bit version compiles again.
- C++
Published by mooskagh about 7 years ago
lc0 -
Only one change in this release: * Fix for demux backend to match cuda expected threading model for computations. (#605)
- C++
Published by mooskagh about 7 years ago
lc0 -
Changes
- Squeeze-and-Excitation Networks are now supported! (lc0.org/se)
- Older text network files are no longer supported.
- Various performance fixes (most major being having fast approximate math functions).
- For systems with multiple GPUs, in addition to "multiplexing" backend we now also have "demux" backend and "roundrobin" backend.
- Compiler settings tweaks (use VS2017 for windows builds, always have LTO enabled, windows releases have PGO enabled).
- Benchmark mode has more options now (e.g. movetime) and saner defaults.
- Added an option to prevent engine to resign too early (used in training).
- Fixed a bug when number of visits could be too high in collision nodes. The fix is pretty hacky, there will be better fix later.
- 32-bit version compiles again.
- C++
Published by mooskagh about 7 years ago
lc0 -
Changes
Relative to v0.19.0
- Parameters from AlphaZero paper were introduced
- Do not load network on
isready - Fixed non-working nodes limit.
Relative to v0.19.1-rc2
(no changes)
- C++
Published by mooskagh about 7 years ago
lc0 -
Changelog
- Updated cpuct formula from alphazero paper. (#563)
- remove UpdateFromUciOptions() from EnsureReady() (#558)
- revert IsSearchActive() and better fix for one of #500 crashes (#555)
- C++
Published by mooskagh about 7 years ago
lc0 -
Change relative to RC5
- remove Wait() from EngineController::Stop() (#522)
Changes relative to v0.18
See v0.19.0-RC1 notes.
- C++
Published by mooskagh over 7 years ago
lc0 -
Changes
- OpenCL: replace thread_local with a resource pool. (#516)
- optional wtime and btime (#515)
- Make convolve1 work with workgroup size of 128 (#514)
- adjust average depth calculation for multivisits (#510)
- C++
Published by mooskagh over 7 years ago
lc0 -
Changelog
- Microseconds have 6 digits, not 3! (#505)
- use bestmoveissent_ for Search::IsSearchActive() (#502)
- C++
Published by mooskagh over 7 years ago
lc0 -
Changelog
- Fix OpenCL tuner always loading the first saved tuning (#491)
- Do not show warning when ComputeBlocking() takes too much time. (#494)
- Output microseconds in log rather than milliseconds. (#495)
- Add benchmark features (#483)
- Fix EncodePositionForNN test failure (#490)
- C++
Published by mooskagh over 7 years ago
lc0 -
- Search algorithm changes
When visiting terminal nodes and collisions, instead of counting that as one visit, estimate how many subsequent visits will also go to the same node, and do a batch update.
That should slightly improve nps near terminal nodes and in multithread configurations. Command line parameters that control that:
--max-collision-events – number of collision events allowed per batch. Default is 32. This parameter is roughly equivalent to --allowed-node-collisions in v0.18.
--max-collision-visits – total number of estimated collisions per NN batch. Default is 9999.
- Time management
Multiple changes have been done to make Leela track used time more precisely (particularly, the moment when to start timer is now much closer to the moment GUIs start timer).
For smart pruning, Leela's timer only starts when the first batch comes from NN eval. That should help against instamoves, especially on non-even GPUs.
Also Leela stops the search quicker now when it sees that time is up (it could continue the search for hundreds of milliseconds after that, which caused time trouble if opponent moves very fast).
Those changes should help a lot in ultra-bullet configurations.
- Better logging
Much more information is outputted now to the log file. That will allow us to easier diagnose problems if they occur. To have debug file written, add a command line option:
--logfile=/path/to/logfile
(or short option "-l /path/to/logfile", or corresponding UCI option "LogFile")
It's recommended to always have logging on, to make it easier to report bugs when it happens.
- Configuration parameters change
Large part of parameter handling has been reworked. As the result:
All UCI parameters have been changed to have more "classical" look. E.g. was "Network weights file path", became "WeightsFile".
Much more detailed help is shown than before when you run ./lc0 --help
Some flags have been renamed, e.g. --futile-move-aversion is renamed back to --smart-pruning-factor.
After setting a parameter (using command line parameter or uci setoption command), uci command "uci" shows updated result. That way you can check the current option values.
Some command-line and UCI options are hidden now. Use --show-hidden command line parameter to unhide them. E.g. ./lc0 --show-hidden --help
Also, in selfplay mode the per player configuration format has been changed (although probably noone knew that anyway): Was: ./lc0 selfplay player1: --movetime=14 Became: ./lc0 selfplay --player1.movetime=14
- Other
"go depth X" uci command now causes search to stop when depth information in uci info line reaches X. Not that it makes much sense for it to work this way, but at least it's better than noting.
Network file size can now be larger than 64MB.
There is now an experimental flag --ramlimit-mb. The engine tries to estimate
how much memory it uses and stops search when tree size (plus cache size)
reaches RAM limit. The estimation is very rough. We'll see how it performs and
improve estimation later.
In situations when search cannot be stopped (go infinite or ponder),
bestmove is not automatically outputted. Instead, search stops progress and
outputs warning.
Benchmark mode has been implemented. Run run, use the following command line: ./lc0 benchmark This feature is pretty basic in the current version, but will be expanded later.
As Leela plays much weaker in positions without history, it now is able to synthesize it and do not blunder in custom FEN positions. There is a --history-fill flag for it. Setting it to "no" disables the feature, setting to "fen_only" (default) enables it for all positions except chess start position, and setting it to "always" enables it even for startpos.
Instead of output current win estimation as centipawn score approximation, Leela can how show it's raw score. A flag that controls that is --score-type. Possible values: - centipawn (default) – approximate the win rate in centipawns, like Leela always did. - win_percentage – value from 0 to 100.0 which represents expected score in percents. - Q – the same, but scales from -100.0 to 100.0 rather than from 0 to 100.0
- C++
Published by mooskagh over 7 years ago
lc0 -
Severe bug fixed: Race condition when out-of-order-eval was enabled (and it was enabled by default)
Windows 32-bit builds are now possible (CPU only for now)
- C++
Published by mooskagh over 7 years ago
lc0 -
KNOWN BUG!
- We have credible reports that in some rare cases Lc0 crashes!
However, we were not able to reproduce it reliably. If you see the crash, please report to devs! What seems to increase crash probability:
- Very short move time (milliseconds)
- Proximity to a checkmate (happens 1-3 moves before the checkmate)
New features:
Endgame tablebases support! Both WDL and DTZ now.
Added MultiPv support.
Time management changes:
Introduced --immediate-time-use flag. Yes, yet another time management flag. Posible values are between 0.0 and 1.0. Setting it closer to 1.0 makes Leela use time saved from futile search aversion earlier.
Some time management parameters were changed:
- Slowmover is 1.0 now (was 2.4)
- Immediate-time-use is 0.6 now (didn't exist before, so was 0.0)
Fixed a bug, because of which futile search aversion tolerance was incorrectly applied, which resulted in instamoves.
Now search stops immediately when it runs out of budgeted time. Should help against timeouts, especially on slow backends (e.g. BLAS).
Move overhead now is a fixed time, doesn't depend on number of remaining moves.
Other:
Out of order eval is on by default. That brings slight nps improvement.
Default FPU reduction is 1.2 now (was 0.9)
Cudnn backend now has maxbatch parameter. (can be set for example like this --backend-opts=maxbatch=100). This is needed for lower end GPUs that didn't have enough VRAM for a buffer of size 1024. Make sure that this setting is not lower than --minibatch-size.
Small memory usage optimizations.
Engine name in UCI response is shorter now. Fritz chess UI should be able to work with Leela now
Added flag --temp-visit-offset, will allow to offset temperature during training.
Command line and UCI parameter values are now checked for validity.
You can now build for older processors that don't support the popcnt instruction by passing -Dpopcnt=false to meson when building.
32-bit build is possible now. CPU only and we were only able to build it in Linux for now, including Raspberry Pi.
Threading issue which caused crash in heavily multithreaded environment with slow backends was fixed.
- C++
Published by mooskagh over 7 years ago
lc0 -
Changes relative to RC1
- Fixed a bug, that rule50 value was located in wrong place in a training data.
- OpenCL uses much less VRAM now.
- Default OpenCL batch size is 16 now (was 1).
- Default time management related configuration was tweaked: --futile-move-aversion is 1.33 now (was 1.47) --slowmover is 2.4 now (was 2.6)
- C++
Published by mooskagh over 7 years ago
lc0 -
Changes
New visible features
- Implemented ponder support.
- Tablebases are supported now (only WDL probe for now). Command line parameter is --syzygy-paths=/path/to/syzygy/
- Old smart pruning flag is gone. Instead there is --futile-search-aversion flag. --futile-search-aversion=0 is equivalent to old --no-smart-pruning. --futile-search-aversion=1 is equivalent to old --smart-pruning. Now default is 1.47, which means that engine will sometimes decide to stop search earlier even when there is theoretical chance (but not very probable) that best move decision could be changed if allowed to think more.
Lc0 now supports configuration files. Options can be listed there instead of command line flags / uci params. Config should be named lc0.config and located in the same directory as lc0. Should list one command line option per line, with '--' in the beginning being optional, for example:
syzygy-paths=/path/to/syzygy/
In uci info, "depth" is now average depth rather than full depth (which was 4 all the time). Also, depth values do not include reused tree, only nodes visited during the current search session.
--sticky-checkmates experimental flag (default off), supposed to find shorter checkmate sequences.
More features in backend "check".
Performance optimizations
- Release windows executables are built with "whole program optimization".
- Added --out-of-order-eval flag (default is off). Switching it on makes cached/terminal nodes higher priority, which increases nps.
- OpenCL backend now supports batches (up to 5x speedup!)
- Performance optimizations for BLAS backend.
- Total visited policy (for FPU reduction) is now cached.
- Values of priors (P) are stored now as 16-bit float rather than 32-bit float, that saves considerable amount of RAM.
Bugfixes
- Fixed en passant detection bug which caused the position after pawn moving by two squares not counted towards threefold repetition even if en passant was not possible.
- Fixed the bug which caused --cache-history-length for values 2..7 work the same as --cache-history-length=1. This is fixed, but default is temporarily changed to --cache-history-length=1 during play. (For training games, it's 7)
Removed features
- Backpropagation beta / backpropagation gamma parameters have been removed.
Other changes
- Release lc0-windows-cuda.zip package now contains NVdia CUDA and cuDNN .dlls.
- C++
Published by mooskagh over 7 years ago
lc0 -
- Fully switched to official releases! No more https://crem.xyz/lc0/
- Fixed a bug when pv display and smart pruning didn't sometimes work properly after tree reuse.
- Format of protobuf network files was changed.
- Autodiscovery of protobuf based network files works now.
- C++
Published by borg323 over 7 years ago