lc0 - v0.32.0

In this release, the code has been reorganized and undergone major changes. Therefore this changelog will be less detailed and describe the changes in major groups. * We have a new search API that allows search algorithms to co-exist. Currently available are classic (the default), dag-preview (more later), valuehead and policyhead. The default algorithm can be changed either at build time by the default_search option or by renaming the executable to include the algorithm name (e.g. lc0-valuehead). * We also have a new backend interface that is chess oriented and not tied to the network architecture. The existing backends still use the old interface through a wrapper. * The source code is reorganized, with a more logical directory structure. * The original search was ported to the new search and backend interfaces and is renamed to classic. This has allowed some streamlining and simplifications. * The dag-preview search is the DAG algorithm that lived in a separate branch up to now. It hasn't been so well tested, that's why it has "preview" in its name for now, but lives in the src/search/dag-classic directory. * The valuehead search replaces ValueOnly mode and selects the move with the best value head evaluation. * The policyhead search is equivalent to a single node search, selecting the best move using just the policy head. * The new default_backend build option allows to override the fixed priority for the backend used by default. * The new native_arch build option to override the -march=native compiler default for linux release builds, to help with distribution package creation. * We have a new sycl backend that will work with amd, intel and nvidia gpus. * There is also a new onnx-trt backend, using tensorrt on nvidia gpus. * The metal backend received several improvements. * Support simple/normal/pro mode in options was cleaned up, using a common mechanism. * Added the wait uci extension command to allow running simple tests from the command line. * Removed the fen uci extension command as it was unnecessarily complicating things. * Some preliminary fp8 support was added for onnx and xla. This is not functional, just there to make experimentation easier. * Several build system changes and improvements. * We now generate binaries for cuda 12, onnx-trt and macos. * The onnx-trt package has a readme with instructions and an install script. * Support for using lc0 with openbench. * New bench mode for a quicker benchmark. * RPE nets are now detected and give an error instead of bad results. * The rescorer code and training data header were refactored to make them usable by external tools. * Assorted small fixes and improvements.

- C++
Published by borg323 6 months ago

lc0 - v0.32.0-rc2

In this version: * Fix for onnx-trt bug, where the wrong network could be used from the cache. * Added code to detect RPE nets and give an error instead of bad results. * Better instructions in the readme and install script for onnx-trt. * Made UCI_ShowWDL again off by default again as some GUIs have issues. * Fixed a long standing issue when compiled with -ffast-math (or icx -O3). * Several improvements to the sycl backend. * Several improvements to the metal backend. * Refactored the rescorer code and training data header to make them usable by external tools. * Relaxed cuda/cudnn version checks so that no warnings are shown for mismatched versions that are supported. * Several build system updates. * Assorted small fixes and improvements.

- C++
Published by borg323 6 months ago

lc0 - v0.32.0-rc1

In this release, the code has been reorganized and undergone major changes. Therefore this changelog will be less detailed and describe the changes in major groups. * We have a new search API that allows search algorithms to co-exist. Currently available are classic (the default), dag-preview (more later), valuehead and policyhead. The default algorithm can be changed either at build time by the default_search option or by renaming the executable to include the algorithm name (e.g. lc0-valuehead). * We also have a new backend interface that is chess oriented and not tied to the network architecture. The existing backends still use the old interface through a wrapper. * The source code is reorganized, with a more logical directory structure. * The original search was ported to the new search and backend interfaces and is renamed to classic. This has allowed some streamlining and simplifications. * The dag-preview search is the DAG algorithm that lived in a separate branch up to now. It hasn't been so well tested, that's why it has "preview" in its name for now, but lives in the src/search/dag-classic directory. * The valuehead search replaces ValueOnly mode and selects the move with the best value head evaluation. * The policyhead search is equivalent to a single node search, selecting the best move using just the policy head. * The new default_backend build option allows to override the fixed priority for the backend used by default. * The new native_arch build option to override the -march=native compiler default for linux release builds, to help with distribution package creation. * We have a new sycl backend that will work with amd, intel and nvidia gpus. * There is also a new onnx-trt backend, using tensorrt on nvidia gpus. * Support simple/normal/pro mode in options was cleaned up, using a common mechanism. * Added the wait uci extension command to allow running simple tests from the command line. * Removed the fen uci extension command as it was unnecessarily complicating things. * Some preliminary fp8 support was added for onnx and xla. This is not functional, just there to make experimentation easier. * Several build system changes and improvements. * We now generate binaries for cuda 12, onnx-trt and macos. * Support for using lc0 with openbench. * New bench mode for a quicker benchmark. * Assorted small fixes and improvements.

- C++
Published by borg323 7 months ago

lc0 - v0.31.2

In this version: * Updated the WDL_mu centipawn fallback. * Fix for build issues with newer Linux c++ libraries. * Fix for an XLA Mish bug. * Minor README.md update.

- C++
Published by borg323 over 1 year ago

lc0 - v0.31.1

In this version: * Make WDL_mu score type work as intended. * Fix macos CI builds.

- C++
Published by borg323 over 1 year ago

lc0 - v0.31.0

In this version: * The blas, cuda, eigen, metal and onnx backends now have support for multihead network architecture and can run BT3/BT4 nets. * Updated the internal Elo model to better align with regular Elo for human players. * There is a new XLA backend that uses OpenXLA compiler to produce code to execute the neural network. See https://github.com/LeelaChessZero/lc0/wiki/XLA-backend for details. Related are new leela2onnx options to output the HLO format that XLA understands. * There is a vastly simplified lc0 interface available by renaming the executable to lc0simple. * The backends can now suggest a minibatch size to the search, this is enabled by --minibatch-size=0 (the new default). * If the cudnn backend detected an unsupported network architecture it will switch to the cuda backend. * Two new selfplay options enable value and policy tournaments. A policy tournament is using a single node policy to select the move to play, while a value tournament searches all possible moves at depth 1 to select the one with the best q. * While it is easy to get a single node policy evaluation (go nodes 1 using uci), there was no simple way to get the effect of a value only evalaution, so the --value-only option was added. * Button uci options were implemented and a button to clear the tree was added (as hidden option). * Support for the uci go mate option was added. * The rescorer can now be built from the lc0 code base instead of a separate branch. * A dicrete onnx layernorm implementation was added to get around a onnxruntime bug with directml - this has some overhead so it is only enabled for onnx-dml and can be switched off with the alt_layernorm=false backend option. * The --onnx2pytoch option was added to leela2onnx to generate pytorch compatible models. * There is a cuda min_batch backend option to reduce non-determinism with small batches. * New options were added to onnx2leela to fix tf exported onnx models. * The onnx backend can now be built for amd's rocm. * Fixed a bug where the Contempt effect on eval was too low for nets with natively higher draw rates. * Made the WDL Rescale sharpness limit configurable via the --wdl-max-s hidden option. * The search task workers can be set automatically, to either 0 for cpu backends or up to 4 depending on the number of cpu cores. This is enabled by --task-workers=-1 (the new default). * Changed cuda compilation options to use -arch=native or -arch=all-major if no specific version is requested, with fallback for older cuda that don't support those options. * Updated android builds to use openblas 0.3.27. * The WDLDrawRateTarget option now accepts the value 0 (new default) to retain raw WDL values if WDLCalibrationElo is set to 0 (default). * Improvements to the verbose move stats if `WDLEvalObjectivity is used. * The centipawn score is displayed by default for old nets without WDL output. * Several assorted fixes and code cleanups.

- C++
Published by borg323 over 1 year ago

lc0 - v0.31.0-rc3

In this version: * The WDLDrawRateTarget option now accepts the value 0 (new default) to retain raw WDL values if WDLCalibrationElo is set to 0 (default). * Improvements to the verbose move stats if `WDLEvalObjectivity is used. * The centipawn score is displayed by default for old nets without WDL output. * Some build system improvements.

- C++
Published by borg323 over 1 year ago

lc0 - v0.31.0-rc2

In this version: * Changed cuda compilation options to use -arch=native or -arch=all-major if no specific version is requested, with fallback for older cuda that don't support those options. * Updated android builds to use openblas 0.3.27. * A few small fixes.

- C++
Published by borg323 almost 2 years ago

lc0 - v0.31.0-rc1

In this version: * The blas, cuda, eigen, metal and onnx backends now have support for multihead network architecture and can run BT3/BT4 nets. * Updated the internal Elo model to better align with regular Elo for human players. * There is a new XLA backend that uses OpenXLA compiler to produce code to execute the neural network. See https://github.com/LeelaChessZero/lc0/wiki/XLA-backend for details. Related are new leela2onnx options to output the HLO format that XLA understands. * There is a vastly simplified lc0 interface available by renaming the executable to lc0simple. * The backends can now suggest a minibatch size to the search, this is enabled by --minibatch-size=0 (the new default). * If the cudnn backend detected an unsupported network architecture it will switch to the cuda backend. * Two new selfplay options enable value and policy tournaments. A policy tournament is using a single node policy to select the move to play, while a value tournament searches all possible moves at depth 1 to select the one with the best q. * While it is easy to get a single node policy evaluation (go nodes 1 using uci), there was no simple way to get the effect of a value only evaluation, so the --value-only option was added. * Button uci options were implemented and a button to clear the tree was added (as hidden option). * Support for the uci go mate option was added. * The rescorer can now be built from the lc0 code base instead of a separate branch. * A dicrete onnx layernorm implementation was added to get around a onnxruntime bug with directml - this has some overhead so it is only enabled for onnx-dml and can be switched off with the alt_layernorm=false backend option. * The --onnx2pytoch option was added to leela2onnx to generate pytorch compatible models. * There is a cuda min_batch backend option to reduce non-determinism with small batches. * New options were added to onnx2leela to fix tf exported onnx models. * The onnx backend can now be built for amd's rocm. * Fixed a bug where the Contempt effect on eval was too low for nets with natively higher draw rates. * Made the WDL Rescale sharpness limit configurable via the --wdl-max-s hidden option. * The search task workers can be set automatically, to either 0 for cpu backends or up to 4 depending on the number of cpu cores. This is enabled by --task-workers=-1 (the new default). * Several assorted fixes and code cleanups.

- C++
Published by borg323 almost 2 years ago

lc0 - v0.30.0

In this version:

Support for networks with attention body and smolgen added to blas, cuda, metal and onnx backends.
WDL conversion for more realistic WDL score and contempt. Adds an Elo based WDL transformation of the NN value head output. Helps with more accurate play at high level (WDL sharpening), more aggressive play against weaker opponents and draw avoiding openings (contempt), piece odds play. For details on how it works see https://lczero.org/blog/2023/07/the-lc0-v0.30.0-wdl-rescale/contempt-implementation/.
A new score type WDL_mu which follows the new eval convention, where +1.00 means 50% white win chance.
Changed mlh threshold effect to create a smooth transition.
WDL_mu score type is now the default and the --moves-left-threshold default was changed from 0 to 0.8.
Simplified to a single --draw-score parameter, adjusting the draw score from white's perspective: 0 gives standard scoring, -1 gives Armageddon scoring.
Updated describenet for new net architectures.
Added a first-move-bonus option to the legacy time manager, to accompany book-ply-bonus for shallow openings.
Persistent L2 cache optimization for the cuda backend. Use the cache_opt=true backend option to turn it on.
Some performance improvements for the cuda, onnx and blas backends.
Added the threads backend option to onnx, defaults to 0 (let the onnxruntime decide) except for onnx-cpu that defaults to 1.
The onnx-dml package now includes a directml.dll installation script.
Some users experienced memory issues with onnx-dml, so the defaults were changed. This may affect performance, in which case you can use the steps=8 backend option to get the old behavior.
The Python bindings are available as a package, see the README for instructions.
Revised 'simple' time manager.
A new spinlock implementation (selected with --search-spin-backoff) to help with many cpu threads (e.g. 128 threads), obviously for cpu backends only.
Fixes for contempt with infinite search/pondering and for the wdl display when pondering.
Some assorted fixes and code cleanups.

- C++
Published by borg323 over 2 years ago

lc0 - v0.30.0-rc2

In this release: * WDL conversion for more realistic WDL score and contempt. Adds an Elo based WDL transformation of the NN value head output. Helps with more accurate play at high level (WDL sharpening), more aggressive play against weaker opponents and draw avoiding openings (contempt), piece odds play. There will be a blog post soon explaining in detail how it works. * A new score type WDL_mu which follows the new eval convention, where +1.00 means 50% white win chance. * Simplified to a single --draw-score parameter, adjusting the draw score from white's perspective: 0 gives standard scoring, -1 gives Armageddon scoring. * Updated describenet for new net architectures. * Added a first-move-bonus option to the legacy time manager, to accompany book-ply-bonus for shallow openings. * Changed mlh threshold effect to create a smooth transition. * Revised 'simple' time manager. * A new spinlock implementation (selected with --search-spin-backoff) to help with many cpu threads (e.g. 128 threads), obviously for cpu backends only. * Some assorted fixes and code cleanups.

- C++
Published by borg323 over 2 years ago

lc0 - v0.30.0-rc1

In this release: * Support for networks with attention body and smolgen added to blas, cuda, metal and onnx backends. * Persistent L2 cache optimization for the cuda backend. Use the cache_opt=true backend option to turn it on. * Some performance improvements for the cuda, onnx and blas backends. * Added the threads backend option to onnx, defaults to 0 (let the onnxruntime decide) except for onnx-cpu that defaults to 1. * The onnx-dml package now includes a directml.dll installation script. * Some users experienced memory issues with onnx-dml, so the defaults were changed. This may affect performance, in which case you can use the steps=8 backend option to get the old behavior. * The Python bindings are available as a package, see the README for instructions. * Some assorted fixes and code cleanups.

- C++
Published by borg323 almost 3 years ago

lc0 - v0.29.0

In this release: * New metal backend for apple systems. This is now the default backend for macos builds. * New onnx-dml backend to use DirectML under windows, has better net compatibility than dx12 and is faster than opencl. See the README for use instructions, a separate download of the DirectML dll is required. * Full attention policy support in cuda, cudnn, metal, onnx, blas, dnnl, and eigen backends. * Partial attention policy support in onednn backend (good enough for T79). * Non multigather (legacy) search code and --multigather option are removed. * Now the onnx backends can use fp16 when running with a network file (not with .onnx model files). This is the default for onnx-cuda and onnx-dml, can be switched on or off with by setting the fp16 backend option to true or false respectively. * The onednn package comes with the latest dnnl compiled to allow running on an intel gpu by adding gpu=0 to the backend options. * The default net is now 791556 for most backends except opencl and dx12 that get 753723 (as they lack attention policy support). * Support for using pgn book with long lines in training: selfplay can start at a random point in the book. * New "simple" time manager. * Support for double Fischer random chess (dfrc). * Added TC-dependent output to the backendbench assistant. * Starting with this version, the check backend compares policy for valid moves after softmax. * The onnx backend now allows selecting gpu to use. * Improved error messages for unsupported network files. * Some assorted fixes and code cleanups.

- C++
Published by borg323 about 3 years ago

lc0 - v0.29.0-rc1

In this release: * New metal backend for apple systems. This is now the default backend for macos builds. * New onnx-dml backend to use DirectML under windows, has better net compatibility than dx12 and is faster than opencl. See the README for use instructions, a separate download of the DirectML dll is required. * Full attention policy support in cuda, cudnn, metal, onnx, blas, dnnl, and eigen backends. * Partial attention policy support in onednn backend (good enough for T79). * Now the onnx backends can use fp16 when running with a network file (not with .onnx model files). This is the default for onnx-cuda and onnx-dml, can be switched on or off with by setting the fp16 backend option to true or false respectively. * The onednn package comes with a dnnl compiled to allow running on an intel gpu by adding gpu=0 to the backend options. * The default net is now 791556 for most backends except opencl and dx12 that get 753723 (as they lack attention policy support). * Support for using pgn book with long lines in training: selfplay can start at a random point in the book. * New "simple" time manager. * Support for double Fischer random chess (dfrc). * Added TC-dependent output to the backendbench assistant. * Starting with this version, the check backend compares policy for valid moves after softmax. * Some assorted fixes and code cleanups.

- C++
Published by borg323 about 3 years ago

lc0 - v0.29.0-rc0

In this release: * Initial support for attention policy, only cuda backend and partially in blas/dnnl/eigen (good enough for T79). * Non multigather (legacy) search code and --multigather option are removed. * 15b default net is now 753723. * The onnx backend now allows selecting gpu to use. * Improved error messages for unsupported network files. * Some assorted fixes.

- C++
Published by borg323 almost 4 years ago

lc0 - v0.28.2

This is what should have been v0.28.1: * Improved cuda performance for 512 filter networks on Amprere GPUs. * Several fixes for the onnx backend. * New lc0 modes to process network files: describenet, leela2onnx and onnx2leela * Documentation updates. * Correctness fixes for rescorer support functions.

- C++
Published by borg323 about 4 years ago

lc0 - v0.28.1-rc1

Improved cuda performance for 512 filter networks on Amprere GPUs.
Several fixes for the onnx backend.
Command line options for network file conversion to/from onnx.
Documentation updates.
Correctness fixes for rescorer support functions.

- C++
Published by borg323 about 4 years ago

lc0 - v0.28.0

In this release: * Multigather is now made the default (and also improved). Some search settings have changed meaning, so if you have modified values please discard them. Specifically, max-collision-events, max-collision-visits and max-out-of-order-evals-factor have changed default values, but other options also affect the search. Similarly, check that your GUI is not caching the old values. * Updated several other default parameter values, including the MLH ones. * Performance improvements for the cuda/cudnn backends. This includes the multi_stream cuda backend option that is off by default. You should test adding multi_stream=true to backend-opts (command line) or BackendOptions (UCI) if you have a recent GPU with a lot of VRAM. * Support for policy focus during training. * Larger/stronger 15b default net for all packages except android, blas and dnnl that get a new 10b network. * The distributed binaries come with the mimalloc memory allocator for better performance when a large tree has to be destroyed (e.g. after an unexpected move). * The legacy time manager is again the default and will use more time for the first move after a long book line. * The --preload command line flag will initialize the backend and load the network during startup. This may help in cases where the GUI is confused by long start times, but only if backend and network are not changed via UCI options. * A 'fen' command was added as a UCI extension to print the current position. * Experimental onednn backend for recent intel CPUs and GPUs. * Added support for ONNX network files and runtime with the onnx backend. * Several bug and stability fixes.

Note: Some small third-party nets seem to play really bad with the dx12 backend and certain GPU drivers, setting the enable-gemm-metacommand=false backend option is reported to work around this issue.

- C++
Published by borg323 over 4 years ago

lc0 - v0.28.0-rc2

The cuda backend option multi_stream is now off by default. You should consider setting it to on if you have a recent gpu with a lot of vram.
Updated default parameters.
Newer and stronger nets are included in the release packages.
Added support for onnx network files and runtime with the "onnx" backend.
Several bug and stability fixes.

- C++
Published by borg323 over 4 years ago

lc0 - v0.28.0-rc1

Multigather is now made the default (and also improved). Some search settings have changed meaning, so if you have modified values please discard them. Specifically, max-collision-events, max-collision-visits and max-out-of-order-evals-factor have changed default values, but other options also affect the search. Similarly, check that your gui is not caching the old values.
Performance improvements for the cuda/cudnn backends.
Support for policy focus during training.
Larger/stronger 15b default net for all packages except android, blas and dnnl that get a new 10b network.
The distributed binaries come with the mimalloc memory allocator for better performance when a large tree has to be destroyed (e.g. after an unexpected move).
The legacy time manager will use more time for the first move after a long book line.
The --preload command line flag will initialize the backend and load the network during startup.
A 'fen' command was added as a UCI extension to print the current position.
Experimental onednn backend for recent intel cpus and gpus.

- C++
Published by borg323 over 4 years ago

lc0 - v0.27.0

A better value for the backendbench Clippy threshold.

- C++
Published by borg323 almost 5 years ago

lc0 - v0.27.0-rc2

Fix additional cases where 'invalid move' could be incorrectly reported.
Replace WDL softmax in cudnn backend with same implementation as cuda backend. This fixes some inaccuracy issues that were causing training data to be rejected at a fairly low frequency.
Ensure that training data Q/D pairs form valid WDL targets even if there is accumulated drift in calculation.
Fix for the calculation of the 'best q is proven' bit in training data.
Multiple fixes for timelosses and infinite instamoving in smooth time manager. Smooth time manager now made default after these fixes.

- C++
Published by Tilps about 5 years ago

lc0 - v0.27.0-rc1

Fix a bug which meant position ... moves ... didn't work if the moves went off the end of the existing tree. (Which happens normally when playing from an opening book.)

- C++
Published by Tilps about 5 years ago

lc0 - v0.27.0-rc0

Note: This version is very broken, do not attempt use it.

Multigather search inspired by Ceres. (Default is off. Note that the meaning of max-collision-events changes considerably when enabled and max-collision-visits will need to be set to a value close to previous values of max-collision-events in order to have similar search behavior.)
V6 training format with additional info for training experiments.
Updated default search parameters.
A better algorithm for the backendbench assistant.
Terminate search early if only 1 move isn't a proven loss.
Various build system changes.

- C++
Published by Tilps about 5 years ago

lc0 - v0.26.3

Starting with this release, we are distributing two packages for windows with Nvidia GPUs: the cuda package and the cudnn package. The cudnn package is what we used to distribute so far (but we called it cuda), and comes with the same versions of cuda and cudnn dlls we were using for the last few months. The new cuda package comes with cuda 11.1 dlls and requires at least version 456.38 of the windows Nvidia drivers, and should give better performance on RTX cards and in particular the new RTX 30XX cards.

Notes: 1. The cudnn package will work as-is in existing setups, but for the cuda package you may have to replace cudnn with cuda (or cuda-auto or cuda-fp16) as a backend (if specified) - this will certainly be necessary for multi-gpu setups. 2. Some testing indicates that cuda 11.1 may be slower for GTX 10XX cards, so owners of older cards may want to stay with the cudnn package. If your testing shows otherwise do let us know.

- C++
Published by borg323 over 5 years ago

lc0 - v0.26.3-rc2

Fix for uninitialized variable that led to crashes with the cudnn backend.
Correct windows support for systems with more than 64 threads.
A new package is built for the cuda backend with cuda 11.1. The old cuda package is renamed to cudnn.

Note: The cuda package requires nvidia driver 456.38 or newer.

- C++
Published by borg323 over 5 years ago

lc0 - v0.26.3-rc1

Residual block fusion optimization for cudnn backend, that depends on custom_winograd=true. Enabled by default only for networks with up to 384 filters in fp16 mode and never in fp32 mode. Default can be overridden with --backend-opts=res_block_fusing=false to disable (or =true to enable).
New experimental cuda backend without cudnn dependency (cuda-auto, cuda and cuda-fp16 are available).

- C++
Published by borg323 over 5 years ago

lc0 - v0.26.2

No changes since rc1. Enjoy!

- C++
Published by borg323 over 5 years ago

lc0 - v0.26.2-rc1

Repetitions in the search tree are marked as draws, to explore more promising lines. Enabled by default (except in selfplay mode) use --two-fold-draws=false to disable.
Syzygy tablebase files can now be used in selfplay. Still need to add adjudication support before we can consider using this for training.
Default net updated to 703810.
Fix for book with CR/LF line endings.
Updated Eigen wrap to use new download link.

If you build from source, note that old versions of meson cannot download from the new Eigen download link. You will either have to update meson or build with -Dblas=false.

- C++
Published by borg323 over 5 years ago

lc0 - v0.26.1

Fixed an issue where an incorrectly specified openings-pgn path would be ignored rather than cause a failure.
Added support for compressed opening books.
Windows builds include v29 of the lc0-training-client.

- C++
Published by Tilps over 5 years ago

lc0 - v0.26.0

No changes since rc1. Enjoy!

- C++
Published by Tilps over 5 years ago

lc0 - v0.26.0-rc1

Verbose move stats now includes a line for the root node itself.
Added optional alphazero time manager type for fixed fraction of remaining time per move.
The WL score is now tracked with double pecision to improve accuracy during very long search.
Fix for a performance bug when playing from tablebase position with tablebases enabled and the PV move was changing frequently.
Illegal searchmove restrictions will now be ignored rather than crash.
Policy is cleared for terminal losses to encourage better quality MLH estimates by reducing how many visits a move that will not be selected (unless all other options are equally bad) receives.
Smart pruning will now cause leela to play immediately once mate score has been declared.
Fix an issue where sometimes the pv reported wouldn't match the move that would be selected at that moment.
Improvement for logic for when to disable custom_winograd optimization to avoid running out of video ram.
--show-hidden can now be specified after --help and still work.
Performance tuning for populating the policy into nodes after nn eval completes.
Enable custom optimized SE paths for nets with 384 filters when using the custom_winograd=false path.
Updates to zlib/gtest/eigen when included via meson wrap.
Added build option to build python bindings to the lc0 engine.
Only show the git hash in uci name if not a release tag build.
Add --nps-limit option to artificially reduce nps to make for easier opponent or whatever other reason you want.
Fixed a bug where search tree shape could be affected even when the --smart-pruning-factor setting was 0.
Changed the search logic to find the lc0.config file if left on the default value.
Changed the search logic to find network files in autodiscover mode.
Changed the logic to determine the default location for training games generated by selfplay in training mode.
Changed the logic to decide where to look for the opencl backend tuning settings file.
Android binaries published by appveyor are now stripped.
Build can now use system installed eigen if available.
When nodes in the tree get proven terminal, parents are updated as if they had always been terminal. This allows for faster convergence on more accurate MLH estimates amongst other details.
Removed shortsightedness and logit-q options that have not found a reliable use case.
Fixed a bug where m_effect calculated as part of S in verbose move stats was not consistent with the value used in search itself.
Added 'pro' mode as an alternative to --show-hidden for UCI hosts that do not support command line arguments. Simply rename the lc0 binary to include 'pro' in order to enable.
backendbench now has a --clippy option to try and auto suggest which batch size is a good idea.
The demux backend now splits the batch into equal sizes based on the number of threads that demux is using rather than number of backends. By default this is no change as usually there is 1 thread per backend. But it allows to more easily use demux against a blas backend sending one chunk per core.
Added support for new training input variants canonicalhectoplies and canonicalhectoplies_armageddon.
Fixed a bug where if the network search paths for autodiscover contain files which lc0 cannot open it would error out rather than continuing on to other files.
Blas backends no longer have a blas_cores option, as it never seemed useful compared to running more threads at a higher level.
--help-md option removed as it was deemed not very useful.
Updated to the latest version of dnnl for the dnnl build.
Selfplay mode now supports per color settings in addition to per player settings. Per player settings have higher priority if there is a conflict. This will be used as part of armageddon training.
Added a new experimental backend type: recordreplay. This allows to record the output of a backend under a particular search and then replay it back again later. Theoretically this lets you simulate a CPU bottlenecked environment but still use a search tree that is a match for what might be a GPU bottlenecked environment. In practice there are a lot of corner cases where replay is not reliable yet. At a minimum you must disable prefetch.
During search the node tree is occasionally compacted to reduce cache misses during the search tree walk. New option --solid-tree-threshold can be used to adjust how aggressive this optimization is. Note that very small values can cause very large growth in ram usage and are not a good idea. The default value is a little conservative, if you have plenty of spare ram it can be good to decrease it a bit.
Small performance optimization for windows build with MLH enabled.
Meson configuration changed to build with LTO by default. Note that meson does not always configure visual studio project files to apply this correctly on windows.
The included net in appveyor builds is now 703350. This network supports MLH although the default MLH parameters are still threshold 1.0 which means it will not trigger without parameter adjustment.
New backend option to explicitly override the net details and force MLH disabled. If you weren't going to use MLH anyway, this may give a tiny nps increase.
New flag --show-movesleft (or UCI_ShowMovesLeft for UCI hosts that support it) will cause movesleft (in moves) to be reported in the uci info messages. Only works with networks that have MLH enabled.
More sensible default values for MLH are in. Note that threshold is still 1.0 by default, so that will still need to be configured to enable it.
The smooth-experimental time manager has been renamed smooth and support added to increase search time whenever the best N does not correspond with the move with best utility estimate. legacy remains the default for now as smooth has only been tuned for short time controls and evidence suggests it doesn't scale with these defaults.
Selfplay mode now supports a logfile parameter just like normal mode.
Reinstated the 4 billion visit limit on search to avoid overflowing counters and causing very strange behavior to occur.
Performance optimization to make tree walk faster by ensuring that node edges are always sorted by policy. This has some very small side effects to do with tiebreaks in search no longer always being dominated by movegen order.
Appveyor built blas and Android binaries now default to minibatch size 1 and prefetch 0, which should be much better than the normal GPU optimized defaults. Note this only affects Appveyor built binaries.
The included client in Windows Appveyor releases is now v27 and is named lc0-training-client.exe instead of client.exe.

- C++
Published by Tilps over 5 years ago