Recent Releases of https://github.com/awslabs/sockeye
https://github.com/awslabs/sockeye - 3.1.34
[3.1.34]
Fixed
- Do not mask prepended tokens by default (for self-attention).
- Do not require specifying
--end-of-prepending-tagif it is already done when preparing the data.
- Python
Published by fhieber over 3 years ago
https://github.com/awslabs/sockeye - 3.1.33
[3.1.33]
Fixed
- Two small fixes to SampleK. Before the device was not set correctly leading to issues when running sampling on GPUs. Furthermore, SampleK did not return the top-k values correctly.
[3.1.32]
Added
- Sockeye now supports blocking cross-attention between decoder and encoded prepended tokens.
- If the source contains prepended text and a tag indicating the end of prepended text,
Sockeye supports blocking the cross-attention between decoder and encoded prepended tokens (including the tag).
To enable this operation, specify
--end-of-prepending-tagfor training or data preparation, and--transformer-block-prepended-cross-attentionfor training.
- If the source contains prepended text and a tag indicating the end of prepended text,
Sockeye supports blocking the cross-attention between decoder and encoded prepended tokens (including the tag).
To enable this operation, specify
Changed
- Sockeye uses a new dictionary-based prepared data format that supports storing length of prepended source tokens (version 7). The previous format (version 6) is still supported.
- Python
Published by fhieber over 3 years ago
https://github.com/awslabs/sockeye - 3.1.31
[3.1.31]
Fixed
- Fixed sequence copying integration tests to correctly specify that scoring/translation outputs should not be checked.
- Enabled
bfloat16integration and system testing on all platforms.
[3.1.30]
Added
- Added support for
--dtype bfloat16tosockeye-translate,sockeye-score, andsockeye-quantize.
Fixed
- Fixed compatibility issue with
numpy==1.24.0by usingpickleinstead ofnumpyto save/loadParallelSampleIterdata permutations.
- Python
Published by fhieber over 3 years ago
https://github.com/awslabs/sockeye - 3.1.29
[3.1.29]
Changed
- Running
sockeye-evaluateno longer applies text tokenization for TER (same behavior as other metrics). - Turned on type checking for all
sockeyemodules excepttest_utilsand addressed resulting type issues. - Refactored code in various modules without changing user-level behavior.
[3.1.28]
Added
- Added kNN-MT model from Khandelwal et al., 2021.
- Installation: see faiss document -- installation via conda is recommended.
- Building a faiss index from a sockeye model takes two steps:
- Generate decoder states:
sockeye-generate-decoder-states -m [model] --source [src] --target [tgt] --output-dir [output dir] - Build index:
sockeye-knn -i [input_dir] -o [output_dir] -t [faiss_index_signature]whereinput_diris the same asoutput_dirfrom thesockeye-generate-decoder-statescommand. - Faiss index signature reference: see here
- Running inference using the built index:
sockeye-translate ... --knn-index [index_dir] --knn-lambda [interpolation_weight]whereindex_diris the same asoutput_dirfrom thesockeye-knncommand.
- Python
Published by fhieber over 3 years ago
https://github.com/awslabs/sockeye - 3.1.27
[3.1.27]
Changed
- allow torch 1.13 in requirements.txt
- Replaced deprecated
torch.testing.assert_allclosewithtorch.testing.closefor PyTorch 1.14 compatibility.
[3.1.26]
Added
--tf32 0|1bool device (torch.backends.cuda.matmul.allow_tf32) enabling 10-bit precision (19 bit total) transparent float32 acceleration. default true for backward compat with torch < 1.12. allow different--tf32training continuation
Changed
device.init_device()called by train, translate, and score- allow torch 1.12 in requirements.txt
[3.1.25]
Changed
- Updated to sacrebleu==2.3.1. Changed default BLEU floor smoothing offset from 0.01 to 0.1.
[3.1.24]
Fixed
- Updated DeepSpeed checkpoint conversion to support newer versions of DeepSpeed.
[3.1.23]
Changed
- Change decoder softmax size logging level from info to debug.
[3.1.22]
Added
- log beam search avg output vocab size
Changed
- common base Search for GreedySearch and BeamSearch
- .pylintrc: suppress warnings about deprecated pylint warning suppressions
[3.1.21]
Fixed
- Send skipnvs and nvsthresh args now to Translator constructor in sockeye-translate instead of ignoring them.
[3.1.20]
Added
- Added training support for DeepSpeed.
- Installation:
pip install deepspeed - Usage:
deepspeed --no_python ... sockeye-train ... - DeepSpeed mode uses Zero Redundancy Optimizer (ZeRO) stage 1 (Rajbhandari et al., 2019).
- Run in FP16 mode with
--deepspeed-fp16or BF16 mode with--deepspeed-bf16.
- Installation:
[3.1.19]
Added
- Clean up GPU and CPU memory used during training initialization before starting the main training loop.
Changed
- Refactored training code in advance of adding DeepSpeed support:
- Moved logic for flagging interleaved key-value parameters from layers.py to model.py.
- Refactored LearningRateScheduler API to be compatible with PyTorch/DeepSpeed.
- Refactored optimizer and learning rate scheduler creation to be modular.
- Migrated to ModelWithLoss API, which wraps a Sockeye model and its losses in a single module.
- Refactored primary and secondary worker logic to reduce redundant calculations.
- Refactored code for saving/loading training states.
- Added utility code for managing model/training configurations.
Removed
- Removed unused training option
--learning-rate-t-scale.
[3.1.18]
Added
- Added
sockeye-trainandsockeye-translateoption--clamp-to-dtypethat clamps outputs of transformer attention, feed-forward networks, and process blocks to the min/max finite values for the current dtype. This can prevent inf/nan values from overflow when running large models in float16 mode. See: https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139
[3.1.17]
Added
- Added support for offline model quantization with
sockeye-quantize.- Pre-quantizing a model avoids the load-time memory spike of runtime quantization. For example, a float16 model loads directly as float16 instead of loading as float32 then casting to float16.
[3.1.16]
Added
- Added nbest list reranking options using isometric translation criteria as proposed in an ICASSP 2021 paper https://arxiv.org/abs/2110.03847.
To use this feature pass a criterion (
isometric-ratio, isometric-diff, isometric-lc) when specifying--metric. - Added
--output-best-non-blankto output non-blank best hypothesis from the nbest list.
[3.1.15]
Fixed
- Fix type of valid_length to be pt.Tensor instead of Optional[pt.Tensor] = None for jit tracing
- Python
Published by fhieber over 3 years ago
https://github.com/awslabs/sockeye - 3.1.14
[3.1.14]
Added
- Added the implementation of Neural vocabulary selection to Sockeye as presented in our NAACL 2022 paper "The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation" (Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne and Felix Hieber).
- To use NVS simply specify
--neural-vocab-selectiontosockeye-train. This will train a model with Neural Vocabulary Selection that is automatically used bysockeye-translate. If you want look at translations without vocabulary selection specify--skip-nvsas an argument tosockeye-translate.
- To use NVS simply specify
[3.1.13]
Added
- Added
sockeye-trainargument--no-reload-on-learning-rate-reducethat disables reloading the best training checkpoint when reducing the learning rate. This currently only applies to theplateau-reducelearning rate scheduler since other schedulers do not reload checkpoints.
- Python
Published by fhieber about 4 years ago
https://github.com/awslabs/sockeye - 3.1.12
[3.1.12]
Fixed
- Fix scoring with batches of size 1 (whic may occur when
|data| % batch_size == 1.
[3.1.11]
Fixed
- When resuming training with a fully trained model,
sockeye-trainwill correctly exit without creating a duplicate (but separately numbered) checkpoint.
- Python
Published by fhieber about 4 years ago
https://github.com/awslabs/sockeye - 3.1.10
[3.1.10]
Fixed
- When loading parameters, SockeyeModel now ignores false positive missing parameters for traced modules. These modules use the same parameters as their original non-traced versions.
- Python
Published by fhieber about 4 years ago
https://github.com/awslabs/sockeye - 3.1.9
[3.1.9]
Changed
- Clarified usage of
batch_sizein Translator code.
[3.1.8]
Fixed
- When saving parameters, SockeyeModel now skips parameters for traced modules because these modules are created at runtime and use the same parameters as non-traced versions. When loading parameters, SockeyeModel ignores parameters for traced modules that may have been saved by earlier versions.
- Python
Published by fhieber about 4 years ago
https://github.com/awslabs/sockeye - 3.1.7
[3.1.7]
Changed
- SockeyeModel components are now traced regardless of whether
inference_onlyis set, including for the CheckpointDecoder during training.
[3.1.6]
Changed
- Moved offsetting of topk scores out of the (traced) TopK module. This allows sending requests of variable batch size to the same Translator/Model/BeamSearch instance.
[3.1.5]
Changed
- Allow PyTorch 1.11 in requirements
- Python
Published by fhieber about 4 years ago
https://github.com/awslabs/sockeye - 3.1.4
[3.1.4]
Added
- Added support for the use of adding target prefix and target prefix factors to the input in JSON format during inference.
- Python
Published by fhieber about 4 years ago
https://github.com/awslabs/sockeye - 3.1.3
[3.1.3]
Added
- Added support for the use of adding source prefixes to the input in JSON format during inference.
[3.1.2]
Changed
- Optimized creation of source length mask by using
expandinstead ofrepeat_interleave.
[3.1.1]
Changed
- Updated torch dependency to 1.10.x (
torch>=1.10.0,<1.11.0)
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 3.1.0
[3.1.0]
Sockeye is now exclusively based on Pytorch.
Changed
- Renamed
x_ptmodules tox. Updated entry points insetup.py.
Removed
- Removed MXNet from the codebase
- Removed device locking / GPU acquisition logic. Removed dependency on
portalocker. - Removed arguments
--softmax-temperature,--weight-init-*,--mc-dropout,--horovod,--device-ids - Removed all MXNet-related tests
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 3.0.15
[3.0.15]
Fixed
- Fixed GPU-based scoring by copying to cpu tensor first before converting to numpy.
[3.0.14]
Added
- Added support for Translation Error Rate (TER) metric as implemented in sacrebleu==1.4.14.
Checkpoint decoder metrics will now include TER scores and early stopping can be determined
via TER improvements (
--optimized-metric ter)
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 3.0.13
[3.0.13]
Changed
- use
expandinstead ofrepeatfor attention masks to not allocate additional memory - avoid repeated
transposefor initializing cached encoder-attention states in the decoder.
[3.0.12]
Removed
- Removed unused code for Weight Normalization. Minor code cleanups.
[3.0.11]
Fixed
- Fixed training with a single, fixed learning rate instead of a rate scheduler (
--learning-rate-scheduler none --initial-learning-rate ...).
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 3.0.10
[3.0.10]
Changed
- End-to-end trace decode_step of the Sockeye model. Creates less overhead during decoding and a small speedup.
[3.0.9]
Fixed
- Fixed not calling the traced target embedding module during inference.
[3.0.8]
Changed
- Add support for JIT tracing source/target embeddings and JIT scripting the output layer during inference.
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 3.0.7
[3.0.7]
Changed
- Improve training speed by using
torch.nn.functional.multi_head_attention_forwardfor self- and encoder-attention during training. Requires reorganization of the parameter layout of the key-value input projections, as the current Sockeye attention interleaves for faster inference. Attention masks (both for source masking and autoregressive masks need some shape adjustments as requirements for the fused MHA op differ slightly).- Non-interleaved format for joint key-value input projection parameters:
in_features=hidden, out_features=2*hidden -> Shape: (2*hidden, hidden) - Interleaved format for joint-key-value input projection stores key and value parameters, grouped by heads:
Shape: ((num_heads * 2 * hidden_per_head), hidden) - Models save and load key-value projection parameters in interleaved format.
- When
model.training == Truekey-value projection parameters are put into non-interleaved format fortorch.nn.functional.multi_head_attention_forward - When
model.training == False, i.e. model.eval() is called, key-value projection parameters are again converted into interleaved format in place.
- Non-interleaved format for joint key-value input projection parameters:
[3.0.6]
Fixed
- Fixed checkpoint decoder issue that prevented using
bleuas--optimized-metricfor distributed training (#995).
[3.0.5]
Fixed
- Fixed data download in multilingual tutorial.
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 3.0.4
[3.0.4]
- Make sure data permutation indices are in int64 format (doesn't seem to be the case by default on all platforms).
[3.0.3]
Fixed
- Fixed ensemble decoding for models without target factors.
[3.0.2]
Changed
sockeye-translate: Beam search now computes and returns secondary target factor scores. Secondary target factors do not participate in beam search, but are greedily chosen at every time step. Accumulated scores for secondary factors are not normalized by length. Factor scores are included in JSON output (--output-type json).sockeye-scorenow returns tab-separated scores for each target factor. Users can decide how to combine factor scores depending on the downstream application. Score for the first, primary factor (i.e. output words) are normalized, other factors are not.
[3.0.1]
Fixed
- Parameter averaging (
sockeye-average) now always uses the CPU, which enables averaging parameters from GPU-trained models on CPU-only hosts.
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 3.0.0
[3.0.0] Sockeye 3: Fast Neural Machine Translation with PyTorch
Sockeye is now based on PyTorch. We maintain backwards compatibility with MXNet models in version 2.3.x until 3.1.0. If MXNet 2.x is installed, Sockeye can run both with PyTorch or MXNet but MXNet is no longer strictly required.
Added
- Added model converter CLI
sockeye.mx_to_ptthat converts MXNet models to PyTorch models. - Added
--apex-amptraining argument that runs entire model in FP16 mode, replaces--dtype float16(requires Apex). - Training automatically uses Apex fused optimizers if available (requires Apex).
- Added training argument
--label-smoothing-implto choose label smoothing implementation (default ofmxnetuses the same logic as MXNet Sockeye 2).
Changed
- CLI names point to the PyTorch code base (e.g.
sockeye-trainetc.). - MXNet-based CLIs are now accessible via
sockeye-<name>-mx. - MXNet code requires MXNet >= 2.0 since we adopted the new numpy interface.
sockeye-trainnow uses PyTorch's distributed data-parallel mode for multi-process (multi-GPU) training. Launch with:torchrun --no_python --nproc_per_node N sockeye-train --dist ...- Updated the quickstart tutorial to cover multi-device training with PyTorch Sockeye.
- Changed
--device-idsargument (plural) to--device-id(singular). For multi-GPU training, see distributed mode noted above. - Updated default value:
--pad-vocab-to-multiple-of 8 - Removed
--horovodargument used withhorovodrun(use--distwithtorchrun). - Removed
--optimizer-paramsargument (use--optimizer-betas,--optimizer-eps). - Removed
--no-hybridizationargument (usePYTORCH_JIT=0, see Disable JIT for Debugging). - Removed
--omp-num-threadsargument (use--env=OMP_NUM_THREADS=N).
Removed
- Removed support for constrained decoding (both positive and negative lexical constraints)
- Removed support for beam histories
- Removed
--amp-scale-intervalargument. - Removed
--kvstoreargument. - Removed arguments:
--weight-init,--weight-init-scale--weight-init-xavier-factor-type,--weight-init-xavier-rand-type - Removed
--decode-and-evaluate-device-idargument. - Removed arguments:
--monitor-pattern',--monitor-stat-func - Removed CUDA-specific requirements files in
requirements/
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 2.3.24
[2.3.24]
Added
- Use of the safe yaml loader for the model configuration files.
[2.3.23]
Changed
- Do not sort BIAS_STATE in beam search. It is constant across decoder steps.
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 2.3.22
[2.3.22]
Fixed
- The previous commit introduced a regression for vocab creation. The results was that the vocabulary was created on the input characters rather than on tokens.
[2.3.21]
Added
- Extended parallelization of data preparation to vocabulary and statistics creation while minimizing the overhead of sharding.
[2.3.20]
Added
- Added debug logging for restrict_lexicon lookups
[2.3.19]
Changed
- When training only the decoder (
--fixed-param-strategy all_except_decoder), disable autograd for the encoder and embeddings to save memory.
[2.3.18]
Changed
- Updated Docker builds and documentation. See sockeye_contrib/docker.
- Python
Published by fhieber over 4 years ago
https://github.com/awslabs/sockeye - 2.3.17
[2.3.17]
Added
- Added an alternative, faster implementation of greedy search. The '--greedy' flag to
sockeye.translatewill enable it. This implementation does not support hypothesis scores, batch decoding, or lexical constraints."
[2.3.16]
Added
- Added option
--transformer-feed-forward-use-gluto use Gated Linear Units in transformer feed forward networks (Dauphin et al., 2016; Shazeer, 2020).
[2.3.15]
Changed
- Optimization: Decoder class is now a complete HybridBlock (no forward method).
- Python
Published by fhieber almost 5 years ago
https://github.com/awslabs/sockeye - 2.3.14
[2.3.14]
Changed
- Updated to MXNet 1.8.0
- Removed dependency support for Cuda 9.2 (no longer supported by MXNet 1.8).
- Added dependency support for Cuda 11.0 and 11.2.
- Updated Python requirement to 3.7 and later. (Removed backporting
dataclassesrequirement)
[2.3.13]
Added
- Target factors are now also collected for nbest translations (and stored in the JSON output handler).
[2.3.12]
Added
- Added
--configoption toprepare_dataCLI to allow setting commandline flags via a yaml config. - Flags for the
prepare_dataCLI are now stored in the output folder underargs.yaml(equivalent to the behavior ofsockeye_train)
[2.3.11]
Added
- Added option
prevent_unkto avoid generating<unk>token in beam search.
- Python
Published by fhieber about 5 years ago
https://github.com/awslabs/sockeye - 2.3.10
[2.3.10]
Changed
- Make sure that the top N best params files retained, even if N > --keep-last-params. This ensures that model averaging will not be crippled when keeping only a few params files during training. This can result in a significant savings of disk space during training.
[2.3.9]
Added
- Added scripts for processing Sockeye benchmark output (
--output-type benchmark):- benchmarktooutput.py extracts translations
- benchmarktopercentiles.py computes percentiles
- Python
Published by fhieber over 5 years ago
https://github.com/awslabs/sockeye - 2.3.8
[2.3.8]
Fixed
- Fix problem identified in issue #925 that caused learning rate warmup to fail in some instances when doing continued training
[2.3.7]
Changed
- Use dataclass module to simplify Config classes. No functional change.
[2.3.6]
Fixed
- Fixes the problem identified in issue #890, where the lrscheduler does not behave as expected when continuing training. The problem is that the lrscheduler is kept as part of the optimizer, but the optimizer is not saved when saving state. Therefore, every time training is restarted, a new lrscheduler is created with initial parameter settings. Fix by saving and restoring the lrscheduling separately.
[2.3.5]
Fixed
- Fixed issue with LearningRateSchedulerPlateauReduce.repr printing out numnotimproved instead of reducenumnot_improved.
[2.3.4]
Fixed
- Fixed issue with dtype mismatch in beam search when translating with
--dtype float16.
[2.3.3]
Changed
- Upgraded
SacreBLEUdependency of Sockeye to a newer version (1.4.14).
- Python
Published by fhieber over 5 years ago
https://github.com/awslabs/sockeye - 2.3.2
[2.3.2]
Fixed
- Fixed edge case that unintentionally skips softmax for sampling if beam size is 1.
[2.3.1]
Fixed
- Optimizing for BLEU/CHRF with horovod required the secondary workers to also create checkpoint decoders.
[2.3.0]
Added
- Added support for target factors.
If provided with additional target-side tokens/features (token-parallel to the regular target-side) at training time,
the model can now learn to predict these in a multi-task setting. You can provide target factor data similar to source
factors:
--target-factors <factor_file1> [<factor_fileN>]. During training, Sockeye optimizes one loss per factor in a multi-task setting. The weight of the losses can be controlled by--target-factors-weight. At inference, target factors are decoded greedily, they do not participate in beam search. The predicted factor at each time step is the argmax over its separate output layer distribution. To receive the target factor predictions at inference time, use--output-type translation_with_factors.
Changed
load_model(s)now returns a list of target vocabs.- Default source factor combination changed to
sum(wasconcatbefore). SockeyeModelclass has three new properties:num_target_factors,target_factor_configs, andfactor_output_layers.
- Python
Published by fhieber over 5 years ago
https://github.com/awslabs/sockeye - 2.2.8
[2.2.8]
Changed
- Make source/target data parameters required for the scoring CLI to avoid cryptic error messages.
[2.2.7]
Added
- Added an argument to specify the log level of secondary workers. Defaults to ERROR to hide any logs except for exceptions.
[2.2.6]
Fixed
- Avoid a crash due to an edge case when no model improvement has been observed by the time the learning rate gets reduced for the first time.
[2.2.5]
Fixed
- Enforce sentence batching for sockeye score tool, set default batch size to 56
[2.2.4]
Changed
- Use softmax with length in DotAttentionCell.
- Use
contrib.arange_likein AutoRegressiveBias block to reduce number of ops.
[2.2.3]
Added
- Log the absolute number of
<unk>tokens in source and target data
[2.2.2]
Fixed
- Fix: Guard against null division for small batch sizes.
[2.2.1]
Fixed
- Fixes a corner case bug by which the beam decoder can wrongly return a best hypothesis with -infinite score.
- Python
Published by fhieber over 5 years ago
https://github.com/awslabs/sockeye - 2.2.0
[2.2.0]
Changed
Replaced multi-head attention with interleavedmatmulencdec operators, which removes previously needed transposes and improves performance.
Beam search states and model layers now assume time-major format.
[2.1.26]
Fixed
- Fixes a backwards incompatibility introduced in 2.1.17, which would prevent models trained with prior versions to be used for inference.
[2.1.25]
Changed
- Reverting PR #772 as it causes issues with
amp.
[2.1.24]
Changed
- Make sure to write a final checkpoint when stopping with
--max-updates,--max-samplesor--max-num-epochs.
[2.1.23]
Changed
- Updated to MXNet 1.7.0.
- Re-introduced use of softmax with length parameter in DotAttentionCell (see PR #772).
[2.1.22]
Added
- Re-introduced
--softmax-temperatureflag forsockeye.scoreandsockeye.translate.
- Python
Published by fhieber over 5 years ago
https://github.com/awslabs/sockeye - 2.1.21
[2.1.21]
Added
- Added an optional ability to cache encoder outputs of model.
[2.1.20]
Fixed
- Fixed a bug where the training state object was saved to disk before training metrics were added to it, leading to an inconsistency between the training state object and the metrics file (see #859).
[2.1.19]
Fixed
- When loading a shard in Horovod mode, there is now a check that each non-empty bucket contains enough sentences to cover each worker's slice. If not, the bucket's sentences are replicated to guarantee coverage.
[2.1.18]
Fixed
- Fixed a bug where sampling translation fails because an array is created in the wrong context.
- Python
Published by fhieber almost 6 years ago
https://github.com/awslabs/sockeye - 2.1.17
[2.1.17]
Added
Added
layers.SSRU, which implements a Simpler Simple Recurrent Unit as described in Kim et al, "From Research to Production and Back: Ludicrously Fast Neural Machine Translation" WNGT 2019.Added
ssru_transformeroption to--decoder, which enables the usage of SSRUs as a replacement for the decoder-side self-attention layers.
Changed
- Reduced the number of arguments for
MultiHeadSelfAttention.hybrid_forward().previous_keysandprevious_valuesshould now be input together asprevious_states, a list containing two symbols.
- Python
Published by fhieber almost 6 years ago
https://github.com/awslabs/sockeye - 2.1.16
[2.1.16]
Fixed
- Fixed batch sizing error introduced in version 2.1.12 (c00da52) that caused batch sizes to be multiplied by the number of devices. Batch sizing now works as documented (same as pre-2.1.12 versions).
- Fixed
max-wordbatching to properly size batches to a multiple of both--batch-sentences-multiple-ofand the number of devices.
[2.1.15]
Added
- Inference option
--mc-dropoutto use dropout during inference, leading to non-deterministic output. This option uses the same dropout parameters present in the model config file.
[2.1.14]
Added
- Added
sockeye.rerankoption--outputto specify output file. - Added
sockeye.rerankoption--output-reference-instead-of-blankto output reference line instead of best hypothesis when best hypothesis is blank.
- Python
Published by fhieber almost 6 years ago
https://github.com/awslabs/sockeye - 2.1.13
[2.1.13]
Added
- Training option
--quiet-secondary-workersthat suppresses console output for secondary workers when training with Horovod/MPI. - Set version of isort to
<5.0.0in requirements.dev.txt to avoid incompatibility between newer versions of isort and pylint.
[2.1.12]
Added
- Batch type option
max-wordfor max number of words including padding tokens (more predictable memory usage thanword). - Batching option
--batch-sentences-multiple-ofthat is similar to--round-batch-sizes-to-multiple-ofbut always rounds down (more predictable memory usage).
Changed
- Default bucketing settings changed to width 8, max sequence length 95 (96 including BOS/EOS tokens), and no bucket scaling.
- Argument
--no-bucket-scalingreplaced with--bucket-scalingwhich is False by default.
[2.1.11]
Changed
- Updated
sockeye.rerankmodule to use "add-k" smoothing for sentence-level BLEU.
Fixed
- Updated
sockeye.rerankmodule to use current N-best format.
- Python
Published by fhieber almost 6 years ago
https://github.com/awslabs/sockeye - 2.1.10
[2.1.10]
Changed
- Changed to a cross-entropy loss implementation that avoids the use of SoftmaxOutput.
[2.1.9]
Added
- Added training argument
--ignore-extra-paramsto ignore extra parameters when loading models. The primary use case is continuing training with a model that has already been annotated with scaling factors (sockeye.quantize).
Fixed
- Properly pass
allow_missingflag tomodel.load_parameters()
[2.1.8]
Changed
- Update to sacrebleu=1.4.10
- Python
Published by fhieber almost 6 years ago
https://github.com/awslabs/sockeye - 2.1.7
[2.1.7]
Changed
- Optimize preparedata by saving the shards in parallel. The preparedata script accepts a new parameter
--max-processesto control the level of parallelism with which shards are written to disk.
[2.1.6]
Changed
- Updated Dockerfiles optimized for CPU (intgemm int8 inference, full MKL support) and GPU (distributed training with Horovod). See sockeye_contrib/docker.
Added
- Official support for int8 quantization with intgemm:
- This requires the "intgemm" fork of MXNet (kpuatamazon/incubator-mxnet/intgemm). This is the version of MXNet used in the Sockeye CPU docker image (see sockeye_contrib/docker).
- Use
sockeye.translate --dtype int8to quantize a trained float32 model at runtime. - Use the
sockeye.quantizeCLI to annotate a float32 model with int8 scaling factors for fast runtime quantization.
[2.1.5]
Changed
- Changed state caching for transformer models during beam search to cache states with attention heads already separated out. This avoids repeated transpose operations during decoding, leading to faster inference.
[2.1.4]
Added
- Added Dockerfiles that build an experimental CPU-optimized Sockeye image:
- Uses the latest versions of kpuatamazon/incubator-mxnet (supports intgemm and makes full use of Intel MKL) and kpuatamazon/sockeye (supports int8 quantization for inference).
- See sockeye_contrib/docker.
[2.1.3]
Changed
- Performance optimizations to beam search inference
- Remove unneeded take ops on encoder states
- Gathering input data before sending to GPU, rather than sending each batch element individually
- All of beam search can be done in fp16, if specified by the model
- Other small miscellaneous optimizations
- Model states are now a flat list in ensemble inference, structure of states provided by
state_structure()
[2.1.2]
Changed
- Updated to MXNet 1.6.0
Added
- Added support for CUDA 10.2
Removed
- Removed support for CUDA<9.1 / CUDNN<7.5
[2.1.1]
Added
- Ability to set environment variables from training/translate CLIs before MXNet is imported. For example, users can
configure MXNet as such:
--env "OMP_NUM_THREADS=1;MXNET_ENGINE_TYPE=NaiveEngine"
[2.1.0]
Changed
- Version bump, which should have been included in commit b0461b due to incompatible models.
[2.0.1]
Changed
- Inference defaults to using the max input length observed in training (versus scaling down based on mean length ratio and standard deviations).
Added
- Additional parameter fixing strategies:
all_except_feed_forward: Only train feed forward layers.encoder_and_source_embeddings: Only train the decoder (decoder layers, output layer, and target embeddings).encoder_half_and_source_embeddings: Train the latter half of encoder layers and the decoder.
- Option to specify the number of CPU threads without using an environment variable (
--omp-num-threads). - More flexibility for source factors combination
[2.0.0]
Changed
- Update to MXNet 1.5.0
- Moved
SockeyeModelimplementation and all layers to Gluon API - Removed support for Python 3.4.
- Removed image captioning module
- Removed outdated Autopilot module
- Removed unused training options: Eve, Nadam, RMSProp, Nag, Adagrad, and Adadelta optimizers,
fixed-stepandfixed-rate-inv-tlearning rate schedulers - Updated and renamed learning rate scheduler
fixed-rate-inv-sqrt-t->inv-sqrt-decay - Added script for plotting metrics files: sockeyecontrib/plotmetrics.py
- Removed option
--weight-tying. Weight tying is enabled by default, disable with--weight-tying-type none.
Added
- Added distributed training support with Horovod/OpenMPI. Use
horovodrunand the--horovodtraining flag. - Added Dockerfiles that build a Sockeye image with all features enabled. See sockeye_contrib/docker.
- Added
nonelearning rate scheduler (use a fixed rate throughout training) - Added
linear-decaylearning rate scheduler - Added training option
--learning-rate-t-scalefor time-based decay schedulers - Added support for MXNet's Automatic Mixed Precision. Activate with the
--amptraining flag. For best results, make sure as many model dimensions are possible are multiples of 8. - Added options for making various model dimensions multiples of a given value. For example, use
--pad-vocab-to-multiple-of 8,--bucket-width 8 --no-bucket-scaling, and--round-batch-sizes-to-multiple-of 8with AMP training. - Added GluonNLP's BERTAdam optimizer, an implementation of the Adam variant used by Devlin et al. (2018). Use
--optimizer bertadam. - Added training option
--checkpoint-improvement-thresholdto set the amount of metric improvement required over the window of previous checkpoints to be considered actual model improvement (used with--max-num-checkpoint-not-improved).
- Python
Published by fhieber about 6 years ago
https://github.com/awslabs/sockeye - 1.18.115
[1.18.115]
Added
- Added requirements for MXnet compatible with cuda 10.1.
[1.18.114]
Fixed
- Fix bug in preparetraindata arguments.
[1.18.113]
Fixed
- Added logging arguments for prepare_data CLI.
[1.18.112]
Added
- Option to suppress creation of logfiles for CLIs (
--no-logfile).
[1.18.111]
Added
- Added an optional checkpoint callback for the train function.
Changed
- Excluded gradients from pickled fields of TrainState
[1.18.110]
Changed
- We now guard against failures to run
nvidia-smifor GPU memory monitoring.
[1.18.109]
Fixed
- Fixed the metric names by prefixing training metrics with 'train-' and validation metrics with 'val-'. Also restricted the custom logging function to accept only a dictionary and a compulsory global_step parameter.
[1.18.108]
Changed
- More verbose log messages about target token counts.
[1.18.107]
Changed
- Updated to MXNet 1.5.0
- Python
Published by fhieber about 6 years ago
https://github.com/awslabs/sockeye - 1.18.106
[1.18.106]
Added
- Added an optional time limit for stopping training. The training will stop at the next checkpoint after reaching the time limit.
[1.18.105]
Added
- Added support for a possibility to have a custom metrics logger - a function passed as an extra parameter. If supplied, the logger is called during training.
[1.18.104]
Changed
- Implemented an attention-based copy mechanism as described in Jia, Robin, and Percy Liang. "Data recombination for neural semantic parsing." (2016).
- Added a
special symbol to explicitly point at an input token in the target sequence - Changed the decoder interface to pass both the decoder data and the pointer data.
- Changed the AttentionState named tuple to add the raw attention scores.
[1.18.103]
Added
- Added ability to score image-sentence pairs by extending the scoring feature originally implemented for machine translation to the image captioning module.
[1.18.102]
Fixed
- Fixed loading of more than 10 source vocabulary files to be in the right, numerical order.
[1.18.101]
Changed
- Update to Sacrebleu 1.3.6
[1.18.100]
Fixed
- Always initializing the multiprocessing context. This should fix issues observed when running
sockeye-train.
[1.18.99]
Changed
- Updated to MXNet 1.4.1
[1.18.98]
Changed
- Converted several transformer-related layer implementations to Gluon HybridBlocks. No functional change.
- Python
Published by fhieber almost 7 years ago
https://github.com/awslabs/sockeye - 1.18.97
[1.18.97]
Changed
- Updated to PyYAML 5.1
[1.18.96]
Changed
- Extracted prepare vocab functionality in the build vocab step into its own function. This matches the pattern in prepare data and train where the main() function only has argparsing, and it invokes a separate function to do the work. This is to allow modules that import this one to circumvent the command line.
[1.18.95]
Changed
- Removed custom operators from transformer models and replaced them with symbolic operators. Improves Performance.
[1.18.94]
Added
- Added ability to accumulate gradients over multiple batches (--update-interval). This allows simulation of large
batch sizes on environments with limited memory. For example: training with
--batch-size 4096 --update-interval 2should be close to training with--batch-size 8192at smaller memory footprint.
[1.18.93]
Fixed
- Made
brevity_penaltyargument inTranslatorclass optional to ensure backwards compatibility.
- Python
Published by fhieber about 7 years ago
https://github.com/awslabs/sockeye - 1.18.92
[1.18.92]
Added
- Added sentence length (and length ratio) prediction to be able to discourage hypotheses that are too short at inference time. Can be enabled for training with
--length-taskand with--brevity-penalty-typeduring inference.
[1.18.91]
Changed
- Multiple lexicons can now be specified with the
--restrict-lexiconoption:- For a single lexicon:
--restrict-lexicon /path/to/lexicon. - For multiple lexicons:
--restrict-lexicon key1:/path/to/lexicon1 key2:/path/to/lexicon2 .... - Use
--json-inputto specify the lexicon to use for each input, ex:{"text": "some input string", "restrict_lexicon": "key1"}.
- For a single lexicon:
[1.18.90]
Changed
- Updated to MXNet 1.4.0
- Integration tests no longer check for equivalence of outputs with batch size 2
[1.18.89]
Fixed
- Made the length ratios per bucket change backwards compatible.
[1.18.88]
Changed
- Made sacrebleu a pip dependency and removed it from
sockeye_contrib.
[1.18.87]
Added
- Data statistics at training time now compute mean and standard deviation of length ratios per bucket. This information is stored in the model's config, but not used at the moment.
[1.18.86]
Added
- Added the
--fixed-param-strategyoption that allows fixing various model parameters during training via named strategies. These include some of the simpler combinations from Wuebker et al. (2018) such as fixing everything except the first and last layers of the encoder and decoder (all_except_outer_layers). See the help message for a full list of strategies.
- Python
Published by fhieber about 7 years ago
https://github.com/awslabs/sockeye - 1.18.85
[1.18.85]
Changed
- Disabled dynamic batching for
Translator.translate()by default due to increased memory usage. The default is to fill-up batches toTranslator.max_batch_size. Dynamic batching can still be enabled iffill_up_batchesis set to False. ### Added - Added parameter to force training to stop after a given number of checkpoints. Useful when forced to share limited GPU resources.
[1.18.84]
Fixed
- Fixed lexical constraints bugs that broke batching and caused large drop in BLEU. These were introduced with sampling (1.18.64).
[1.18.83]
Changed
- The embedding size is automatically adjusted to the Transformer model size in case it is not specified on the command line.
[1.18.82]
Fixed
- Fixed type conversion in metrics file reading introduced in 1.18.79.
[1.18.81]
Fixed
- Making sure the training pickled training state contains the checkpoint decoder's BLEU score of the last checkpoint.
[1.18.80]
Fixed
- Fixed a bug introduced in 1.18.77 where blank lines in the training data resulted in failure.
[1.18.79]
Added
- Writing of the convergence/divergence status to the metrics file and guarding against numpy.histogram's errors for NaNs during divergent behaviour.
- Python
Published by fhieber about 7 years ago
https://github.com/awslabs/sockeye - 1.18.78
[1.18.78]
Changed
- Dynamic batch sizes:
Translator.translate()will adjust batch size in beam search to the actual number of inputs without using padding.
[1.18.77]
Added
sockeye.scorenow loads data on demand and doesn't skip any input lines
[1.18.76]
Changed
- Do not compare scores from translation and scoring in integration tests.
Added
- Adding the option via the flag
--stop-training-on-decoder-failureto stop training in case the checkpoint decoder dies (e.g. because there is not enough memory). In case this is turned on a checkpoint decoder is launched right when training starts in order to fail as early as possible.
[1.18.75]
Changed
- Do not create dropout layers for inference models for performance reasons.
[1.18.74]
Changed
- Revert change in 1.18.72 as no memory saving could be observed.
[1.18.73]
Fixed
- Fixed a bug where
source-factors-num-embedwas not correctly adjusted tonum-embedwhen using prepared data &source-factor-combinesum.
- Python
Published by fhieber over 7 years ago
https://github.com/awslabs/sockeye - 1.18.72
[1.18.72]
Changed
- Removed use of
expand_dimsin favor ofreshapeto save memory.
[1.18.71]
Fixed
- Fixed default setting of source factor combination to be 'concat' for backwards compatibility.
[1.18.70]
Added
- Sockeye now outputs fields found in a JSON input object, if they are not overwritten by Sockeye. This behavior can be enabled by selecting
--json-input(to read input as a JSON object) and--output-type json(to write a JSON object to output).
[1.18.69]
Added
- Source factors can now be added to the embeddings instead of concatenated with
--source-factors-combine sum(default: concat)
[1.18.68]
- Fixed training crashes with
--learning-rate-decay-optimizer-states-reset initialoption.
- Python
Published by fhieber over 7 years ago
https://github.com/awslabs/sockeye - 1.18.67
[1.18.67]
Added
- Added
fertilityas a further type of attention coverage. - Added an option for training to keep the initializations of the model via
--keep-initializations. When set, the trainer will avoid deleting the params file for the first checkpoint, no matter what--keep-last-paramsis set to.
[1.18.66]
Fixed
- Fix to argument names that are allowed to differ for resuming training.
[1.18.65]
Changed
- More informative error message about inconsistent --shared-vocab setting.
[1.18.64]
Added
- Adding translation sampling via
--sample [N]. This causes the decoder to sample each next step from the target distribution probabilities at each timestep. An optional value ofNcauses the decoder to sample only from the topNvocabulary items for each hypothesis at each timestep (the default is 0, meaning to sample from the entire vocabulary).
[1.18.63]
Changed
- The checkpoint decoder and nvidia-smi subprocess are now launched from a forkserver, allowing for a better separation between processes.
[1.18.62]
Added
- Add option to make
TranslatorInputsdirectly from a dict.
- Python
Published by fhieber over 7 years ago
https://github.com/awslabs/sockeye - 1.18.61
[1.18.61]
Changed
- Update to MXNet 1.3.1. Removed requirements/requirements.gpu-cu{75,91}.txt as CUDA 7.5 and 9.1 are deprecated.
[1.18.60]
Fixed
- Performance optimization to skip the softmax operation for single model greedy decoding is now only applied if no translation scores are required in the output.
[1.18.59]
Added
- Full training state is now returned from EarlyStoppingTrainer's fit(). ### Changed
- Training state cleanup will not be performed for training runs that did not converge yet.
- Switched to portalocker for locking files (Windows compatibility).
[1.18.58]
Added
- Added nbest translation, exposed as
--nbest-size. Nbest translation means to not only output the most probable translation according to a model, but the top n most probable hypotheses. If--nbest-size > 1and the option--output-typeis not explicitly specified, the output type will be changed to one JSON list of nbest translations per line.--nbest-sizecan never be larger than--beam-size.
Changed
- Changed
sockeye.rerankCLI to be compatible with nbest translation JSON output format.
- Python
Published by fhieber over 7 years ago
https://github.com/awslabs/sockeye - 1.18.57
[1.18.57]
Added
- Added
sockeye.scoreCLI for quickly scoring existing translations (documentation). ### Fixed - Entry-point clean-up after the contrib/ rename
- Python
Published by tdomhan over 7 years ago
https://github.com/awslabs/sockeye - 1.18.56
[1.18.56]
Changed
- Update to MXNet 1.3.0.post0
[1.18.55]
- Renamed
contribto less-genericsockeye_contrib
- Python
Published by fhieber over 7 years ago
https://github.com/awslabs/sockeye - 1.18.54
[1.18.54]
Added
--source-factor-vocabscan be set to provide source factor vocabularies.
[1.18.53]
Added
- Always skipping softmax for greedy decoding by default, only for single models.
- Added option
--skip-topkfor greedy decoding.
[1.18.52]
Fixed
- Fixed bug in constrained decoding to make sure best hypothesis satifies all constraints.
[1.18.51]
Added
- Added a CLI for reranking of an nbest list of translations.
[1.18.50]
Fixed
- Check for equivalency of training and validation source factors was incorrectly indented.
[1.18.49]
Changed
- Removed dependence on the nvidia-smi tool. The number of GPUs is now determined programatically.
[1.18.48]
Changed
- Translator.maxinputlength now reports correct maximum input length for TranslatorInput objects, independent of the internal representation, where an additional EOS gets added.
- Python
Published by fhieber over 7 years ago
https://github.com/awslabs/sockeye - 1.18.47
[1.18.47]
Changed
- translate CLI: no longer rely on external, user-given input id for sorting translations. Also allow string ids for sentences.
[1.18.46]
Fixed
- Fixed issue with
--num-words 0:0in image captioning and another issue related to loading all features to memory with variable length.
[1.18.45]
Added
- Added an 8 layer LSTM model similar (but not exactly identical) to the 'GNMT' architecture to autopilot.
[1.18.44]
Fixed
- Fixed an issue with
--max-num-epochscausing training to stop before the update/batch that actually completes the epoch was made.
[1.18.43]
Added
<s>now supported as the first token in a multi-word negative constraint (e.g.,<s> I thinkto prevent a sentence from starting withI think) ### Fixed- Bugfix in resetting the state of a multiple-word negative constraint
[1.18.42]
Changed
- Simplified gluon blocks for length calculation
- Python
Published by fhieber almost 8 years ago
https://github.com/awslabs/sockeye - 1.18.41
[1.18.41]
Changed
- Require numpy 1.14 or later to avoid MKL conflicts between numpy as mxnet-mkl.
[1.18.40]
Fixed
- Fixed bad check for existence of negative constraints.
- Resolved conflict for phrases that are both positive and negative constraints.
- Fixed softmax temperature at inference time.
[1.18.39]
Added
- Image Captioning now supports constrained decoding.
- Image Captioning: zero padding of features now allows input features of different shape for each image.
[1.18.38]
Fixed
- Fixed issue with the incorrect order of translations when empty inputs are present and translating in chunks.
[1.18.37]
Fixed
- Determining the max output length for each sentence in a batch by the bucket length rather than the actual in order to match the behavior of a single sentence translation.
[1.18.36]
Changed
- Updated to MXNet 1.2.1
- Python
Published by tdomhan almost 8 years ago
https://github.com/awslabs/sockeye - 1.18.35
[1.18.35]
Added
- ROUGE scores are now available in
sockeye-evaluate. - Enabled CHRF as an early-stopping metric.
- Added support for
--beam-search-stop firstfor decoding jobs with--batch-size > 1. - Now supports negative constraints, which are phrases that must not appear in the output.
- Global constraints can be listed in a (pre-processed) file, one per line:
--avoid-list FILE - Per-sentence constraints are passed using the
avoidkeyword in the JSON object, with a list of strings as its field value.
- Global constraints can be listed in a (pre-processed) file, one per line:
- Added option to pad vocabulary to a multiple of x: e.g.
--pad-vocab-to-multiple-of 16. - Pre-training the RNN decoder. Usage:
- Train with flag
--decoder-only. - Feed identical source/target training data.
- Train with flag
Fixed
- Preserving max output length for each sentence to allow having identical translations for both with and without batching.
Changed
- No longer restrict the vocabulary to 50,000 words by default, but rather create the vocabulary from all words which occur at least
--word-min-counttimes. Specifying--num-wordsexplicitly will still lead to a restricted vocabulary.
- Python
Published by fhieber almost 8 years ago
https://github.com/awslabs/sockeye - 1.18.28
[1.18.28]
Changed
- Temporarily fixing the pyyaml version to 3.12 as version 4.1 introduced some backwards incompatible changes.
[1.18.27]
Fixed
- Fix silent failing of NDArray splits during inference by using a version that always returns a list. This was causing incorrect behavior when using lexicon restriction and batch inference with a single source factor.
[1.18.26]
Added
- ROUGE score evaluation. It can be used as the stopping criterion for tasks such as summarization.
[1.18.25]
Changed
- Update requirements to use MKL versions of MXNet for fast CPU operation.
[1.18.24]
Added
- Dockerfiles and convenience scripts for running
fast_alignto generate lexical tables. These tables can be used to create top-K lexicons for faster decoding via vocabulary selection (documentation).
Changed
- Updated default top-K lexicon size from 20 to 200.
- Python
Published by tdomhan almost 8 years ago
https://github.com/awslabs/sockeye - 1.18.23
Fixed
- Correctly create the convolutional embedding layers when the encoder is set to
transformer-with-conv-embed. Previously no convolutional layers were added so that a standard Transformer model was trained instead.
- Python
Published by tdomhan almost 8 years ago
https://github.com/awslabs/sockeye - 1.18.22
Fixed
- Make sure the default bucket is large enough with word based batching when the source is longer than the target (Previously there was an edge case where the memory usage was sub-optimal with word based batching and longer source than target sentences).
- Python
Published by tdomhan almost 8 years ago
https://github.com/awslabs/sockeye - 1.18.21
[1.18.21]
Fixed
- Constrained decoding was missed a crucial cast
- Fixed test cases that should have caught this
- Python
Published by fhieber almost 8 years ago
https://github.com/awslabs/sockeye - 1.18.20
[1.18.20]
Changed
- Transformer parametrization flags (model size, # of attention heads, feed-forward layer size) can now optionally
defined separately for encoder & decoder. For example, to use a different transformer model size for the encoder,
pass
--transformer-model-size 1024:512.
[1.18.19]
Added
- LHUC is now supported in transformer models
[1.18.18]
Added
- [Experimental] Introducing the image captioning module. Type of models supported: ConvNet encoder - Sockeye NMT decoders. This includes also a feature extraction script, an image-text iterator that loads features, training and inference pipelines and a visualization script that loads images and captions. See this tutorial for its usage. This module is experimental therefore its maintenance is not fully guaranteed.
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.18.17
[1.18.17]
Changed
- Updated to MXNet 1.2
- Use of the new LayerNormalization operator to save GPU memory.
[1.18.16]
Fixed
- Removed summation of gradient arrays when logging gradients. This clogged the memory on the primary GPU device over time when many checkpoints were done. Gradient histograms are now logged to Tensorboard separated by device.
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.18.15
[1.18.15]
Added
- Added decoding with target-side lexical constraints (documentation in
tutorials/constraints).
[1.18.14]
Added
- Introduced Sockeye Autopilot for single-command end-to-end system building.
See the Autopilot documentation and run with:
sockeye-autopilot. Autopilot is acontribmodule with its own tests that are run periodically. It is not included in the comprehensive tests run for every commit.
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.18.13
[1.18.13]
Fixed
- Fixed two bugs with training resumption:
- removed overly strict assertion in the data iterator for model states before the first checkpoint.
- removed deletion of Tensorboard log directory.
Added
- Added support for config files. Command line parameters have precedence over the values read from the config file.
Minimal working example:
python -m sockeye.train --config config.yamlwith contents ofconfig.yamlas follows:yaml source: source.txt target: target.txt output: out validation_source: valid.source.txt validation_target: valid.target.txt### Changed The full set of arguments is serialized toout/args.yamlat the beginning of training (before json was used).
[1.18.12]
Changed
- All source side sequences now get appended an additional end-of-sentence (EOS) symbol. This change is backwards compatible meaning that inference with older models will still work without the EOS symbol.
[1.18.11]
Changed
- Default training parameters have been changed to reflect the setup used in our arXiv paper. Specifically, the default
is now to train a 6 layer Transformer model with word based batching. The only difference to the paper is that weight
tying is still turned off by default, as there may be use cases in which tying the source and target vocabularies is
not appropriate. Turn it on using
--weight-tying --weight-tying-type=src_trg_softmax. Additionally, BLEU scores from a checkpoint decoder are now monitored by default.
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.18.10
[1.18.10]
Fixed
- Re-allow early stopping w.r.t BLEU
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.18.9
[1.18.9]
Fixed
- Fixed a problem with lhuc boolean flags passed as None.
Added
- Reorganized beam search. Normalization is applied only to completed hypotheses, and pruning of
hypotheses (logprob against highest-scoring completed hypothesis) can be specified with
--beam-prune X - Enabled stopping at first completed hypothesis with
--beam-search-stop first(default is 'all')
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.18.8
[1.18.8]
Removed
- Removed tensorboard logging of embedding & output parameters at every checkpoint. This used a lot of disk space.
[1.18.7]
Added
- Added support for LHUC in RNN models (David Vilar, "Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models" NAACL 2018)
Fixed
- Word based batching with very small batch sizes.
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.18.6
[1.18.6]
Fixed
- Fixed a problem with learning rate scheduler not properly being loaded when resuming training.
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.18.5
[1.18.5]
Fixed
- Fixed a problem with trainer not waiting for the last checkpoint decoder (#367).
[1.18.4]
Added
- Added options to control training length w.r.t number of updates/batches or number of samples:
--min-updates,--max-updates,--min-samples,--max-samples.
[1.18.3]
Changed
- Training now supports training and validation data that contains empty segments. If a segment is empty, it is skipped during loading and a warning message including the number of empty segments is printed.
[1.18.2]
Changed
- Removed combined linear projection of keys & values in source attention transformer layers for performance improvements.
- The topk operator is performed in a single operation during batch decoding instead of running in a loop over each sentence, bringing speed benefits in batch decoding.
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.18.1
[1.18.1]
Added
- Added Tensorboard logging for all parameter values and gradients as histograms/distributions. The logged values correspond to the current batch at checkpoint time.
Changed
- Tensorboard logging now is done with the MXNet compatible 'mxboard' that supports logging of all kinds of events (scalars, histograms, embeddings, etc.). If installed, training events are written out to Tensorboard compatible even files automatically.
Removed
- Removed the
--use-tensorboardargument fromsockeye.train. Tensorboard logging is now enabled by default ifmxboardis installed.
[1.18.0]
Changed
- Change default target vocab name in model folder to
vocab.trg.0.json - Changed serialization format of top-k lexica to pickle/Numpy instead of JSON.
sockeye-lexiconnow supports two subcommands: create & inspect. The former provides the same functionality as the previous CLI. The latter allows users to pass source words to the top-k lexicon to inspect the set of allowed target words.
Added
- Added ability to choose a smaller
kat decoding runtime for lexicon restriction.
[1.17.5]
Added
- Added a flag
--strip-unknown-wordstosockeye.translateto remove any<unk>symbols from the output strings.
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - [1.17.4]
[1.17.4]
Added
- Added a flag
--fixed-param-namesto prevent certain parameters from being optimized during training. This is useful if you want to keep pre-trained embeddings fixed during training. - Added a flag
--dry-runtosockeye.trainto not perform any actual training, but print statistics about the model and mode of operation.
[1.17.3]
Changed
sockeye.evaluatecan now handle multiple hypotheses files by simply specifying--hypotheses file1 file2.... For each metric the mean and standard deviation will be reported across files.
[1.17.2]
Added
- Optionally store the beam search history to a
jsonoutput using thebeam_storeoutput handler.
Changed
- Use stack operator instead of expand_dims + concat in RNN decoder. Reduces memory usage.
[1.17.1]
Changed
- Updated to MXNet 1.1.0
- Python
Published by fhieber about 8 years ago
https://github.com/awslabs/sockeye - 1.17.0
[1.17.0]
Added
- Source factors, as described in
Linguistic Input Features Improve Neural Machine Translation (Sennrich & Haddow, WMT 2016) PDF bibtex
Additional source factors are enabled by passing --source-factors file1 [file2 ...] (-sf), where file1, etc. are
token-parallel to the source (-s).
An analogous parameter, --validation-source-factors, is used to pass factors for validation data.
The flag --source-factors-num-embed D1 [D2 ...] denotes the embedding dimensions and is required if source factor
files are given. Factor embeddings are concatenated to the source embeddings dimension (--num-embed).
At test time, the input sentence and its factors can be passed in via STDIN or command-line arguments.
- For STDIN, the input and factors should be in a token-based factored format, e.g.,
word1|factor1|factor2|... w2|f1|f2|... ...1.
- You can also use file arguments, which mirrors training: --input takes the path to a file containing the source,
and --input-factors a list of files containing token-parallel factors.
At test time, an exception is raised if the number of expected factors does not
match the factors passed along with the input.
- Removed bias parameters from multi-head attention layers of the transformer.
[1.16.6]
Changed
- Loading/Saving auxiliary parameters of the models. Before aux parameters were not saved or used for initialization. Therefore the parameters of certain layers were ignored (e.g., BatchNorm) and randomly initialized. This change enables to properly load, save and initialize the layers which use auxiliary parameters.
[1.16.5]
Changed
- Device locking: Only one process will be acquiring GPUs at a time. This will lead to consecutive device ids whenever possible.
[1.16.4]
Changed
- Internal change: Standardized all data to be batch-major both at training and at inference time.
[1.16.3]
Changed
- When a device lock file exists and the process has no write permissions for the lock file we assume that the device is locked. Previously this lead to an permission denied exception. Please note that in this scenario we an not detect if the original Sockeye process did not shut down gracefully. This is not an issue when the sockeye process has write permissions on existing lock files as in that case locking is based on file system locks, which cease to exist when a process exits.
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - 1.15.8
[1.15.8]
Fixed
- Taking the BOS and EOS tag into account when calculating the maximum input length at inference.
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - [1.15.7]
[1.15.7]
Fixed
- fixed a problem with
--num-samples-per-shardflag not being parsed as int.
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - 1.15.6
[1.15.6]
Added
- New CLI
sockeye.prepare_datafor preprocessing the training data only once before training, potentially splitting large datasets into shards. At training time only one shard is loaded into memory at a time, limiting the maximum memory usage.
Changed
- Instead of using the
--sourceand--targetargumentssockeye.trainnow accepts a--prepared-dataargument pointing to the folder containing the preprocessed and sharded data. Using the raw training data is still possible and now consumes less memory.
[1.15.5]
Added
- Optionally apply query, key and value projections to the source and target hidden vectors in the CNN model
before applying the attention mechanism. CLI parameter:
--cnn-project-qkv.
[1.15.4]
Added
- A warning will be printed if the checkpoint decoder slows down training.
[1.15.3]
Added
- Exposing the xavier random number generator through
--weight-init-xavier-rand-type.
[1.15.2]
Added
- Exposing MXNet's Nesterov Accelerated Gradient, Adadelta and Adadelta optimizers.
[1.15.1]
Added
- A tool that initializes embedding weights with pretrained word representations,
sockeye.init_embedding.
[1.15.0]
Added
- Added support for Swish-1 (SiLU) activation to transformer models
(Ramachandran et al. 2017: Searching for Activation Functions,
Elfwing et al. 2017: Sigmoid-Weighted Linear Units for Neural Network Function Approximation
in Reinforcement Learning). Use
--transformer-activation-type swish1. - Added support for GELU activation to transformer models (Hendrycks and Gimpel 2016: Bridging Nonlinearities and
Stochastic Regularizers with Gaussian Error Linear Units.
Use
--transformer-activation-type gelu.
[1.14.3]
Changed
- Fast decoding for transformer models. Caches keys and values of self-attention before softmax.
Changed decoding flag
--bucket-widthto apply only to source length.
[1.14.2]
Added
- Gradient norm clipping (
--gradient-clipping-type) and monitoring. ### Changed - Changed
--clip-gradientto--gradient-clipping-thresholdfor consistency.
[1.14.1]
Changed
- Sorting sentences during decoding before splitting them into batches.
- Default chunk size: The default chunk size when batching is enabled is now batch_size * 500 during decoding to avoid users accidentally forgetting to increase the chunk size.
[1.14.0]
Changed
- Downscaled fixed positional embeddings for CNN models.
- Renamed
--monitor-bleuflag to--decode-and-evaluateto illustrate that it computes other metrics in addition to BLEU.
Added
--decode-and-evaluate-use-cpuflag to use CPU for decoding validation data.--decode-and-evaluate-device-idflag to use a separate GPU device for validation decoding. If not specified, the existing and still default behavior is to use the last acquired GPU for training.
[1.13.2]
Added
- A tool that extracts specified parameters from params.x into a .npz file for downstream applications or analysis.
[1.13.1]
Added
- Added chrF metric
(Popovic 2015: chrF: character n-gram F-score for automatic MT evaluation) to Sockeye.
sockeye.evaluate now accepts
bleuandchrfas values for--metrics
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - 1.13.0
[1.13.0]
Fixed
- Transformer models do not ignore
--num-embedanymore as they did silently before. As a result there is an error thrown if--num-embed!=--transformer-model-size. - Fixed the attention in upper layers (
--rnn-attention-in-upper-layers), which was previously not passed correctly to the decoder. ### Removed - Removed RNN parameter (un-)packing and support for FusedRNNCells (removed
--use-fused-rnnsflag). These were not used, not correctly initialized, and performed worse than regular RNN cells. Moreover, they made the code much more complex. RNN models trained with previous versions are no longer compatible.- Removed the lexical biasing functionality (Arthur ETAL'16) (removed arguments
--lexical-biasand--learn-lexical-bias).
- Removed the lexical biasing functionality (Arthur ETAL'16) (removed arguments
[1.12.2]
Changed
- Updated to MXNet 0.12.1, which includes an important bug fix for CPU decoding.
[1.12.1]
Changed
- Removed dependency on sacrebleu pip package. Now imports directly from
contrib/.
[1.12.0]
Changed
- Transformers now always use the linear output transformation after combining attention heads, even if input & output depth do not differ.
[1.11.2]
Fixed
- Fixed a bug where vocabulary slice padding was defaulting to CPU context. This was affecting decoding on GPUs with very small vocabularies.
[1.11.1]
Fixed
- Fixed an issue with the use of
ignoreinCrossEntropyMetric::cross_entropy_smoothed. This was affecting runs with Eve optimizer and label smoothing. Thanks @kobenaxie for reporting.
[1.11.0]
Added
- Lexicon-based target vocabulary restriction for faster decoding. New CLI for top-k lexicon creation, sockeye.lexicon.
New translate CLI argument
--restrict-lexicon.
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - 1.10.5
[1.10.5]
Fixed
- Fixed yet another bug with the data iterator.
[1.10.4]
Fixed
- Fixed a bug with the revised data iterator not correctly appending EOS symbols for variable-length batches. This reverts part of the commit added in 1.10.1 but is now correct again.
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - 1.10.3
[1.10.3]
Changed
- Fixed a bug with maxobserved{source,target}len being computed on the complete data set, not only on the sentences actually added to the buckets based on `--maxseq_len`.
[1.10.2]
Added
--max-num-epochsflag to train for a maximum number of passes through the training data.
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - Update to MXNet 0.12.0
[1.10.1]
Changed
- Reduced memory footprint when creating data iterators: integer sequences are streamed from disk when being assigned to buckets.
[1.10.0]
Changed
- Updated MXNet dependency to 0.12 (w/ MKL support by default).
- Changed
--smoothed-cross-entropy-alphato--label-smoothing. Label smoothing should now require significantly less memory due to its addition to MXNet'sSoftmaxOutputoperator. --weight-normalizationnow applies not only to convolutional weight matrices, but to output layers of all decoders. It is also independent of weight tying.- Transformers now use
--embed-dropout. Before they were using--transformer-dropout-prepostfor this. - Transformers now scale their embedding vectors before adding fixed positional embeddings. This turns out to be crucial for effective learning.
.paramfiles now use 5 digit identifiers to reduce risk of overflowing with many checkpoints.
Added
- Added CUDA 9.0 requirements file.
--loss-normalization-type. Added a new flag to control loss normalization. New default is to normalize by the number of valid, non-PAD tokens instead of the batch size.--weight-init-xavier-factor-type. Added new flag to control Xavier factor type when--weight-init=xavier.--embed-weight-init. Added new flag for initialization of embeddings matrices.
Removed
--smoothed-cross-entropy-alphaargument. See above.--normalize-lossargument. See above.
[1.9.0]
Added
- Batch decoding. New options for the translate CLI:
--batch-sizeand--chunk-size. Translator.translate()
now accepts and returns lists of inputs and outputs.
[1.8.4]
Added
- Exposing the MXNet KVStore through the
--kvstoreargument, potentially enabling distributed training.
[1.8.3]
Added
- Optional smart rollback of parameters and optimizer states after updating the learning rate
if not improved for x checkpoints. New flags:
--learning-rate-decay-param-reset,--learning-rate-decay-optimizer-states-reset
[1.8.2]
Fixed
- The RNN variational dropout mask is now independent of the input (previously any zero initial state led to the first state being canceled).
- Correctly pass
self.dropout_inputsfloat tomx.sym.DropoutinVariationalDropoutCell.
[1.8.1]
Changed
- Instead of truncating sentences exceeding the maximum input length they are now translated in chunks.
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - Transformer models
- Added transformer models (Vaswasni et al, 2017) to Sockeye
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - Updated Word batching
Word batching update: guarantee default bucket has largest batch size.
Comments/logic for clarity.
Address PR comments.
Memory usage note.
NamedTuple for bucket batch sizes.
- Python
Published by fhieber over 8 years ago
https://github.com/awslabs/sockeye - Conv2seq models
Added
- Convolutional decoder.
- Weight normalization (for CNN only so far).
- Learned positional embeddings for the transformer.
Changed
--attention-*CLI params renamed to--rnn-attention-*.--transformer-no-positional-encodingsgeneralized to--transformer-positional-embedding-type.
- Python
Published by fhieber over 8 years ago