Recent Releases of https://github.com/awslabs/sagemaker-debugger

https://github.com/awslabs/sagemaker-debugger - Pandas 2.0 support

  • Updates Pandas syntax to support Pandas 2.0

- Python
Published by MZSHAN about 3 years ago

https://github.com/awslabs/sagemaker-debugger - Minor importing issue fix

- Python
Published by yl-to about 3 years ago

https://github.com/awslabs/sagemaker-debugger - Removed time string in version tag

  • Help unblock xgboost container release

- Python
Published by yl-to over 3 years ago

https://github.com/awslabs/sagemaker-debugger - Fix XGBoost hook for xgboost version <= 1.7

  • Create new XGBoost hook for its new callback since xgb v1.3
  • Adapt xgboost category column since xgb v1.5
  • Adapt xgboost rabit package deprecation since xgb v1.7

- Python
Published by yl-to over 3 years ago

https://github.com/awslabs/sagemaker-debugger - Support tensorflow 2.9 re-release

  • fixed embedding layer gradient collection feature

- Python
Published by yl-to over 3 years ago

https://github.com/awslabs/sagemaker-debugger - Support tensorflow 2.11 release

Changes in both debugger and profiler sides to support tensorflow 2.11 changes Debugger: - Removed a ZCC test - Replacing optimizers tensorflow2 tests with legacy ones according to breaking changes in tensorflow 2.11 optimizers

Profiler: - Skipping some tests according to breaking changes in tensorflow 2.11 trace.json file generation

Other: - Replacing deprecated data types in numpy

- Python
Published by yl-to over 3 years ago

https://github.com/awslabs/sagemaker-debugger - Disable ZCC tests for PT 1.10 and above

- Python
Published by johnbensnyder over 3 years ago

https://github.com/awslabs/sagemaker-debugger - Disable ZCC tests for PT 1.12 and above

- Python
Published by johnbensnyder almost 4 years ago

https://github.com/awslabs/sagemaker-debugger - Bump version and support PT 1.12

- Python
Published by mariumof almost 4 years ago

https://github.com/awslabs/sagemaker-debugger - Version 1.0.16

  • Fixes bug in python profiler test

- Python
Published by MZSHAN about 4 years ago

https://github.com/awslabs/sagemaker-debugger - Compatible with PyTorch 1.11.0

- Python
Published by MZSHAN about 4 years ago

https://github.com/awslabs/sagemaker-debugger -

Incorporate changes to the tests suite.

- Python
Published by mariumof about 4 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.11

  • cache output of is_framework_version_supported to improve performance (#508)

- Python
Published by NihalHarish almost 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.10

Enabling the support for Pytorch 1.9 (#501) Add error handling for MXNet (#499) Smdebug engage disengage Phase-I (#488) Add error handling for TF1 (#498) Add error handling for XGBoost (#496) Add safety checks for TF2 error handling (#497)

- Python
Published by ndodda-amazon almost 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.9

Implement error handling for TF2 (#483) Implement error handling for PyTorch (#484) Fix bug in TF2 error handling (#489)

- Python
Published by ndodda-amazon about 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.8

bugfix for timelinewriter #460

- Python
Published by ndodda-amazon about 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.7

  • Autoconfigure losses collection for xgboost #462
  • Reduce log level in pytorch hook #465

- Python
Published by NihalHarish about 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.6

Adding support for pytorch 1.8 (#459) Revert accidental commits to the master repo. (#458) Pre commit build (#457) [bugfix] Do not wrap models when the hook has a default_config (#456)

- Python
Published by connorgoggins about 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.5

SMDDP should use size() and rank() for TF jobs (#451)

- Python
Published by ndodda-amazon over 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.4

- Python
Published by NihalHarish over 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.3

- Python
Published by NihalHarish over 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.2

  • Updated tests to support TF 2.4

- Python
Published by NihalHarish over 5 years ago

https://github.com/awslabs/sagemaker-debugger - Release v1.0.1

Supporting sagemaker debugger functionality for the pytorch 1.7.1

- Python
Published by leleamol over 5 years ago

https://github.com/awslabs/sagemaker-debugger - 1.0.0

This version introduces profiling functionality to the sagemaker-debugger

- Python
Published by leleamol over 5 years ago

https://github.com/awslabs/sagemaker-debugger - v0.9.5

Bug fixes:

  • Returning list instead of dict keys (#376).
  • Add support for mixed precision training (#378)
  • Bugfix: Debugger breaks if shouldsavetensor is called before collections are prepared (#372)

- Python
Published by leleamol over 5 years ago

https://github.com/awslabs/sagemaker-debugger - v0.9.4

Bug Fixes:

  • Pass all arguments to the underlying layer in input output wrapper #366
  • Add support for the addformode api in graph mode #353

- Python
Published by NihalHarish over 5 years ago

https://github.com/awslabs/sagemaker-debugger - v0.9.3

  • Extends full support for TF 2.3.0. Users will now be able to save biases, weights, gradients, optimizer variables, labels, predictions and model inputs
  • Address an issue with the model.save API: #333
  • New functions to determine default hook config in AWS TF, PT and MXNet

- Python
Published by NihalHarish over 5 years ago

https://github.com/awslabs/sagemaker-debugger - v0.9.1

  • Extends full support for TF 2.2.0.
    • Users will now be able to save biases, weights, gradients, optimizer variables, labels, predictions and model inputs
  • Introduces the hook.save_tensor api, a generic cross framework api that allows users to save any tensor to a collection during runtime
  • Extends tensor logging support for the Keras Estimator API

- Python
Published by NihalHarish almost 6 years ago

https://github.com/awslabs/sagemaker-debugger - v0.8.1

This release includes the following bug fixes:

  • Correct the number of tensors saved with TF MirroredStrategy running when on GPUs ( #257 )
  • Enable the ability to save scalars with MirroredStrategy on TF 2.x ( #259 )
  • Prevent collection files from being generated when smdebug is not supported by the training script ( #263 )

- Python
Published by NihalHarish about 6 years ago

https://github.com/awslabs/sagemaker-debugger - v0.8.0

Includes support for saving optimizer variable for TF 2.x and bug fixes -

1)Support saving optimizer variables and gradtape for TF 2.x 2)Support saving optimizer variable with Keras fit api eager mode for TF 2.x 3)Fix for metadata.json file being written again and again 4) Handle exception graciously in mxnet 5) Fix for name clash when operator is called multiple times during forward pass

- Python
Published by Vikas-kum about 6 years ago

https://github.com/awslabs/sagemaker-debugger - v0.7.2

  • Experimental support for TF 2.x GradientTape - Introducing experimental support for TF 2.x training scripts using GradientTape. With this change, weights, bias, loss, metrics, and gradients are captured by SageMaker Debugger. These changes work with vanilla version of Tensorflow 2.x (not with the zero-code change version) https://github.com/awslabs/sagemaker-debugger/pull/186

    Note: Training scripts using GradientTape for higher-order gradients or multiple tapes are not supported. Distributed training scripts that use GradientTape, are not supported at this time.

  • Support SyncOnReadVariable in mirrored strategy - Fixes a bug that occurred because SyncOnRead distributed variable was not supported with smdebug. Also enables the use of smdebug with training scripts using TF 2.x MirroredStrategy with fit() API. https://github.com/awslabs/sagemaker-debugger/pull/190

  • Turn off hook and write only from one worker for unsupported distributed training techniques – PyTorch users were observing a crash when distributed training was implemented using generic multiprocessing library, which is not a method supported by smdebug. This fix handles this case and ensures that tensors are saved. https://github.com/awslabs/sagemaker-debugger/pull/167

  • Bug fix: Pytorch: Register only if tensors require gradients – Users were observing a crash when training with pretrained embeddings which does not need gradient updates. This fix checks if a gradient update is required and registers a backward hook only in those cases. https://github.com/awslabs/sagemaker-debugger/pull/193

- Python
Published by NihalHarish about 6 years ago

https://github.com/awslabs/sagemaker-debugger - v0.7.1

  • Fix a test case in ZCC scenario where the training script written in eager mode crashes when gradients and optimizer variables are saved. #178

- Python
Published by ddavydenko about 6 years ago

https://github.com/awslabs/sagemaker-debugger - v0.7.0

This release includes the following changes -

  • Introducing experimental support for TF 2.x keras.fit() eager and non-eager mode. With this change, losses, metrics, weights and biases can be saved in TF 2.x eager mode (At present, gradients/inputs/outputs cannot be saved when TF 2.x eager mode is used) (https://github.com/awslabs/sagemaker-debugger/pull/150)
  • Raise Error For Invalid Collection Config - An exception is raised if a collection is incorrectly configured. Incorrect configuration includes collection created but no tensors/regex specified, spelling mistakes while specifying the collection name in the hook (https://github.com/awslabs/sagemaker-debugger/pull/162)
  • Fix a crash that occurred with the use of PyTorch’s DataParallel API (this API enables user’s to run training on multiple GPUs on PyTorch) (https://github.com/awslabs/sagemaker-debugger/pull/165)
  • Bug fix to allow users to read from data sources that contain more than 1000 files (https://github.com/awslabs/sagemaker-debugger/pull/168)
  • Update the save_scalar() method to accept and store the timestamp at along with the scalar value (https://github.com/awslabs/sagemaker-debugger/pull/170)

- Python
Published by anirudhacharya about 6 years ago

https://github.com/awslabs/sagemaker-debugger - v0.6.0

This release includes the following significant changes - - Fix scalar write to event file & TF Tb write fix (#145) - Fix bug in tensornames() call (#159) - Fixes: SMSimulator fix, listing files local should ignore tmp files (#137) - Fix for bug in haspassedstep (#136) - Bug fix in trial.py haspassed_step (#140) - S3 upload fast (#122) - Skip logging the input tensors to the loss block (#86) - CI/CD and test suite changes.

- Python
Published by anirudhacharya over 6 years ago

https://github.com/awslabs/sagemaker-debugger - Updates January'2020 DLC release

  • Simplified dependencies, removed aioboto3 and unpinned boto3 version
  • Fixed Horovod training support
  • Fixed numeration of steps in saved data
  • Removed saving of input_loss as irrelevant
  • Changed default save_interval from 100 steps to 500 steps
  • Improved examples and documentation

- Python
Published by ddavydenko over 6 years ago

https://github.com/awslabs/sagemaker-debugger - Updated smdebug's dependency on boto libraries

This release is primarily to change the dependency of smdebug on boto3 and botocore libraries. smdebug now depends on the versions of these libraries released for re:Invent 2019. boto3>=1.10.32 and botocore>=1.14.32

Apart from this, there are some doc changes, and indentation change for the sagemaker metrics bug fix from the previous release.

- Python
Published by rahul003 over 6 years ago

https://github.com/awslabs/sagemaker-debugger - First release on PyPI

  • Fixed bug in export of SageMaker metrics
  • Updated licenses and third-party licenses

- Python
Published by ddavydenko over 6 years ago

https://github.com/awslabs/sagemaker-debugger - Released to DLC on 11/27

Renamed trial.tensors() to tensornames() Renamed Hook.hookfromconfig() to Hook.createfromjsonfile() Changed collection file load logic to also check for training end Change searchable flag in savescalar to smmetric Disallow regex and collection to be passed simultaneously for tensor_names() method

- Python
Published by rahul003 over 6 years ago