Recent Releases of triton-model-navigator

triton-model-navigator - Triton Model Navigator v0.14.0

  • Updates:
    • new: TensorRT INT8 and FP8 quantization through ModelOpt (ONNX path)
    • new: TensorRT NVFP4 quantization through ModelOpt (Torch path)
    • new: Improved TorchCompile performance for repeated compilations using TORCHINDUCTORCACHEDIR environment variable
    • new: Global context with scoped variables - temporary context variables
    • new: Added new context variables INPLACE_OPTIMIZE_WORKSPACE_CONTEXT_KEY and INPLACE_OPTIMIZE_MODULE_GRAPH_ID_CONTEXT_KEY
    • new: nav.bundle.save now has include and exclude patterns for fine grained files selection
    • new: GPU and Host memory usage logging
    • change: Install the TensorRT package for architectures other than x86_64
    • change: Disable conversion fallback for TensorRT paths and expose control option in custom config
    • change: Use torch.export.save for Torch-TRT model serialization
    • change: Added export_engine to OnnxConfig for improved export control
    • fix: Correctness command relative tolerance formula
    • fix: Memory management during export and conversion process for Torch

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by kacper-kleczewski about 1 year ago

triton-model-navigator - Triton Model Navigator v0.13.1

  • Updates:
    • fix: Add AutocastType to public API

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by kacper-kleczewski over 1 year ago

triton-model-navigator - Triton Model Navigator v0.13.0

  • Updates:
    • new: Introducing custom_args in TensorConfig for custom runners to use which allows dynamic shapes setup for TorchTensorRT compilation
    • new: autocast_dtype added Torch runner configuration to set the dtype for autocast
    • new: New version of Onnx Runtime 1.20 for python version >= 3.10
    • new: Use torch.compile path in heuristic search for max batch size
    • change: Removed TensorFlow dependencies for nav.jax.optimize
    • change: Removed PyTorch dependencies from nav.profile
    • change: Collect all Python packages in status instead of filtered list
    • change: Use default throughput cutoff threshold for max batch size heuristic when None provided in configuration
    • change: Updated default ONNX opset to 20 for Torch >= 2.5
    • fix: Exception is raised with Python >=3.11 due to wrong dataclass initialization
    • fix: Removed option from ExportOption removed from Torch 2.5
    • fix: Improved preprocessing stage in Torch based runners
    • fix: Warn when using autocast with bfloat16 in Torch
    • fix: Pass runner configuration to runners in nav.profile

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by knowicki-nvidia over 1 year ago

triton-model-navigator - Triton Model Navigator v0.12.0

  • Updates:

    • new: simple and detailed reporting of the optimization process
    • new: adjusted exporting TensorFlow SavedModel for Keras 3.x
    • new: inform user when wrapped a module which is not called during optimize
    • new: inform user when module use a custom forward function
    • new: support for dynamic shapes in Torch ExportedProgram
    • new: use ExportedProgram for Torch-TensorRT conversion
    • new: support back-off policy during profiling to avoid reporting local minimum
    • new: automatically scale conversion batch size when modules have different batch sizes in scope of a single pipeline
    • change: TensorRT conversion max batch size search rely on saturating throughput for base formats
    • change: adjusted profiling configuration for throughput cutoff search
    • change: include optimized pipeline to list of examined variants during nav.profile
    • change: performance is not executed when correctness failed for format and runtime
    • change: verify command is not executed when verify function is not provided
    • change: do not create a model copy before executing torch.compile
    • fix: pipelines sometimes obtain model and tensors on different devices during nav.profile
    • fix: extract graph from ExportedProgram for running inference
    • fix: runner configuration not propagated to pre-processing steps
  • Version of external components used during testing:

- Python
Published by kacper-kleczewski over 1 year ago

triton-model-navigator - Triton Model Navigator v0.11.0

  • Updates:

    • new: Python 3.12 support
    • new: Improved logging
    • new: optimized in-place module can be stored to Triton model repository
    • new: multi-profile support for TensorRT model build and runtime
    • new: measure duration of each command executed in optimization pipeline
    • new: TensorRT-LLM model store generation for deployment on Triton Inference Server
    • change: filter unsupported runners instead of raising an error when running optimize
    • change: moved JAX to support to experimental module and limited support
    • change: use autocast=True for Torch based runners
    • change: use torch.inferencemode or torch.nograd context in nav.profile measurements
    • change: use multiple strategies to select optimized runtime, defaults to [MaxThroughputAndMinLatencyStrategy, MinLatencyStrategy]
    • change: trt_profiles are not set automatically for module when using nav.optimize
    • fix: properly revert log level after torch onnx dynamo export
  • Version of external components used during testing:

- Python
Published by kacper-kleczewski almost 2 years ago

triton-model-navigator - Triton Model Navigator v0.10.1

- Python
Published by kacper-kleczewski almost 2 years ago

triton-model-navigator - Triton Model Navigator v0.10.0

  • Updates:

    • new: inplace nav.Module accepts batching flag which overrides a config setting and precision which allows setting appropriate configuration for TensorRT
    • new: Allow to set device when loading optimized modules using nav.load_optimized()
    • new: Add support for custom i/o names and dynamic shapes in Torch ONNX Dynamo path
    • new: Added nav.bundle.save and nav.bundle.load to save and load optimized models from cache
    • change: Improved optimize and profile status in inplace mode
    • change: Improved handling defaults for ONNX Dynamo when executing nav.package.optimize
    • fix: Maintaining modules device in nav.profile()
    • fix: Add support for all precisions for TensorRT in nav.profile()
    • fix: Forward method not passed to other inplace modules.
  • Version of external components used during testing:

- Python
Published by piotr-bazan-nv almost 2 years ago

triton-model-navigator - Triton Model Navigator v0.9.0

  • Updates:

    • new: TensorRT Timing Tactics Cache Management - using timing tactics cache files for optimization performance improvements
    • new: Added throughput saturation verification in nav.profile() (enabled by default)
    • new: Allow to override Inplace cache dir through MODEL_NAVIGATOR_DEFAULT_CACHE_DIR env variable
    • new: inplace nav.Module can now receive a function name to be used instead of call in modules/submodules, allows customizing modules with non-standard calls
    • fix: torch dynamo export and torch dynamo onnx export
    • fix: measurement stabilization in nav.profile()
    • fix: inplace inference through Torch
    • fix: trt_profiles argument handling in ONNX to TRT conversion
    • fix: optimal shape configuration for batch size in Inplace API
    • change: Disable TensorRT profile builder
    • change: nav.optimize() does not override module configuration
  • Known issues and limitations

    • DistillERT ONNX dynamo export does not support dynamic shapes
  • Version of external components used during testing:

- Python
Published by kacper-kleczewski about 2 years ago

triton-model-navigator - Triton Model Navigator v0.8.1

  • fix: Inference with TensorRT when model has input with empty shape
  • fix: Using stabilized runners when model has no batching
  • fix: Invalid dependencies for cuDNN - review known issues
  • fix: Make ONNX Graph Surgeon produce artifacts within protobuf Limit (2G)
  • change: Remove TensorRTCUDAGraph from default runners
  • change: updated ONNX package to 1.16.0

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by jkosek about 2 years ago

triton-model-navigator - Triton Model Navigator v0.8.0

Updates: - new: Allow to select device for TensorRT runner - new: Add device output buffers to TensorRT runner - new: nav.profile added for profiling any Python function - change: API for Inplace optimization (breaking change) - fix: Passing inputs for Torch to ONNX export - fix: Parse args to kwargs in torchscript-trace export - fix: Lower peak memory usage when loading Torch inplace optimized model

- Python
Published by kacper-kleczewski about 2 years ago

triton-model-navigator - Triton Model Navigator v0.7.7

Updates: - change: Add input and output specs for Triton model repositories generated from packages

Version of external components used during testing: - PyTorch 2.2.0a0+81ea7a48 - TensorFlow 2.14.0 - TensorRT 8.6.1 - ONNX Runtime 1.16.2 - Polygraphy: 0.49.0 - GraphSurgeon: 0.3.27 - tf2onnx v1.16.1 - Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

- Python
Published by ptarasiewiczNV over 2 years ago

triton-model-navigator - Triton Model Navigator v0.7.6

Updates: - fix: Passing inputs for Torch to ONNX export - fix: Passing input data to OnnxCUDA runner

Version of external components used during testing: - PyTorch 2.2.0a0+81ea7a48 - TensorFlow 2.14.0 - TensorRT 8.6.1 - ONNX Runtime 1.16.2 - Polygraphy: 0.49.0 - GraphSurgeon: 0.3.27 - tf2onnx v1.16.1 - Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

- Python
Published by kacper-kleczewski over 2 years ago

triton-model-navigator - Triton Model Navigator v0.7.5

Updates: - new: FP8 precision support for TensorRT - new: Support for autocast and inference mode configuration for Torch runners - new: Allow to select device for Torch and ONNX runners - new: Add support for default_model_filename in Triton model configuration - new: Detailed profiling of inference steps (pre- and postprocessing, memcpy and compute) - fix: JAX export and TensorRT conversion fails when custom workspace is used - fix: Missing max workspace size passed to TensorRT conversion - fix: Execution of TensorRT optimize raise error during handling output metadata - fix: Limited Polygraphy version to work correctly with onnxruntime-gpu package

Version of external components used during testing: - PyTorch 2.2.0a0+6a974be - TensorFlow 2.13.0 - TensorRT 8.6.1 - ONNX Runtime 1.16.2 - Polygraphy: 0.49.0 - GraphSurgeon: 0.3.27 - tf2onnx v1.15.1 - Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

- Python
Published by jkosek over 2 years ago

triton-model-navigator - Triton Model Navigator v0.7.4

Updates: - new: decoupled mode configuration in Triton Model Config - new: support for PyTorch ExportedProgram and ONNX dynamo export - new: added GraphSurgeon ONNX optimalization - fix: compatibility of generating PyTriton model config through adapter - fix: installation of packages that are platform dependent - fix: update package config with model loaded from source - change: in TensorRT runner, when TensorType.TORCH is the return type lazily convert tensor to Torch - change: move from Polygraphy CLI to Polygraphy Python API - change: removed Windows from support list

Version of external components used during testing: - PyTorch 2.1.0a0+ 32f93b1 - TensorFlow 2.13.0 - TensorRT 8.6.1 - ONNX Runtime 1.16.0 - Polygraphy: 0.47.1 - GraphSurgeon: 0.3.27 - tf2onnx v1.15.1 - Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

- Python
Published by kacper-kleczewski over 2 years ago

triton-model-navigator - Triton Model Navigator v0.7.3

Updates: - new: Data dependent dynamic control flow support in nav.Module (multiple computation graphs per module) - new: Added find max batch size utility - new: Added utilities API documentation - new: Add Timer class for measuring execution time of models and Inplace modules. - fix: Use wide range of shapes for TensorRT conversion - fix: Sorting of samples loaded from workspace - change: in Inplace, store one sample by default per module and store shape info for all samples - change: always execute export for all supported formats

Known issues and limitations: - nav.Module moves original torch.nn.Module to the CPU, in case of weight sharing that might result in unexpected behaviour - For data dependent dynamic control flow (multiple computation graphs) nav.Module might copy the weights for each separate graph

Version of external components used during testing: - PyTorch 2.1.0a0+29c30b1 - TensorFlow 2.13.0 - TensorRT 8.6.1 - ONNX Runtime 1.15.1 - Polygraphy: 0.47.1 - GraphSurgeon: 0.3.27 - tf2onnx v1.15.1 - Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

- Python
Published by jkosek over 2 years ago

triton-model-navigator - Triton Model Navigator v0.7.2

- Python
Published by jkosek almost 3 years ago

triton-model-navigator - Triton Model Navigator v0.7.1

- Python
Published by ptarasiewiczNV almost 3 years ago

triton-model-navigator - Triton Model Navigator v0.7.0

  • new: Inplace Optimize feature - optimize models directly in the Python code
  • new: Non-tensor inputs and outputs support
  • new: Model warmup support in Triton model configuration
  • new: nav.tensorrt.optimize api added for testing and measuring performance of TensorRT models
  • new: Extended custom configs to pass arguments directly to export and conversion operations like torch.onnx.export or polygraphy convert
  • new: Collect GPU clock during model profiling
  • new: Add option to configure minimal trials and stabilization windows for performance verification and profiling
  • change: Navigator package version change to 0.2.3. Custom configurations now use trt_profiles list instead single value
  • change: Store separate reproduction scripts for runners used during correctness and profiling

  • Version of external components used during testing:

- Python
Published by ptarasiewiczNV almost 3 years ago

triton-model-navigator - Triton Model Navigator v0.6.3

- Python
Published by jkosek almost 3 years ago

triton-model-navigator - Triton Model Navigator v0.6.2

- Python
Published by jkosek almost 3 years ago

triton-model-navigator - Triton Model Navigator v0.6.1

- Python
Published by jkosek almost 3 years ago

triton-model-navigator - Triton Model Navigator v0.6.0

  • new: Zero-copy runners for Torch, ONNX and TensorRT - omit H2D and D2H memory copy between runners execution
  • new: nav.pacakge.profile API method to profile generated models on provided dataloader
  • change: ProfilerConfig replaced with OptimizationProfile:
    • new: OptimizationProfile impact the conversion for TensorRT
    • new: batch_sizes and max_batch_size limit the max profile in TensorRT conversion
    • new: Allow to provide separate dataloader for profiling - first sample used only
  • new: allow to run nav.package.optimize on empty package - status generation only
  • new: use torch.inference_mode for inference runner when PyTorch 2.x is available
  • fix: Missing model in config when passing package generated during nav.{framework}.optimize directly to nav.package.optimize command
  • Other minor fixes and improvements

  • Version of external components used during testing:

- Python
Published by jkosek almost 3 years ago

triton-model-navigator - Triton Model Navigator v0.5.6

- Python
Published by kacper-kleczewski almost 3 years ago

triton-model-navigator - Triton Model Navigator v0.5.5

  • new: Public nav.utilities module with UnpackedDataloader wrapper
  • new: Added support for strict flag in Torch custom config
  • new: Extended TensorRT custom config to support builder optimization level and hardware compatibility flags
  • fix: Invalid optimal shape calculation for odd values in max batch size

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by ptarasiewiczNV about 3 years ago

triton-model-navigator - Triton Model Navigator v0.5.4

  • new: Custom implementation for ONNX and TensorRT runners
  • new: Use CUDA 12 for JAX in unit tests and functional tests
  • new: Step-by-step examples
  • new: Updated documentation
  • new: TensorRTCUDAGraph runner introduced with support for CUDA graphs
  • fix: Optimal shape not set correctly during adaptive conversion
  • fix: Find max batch size command for JAX
  • fix: Save stdout to logfiles in debug mode

  • Version of external components used during testing:

- Python
Published by kacper-kleczewski about 3 years ago

triton-model-navigator - Triton Model Navigator v0.5.3

  • fix: filter outputs using output_metadata in ONNX runners

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by ptarasiewiczNV about 3 years ago

triton-model-navigator - Triton Model Navigator v0.5.2

- Python
Published by jkosek about 3 years ago

triton-model-navigator - Triton Model Navigator v0.5.1

- Python
Published by kacper-kleczewski about 3 years ago

triton-model-navigator - Triton Model Navigator v0.5.0

  • new: Support for PyTriton deployemnt
  • new: Support for Python models with python.optimize API
  • new: PyTorch 2 compile CPU and CUDA runners
  • new: Collect conversion max batch size in status
  • new: PyTorch runners with compile support
  • change: Improved handling CUDA and CPU runners
  • change: Reduced finding device max batch size time by running it once as separate pipeline
  • change: Stored find max batch size result in separate filed in status

  • Version of external components used during testing:

- Python
Published by kacper-kleczewski about 3 years ago

triton-model-navigator - Triton Model Navigator v0.4.4

  • fix: when exporting single input model to saved model, unwrap one element list with inputs

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by ptarasiewiczNV about 3 years ago

triton-model-navigator - Triton Model Navigator v0.4.3

  • fix: in Keras inference use model.predict(tensor) for single input models

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by ptarasiewiczNV about 3 years ago

triton-model-navigator - Triton Model Navigator v0.4.2

  • fix: loading configuration for trt_profile from package
  • fix: missing reproduction scripts and logs inside package
  • fix: invalid model path in reproduction script for ONNX to TRT conversion
  • fix: collecting metadata from ONNX model in main thread during ONNX to TRT conversion

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by jkosek about 3 years ago

triton-model-navigator - Triton Model Navigator v0.4.1

  • fix: when specified use dynamic axes from custom OnnxConfig

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by ptarasiewiczNV over 3 years ago

triton-model-navigator - Triton Model Navigator v0.4.0

  • new: optimize method that replace export and perform max batch size search and improved profiling during process
  • new: Introduced custom configs in optimize for better parametrization of export/conversion commands
  • new: Support for adding user runners for model correctness and profiling
  • new: Search for max possible batch size per format during conversion and profiling
  • new: API for creating Triton model store from Navigator Package and user provided models
  • change: Improved status structure for Navigator Package
  • deprecated: Optimize for Triton Inference Server support
  • deprecated: HuggingFace contrib module
  • Bug fixes and other improvements

[//]: <> (put here on external component update with short summary what change or link to changelog)

- Python
Published by kacper-kleczewski over 3 years ago