Recent Releases of optimum

optimum - v1.27.0: Last release before v2, Transformers 4.53 support, SmolLM3, VisualBert...

🚀 Major Upgrades

  • Transformers v4.53 support and SmolLM3 model addition by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2326
  • Batched inference support across all decoders by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2319
  • VisualBert support by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2303

🔧 Enhancements & Fixes

  • Fix taskmanager by @echarlaix in https://github.com/huggingface/optimum/pull/2296
  • Add task onnx register by @echarlaix in https://github.com/huggingface/optimum/pull/2291
  • ExporterConfig refactorization by @echarlaix in https://github.com/huggingface/optimum/pull/2157
  • remove timm from exporters extra by @echarlaix in https://github.com/huggingface/optimum/pull/2299
  • No more forcing separators by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2279
  • Fix broken Trainer documentation link in README by @VolodymyrBg in https://github.com/huggingface/optimum/pull/2304
  • Propagate libraryname parameter in frompretrained to export by @tomaarsen in https://github.com/huggingface/optimum/pull/2328
  • Fix 'Block pattern could not be match. Pass blocknametoquantize argument in quantizemodel' while loading Qwen VL GPTQ model by @arunmadhusud in https://github.com/huggingface/optimum/pull/2295

🧹 Deprecations & v2

  • Deprecated support for TFLite, BetterTransformer, and ONNXRuntime‑Training, these integrations will be fully removed in v2.
  • TensorFlow models export will be removed in v2, consistent with Transformer library dropping TF/JAX support.
  • ONNX and ONNXRuntime integrations will move into the new Optimum‑ONNX package.

New Contributors

  • @dependabot[bot] made their first contribution in https://github.com/huggingface/optimum/pull/2292
  • @arunmadhusud made their first contribution in https://github.com/huggingface/optimum/pull/2295
  • @VolodymyrBg made their first contribution in https://github.com/huggingface/optimum/pull/2304

Full Changelog: https://github.com/huggingface/optimum/compare/v1.26.1...v1.27.0

- Python
Published by IlyasMoutawwakil 11 months ago

optimum - v1.26.1: Patch release

Add back from_transformers for base model by @echarlaix in https://github.com/huggingface/optimum/pull/2288

- Python
Published by echarlaix about 1 year ago

optimum - v1.26.0: ColPali, D-FINE, InternLM2

ONNX export

  • D-FINE support by @xenova in https://github.com/huggingface/optimum/pull/2249
  • ColPali support by @Balladie in https://github.com/huggingface/optimum/pull/2251
  • InternLM2 support by @gmf14 in https://github.com/huggingface/optimum/pull/2244
  • Chinese CLIP support by @xenova in https://github.com/huggingface/optimum/pull/1591

New features & enhancements

  • Add onnxslim support by @inisis in https://github.com/huggingface/optimum/pull/2258
  • Introduce ORTSessionMixin and enable general io binding by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2234
  • Fix and uniformize hub kwargs by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2276
  • Add compatibility with transformers 4.52 by @echarlaix in https://github.com/huggingface/optimum/pull/2270
  • Distribute and complete onnxruntime tests (decoder models) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2278
  • Add ONNX Runtime optimization support for ModernBERT by @amas0 in https://github.com/huggingface/optimum/pull/2208

New Contributors

  • @inisis made their first contribution in https://github.com/huggingface/optimum/pull/2258
  • @Balladie made their first contribution in https://github.com/huggingface/optimum/pull/2251
  • @gmf14 made their first contribution in https://github.com/huggingface/optimum/pull/2244
  • @amas0 made their first contribution in https://github.com/huggingface/optimum/pull/2208

- Python
Published by echarlaix about 1 year ago

optimum - v1.25.3: Patch release

  • Fix ORT pipelines by @echarlaix in https://github.com/huggingface/optimum/pull/2274

Full Changelog**: https://github.com/huggingface/optimum/compare/v1.25.2...v1.25.3

- Python
Published by echarlaix about 1 year ago

optimum - v1.25.2: Patch release

What's Changed

  • Upgrade optimum-intel in setup extras by @echarlaix in https://github.com/huggingface/optimum/pull/2271
  • Match transformers behavior with return_dict by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2269

Full Changelog: https://github.com/huggingface/optimum/compare/v1.25.1...v1.25.2

- Python
Published by IlyasMoutawwakil about 1 year ago

optimum - v1.25.1: Patch release

What's Changed

  • Updated readme/pypi page by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2268
  • Fix bug ORTModelForFeatureExtraction by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2267
  • Fix doc TPU section by @echarlaix in https://github.com/huggingface/optimum/pull/2265

Full Changelog: https://github.com/huggingface/optimum/compare/v1.25.0...v1.25.1

- Python
Published by IlyasMoutawwakil about 1 year ago

optimum - v1.25.0: ViTPose, RT-DETR, EfficientNet, Moonshine ONNX

:rocket: New Features & Enhancements

  • Add ONNX export support for ViTPose, RT-DETR, EfficientNet, Moonshine
  • Infer if the model needs to be exported to ONNX during loading

```diff from optimum.onnxruntime import ORTModelForCausalLM

modelid = "meta-llama/Llama-3.2-1B" - model = ORTModelForCausalLM.frompretrained(modelid, export=True) + model = ORTModelForCausalLM.frompretrained(model_id) ```

  • Transformers v4.49, v4.50 and v4.51 compatibility

:bustsinsilhouette: New Contributors

A huge thank you to our first-time contributors:

  • @ruidazeng
  • @ariG23498
  • @janak2
  • @qubvel
  • @zhxchen17
  • @xieofxie
  • @EFord36
  • @Thas-Tayapongsak
  • @hans00
  • @Abdennacer-Badaoui

What's Changed

  • Update ort training installation instructions by @echarlaix in https://github.com/huggingface/optimum/pull/2173
  • Dev version by @echarlaix in https://github.com/huggingface/optimum/pull/2175
  • Fixed All Typos in docs by @ruidazeng in https://github.com/huggingface/optimum/pull/2185
  • Remove deprecated ORTModel class by @echarlaix in https://github.com/huggingface/optimum/pull/2187
  • avoid library_name guessing if it is known in parameters standartization by @eaidova in https://github.com/huggingface/optimum/pull/2179
  • Infer whether a model needs to be exported to ONNX or not by @echarlaix in https://github.com/huggingface/optimum/pull/2181
  • Update optimum neuron extra by @dacorvo in https://github.com/huggingface/optimum/pull/2190
  • Add support for Moonshine ONNX export (& seq2seq models with non-legacy cache & Tensor.repeat_interleave) by @xenova in https://github.com/huggingface/optimum/pull/2162
  • ViTPose by @ariG23498 in https://github.com/huggingface/optimum/pull/2183
  • ViTPose export fix by @echarlaix in https://github.com/huggingface/optimum/pull/2192
  • Remove ORTTrainer code snippet from README by @echarlaix in https://github.com/huggingface/optimum/pull/2194
  • Remove README code snippets by @echarlaix in https://github.com/huggingface/optimum/pull/2195
  • Add transformers v4.49 support by @echarlaix in https://github.com/huggingface/optimum/pull/2191
  • Fix test benchmark suite by @echarlaix in https://github.com/huggingface/optimum/pull/2199
  • fix the onnx export custom model example; fix repo name; fix opset version; remove deprecated arg; by @janak2 in https://github.com/huggingface/optimum/pull/2203
  • Limit transformers version for bettertransformer support by @echarlaix in https://github.com/huggingface/optimum/pull/2198
  • Add ONNX config for RT-DETR (and RT-DETRv2) by @qubvel in https://github.com/huggingface/optimum/pull/2201
  • Remove deprecated notebook by @echarlaix in https://github.com/huggingface/optimum/pull/2205
  • Update CI runner to ubuntu 22.04 by @echarlaix in https://github.com/huggingface/optimum/pull/2206
  • Add executorch documentation section by @echarlaix in https://github.com/huggingface/optimum/pull/2193
  • Fix typo in exporters/onnx/utils.py by @zhxchen17 in https://github.com/huggingface/optimum/pull/2210
  • Link Optimum-ExecuTorch to parent Optimum on Hub by @guangy10 in https://github.com/huggingface/optimum/pull/2222
  • Fix CI and update Transformers (4.51.1) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2225
  • Remove FP16_Optimizer patch for DeepSpeed by @Rohan138 in https://github.com/huggingface/optimum/pull/2213
  • Fix diffusers by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2229
  • Remove diffusers extra by @echarlaix in https://github.com/huggingface/optimum/pull/2207
  • TRT engine docs by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1396
  • Always use a deafult user agent by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2230
  • dedup getmodelexternaldata_paths by @xieofxie in https://github.com/huggingface/optimum/pull/2217
  • Clean up workflows by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2231
  • reduce area of patch_everywhere for avoid unexpected replacements by @eaidova in https://github.com/huggingface/optimum/pull/2237
  • add dinov2 onnx optimizer support by @EFord36 in https://github.com/huggingface/optimum/pull/2227
  • Fix code quality test by @echarlaix in https://github.com/huggingface/optimum/pull/2239
  • Add onnx export for efficientnet by @Thas-Tayapongsak in https://github.com/huggingface/optimum/pull/2214
  • add loading image processor by @eaidova in https://github.com/huggingface/optimum/pull/2254
  • Fix CLIPSdpaAttention had dropped since v4.48 by @hans00 in https://github.com/huggingface/optimum/pull/2245
  • Increase clip opset by @echarlaix in https://github.com/huggingface/optimum/pull/2256
  • Add feature extraction support for image models by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2255
  • adding token classification task for qwen2 by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2261
  • upgrade min transformers version for phi3 by @echarlaix in https://github.com/huggingface/optimum/pull/2263

- Python
Published by echarlaix about 1 year ago

optimum - v1.24.0: SD3 & Flux, DinoV2, Modernbert, GPTQModel, Transformers v4.48...

Release Notes: Optimum v1.24.0

We’re excited to announce the release of Optimum v1.24.0. This update expands ONNX-based model capabilities and includes several improvements, bug fixes, and new contributions from the community.

:rocket: New Features & Enhancements

  • ORTQuantizer now supports models with ONNX subfolders.
  • ONNX Runtime IO Binding support for all supported Transformers models (no models left behind).
  • SD3 and Flux model support added to ORTDiffusionPipeline enabling latest diffusion-based models.
  • Transformers v4.47 and v4.48 compatibility, ensuring seamless integration with the latest advancements in Hugging Face's ecosystem.
  • ONNX export support extended to various models, including Decision Transformer, ModernBERT, Megatron-BERT, Dinov2, OLMo, and many more (see details).

:wrench: Key Fixes & Optimizations

  • Dropped support for Python 3.8
  • Bug fixes in ModelPatcher, SDXL refiner export, and device checks for improved reliability.

:bustsinsilhouette: New Contributors

A huge thank you to our first-time contributors: - @gabe-l-hart - @ra9hur - @bndos - @mlynatom - @LoSealL - @sjrl - @guangy10 - @LRL-ModelCloud - @pragyandev

Your contributions make Optimum better! :tada:

For a detailed list of all changes, please check out the full changelog.

:rocket: Happy optimizing!

What's Changed

* Onnx granite by @gabe-l-hart in https://github.com/huggingface/optimum/pull/2043 * Drop python 3.8 by @echarlaix in https://github.com/huggingface/optimum/pull/2086 * Update Dockerfile base image by @echarlaix in https://github.com/huggingface/optimum/pull/2089 * add transformers 4.36 tests by @echarlaix in https://github.com/huggingface/optimum/pull/2085 * [`fix`] Allow ORTQuantizer over models with subfolder ONNX files by @tomaarsen in https://github.com/huggingface/optimum/pull/2094 * SD3 and Flux support by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2073 * Remove datasets as required dependency by @echarlaix in https://github.com/huggingface/optimum/pull/2087 * Add ONNX Support for Decision Transformer Model by @ra9hur in https://github.com/huggingface/optimum/pull/2038 * Generate guidance for flux by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2104 * Unbundle inputs generated by `DummyTimestepInputGenerator` by @JingyaHuang in https://github.com/huggingface/optimum/pull/2107 * Pass the revision to SentenceTransformer models by @bndos in https://github.com/huggingface/optimum/pull/2105 * Rembert onnx support by @mlynatom in https://github.com/huggingface/optimum/pull/2108 * fix bug `ModelPatcher` returns empty outputs by @LoSealL in https://github.com/huggingface/optimum/pull/2109 * Fix workflow to mark issues as stale by @echarlaix in https://github.com/huggingface/optimum/pull/2110 * Remove doc-build by @echarlaix in https://github.com/huggingface/optimum/pull/2111 * Downgrade stale bot to v8 and fix permissions by @echarlaix in https://github.com/huggingface/optimum/pull/2112 * Update documentation color from google tpu section by @echarlaix in https://github.com/huggingface/optimum/pull/2113 * Fix workflow to mark PRs as stale by @echarlaix in https://github.com/huggingface/optimum/pull/2116 * Enable transformers v4.47 support by @echarlaix in https://github.com/huggingface/optimum/pull/2119 * Add ONNX export support for MGP-STR by @xenova in https://github.com/huggingface/optimum/pull/2099 * Add ONNX export support for OLMo and OLMo2 by @xenova in https://github.com/huggingface/optimum/pull/2121 * Pass on `model_kwargs` when exporting a SentenceTransformers model by @sjrl in https://github.com/huggingface/optimum/pull/2126 * Add ONNX export support for DinoV2, Hiera, Maskformer, PVT, SigLIP, SwinV2, VitMAE, and VitMSN models by @xenova in https://github.com/huggingface/optimum/pull/2001 * move check_dummy_inputs_allowed to common export utils by @eaidova in https://github.com/huggingface/optimum/pull/2114 * Remove CI macos runners by @echarlaix in https://github.com/huggingface/optimum/pull/2129 * Enable GPTQModel by @jiqing-feng in https://github.com/huggingface/optimum/pull/2064 * Skip private model loading for external contributors by @echarlaix in https://github.com/huggingface/optimum/pull/2130 * fix sdxl refiner export by @eaidova in https://github.com/huggingface/optimum/pull/2133 * Export to ExecuTorch: Initial Integration by @guangy10 in https://github.com/huggingface/optimum/pull/2090 * Fix AutoModel can't load gptq model due to module prefix mismatch vs AutoModelForCausalLM by @LRL-ModelCloud in https://github.com/huggingface/optimum/pull/2146 * Update docker files by @echarlaix in https://github.com/huggingface/optimum/pull/2102 * Limit diffusers version by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2150 * Add ONNX export support for ModernBERT by @xenova in https://github.com/huggingface/optimum/pull/2131 * Allow GPTQModel to auto select Marlin or faster kernels for inference only ops by @LRL-ModelCloud in https://github.com/huggingface/optimum/pull/2138 * fix device check by @jiqing-feng in https://github.com/huggingface/optimum/pull/2136 * Replace check_if_xxx_greater with is_xxx_version by @echarlaix in https://github.com/huggingface/optimum/pull/2152 * Add tf available and version by @echarlaix in https://github.com/huggingface/optimum/pull/2154 * Add ONNX export support for `PatchTST` by @xenova in https://github.com/huggingface/optimum/pull/2101 * fix infer task from model_name if model from sentence transformer by @eaidova in https://github.com/huggingface/optimum/pull/2151 * Unpin diffusers and pass onnx exporters tests by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2153 * Uncomment modernbert config by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2155 * Skip optimum-benchmark when loading namespace modules by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2159 * Fix PR doc upload by @regisss in https://github.com/huggingface/optimum/pull/2161 * Move executorch to optimum-executorch by @echarlaix in https://github.com/huggingface/optimum/pull/2165 * Adding Onnx Support For Megatron-Bert by @pragyandev in https://github.com/huggingface/optimum/pull/2169 * Transformers 4.48 by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2158 * Update ort CIs (slow, gpu, train) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2024

- Python
Published by IlyasMoutawwakil over 1 year ago

optimum - v1.23.3: Patch release

  • Add sentence-transformers and timm documentation example by @echarlaix in https://github.com/huggingface/optimum/pull/2072
  • Create token type ids when not provided by @echarlaix in https://github.com/huggingface/optimum/pull/2081
  • Add transformers v4.46 support by @echarlaix in https://github.com/huggingface/optimum/pull/2078

- Python
Published by echarlaix over 1 year ago

optimum - v1.23.2: Patch release

  • Fix compatibility with diffusers < 0.25.0 #2063 @echarlaix
  • Update the habana extra #2077 @regisss

Full Changelog: https://github.com/huggingface/optimum/compare/v1.23.1...v1.23.2

- Python
Published by regisss over 1 year ago

optimum - v1.23.1: Patch release

  • Fix doc build by @regisss in https://github.com/huggingface/optimum/pull/2050
  • Don't hardcode the logger level to INFO let users set TRANSFORMERS_VERBOSITY by @tomaarsen in https://github.com/huggingface/optimum/pull/2047
  • Add workflow to mark issues as stale by @regisss in https://github.com/huggingface/optimum/pull/2051
  • Fix onnx export when transformers >= v4.45 (impacting sentence-transformers and timm models) by @echarlaix in https://github.com/huggingface/optimum/pull/2053 and https://github.com/huggingface/optimum/pull/2054

- Python
Published by echarlaix over 1 year ago

optimum - v1.23.0: ORTDiffusionPipeline, transformers v4.45

ONNX Runtime Diffusion pipeline

Adding ORTDiffusionPipeline to simplify diffusers model loading by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1960 and https://github.com/huggingface/optimum/pull/2021

diff model_id = "runwayml/stable-diffusion-v1-5" - pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, revision="onnx") + pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx") image = pipeline("sailing ship in storm by Leonardo da Vinci").images[0]

Transformers v4.45

Transformers v4.45 support by @echarlaix in https://github.com/huggingface/optimum/pull/2023 and https://github.com/huggingface/optimum/pull/2045

Subfolder

Remove the restriction for the model's config to be in the model's subfolder by @echarlaix in https://github.com/huggingface/optimum/pull/2044

New Contributors

  • @tcsavage made their first contribution in https://github.com/huggingface/optimum/pull/1965
  • @yuanwu2017 made their first contribution in https://github.com/huggingface/optimum/pull/2003
  • @h3110Fr13nd made their first contribution in https://github.com/huggingface/optimum/pull/2031
  • @glegendre01 made their first contribution in https://github.com/huggingface/optimum/pull/2033
  • @rbrugaro made their first contribution in https://github.com/huggingface/optimum/pull/2027

Full Changelog: https://github.com/huggingface/optimum/compare/v1.22.0...v1.23.0

- Python
Published by echarlaix over 1 year ago

optimum - v1.22.0: transformers 4.44 compatibility, bugfixes

What's Changed

  • Fix sentence transformers modeling patching for export by @echarlaix in https://github.com/huggingface/optimum/pull/1936
  • Update optimum intel extra by @echarlaix in https://github.com/huggingface/optimum/pull/1935
  • Update Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1937
  • Remove inplace op in mistral patcher by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1938
  • Fix forward bug in ORTModelForFeatureExtraction by @moria97 in https://github.com/huggingface/optimum/pull/1941
  • Deprecate ORTModel class by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1939
  • Remove warning by @echarlaix in https://github.com/huggingface/optimum/pull/1945
  • Clip vision model onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/1920
  • Add export test for swin with shifted windows by @echarlaix in https://github.com/huggingface/optimum/pull/1942
  • Refactor diffusers tasks by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1947
  • Fix optimizer's command line reading by @idruker-cerence in https://github.com/huggingface/optimum/pull/1961
  • Fix unmaskunattendedpatched signature by @fxmarty in https://github.com/huggingface/optimum/pull/1963
  • Fix undefined variable in library name inference by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1964
  • Fix gpt bigcode ONNX export for transformers<4.39.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1973
  • Support transformers 4.43 by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1971
  • chore(ci): migrate runner configuration in GitHub workflows by @XciD in https://github.com/huggingface/optimum/pull/1978
  • Fix typos in quantization.mdx by @aldakata in https://github.com/huggingface/optimum/pull/1989
  • Update Habana extra in setup.py by @regisss in https://github.com/huggingface/optimum/pull/1991
  • Follow up the diffusers task refactoring by @JingyaHuang in https://github.com/huggingface/optimum/pull/1999
  • Transformers 4.44 support by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1996
  • Modify token classification processor default dataset args by @echarlaix in https://github.com/huggingface/optimum/pull/2005
  • Fix TFLite tests by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2007
  • Fix attribute name from inputs_names to input_names by @J4BEZ in https://github.com/huggingface/optimum/pull/2010
  • Fix typo in BetterTransformer's overview docs by @ftnext in https://github.com/huggingface/optimum/pull/2015
  • Apply deprecated evaluation_strategy by @muellerzr in https://github.com/huggingface/optimum/pull/1819
  • Update transformers imports for deepspeed and is_torch_xla_available by @Rohan138 in https://github.com/huggingface/optimum/pull/2012
  • Add quanto install and instructions by @dacorvo in https://github.com/huggingface/optimum/pull/1976

New Contributors

  • @moria97 made their first contribution in https://github.com/huggingface/optimum/pull/1941
  • @XciD made their first contribution in https://github.com/huggingface/optimum/pull/1978
  • @zhenglongjiepheonix made their first contribution in https://github.com/huggingface/optimum/pull/1933
  • @aldakata made their first contribution in https://github.com/huggingface/optimum/pull/1989
  • @J4BEZ made their first contribution in https://github.com/huggingface/optimum/pull/2010
  • @ftnext made their first contribution in https://github.com/huggingface/optimum/pull/2015
  • @muellerzr made their first contribution in https://github.com/huggingface/optimum/pull/1819
  • @Rohan138 made their first contribution in https://github.com/huggingface/optimum/pull/2012

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.4...v1.22.0

- Python
Published by echarlaix almost 2 years ago

optimum - v1.21.4: Patch release

  • Update Habana extra in setup.py by @regisss in #1991

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.3...v1.21.4

- Python
Published by regisss almost 2 years ago

optimum - v1.21.3: Patch release

  • Deprecate ORTModel class by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1939
  • Remove warning by @echarlaix in https://github.com/huggingface/optimum/pull/1945
  • Fix optimizer's command line reading by @idruker-cerence in https://github.com/huggingface/optimum/pull/1961
  • Fix unmaskunattendedpatched signature by @fxmarty in https://github.com/huggingface/optimum/pull/1963
  • Fix gpt bigcode ONNX export for transformers<4.39.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1973
  • Support transformers 4.43 by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1971

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.2...v1.21.3

- Python
Published by echarlaix almost 2 years ago

optimum - v1.21.2: Patch release

  • Remove inplace op in mistral patcher by @IlyasMoutawwakil in #1938
  • Fix ORTModelForFeatureExtraction modeling by @moria97 in #1941

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.1...v1.21.2

- Python
Published by echarlaix almost 2 years ago

optimum - v1.21.1: Patch release

  • Fix sentence transformers model patching by @echarlaix in https://github.com/huggingface/optimum/pull/1936
  • Update Intel extra by @echarlaix in https://github.com/huggingface/optimum/pull/1935
  • Update Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1937

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.0...v1.21.1

- Python
Published by echarlaix almost 2 years ago

optimum - v1.21.0: many bugfixes, transformers 4.42 compatibility

What's Changed

  • ORTOptimizer for the model type Segformer by @zachmayer in https://github.com/huggingface/optimum/pull/1820
  • fix: device consistence by @Daya-Jin in https://github.com/huggingface/optimum/pull/1891
  • Allow optimum to discover and load subpackages by @dacorvo in https://github.com/huggingface/optimum/pull/1894
  • feat(ci): add trufflehog secrets detector by @McPatate in https://github.com/huggingface/optimum/pull/1899
  • fix(ci): remove unnecessary permissions by @McPatate in https://github.com/huggingface/optimum/pull/1904
  • Remove read token by @fxmarty in https://github.com/huggingface/optimum/pull/1903
  • Remove dataset with restrictive license by @echarlaix in https://github.com/huggingface/optimum/pull/1910
  • Fix Windows and onnx dtype compatibility by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1886
  • Deprecated use_auth_token by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1837
  • Add redirection for optimum intel doc by @echarlaix in https://github.com/huggingface/optimum/pull/1918
  • Read useexternaldata_format from ORTConfig file by @idruker-cerence in https://github.com/huggingface/optimum/pull/1917
  • Pin numpy v1 for onnxruntime by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1921
  • Fix GPTQ CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1878
  • Fix code quality by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1928
  • Fix incorrect names for usage blenderbot for causallm by @eaidova in https://github.com/huggingface/optimum/pull/1887
  • Fixed bug key error "lasthiddenstate" by @satishsilveri in https://github.com/huggingface/optimum/pull/1674
  • Support transformers 4.42 by @fxmarty in https://github.com/huggingface/optimum/pull/1929

New Contributors

  • @zachmayer made their first contribution in https://github.com/huggingface/optimum/pull/1820
  • @Daya-Jin made their first contribution in https://github.com/huggingface/optimum/pull/1891
  • @dacorvo made their first contribution in https://github.com/huggingface/optimum/pull/1894
  • @McPatate made their first contribution in https://github.com/huggingface/optimum/pull/1899
  • @idruker-cerence made their first contribution in https://github.com/huggingface/optimum/pull/1917
  • @satishsilveri made their first contribution in https://github.com/huggingface/optimum/pull/1674

Full Changelog: https://github.com/huggingface/optimum/compare/v1.20.0...v1.21.0

- Python
Published by fxmarty almost 2 years ago

optimum - v1.20.0: VITS, Phi-3 ONNX export

Extended ONNX export

  • VITS ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1607
  • Phi-3 ONNX export by @JingyaHuang in https://github.com/huggingface/optimum/pull/1870
  • Add Phi-3 normalized config by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/1841
  • Add Phi-3 small normalized config by @JingyaHuang in https://github.com/huggingface/optimum/pull/1864

Other changes and bugfixes

  • Bump transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1824
  • Remove call to apt update before apt purge in the main doc build workflow by @regisss in https://github.com/huggingface/optimum/pull/1830
  • Update github workflows by @echarlaix in https://github.com/huggingface/optimum/pull/1829
  • Remove bad PPA in main doc build workflow by @regisss in https://github.com/huggingface/optimum/pull/1831
  • Fix TPU doc build by @regisss in https://github.com/huggingface/optimum/pull/1834
  • Fix sentence transformers models infer library by @echarlaix in https://github.com/huggingface/optimum/pull/1832
  • Fix random initialization of bias when using GPTQ quantization with models without bias by @B-201 in https://github.com/huggingface/optimum/pull/1827

  • Update the Transformers dependency in the Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1851

  • Make stable diffusion unet and vae number of channels static by @eaidova in https://github.com/huggingface/optimum/pull/1840

  • Fix compatibility with transformers v4.41.0 for ONNX by @echarlaix in https://github.com/huggingface/optimum/pull/1860

  • Fix FX CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1866

  • Fix Utils CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1867

  • Fix BT CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1872

  • Fix ORTConfig loading by @mr-sarthakgupta in https://github.com/huggingface/optimum/pull/1879

  • Update ORT doc for ROCM 6.0 by @mht-sharma in https://github.com/huggingface/optimum/pull/1862

  • Fix ort config instantiation (frompretrained) and saving (savepretrained) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1865

  • Fix ORT CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1875

  • Update optimum intel extra by @echarlaix in https://github.com/huggingface/optimum/pull/1882

  • Bump transformers version for neuron extras by @JingyaHuang in https://github.com/huggingface/optimum/pull/1881

New Contributors

  • @B-201 made their first contribution in https://github.com/huggingface/optimum/pull/1827
  • @mr-sarthakgupta made their first contribution in https://github.com/huggingface/optimum/pull/1879

Full Changelog: https://github.com/huggingface/optimum/compare/v1.19.0...v1.20.0

- Python
Published by echarlaix about 2 years ago

optimum - v1.19.2: Patch release

  • Update the Transformers dependency in the Habana extra #1851 @regisss

Full Changelog: https://github.com/huggingface/optimum/compare/v1.19.1...v1.19.2

- Python
Published by regisss about 2 years ago

optimum - v1.19.1: Patch release

  • Bump transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1824
  • Remove call to apt update before apt purge in the main doc build workflow by @regisss in https://github.com/huggingface/optimum/pull/1830

Full Changelog: https://github.com/huggingface/optimum/compare/v1.19.0...v1.19.1

- Python
Published by echarlaix about 2 years ago

optimum - v1.19.0: Musicgen, MarkupLM ONNX export

Extended ONNX export

Musicgen and MarkupLM models from Transformers can now be exported to ONNX through optimum-cli export onnx. Musicgen ONNX export is used to run the model locally in a browser through transformers.js.

  • Musicgen ONNX export (text-conditional only) by @fxmarty in https://github.com/huggingface/optimum/pull/1779
  • Add support for markuplm ONNX export by @pogzyb in https://github.com/huggingface/optimum/pull/1784

Other changes and bugfixes

  • Fix IR version for merged ONNX decoders by @fxmarty in https://github.com/huggingface/optimum/pull/1780
  • Update test model id by @echarlaix in https://github.com/huggingface/optimum/pull/1785
  • Add Nvidia and Neuron to README by @JingyaHuang in https://github.com/huggingface/optimum/pull/1791
  • adds debug options to dump onnx graphs by @prathikr in https://github.com/huggingface/optimum/pull/1789
  • Improve PR template by @fxmarty in https://github.com/huggingface/optimum/pull/1799
  • Add Google TPU to the mix by @mfuntowicz in https://github.com/huggingface/optimum/pull/1797
  • Add redirection for Optimum TPU by @regisss in https://github.com/huggingface/optimum/pull/1801
  • Add Nvidia and Neuron to the installation doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/1803
  • Update installation instructions by @echarlaix in https://github.com/huggingface/optimum/pull/1806
  • Fix offline compatibility by @fxmarty in https://github.com/huggingface/optimum/pull/1805
  • Remove unnecessary constants for > 2GB ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/1808
  • Add onnx export function for pix2struct model by @naormatania in https://github.com/huggingface/optimum/pull/1815

New Contributors

  • @pogzyb made their first contribution in https://github.com/huggingface/optimum/pull/1784
  • @naormatania made their first contribution in https://github.com/huggingface/optimum/pull/1815

Full Changelog: https://github.com/huggingface/optimum/compare/v1.18.0...v1.19.0

- Python
Published by fxmarty about 2 years ago

optimum - v1.18.1: Patch release

Fix the installation for Optimum Neuron v0.0.21 release

  • Improve the installation of optimum-neuron through optimum extras #1778

Fix the task inference of stable diffusion

  • Fix infer task for stable diffusion #1793

Full Changelog: https://github.com/huggingface/optimum/compare/v1.18.0...v1.18.1

- Python
Published by JingyaHuang about 2 years ago

optimum - v1.18.0: Gemma, OWLv2, MPNet Qwen2 ONNX support

New architectures ONNX export :

  • OWLv2 by @xenova in #1689
  • Gemma by @fxmarty in #1714
  • MPNet by @nathan-az in #1471
  • Qwen2 by @uniartisan in #1746

Other changes and bugfixes

  • Fix starcoder ORT integration by @fxmarty in #1722
  • Fix useauthtoken with ORTModel by @fxmarty in #1740
  • Fix compatibility with transformers v4.39.0 by @echarlaix in #1764

- Python
Published by echarlaix about 2 years ago

optimum - v1.17.1: Patch release

Update Transformers dependency for the release of Optimum Habana v1.10.2

  • Update Transformers dependency in Habana extra #1700

Full Changelog: https://github.com/huggingface/optimum/compare/v1.17.0...v1.17.1

- Python
Published by regisss over 2 years ago

optimum - v1.17.0: Improved ONNX support & many bugfixes

ONNX export from nn.Module

A function is exposed to programmatically export any nn.Module (e.g. models coming from Transformers, but modified). This is useful in case you need to do some modifications on models loaded from the Hub before exporting. Example:

```python from transformers import AutoModelForImageClassification from optimum.exporters.onnx import onnxexportfrom_model

model = AutoModelForImageClassification.from_pretrained("google/vit-base-patch16-224")

Here one could do any modification on the model before the export.

onnxexportfrommodel(model, output="vitonnx") ```

  • Enable model ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1649

ONNX export with static shapes

The Optimum ONNX export CLI allows to disable dynamic shape for inputs/outputs:

optimum-cli export onnx --model timm/ese_vovnet39b.ra_in1k out_vov --no-dynamic-axes

This is useful if the exported model is to be consumed by a runtime that does not support dynamic shapes. The static shape can be specified e.g. with --batch_size 1 . See all the shape options in optimum-cli export onnx --help.

  • Enable export of model with fixed shape by @mht-sharma in https://github.com/huggingface/optimum/pull/1643

BF16 ONNX export

The Optimum ONNX export now supports BF16 export on CPU and GPU. Beware though that ONNX Runtime is most often not able to consume the models as some operation are not implemented in this data type, although the exported models comply with ONNX standard. This is useful if you are developing a runtime that consomes BF16 ONNX models.

Example: optimum-cli export onnx --model bert-base-uncased --dtype bf16 bert_onnx

  • BF16 support in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1654

ONNX export for news models

You can now export to ONNX table-transformer, bart for text-classification.

  • Add ONNX export for table-transformer by @xenova in https://github.com/huggingface/optimum/pull/1616
  • Reactivate BART Onnx Export by @claeyzre in https://github.com/huggingface/optimum/pull/1666

Sentence Transformers ONNX export

  • Fix sentence transformers ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1632
  • Bump sentence-transformers ONNX opset by @fxmarty in https://github.com/huggingface/optimum/pull/1634
  • Pass trust_remote_code to sentence transformers export by @xenova in https://github.com/huggingface/optimum/pull/1677
  • Fix library detection by @fxmarty in https://github.com/huggingface/optimum/pull/1690

Timm models support with ONNX Runtime

Timm models can now be run through ONNX Runtime with the class ORTModelForImageClassification:

```python from urllib.request import urlopen

import timm import torch from PIL import Image

from optimum.onnxruntime import ORTModelForImageClassification

Export the model to ONNX under the hood with export=True.

model = ORTModelForImageClassification.frompretrained("timm/resnext10164x4d.c1_in1k", export=True)

Get model specific transforms (normalization, resize).

dataconfig = timm.data.resolvedataconfig(pretrainedcfg=model.config.pretrainedcfg) transforms = timm.data.createtransform(**dataconfig, istraining=False)

img = Image.open( urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png") ) output = model(transforms(img).unsqueeze(0)).logits top5probabilities, top5class_indices = torch.topk(torch.softmax(output, dim=1) * 100, k=5) ```

  • Add Timm support in ORTModelForImageClassification by @mht-sharma in https://github.com/huggingface/optimum/pull/1578

Other changes and bugfixes

  • Modify SEW-D model for tests by @echarlaix in https://github.com/huggingface/optimum/pull/1601
  • Add phi and mixtral model type to normalizedconfig by @changwangss in https://github.com/huggingface/optimum/pull/1625
  • Remove "to ONNX" from info message when exporting model by @helena-intel in https://github.com/huggingface/optimum/pull/1627
  • Modify model id for test by @echarlaix in https://github.com/huggingface/optimum/pull/1628
  • Fix cupy detection by @fxmarty in https://github.com/huggingface/optimum/pull/1635
  • Fix ORT detection by @fxmarty in https://github.com/huggingface/optimum/pull/1636
  • Enable sdpa export for SD unet component by @echarlaix in https://github.com/huggingface/optimum/pull/1637
  • [ORT] Improve dummy mask & add tips for attention fusion in the doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/1640
  • Improve error message by @Almonok in https://github.com/huggingface/optimum/pull/1623
  • Add input_labels input to SAM model export by @xenova in https://github.com/huggingface/optimum/pull/1638
  • Fix c4 dataset loading by @SunMarc in https://github.com/huggingface/optimum/pull/1646
  • Avoid loading onnx file in weight deduplication if not necessary by @fxmarty in https://github.com/huggingface/optimum/pull/1648
  • Allow lower ONNX opsets by @fxmarty in https://github.com/huggingface/optimum/pull/1650
  • Remove abstract decorator from _export by @JingyaHuang in https://github.com/huggingface/optimum/pull/1652
  • Add rjieba install by @mht-sharma in https://github.com/huggingface/optimum/pull/1661
  • Fix wikitext2 processing by @SunMarc in https://github.com/huggingface/optimum/pull/1663
  • Fix: local variable 'dataset' referenced before assignment by @hiyouga in https://github.com/huggingface/optimum/pull/1600
  • Support float16 images in StableDiffusionXLWatermarker by @jambayk in https://github.com/huggingface/optimum/pull/1603
  • Extend autocast check to cover more platforms like XPU by @hoshibara in https://github.com/huggingface/optimum/pull/1639
  • Support IO Binding for ORTModelForCTC by @vidalmaxime in https://github.com/huggingface/optimum/pull/1629
  • Add fp16 support for split cache by @PatriceVignola in https://github.com/huggingface/optimum/pull/1602
  • ORTModelForFeatureExtraction always exports as transformers models by @fxmarty in https://github.com/huggingface/optimum/pull/1684
  • Avoid overriding model_type in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/1647
  • Fix gptq device_map = "cpu" by @SunMarc in https://github.com/huggingface/optimum/pull/1662
  • CI: Avoid iterating over a mutated iterable by @fxmarty in https://github.com/huggingface/optimum/pull/1683
  • Add option to disable ONNX constant folding by @fxmarty in https://github.com/huggingface/optimum/pull/1682
  • re-enable decoder sequence classification by @dwyatte in https://github.com/huggingface/optimum/pull/1679
  • Move & rename onnx_export by @fxmarty in https://github.com/huggingface/optimum/pull/1685
  • Update standardizemodelattributes by @mht-sharma in https://github.com/huggingface/optimum/pull/1686
  • Fix: AttributeError: module 'packaging' has no attribute 'version' by @soulteary in https://github.com/huggingface/optimum/pull/1660
  • Disable failing test & free space when building documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1693
  • Fix no space left on device in actions by @fxmarty in https://github.com/huggingface/optimum/pull/1694
  • Add end-to-end Marlin benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1695
  • Fix main doc build by @fxmarty in https://github.com/huggingface/optimum/pull/1697
  • Update optimum-intel requirements by @echarlaix in https://github.com/huggingface/optimum/pull/1699

New Contributors

  • @tomaarsen made their first contribution in https://github.com/huggingface/optimum/pull/1597
  • @helena-intel made their first contribution in https://github.com/huggingface/optimum/pull/1627
  • @Almonok made their first contribution in https://github.com/huggingface/optimum/pull/1623
  • @hiyouga made their first contribution in https://github.com/huggingface/optimum/pull/1600
  • @jambayk made their first contribution in https://github.com/huggingface/optimum/pull/1603
  • @hoshibara made their first contribution in https://github.com/huggingface/optimum/pull/1639
  • @vidalmaxime made their first contribution in https://github.com/huggingface/optimum/pull/1629
  • @PatriceVignola made their first contribution in https://github.com/huggingface/optimum/pull/1602
  • @claeyzre made their first contribution in https://github.com/huggingface/optimum/pull/1666
  • @dwyatte made their first contribution in https://github.com/huggingface/optimum/pull/1679
  • @soulteary made their first contribution in https://github.com/huggingface/optimum/pull/1660

Full Changelog: https://github.com/huggingface/optimum/compare/v1.16.0...v1.17.0

- Python
Published by fxmarty over 2 years ago

optimum - v1.16.2: Patch release

  • Fix ORT training compatibility for transformers v4.36.0 by @AdamLouly https://github.com/huggingface/optimum/pull/1586

  • Fix ONNX expor tcompatibility for transformers v4.37.0 by @echarlaix https://github.com/huggingface/optimum/pull/1641

- Python
Published by echarlaix over 2 years ago

optimum - v1.16.1: Patch release

Breaking change: BetterTransformer llama, falcon, whisper, bart is deprecated

The features from BetterTransformer for Llama, Falcon, Whisper and Bart have been upstreamed in Transformers. Please use transformers>=4.36 and torch>=2.1.1 to use by default PyTorch's scaled_dot_product_attention.

More details: https://github.com/huggingface/transformers/releases/tag/v4.36.0

What's Changed

  • Update dev version by @fxmarty in https://github.com/huggingface/optimum/pull/1596
  • Typo: tansformers -> transformers by @tomaarsen in https://github.com/huggingface/optimum/pull/1597
  • [GPTQ] fix tests by @SunMarc in https://github.com/huggingface/optimum/pull/1598
  • Show correct error message on using BT for SDPA models by @fxmarty in https://github.com/huggingface/optimum/pull/1599

New Contributors

  • @tomaarsen made their first contribution in https://github.com/huggingface/optimum/pull/1597

Full Changelog: https://github.com/huggingface/optimum/compare/v1.16.0...v1.16.1

- Python
Published by fxmarty over 2 years ago

optimum - v1.16.0: Transformers 4.36 compatibility, extended ONNX support, Mixtral GPTQ

Transformers 4.36 compatiblity

Notably, the ONNX exports aten::scaled_dot_product_attention in a standardized way for the compatible models.

  • Compatibility with Transformers 4.36 by @fxmarty in https://github.com/huggingface/optimum/pull/1590

Extended ONNX support: timm, sentence-transformers, Phi, ESM

  • Add ONNX export for phi models by @xenova in https://github.com/huggingface/optimum/pull/1579
  • Add ESM onnx support by @xenova in https://github.com/huggingface/optimum/pull/1581
  • Add timm models export by @mht-sharma in https://github.com/huggingface/optimum/pull/1587
  • Proper sentence-transformers ONNX export support by @fxmarty in https://github.com/huggingface/optimum/pull/1589

GPTQ for Mixtral

Work in progress.

  • add modules_in_block_to_quantize arg for gptq by @SunMarc in https://github.com/huggingface/optimum/pull/1585

What's Changed

  • Update version to 1.16.0.dev0 by @fxmarty in https://github.com/huggingface/optimum/pull/1571
  • Use doc links in the README for subpackages by @fxmarty in https://github.com/huggingface/optimum/pull/1572
  • Fix GPTQ compatibility with AutoGPTQ by @fxmarty in https://github.com/huggingface/optimum/pull/1574
  • Refactoring EC2 CIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/1575
  • Remove inputs from sentence-transformers ONNX output by @fxmarty in https://github.com/huggingface/optimum/pull/1593
  • Gptq tokenized dataset by @SunMarc in https://github.com/huggingface/optimum/pull/1584
  • Run timm ONNX CI only once per day by @fxmarty in https://github.com/huggingface/optimum/pull/1594
  • Run timm ONNX CI nightly v2 by @fxmarty in https://github.com/huggingface/optimum/pull/1595

Full Changelog: https://github.com/huggingface/optimum/compare/v1.15.0...v1.16.0

- Python
Published by fxmarty over 2 years ago

optimum - v1.15.0: ROCMExecutionProvider support

ROCMExecutionProvider support

The Optimum ONNX Runtime integration is extended to officially support ROCMExecutionProvider. See more details in the documentation.

  • Add AMD GPU support by @mht-sharma in https://github.com/huggingface/optimum/pull/1546
  • Update ROCM ORT doc by @mht-sharma in https://github.com/huggingface/optimum/pull/1564

Extended ONNX export

The Swin2sr, DPT, GLPN, ConvNextv2 are now supported in the ONNX export.

  • Swin2sr onnx by @baskrahmer in https://github.com/huggingface/optimum/pull/1492
  • Add depth-estimation w/ DPT+GLPN by @xenova in https://github.com/huggingface/optimum/pull/1529
  • Add convnextv2 onnx export by @xenova in https://github.com/huggingface/optimum/pull/1560

What's Changed

  • Add OV export CLI to README by @echarlaix in https://github.com/huggingface/optimum/pull/1526
  • Refactor NormalizedConfigs for GQA by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1539
  • Fix model patcher ONNX decoder export by @fxmarty in https://github.com/huggingface/optimum/pull/1547
  • Add AMD to the documentation main page by @mfuntowicz in https://github.com/huggingface/optimum/pull/1540
  • Add Optimum-amd documentation to the PR & release doc by @fxmarty in https://github.com/huggingface/optimum/pull/1562
  • Add amd documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1557
  • Remove delete_doc_comment workflows by @regisss in https://github.com/huggingface/optimum/pull/1565
  • optimum-nvidia by @mfuntowicz in https://github.com/huggingface/optimum/pull/1566
  • Update installation instructions in README by @echarlaix in https://github.com/huggingface/optimum/pull/1568
  • Update doc for AMD by @mht-sharma in https://github.com/huggingface/optimum/pull/1570
  • Add amd extra to setup.py by @echarlaix in https://github.com/huggingface/optimum/pull/1567

New Contributors

  • @xenova made their first contribution in https://github.com/huggingface/optimum/pull/1529

Full Changelog: https://github.com/huggingface/optimum/compare/v1.14.0...v1.15.0

- Python
Published by fxmarty over 2 years ago

optimum - v1.14.1: Patch release

  • Update optimum-intel required version by @echarlaix in https://github.com/huggingface/optimum/pull/1521
  • Swin2sr onnx by @baskrahmer in https://github.com/huggingface/optimum/pull/1492
  • Fix Falcon ONNX export with alibi by @fxmarty in https://github.com/huggingface/optimum/pull/1524
  • Fix whisper v3 ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1525
  • Add new fusion argument to fix compatibility with onnxruntime v1.16.2 by @echarlaix in https://github.com/huggingface/optimum/pull/1535
  • Add depth-estimation w/ DPT+GLPN by @xenova in https://github.com/huggingface/optimum/pull/1529

- Python
Published by echarlaix over 2 years ago

optimum - v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization

ONNX

New architectures

Falcon

  • Add ONNX and ORT support for Falcon by @fxmarty in https://github.com/huggingface/optimum/pull/1391

SpeechT5

  • SpeechT5 ONNX support by @fxmarty in https://github.com/huggingface/optimum/pull/1404

Mistral

  • Add Mistral models ONNX export support by @echarlaix in https://github.com/huggingface/optimum/pull/1425

TrOCR

  • Enable KV cache support by @fxmarty in https://github.com/huggingface/optimum/pull/1456

LCMs

Enable LCMs (available in in diffusers since v0.22.0) ONNX export and ORT inference by @echarlaix in https://github.com/huggingface/optimum/pull/1469

```python from optimum.onnxruntime import ORTLatentConsistencyModelPipeline

pipe = ORTLatentConsistencyModelPipeline.frompretrained("SimianLuo/LCMDreamshaperv7", export=True) prompt = "sailing ship in storm by Leonardo da Vinci" images = pipe(prompt=prompt, numinferencesteps=4, guidancescale=8.0).images ``` Also enable ONNX export using the CLI :

bash optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/

Decoder refactorization

  • Add position ids as input during ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1381
  • Enable the export of only one decoder for decoder-only models by @echarlaix in https://github.com/huggingface/optimum/pull/1257

GPTQ

  • Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in https://github.com/huggingface/optimum/pull/1419
  • Disable exllamav2 for quantization by @SunMarc in https://github.com/huggingface/optimum/pull/1482
  • Default to exllama when exllamav2 is disabled by @SunMarc in https://github.com/huggingface/optimum/pull/1494
  • Added cacheblockoutputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in https://github.com/huggingface/optimum/pull/1479
  • Add support for CPU Inference by @vivekkhandelwal1 in https://github.com/huggingface/optimum/pull/1496
  • Fix minimum version of auto-gptq by @fxmarty in https://github.com/huggingface/optimum/pull/1504
  • switch to exllama_config instead of disabling exllamav2 by @SunMarc in https://github.com/huggingface/optimum/pull/1505

Other changes and bugfixes

  • Fix wrong dtype in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1369
  • Add support for loading quantization from config by @aarnphm https://github.com/huggingface/optimum/pull/1363
  • Guard multiprocessing set start method by @fxmarty in https://github.com/huggingface/optimum/pull/1377
  • Do not output KV cache when not using with-past in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1358
  • Fix provider availability check on ORT 1.16.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/1403
  • Fix quantization for onnxruntime v1.16.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1405
  • Fix normalized config key for models architecture by @echarlaix in https://github.com/huggingface/optimum/pull/1408
  • Fix arg in bettertransformer llama attention by @SunMarc in https://github.com/huggingface/optimum/pull/1421
  • Ignore .xml files for Stable Diffusion ORT downloads by @baskrahmer in https://github.com/huggingface/optimum/pull/1428
  • Falcon BetterTransformer requires transformers>=4.34 by @fxmarty in https://github.com/huggingface/optimum/pull/1431
  • Fix llama ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1432
  • Update attention.py by @DongHande in https://github.com/huggingface/optimum/pull/1416
  • Remove SharedDDP as it was deprecated from Transformers by @AdamLouly in https://github.com/huggingface/optimum/pull/1443
  • Fix owlvit task detection by @fxmarty in https://github.com/huggingface/optimum/pull/1453
  • Improve ONNX quantization doc by @fxmarty in https://github.com/huggingface/optimum/pull/1451
  • Fix perceiver tests and dummy inputs for ONNX by @baskrahmer in https://github.com/huggingface/optimum/pull/1449
  • Disable bart onnx export for text-classification and question-answering by @fxmarty in https://github.com/huggingface/optimum/pull/1457
  • Fix ONNX exporter library_name by @baskrahmer in https://github.com/huggingface/optimum/pull/1460
  • [ORT Training] Some important updates of ONNX Runtime training APIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/1335
  • Fix typo in BetterTransformer CLIP by @fxmarty in https://github.com/huggingface/optimum/pull/1468
  • Fix custom architecture detection in onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/1472
  • Fix whisper export by @mht-sharma in https://github.com/huggingface/optimum/pull/1503
  • Update Transformers dependency for Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1508
  • Fix argument error by @ranchlai in https://github.com/huggingface/optimum/pull/1501
  • Remove attention mask patching by @fxmarty in https://github.com/huggingface/optimum/pull/1509
  • Fix generation input by @echarlaix in https://github.com/huggingface/optimum/pull/1512
  • Fix tests ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/1517
  • Fix BT on transformers 4.35 release by @fxmarty in https://github.com/huggingface/optimum/pull/1518

New Contributors

  • @aarnphm made their first contribution in https://github.com/huggingface/optimum/pull/1363
  • @DongHande made their first contribution in https://github.com/huggingface/optimum/pull/1416
  • @AlexKoff88 made their first contribution in https://github.com/huggingface/optimum/pull/1479
  • @vivekkhandelwal1 made their first contribution in https://github.com/huggingface/optimum/pull/1496
  • @ranchlai made their first contribution in https://github.com/huggingface/optimum/pull/1501

- Python
Published by echarlaix over 2 years ago

optimum - v1.13.3: Patch release

Patch release for transformers==4.34.1 compatibility. We will do a release next week for transformers==4.35 compatibility and new features. Please bear with us!

  • Falcon BetterTransformer requires transformers>=4.34 by @fxmarty https://github.com/huggingface/optimum/pull/1431
  • Fix arg in bettertransformer llama attention by @SunMarc #1421
  • Update Transformers dependency for Habana extra by @regisss #1508
  • temporarily pin to transformers<4.35 by @fxmarty https://github.com/huggingface/optimum/commit/616931019b9bd7546918a48d475a07efb92f51b1

- Python
Published by fxmarty over 2 years ago

optimum - v1.13.2: Patch release

  • Fix provider availability check on ORT 1.16.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/1403
  • Fix ONNX Runtime quantization compatibility for onnxruntime v1.16.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1405

- Python
Published by echarlaix over 2 years ago

optimum - v1.13.1: Patch release

Fix ONNX fp16 export that broke in 1.13.0.

What's Changed

  • Fix wrong dtype in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1369
  • Fix tests collection for TFLite export and trigger TFLite tests only when relevant by @fxmarty in https://github.com/huggingface/optimum/pull/1368
  • upgrade min compatible optimum-intel version by @echarlaix in https://github.com/huggingface/optimum/pull/1371
  • Fix fp16 ONNX export test by @fxmarty in https://github.com/huggingface/optimum/pull/1373

- Python
Published by fxmarty almost 3 years ago

optimum - v1.13.0: ONNX weight deduplication, ONNX export and ORT extension

Deduplicate Embedding / LM head weight in the ONNX export

Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: https://github.com/pytorch/pytorch/issues/108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.

  • Fix PyTorch tied weights being duplicated in the exported ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/1326
  • Fix initializer detection for weight deduplication by @fxmarty in https://github.com/huggingface/optimum/pull/1333

Extended ONNX Runtime support

ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.

  • Pix2Struct onnxruntime support by @krathul in https://github.com/huggingface/optimum/pull/1296
  • Add MPT onnx and ORT support by @jiqing-feng in https://github.com/huggingface/optimum/pull/1161
  • Donut iobinding by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1209
  • Add encoder decoder model by @mht-sharma in https://github.com/huggingface/optimum/pull/851

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

Additionally, the model SAM is now be default exported as a visionencoder.onnx, and promptencodermaskdecoder.onnx.

  • Add MPT onnx and ORT support by @jiqing-feng in https://github.com/huggingface/optimum/pull/1161
  • Adds ONNX Export Support for Timm Models by @mht-sharma in https://github.com/huggingface/optimum/pull/965
  • Add encoder decoder model by @mht-sharma in https://github.com/huggingface/optimum/pull/851
  • Fix SAM ONNX export requirements with transformers 4.32, export vision encoder separately by @fxmarty in https://github.com/huggingface/optimum/pull/1301

BetterTransformer supports Falcon

  • [BetterTransformer] Add falcon to BetterTransformer by @younesbelkada in https://github.com/huggingface/optimum/pull/1343

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

The function exllama_set_max_input_length from auto-gptq can now be used with Transformers GPTQ models.

  • Version bump + add maxinputlength to gptq by @SunMarc in https://github.com/huggingface/optimum/pull/1329

Other changes and bugfixes

  • Update version to 1.12.1.dev0 following release by @fxmarty in https://github.com/huggingface/optimum/pull/1312

  • Add GPTQ prefill benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1313

  • Precise ORTModel documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1268

  • Improve BetterTransformer backward compatibility by @fxmarty in https://github.com/huggingface/optimum/pull/1314

  • Improve ORTModel documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1245

  • Add bitsandbytes benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1320

  • fix typo in log message by @AAnirudh07 in https://github.com/huggingface/optimum/pull/1322

  • Support customize dtype for dummy generators by @JingyaHuang in https://github.com/huggingface/optimum/pull/1307

  • Fix opset custom onnx export by @mht-sharma in https://github.com/huggingface/optimum/pull/1331

  • Replace mpt to ernie custom export by @mht-sharma in https://github.com/huggingface/optimum/pull/1332

  • Fix BT benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/1344

  • Add nameorpath for donut generation by @fxmarty in https://github.com/huggingface/optimum/pull/1345

  • send both negative prompt embeds to ORT SDXL by @ssube in https://github.com/huggingface/optimum/pull/1339

  • add vae image processor by @echarlaix in https://github.com/huggingface/optimum/pull/1219

  • add negative prompt test by @echarlaix in https://github.com/huggingface/optimum/pull/1347

  • Add GPT BigCode to the BT documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1356

  • Add BT dummy objects by @fxmarty in https://github.com/huggingface/optimum/pull/1355

  • Add text2text-generation-with-past test for encoder-decoder model by @mht-sharma in https://github.com/huggingface/optimum/pull/1338

  • Fix sentence transformer export by @mht-sharma in https://github.com/huggingface/optimum/pull/1366

New Contributors

  • @krathul made their first contribution in https://github.com/huggingface/optimum/pull/1296
  • @AAnirudh07 made their first contribution in https://github.com/huggingface/optimum/pull/1322
  • @jiqing-feng made their first contribution in https://github.com/huggingface/optimum/pull/1161
  • @ssube made their first contribution in https://github.com/huggingface/optimum/pull/1339

Full Changelog: https://github.com/huggingface/optimum/compare/v1.12.0...v1.13.0

- Python
Published by fxmarty almost 3 years ago

optimum - v1.12.0: AutoGPTQ integration, extended BetterTransformer support

AutoGPTQ integration

Part of AutoGPTQ library has been integrated in Optimum, with utilities to ease the integration in other Hugging Face libraries. Reference: https://huggingface.co/docs/optimum/llmquantization/usageguides/quantization

  • Add GPTQ Quantization by @SunMarc in https://github.com/huggingface/optimum/pull/1216
  • Fix GPTQ doc by @regisss in https://github.com/huggingface/optimum/pull/1267
  • Add AutoGPTQ benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1292
  • Fix gptq params by @SunMarc in https://github.com/huggingface/optimum/pull/1284

Extended BetterTransformer support

BetterTransformer now supports BLOOM and GPT-BigCode architectures.

  • Bt bloom by @baskrahmer in https://github.com/huggingface/optimum/pull/1221
  • Support gpt_bigcode in bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/1252
  • Fix BetterTransformer starcoder init by @fxmarty in https://github.com/huggingface/optimum/pull/1254
  • Fix BT starcoder fp16 by @fxmarty in https://github.com/huggingface/optimum/pull/1255
  • SDPA dispatches to flash for MQA by @fxmarty in https://github.com/huggingface/optimum/pull/1259
  • Check output_attentions is False in BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/1306

Other changes and bugfixes

  • Update bug report template by @fxmarty in https://github.com/huggingface/optimum/pull/1266
  • Fix ORTModule uses fp32 model issue by @jingyanwangms in https://github.com/huggingface/optimum/pull/1264
  • Fix build PR doc workflow by @fxmarty in https://github.com/huggingface/optimum/pull/1270
  • Avoid triggering stop job on label by @fxmarty in https://github.com/huggingface/optimum/pull/1274
  • Update version following 1.11.1 patch by @fxmarty in https://github.com/huggingface/optimum/pull/1275
  • Fix fp16 ONNX detection for decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/1276
  • Update version following 1.11.2 patch by @regisss in https://github.com/huggingface/optimum/pull/1291
  • Pin tensorflow<=2.12.1 by @fxmarty in https://github.com/huggingface/optimum/pull/1305
  • ONNX: disable text-generation models for sequence classification & fixes for transformers 4.32 by @fxmarty in https://github.com/huggingface/optimum/pull/1308
  • Fix staging tests following transformers 4.32 release by @fxmarty in https://github.com/huggingface/optimum/pull/1309
  • More fixes following transformers 4.32 release by @fxmarty in https://github.com/huggingface/optimum/pull/1311

New Contributors

  • @SunMarc made their first contribution in https://github.com/huggingface/optimum/pull/1216
  • @jingyanwangms made their first contribution in https://github.com/huggingface/optimum/pull/1264

Full Changelog: https://github.com/huggingface/optimum/compare/v1.11.2...v1.12.0

- Python
Published by fxmarty almost 3 years ago

optimum - v1.11.2: Patch release

Remove the Transformers version constraint on optimum[habana].

  • Remove Transformers version constraint on Optimum Habana #1290 by @regisss

Full Changelog: https://github.com/huggingface/optimum/compare/v1.11.1...v1.11.2

- Python
Published by regisss almost 3 years ago

optimum - v1.11.1: Patch release

Minor fix: documentation building for 1.11.

Full Changelog: https://github.com/huggingface/optimum/compare/v1.11.0...v1.11.1

- Python
Published by fxmarty almost 3 years ago

optimum - v1.11.0: Extended ONNX, ONNX Runtime, BetterTransformer support

Extended ONNX and ONNX Runtime support

Add ONNX export and ONNX Runtime inference support for gpt bigcode.

  • Add ONNX / ONNXRuntime support for StarCoder by @JingyaHuang in #1042

Extended BetterTransformer support

BetterTransformer now supports Llama 2 and bark.

Training and autocast are now supported for most architectures, please refer to the documentation for more details: https://huggingface.co/docs/optimum/main/en/bettertransformer/overview

  • Support Llama 2 in BetterTransformer. by @noamwies in #1235
  • BetterTransformer support training & autocast for all archs by @fxmarty in #1225
  • Add bark into bettertransformer by @ylacombe in https://github.com/huggingface/optimum/pull/1199
  • Drop mask for training in all cases for BetterTransformer & precise documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1250

Major bugfixes

  • Update ORT training to be compatible with transformers 4.31 by @JingyaHuang in #1227

Other improvements and bugfix

  • add upgrade strategy by @echarlaix in https://github.com/huggingface/optimum/pull/1228
  • fix typo README by @echarlaix in https://github.com/huggingface/optimum/pull/1230
  • Fix OwlViT exporter config by @regisss in https://github.com/huggingface/optimum/pull/1188
  • Add example SD XL documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1233
  • fix SD loading when safetensors weights only by @echarlaix in https://github.com/huggingface/optimum/pull/1232
  • fix optimum-intel min version by @echarlaix in https://github.com/huggingface/optimum/pull/1234
  • fix typo documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1238
  • update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1240
  • Update onnxruntime minimum version to 1.11 by @fxmarty in https://github.com/huggingface/optimum/pull/1244
  • ORT quantizes by default all ops by @fxmarty in https://github.com/huggingface/optimum/pull/1246

New Contributors

  • @ylacombe made their first contribution in https://github.com/huggingface/optimum/pull/1199
  • @noamwies made their first contribution in https://github.com/huggingface/optimum/pull/1235

Full Changelog: https://github.com/huggingface/optimum/compare/v1.10.0...v1.11.0

- Python
Published by JingyaHuang almost 3 years ago

optimum - v1.10.1: Patch release

  • Fix OwlViT exporter by @regisss in https://github.com/huggingface/optimum/pull/1188

  • Fix SD loading when safetensors weights only by @echarlaix in https://github.com/huggingface/optimum/pull/1232

  • Fix optimum-intel version requirements by @echarlaix in https://github.com/huggingface/optimum/pull/1234

Full Changelog: https://github.com/huggingface/optimum/compare/v1.10.0...v1.10.1

- Python
Published by echarlaix almost 3 years ago

optimum - v1.10.0: Stable Diffusion XL pipelines

Stable Diffusion XL

Enable SD XL ONNX export and ONNX Runtime inference by @echarlaix in https://github.com/huggingface/optimum/pull/1168

  • Enable SD XL ONNX export using the CLI :

optimum-cli export onnx --model stabilityai/stable-diffusion-xl-base-0.9 --task stable-diffusion-xl ./sd_xl_onnx

  • Add SD XL pipelines for ONNX Runtime inference (supported tasks : text-to-image and image-to-image) :

```python from optimum.onnxruntime import ORTStableDiffusionXLPipeline

modelid = "stabilityai/stable-diffusion-xl-base-0.9" pipeline = ORTStableDiffusionXLPipeline.frompretrained(model_id, export=True)

prompt = "sailing ship in storm by Leonardo da Vinci" image = pipeline(prompt).images[0] pipeline.save_pretrained("onnx-sd-xl-base-0.9") ```

Stable Diffusion pipelines

Enable image-to-image and inpainting pipelines for ONNX Runtime inference by @echarlaix in https://github.com/huggingface/optimum/pull/1121

More examples in documentation

Major bugfixes

  • Fix bloom KV cache usage in ORTForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/1152

What's Changed

  • Add stable diffusion example by @prathikr in https://github.com/huggingface/optimum/pull/1136
  • Fixed incomplete ONNX export model memory release issue by @sharpbai in https://github.com/huggingface/optimum/pull/1154
  • Add trust remote code option for config by @changwangss in https://github.com/huggingface/optimum/pull/1151
  • Fix typos of ONNXRuntimme -> ONNXRuntime by @mgoin in https://github.com/huggingface/optimum/pull/1155
  • Fix ONNX export for MobileViT for segmentation by @regisss in https://github.com/huggingface/optimum/pull/1128
  • Revert "update the default block size" by @rui-ren in https://github.com/huggingface/optimum/pull/1162
  • ONNX export for custom architectures & models with custom modeling code by @fxmarty in https://github.com/huggingface/optimum/pull/1166
  • Update Optimum Neuron doc by @regisss in https://github.com/huggingface/optimum/pull/1164
  • Fix stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1173
  • Add gptbigcode modeltype to NormalizedTextConfig by @changwangss in https://github.com/huggingface/optimum/pull/1170
  • Allow attention_mask=None for BetterTransformer in the inference batched case for gpt2 & gpt-neo by @fxmarty in https://github.com/huggingface/optimum/pull/1180
  • Fix encoder attention mask input order for ORT by @fxmarty in https://github.com/huggingface/optimum/pull/1181
  • Fix ORTModel initialization on specific device id by @fxmarty in https://github.com/huggingface/optimum/pull/1182
  • Add stable diffusion img2img and inpain documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1149
  • Fix SD XL ONNX export for img2img task by @echarlaix in https://github.com/huggingface/optimum/pull/1194
  • Remove graphcore from documentation quickstart by @echarlaix in https://github.com/huggingface/optimum/pull/1201
  • Unpin tensorflow by @fxmarty in https://github.com/huggingface/optimum/pull/1211
  • Fix ORT test for unknown architecture for task by @fxmarty in https://github.com/huggingface/optimum/pull/1212
  • add ort + stable diffusion documentation by @prathikr in https://github.com/huggingface/optimum/pull/1205
  • Fix vision encoder decoder that may not cache cross-attention by @fxmarty in https://github.com/huggingface/optimum/pull/1210
  • Add documentation for Optimum Furiosa by @regisss in https://github.com/huggingface/optimum/pull/1165
  • Add BLIP-2 to BetterTransformer documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1218
  • Set default value to unet config sample size by @echarlaix in https://github.com/huggingface/optimum/pull/1223
  • Fix broken link in doc by @regisss in https://github.com/huggingface/optimum/pull/1222
  • Fix BT test by @fxmarty in https://github.com/huggingface/optimum/pull/1224
  • Add SD XL documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1198
  • Update setup.py to add optimum-furiosa extras by @mht-sharma in https://github.com/huggingface/optimum/pull/1226

New Contributors

  • @sharpbai made their first contribution in https://github.com/huggingface/optimum/pull/1154
  • @mgoin made their first contribution in https://github.com/huggingface/optimum/pull/1155

Full Changelog: https://github.com/huggingface/optimum/compare/v1.9.0...v1.10.0

- Python
Published by echarlaix almost 3 years ago

optimum - v1.9.1: Patch release

  • Fix stable diffusion ONNX export for diffusers>=v0.18.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1173

Full Changelog: https://github.com/huggingface/optimum/compare/v1.9.0...v1.9.1

- Python
Published by echarlaix almost 3 years ago

optimum - v1.9: extended ONNX, ONNX Runtime support

Improved memory management in the ONNX export

Lower memory usage during the ONNX export. This is especially useful to export large models, or on cuda device. Until PyTorch 2.1 release, we recommend to use PyTorch nightly in case memory issues are encountered, as two major bugs were fixed on PyTorch side: https://github.com/pytorch/pytorch/pull/101134 https://github.com/pytorch/pytorch/pull/101148

  • Run validation of exported model in no_grad mode by @fxmarty in https://github.com/huggingface/optimum/pull/1111
  • Load model directly on cuda device for the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1112
  • Lower GPU memory requirements at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1115

Extended ONNX export

The ONNX export now supports the sam, lilt, pix2struct, cvt and owlvit architectures.

  • Sam ONNX export support by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1025
  • Add onnx exporter for Lilt model by @mariababich in https://github.com/huggingface/optimum/pull/1098
  • Add pix2struct to ONNX support (v2) by @arvisioncode in https://github.com/huggingface/optimum/pull/1034
  • Add CvTONNX Config by @rishabbala in https://github.com/huggingface/optimum/pull/1131
  • Support document-question-answering ONNX export for vision-encoder-decoder by @fxmarty in https://github.com/huggingface/optimum/pull/1110
  • add owlvit by @darwinharianto in https://github.com/huggingface/optimum/pull/1067

Support of custom ONNX configurations for export

The method main_export now supports two arguments model_kwargs and custom_onnx_configs that allow for a more custom export for advanced users. Reference.

  • [ONNX export] Ability to pass arbitrary kwargs, custom ONNX configs by @fxmarty in https://github.com/huggingface/optimum/pull/1143

Extended BetterTransformer support

  • Add blip-2 to bettertransformer by @baskrahmer in https://github.com/huggingface/optimum/pull/1125
  • Support llama bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/998

ONNX Runtime: use IO Binding by default for decoder models on CPUExecutionProvider

IO Binding is useful not only to avoid RAM/device memory copies, but also simply between numpy tensors and OrtValue. Thus, for autoregressive tasks we enable IO Binding as a default on CPUExecutionProvider as well, which may bring >10% speedup for large context lengths.

  • Enable useiobinding = True on CPU by @yihonglyu in https://github.com/huggingface/optimum/pull/1087

ORTModelForSpeechSeq2Seq supported in ORTOptimizer

  • added ORTModelForSpeechSeq2Seq support to optimizer by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1068

Major bugfixes

  • Use mask for seq2seq ONNX decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/1076

What's Changed

  • Fix protobuf max allowed size by @fxmarty in https://github.com/huggingface/optimum/pull/988
  • Add Whisper to ORT optimizer configuration by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/986
  • Fix sentence-similarity task in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/996
  • Simplify auto task detection by @fxmarty in https://github.com/huggingface/optimum/pull/997
  • Fix merged decoder usage with fp16 by @fxmarty in https://github.com/huggingface/optimum/pull/1006
  • Fix past key value generator used for ONNX export validation for t5/mt5 by @fxmarty in https://github.com/huggingface/optimum/pull/1007
  • Fix typo for custom shapes passed at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1008
  • Fix _versions.yml upload in doc build by @regisss in https://github.com/huggingface/optimum/pull/1003
  • ORTQuantizer supports subgraphs by @fxmarty in https://github.com/huggingface/optimum/pull/1009
  • fix for huggingface_hub last release by @echarlaix in https://github.com/huggingface/optimum/pull/1014
  • Add links to documentation to README by @echarlaix in https://github.com/huggingface/optimum/pull/1013
  • Upate documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1011
  • update optimum intel description by @echarlaix in https://github.com/huggingface/optimum/pull/1015
  • fix: ValueError offload_dir by @orangetin in https://github.com/huggingface/optimum/pull/993
  • Sentence transformers ONNX export fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1029
  • Add OpenVINO notebooks by @echarlaix in https://github.com/huggingface/optimum/pull/1030
  • Fix task inference for sam by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1031
  • fix typo by @echarlaix in https://github.com/huggingface/optimum/pull/1033
  • added types to new fields in OptimizationConfig by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1036
  • Fix some typos in the quantization guide by @dcferreira in https://github.com/huggingface/optimum/pull/1041
  • Optional attention_mask in ORTModelForxxx by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1045
  • ONNX SAM export - change input_points data type by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1048
  • masked-im output name fix for transformers >= 4.29.0 by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1049
  • remove torchvision requirement by @BramVanroy in https://github.com/huggingface/optimum/pull/1052
  • Update version by @regisss in https://github.com/huggingface/optimum/pull/1058
  • Bump package version by @regisss in https://github.com/huggingface/optimum/pull/1062
  • Raise MinimumVersionError when OnnxConfig.MINTORCHVERSION is not satisfied by @regisss in https://github.com/huggingface/optimum/pull/1070
  • Remove deprecated argument from tests and examples by @echarlaix in https://github.com/huggingface/optimum/pull/1072
  • Detect model type for all transformers models in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/1075
  • Fix HF Push to hub by @JingyaHuang in https://github.com/huggingface/optimum/pull/1080
  • Fix float16 ORT conversion for models > 2GB by @fxmarty in https://github.com/huggingface/optimum/pull/1079
  • Update doc workflows by @regisss in https://github.com/huggingface/optimum/pull/1093
  • Error out on ORTQuantizer.quantize call for static quantization when no calibration range is provided by @fxmarty in https://github.com/huggingface/optimum/pull/1094
  • Add mpt model_type to NormalizedTextConfig by @changwangss in https://github.com/huggingface/optimum/pull/1101
  • Fix doc build by @regisss in https://github.com/huggingface/optimum/pull/1107
  • Improve the offline support for the ONNX/TFLite export by @fxmarty in https://github.com/huggingface/optimum/pull/1109
  • Add ViT to ORTConfigManager by @baskrahmer in https://github.com/huggingface/optimum/pull/1117
  • Fix TasksManager getmodelfrom_task with None device by @fxmarty in https://github.com/huggingface/optimum/pull/1122
  • Small typos by @baskrahmer in https://github.com/huggingface/optimum/pull/1124
  • Refactor BetterTransformerManager requirement validation methods by @baskrahmer in https://github.com/huggingface/optimum/pull/1132
  • update the default block size by @rui-ren in https://github.com/huggingface/optimum/pull/1137
  • Update ORT training docker to 1.15 by @JingyaHuang in https://github.com/huggingface/optimum/pull/1139
  • Adamlouly/fix unwrap model eval by @AdamLouly in https://github.com/huggingface/optimum/pull/1099
  • Remove version pinning for onnx package by @cody-moveworks in https://github.com/huggingface/optimum/pull/1141

New Contributors

  • @orangetin made their first contribution in https://github.com/huggingface/optimum/pull/993
  • @dcferreira made their first contribution in https://github.com/huggingface/optimum/pull/1041
  • @BramVanroy made their first contribution in https://github.com/huggingface/optimum/pull/1052
  • @darwinharianto made their first contribution in https://github.com/huggingface/optimum/pull/1067
  • @mariababich made their first contribution in https://github.com/huggingface/optimum/pull/1098
  • @changwangss made their first contribution in https://github.com/huggingface/optimum/pull/1101
  • @arvisioncode made their first contribution in https://github.com/huggingface/optimum/pull/1034
  • @yihonglyu made their first contribution in https://github.com/huggingface/optimum/pull/1087
  • @rui-ren made their first contribution in https://github.com/huggingface/optimum/pull/1137
  • @cody-moveworks made their first contribution in https://github.com/huggingface/optimum/pull/1141
  • @rishabbala made their first contribution in https://github.com/huggingface/optimum/pull/1131

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.0...v1.9.0

- Python
Published by fxmarty almost 3 years ago

optimum - v1.8.8: Patch release

  • Fix optimum model inference compatibility with transformers>=v4.30.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1102
  • Fix stable diffusion ONNX export following diffusers breaking change by @fxmarty in https://github.com/huggingface/optimum/pull/1116

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.7...v1.8.8

- Python
Published by echarlaix about 3 years ago

optimum - v1.8.7: Patch release

  • Restrict transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1097

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.6...v1.8.7

- Python
Published by echarlaix about 3 years ago

optimum - v1.8.6: Patch release

  • Fix CLI for exporting models to TFLite by @regisss #1059

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.5...v1.8.6

- Python
Published by regisss about 3 years ago

optimum - v1.8.5: Patch release

  • Add transformers<4.29.0 in Habana extra by @regisss in #1047

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.4...v1.8.5

- Python
Published by regisss about 3 years ago

optimum - v1.8.4: Patch release

  • Set onnx requirement by @echarlaix @regisss in https://github.com/huggingface/optimum/pull/1037

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.3...v1.8.4

- Python
Published by echarlaix about 3 years ago

optimum - v1.8.3: Patch release

  • Fix Stable Diffusion model ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1020
  • Add optimum-neuron extra by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1021

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.2...v1.8.3

- Python
Published by echarlaix about 3 years ago

optimum - v1.8: extended BetterTransformer support, ONNX merged seq2seq models

Extended BetterTransformer support

Various improvements in the PyTorch BetterTransformer integration.

  • [BT] add BetterTransformer support for ProphetNet by @hirotasoshu in https://github.com/huggingface/optimum/pull/923
  • Improve bettertransformer benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/939
  • Fix sdpa with batch size = 1, better benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/915
  • Fix slow tests & sdpa dropout by @fxmarty in https://github.com/huggingface/optimum/pull/974
  • Remove getattr overhead in spda by @fxmarty in https://github.com/huggingface/optimum/pull/934
  • [BT] Improve docs by @younesbelkada in https://github.com/huggingface/optimum/pull/944

ONNX merged seq2seq models

Instead of using two separate decoder_model.onnx and decoder_with_past_model.onnx models, a single decoder can be used for encoder-decoder models: decoder_model_merged.onnx. This allows to avoid duplicated weights in the two without/with past ONNX models.

By default, if available, the decoder_model_merged.onnx will be used in the ORTModel integration. This can be disabled with the option --no-post-process in the ONNX export CLI, and with use_merged=False in the ORTModel.from_pretrained method.

Example:

optimum-cli export onnx --model t5-small t5_onnx

will give:

└── t5_onnx    ├── config.json    ├── decoder_model_merged.onnx    ├── decoder_model.onnx    ├── decoder_with_past_model.onnx    ├── encoder_model.onnx    ├── generation_config.json    ├── special_tokens_map.json    ├── spiece.model    ├── tokenizer_config.json    └── tokenizer.json

And decoder_model_merged.onnx is enough to be used for inference. We strongly recommend to inspect the subgraphs with netron to understand what are the inputs/outputs, in case the exported model is to be used with an other engine than ONNX Runtime in the Optimum integration.

  • Fix encoder-decoder ONNX merge by @fxmarty in https://github.com/huggingface/optimum/pull/924
  • Support the merge of decoder without/with past for encoder-decoder models in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/926
  • Support merged seq2seq models in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/930

New models in the ONNX export

  • Add llama onnx export & onnxruntime support by @nenkoru in https://github.com/huggingface/optimum/pull/975

Major bugfix

  • Remove constant output in encoder-decoder ONNX models decoder with past by @fxmarty in https://github.com/huggingface/optimum/pull/920
  • Hash tensor data during deduplication by @VikParuchuri in https://github.com/huggingface/optimum/pull/932

Potentially breaking changes

The TasksManager replaces legacy tasks names by the canonical ones used on the Hub and in transformers metadata: - sequence-classification becomes text-classification, - causal-lm becomes text-generation, - seq2seq-lm becomes text2text-generation, - speech2seq-lm and audio-ctc becomes automatic-speech-recognition, - default becomes feature-extraction, - masked-lm becomes fill-mask, - vision2seq-lm becomes image-to-text

This should not break anything except if you rely on private methods and attributes from TasksManager.

  • Allow to use a custom class in TasksManager & use canonical tasks names by @fxmarty in https://github.com/huggingface/optimum/pull/967

What's Changed

  • Update ort trainer to transformers 4.27.2 by @JingyaHuang in https://github.com/huggingface/optimum/pull/917
  • Compute Loss inside the training step. by @AdamLouly in https://github.com/huggingface/optimum/pull/686
  • Fix ORTModel MRO for whisper by @fxmarty in https://github.com/huggingface/optimum/pull/919
  • add ORTStableDiffusionPipeline reference in documentation by @echarlaix in https://github.com/huggingface/optimum/pull/890
  • Fix decoder ONNX model loading from the Hub by @fxmarty in https://github.com/huggingface/optimum/pull/929
  • optimun-cli onnxruntime quantize / optimize output argument is now required by @michaelbenayoun in https://github.com/huggingface/optimum/pull/927
  • Register mechanism for the Optimum CLI by @michaelbenayoun in https://github.com/huggingface/optimum/pull/928
  • Ensure backward compatibility of ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/933
  • Update the README by @michaelbenayoun in https://github.com/huggingface/optimum/pull/925
  • Update README by @echarlaix in https://github.com/huggingface/optimum/pull/941
  • Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/942
  • Remove GC from README by @michaelbenayoun in https://github.com/huggingface/optimum/pull/943
  • Add user and token for CI by @michaelbenayoun in https://github.com/huggingface/optimum/pull/945
  • Update README by @echarlaix in https://github.com/huggingface/optimum/pull/946
  • optimum-cli print the help of subcommands by @michaelbenayoun in https://github.com/huggingface/optimum/pull/940
  • Remove from_transformers references from the documentation by @fxmarty in https://github.com/huggingface/optimum/pull/935
  • Turn command import into optional by @JingyaHuang in https://github.com/huggingface/optimum/pull/936
  • Auto-set usemerged to False if usecache is passed as False by @fxmarty in https://github.com/huggingface/optimum/pull/954
  • Raise error with usecache=False, useio_binding=True by @fxmarty in https://github.com/huggingface/optimum/pull/955
  • Add an ORT training notebook by @JingyaHuang in https://github.com/huggingface/optimum/pull/959
  • Fix issue with doc build sometimes failing silently in GH workflows by @regisss in https://github.com/huggingface/optimum/pull/960
  • Fix typos by @regisss in https://github.com/huggingface/optimum/pull/963
  • Disable tests upon transformers 4.28 release by @fxmarty in https://github.com/huggingface/optimum/pull/976

New Contributors

  • @hirotasoshu made their first contribution in https://github.com/huggingface/optimum/pull/923
  • @VikParuchuri made their first contribution in https://github.com/huggingface/optimum/pull/932

Full Changelog: https://github.com/huggingface/optimum/compare/v1.7.3...v1.8.2

- Python
Published by fxmarty about 3 years ago

optimum - v1.7.3: Patch release for PyTorch 2.0 and transformers 4.27.0

This patch releases fixes a few bugs with PyTorch 2.0 release, and include a few new features as well.

Breaking change: constant outputs removed from ONNX encoder-decoder models

We removed some constant past key values outputs from encoder-decoder models in the ONNX export. Beware that this could potentially break your existing code, but we recommend to use the new exported models as this removes unnecessary Identity nodes in the models.

  • Remove constant outputs from decoder with past ONNX model for encoder-decoder architectures by @fxmarty in https://github.com/huggingface/optimum/pull/872

torch.nn.functional.scaled_dot_product_attention support for decoders in BetterTransformer

Pytorch 2.0 introduces in beta torch.nn.functional.scaled_dot_product_attention, a fastpath for attention extending their accelerated transformer features. This is included in optimum.bettertransformer to be used with the following architectures: Bart, Blenderbot, GPT2, GTP-J, M2M100, Marian, Mbart, OPT, Pegasus, T5.

Beware that this is still experimental and speedups have yet to be validated on all architectures.

PyTorch's scaled_dot_product_attention allows to use flash attention and memory efficient attention natively in PyTorch.

Usage is as follow:

```python from transformers import AutoTokenizer, AutoModelForCausalLM from optimum.bettertransformer import BetterTransformer

tokenizer = AutoTokenizer.frompretrained("gpt2") model = AutoModelForCausalLM.frompretrained("gpt2")

model = BetterTransformer.transform(model) # modify transformers modeling to use native scaleddotproduct_attention

do you inference or training here

model = BetterTransformer.reverse(model) # go back to using canonical transformers modeling model.savepretrained("gpt2model") ```

Inference benchmark (on fp16):

| Model | batch size | Input sequence length | Generated tokens | Latency eager (s) | Latency BT (s) | Speedup | Peak memory eager (MB) | Peak memory BT (MB) | Memory savings | |--------------|------------|-----------------------|------------------|-------------------|-------------------------------|---------|------------------------|------------------------------------|----------------| | gpt2 | 1 | 64 | 256 | 1.800 | 1.607 | 12.0% | 569.90 | 569.89 | 0% | | gpt2 | 64 | 64 | 256 | 2.159 | 1.617 | 33.5% | 2067.45 | 2093.80 | 0% | | opt-1.3b | 1 | 64 | 256 | 3.010 | 2.667 | 12.9% | 5408.238 | 5408.238 | 0% | | gpt-neox-20b | 1 | 64 | 256 | 10.869 | 9.937 | 9.4% | 83670.67 | 83673.53 | 0% |

Training benchmark (on fp16):

| Model | batch size | Sequence length | time/epoch (eager, s) | time/epoch (BT, s) | Speedup | Peak memory eager (MB) | Peak memory BT (MB) | Memory savings | |-------|------------|-----------------|------------------------------|------------------------------------------|---------|------------------------|------------------------------------|----------------| | gpt2 | 8 | 1024 | 17.732 | 14.037 | 26.3% | 13291.16 | 10191.52 | 30.4% | | gpt2 | 32 | 1024 | 17.336 | 13.309 | 30.3% | 52834.83 | 38858.56 | 36.0% | | gpt2 | 64 | 1024 | OOM | 14.067 | / | OOM | 75600.08 | / |

Benchmarks can be reproduced using the inference script and training script:

python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256 python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256 --seqlen-stdev 0

  • Add scaleddotproduct_attention support for decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/853
  • Support scaleddotproduct_attention for t5 by @fxmarty in https://github.com/huggingface/optimum/pull/856
  • [BT] add decoder benchmark script by @younesbelkada in https://github.com/huggingface/optimum/pull/857
  • [BT] Fix bt benchmark by @younesbelkada in https://github.com/huggingface/optimum/pull/858
  • Fix pytorch version check in bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/862
  • [BT] Add fp16 support by @younesbelkada in https://github.com/huggingface/optimum/pull/859
  • [BT] Add decoder training support by @younesbelkada in https://github.com/huggingface/optimum/pull/860
  • Bart support scaleddotproduct_attention by @fxmarty in https://github.com/huggingface/optimum/pull/863
  • [BT] add accelerate_test markers by @younesbelkada in https://github.com/huggingface/optimum/pull/864
  • Mbart, pegasus, blenderbot, marian, m2m100 support scaleddotproductattention by @fxmarty in https://github.com/huggingface/optimum/pull/865
  • Add bettertransformer reverse transform by @fxmarty in https://github.com/huggingface/optimum/pull/868
  • Add bettertransformer training benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/873

New architectures in the ONNX export

Three additional architectures are supported in the ONNX export: ImageGPT, RegNet, OPT.

  • Adding ONNX support for ImageGPT by @adit299 in https://github.com/huggingface/optimum/pull/819
  • Add ONNX support for RegNet by @asrimanth in https://github.com/huggingface/optimum/pull/833
  • Adding support for Facebook's OPT models by @hivaze in https://github.com/huggingface/optimum/pull/852

(WIP) TFLite export with quantization support

Continued progress in the TFLite export with quantization support. This is work in progress and not documented yet.

  • Quantization with TFLite by @michaelbenayoun in https://github.com/huggingface/optimum/pull/854

Bugfixes and improvements

  • Update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/843
  • Fix typo in documentation by @regisss in https://github.com/huggingface/optimum/pull/848
  • Remove redundant code by @mht-sharma in https://github.com/huggingface/optimum/pull/841
  • Update README by @echarlaix in https://github.com/huggingface/optimum/pull/850
  • Update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/855
  • Remove iobinding ORTModelForCTC by @mht-sharma in https://github.com/huggingface/optimum/pull/840
  • Fix typo in documentation by @echarlaix in https://github.com/huggingface/optimum/pull/861
  • Fix causal-lm ONNX axis names by @fxmarty in https://github.com/huggingface/optimum/pull/871
  • add NNCF openvino notebook by @echarlaix in https://github.com/huggingface/optimum/pull/875
  • Remove positional-only parameters not support by python < v3.8 by @echarlaix in https://github.com/huggingface/optimum/pull/881
  • lazy import for task manager by @JingyaHuang in https://github.com/huggingface/optimum/pull/844
  • Remove onnx and ort dependencies on the TasksManager by @michaelbenayoun in https://github.com/huggingface/optimum/pull/846
  • Reactivate export & optimization tests for causal-lm models by @fxmarty in https://github.com/huggingface/optimum/pull/885
  • Fix ONNX export on transformers 4.27 release by @fxmarty in https://github.com/huggingface/optimum/pull/884
  • Do not use scaleddotproduct_attention for stable diffusion onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/888
  • Fix loading of an ONNX stable diffusion model when config doesn't match by @echarlaix in https://github.com/huggingface/optimum/pull/887
  • Automatic framework detection in TasksManager for large models by @fxmarty in https://github.com/huggingface/optimum/pull/883
  • Fix WavLM onnx export upon torch 2.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/889
  • Fix PushToHubMixin.createrepo according to transformers 4.27 release by @fxmarty in https://github.com/huggingface/optimum/pull/892
  • Fix stable diffusion framework detection by @fxmarty in https://github.com/huggingface/optimum/pull/893
  • Add donut CPU inference ORT by @mht-sharma in https://github.com/huggingface/optimum/pull/761
  • Fix check_model for large merged ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/896
  • Drop python 3.7 support by @fxmarty in https://github.com/huggingface/optimum/pull/891
  • Fix dummy label generator for vision tasks by @JingyaHuang in https://github.com/huggingface/optimum/pull/900
  • Add stable diffusion dummy object by @echarlaix in https://github.com/huggingface/optimum/pull/899
  • Automatic support for large ONNX models in ORTOptimizer by @fxmarty in https://github.com/huggingface/optimum/pull/886
  • Remove subprocess calls in ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/897
  • Registering mechanism for the TasksManager by @michaelbenayoun in https://github.com/huggingface/optimum/pull/898
  • add option to run inference with ort by @prathikr in https://github.com/huggingface/optimum/pull/838
  • Check min diffusers version by @echarlaix in https://github.com/huggingface/optimum/pull/902
  • Update bug-report.yml by @lewtun in https://github.com/huggingface/optimum/pull/895
  • Fix axis name for seq2seq ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/904
  • Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/909
  • Fix misleading error message in ORTOptimizer by @fxmarty in https://github.com/huggingface/optimum/pull/910
  • Delete all Docker images before building the doc of Optimum by @regisss in https://github.com/huggingface/optimum/pull/911
  • Fix onnx export preprocessors save by @fxmarty in https://github.com/huggingface/optimum/pull/913
  • Fix GPU CI by @fxmarty in https://github.com/huggingface/optimum/pull/914

New Contributors

  • @adit299 made their first contribution in https://github.com/huggingface/optimum/pull/819
  • @asrimanth made their first contribution in https://github.com/huggingface/optimum/pull/833
  • @hivaze made their first contribution in https://github.com/huggingface/optimum/pull/852

Full Changelog: https://github.com/huggingface/optimum/compare/v1.2.0...v1.7.2

- Python
Published by fxmarty about 3 years ago

optimum - v1.7.1: Patch release

Temporarily fix a critical bug in BetterTransformer https://github.com/huggingface/optimum/pull/849

Full Changelog: https://github.com/huggingface/optimum/compare/v1.7.0...v1.7.1

- Python
Published by fxmarty over 3 years ago

optimum - v1.7.0: ONNX export extension, TFLite export, single-ONNX decoding, ONNX Runtime extension for audio, vision tasks, stable diffusion

New models supported in the ONNX export

Additional architectures are supported in the ONNX export: PoolFormer, Pegasus, Audio Spectrogram Transformer, Hubert, SEW, Speech2Text, UniSpeech, UniSpeech-SAT, Wav2Vec2, Wav2Vec2-Conformer, WavLM, Data2Vec Audio, MPNet, stable diffusion VAE encoder, vision encoder decoder, Nystromformer, Splinter, GPT NeoX.

  • Add PoolFormer support in exporters.onnx by @BakingBrains in https://github.com/huggingface/optimum/pull/646
  • Support pegasus exporters by @mht-sharma in https://github.com/huggingface/optimum/pull/620
  • Audio models support with optimum.exporters.onnx by @michaelbenayoun in https://github.com/huggingface/optimum/pull/622
  • Add MPNet ONNX export by @jplu in https://github.com/huggingface/optimum/pull/691
  • Add stable diffusion VAE encoder export by @echarlaix in https://github.com/huggingface/optimum/pull/705
  • Add vision encoder decoder model in exporters by @mht-sharma in https://github.com/huggingface/optimum/pull/588
  • Nystromformer ONNX export by @whr778 in https://github.com/huggingface/optimum/pull/728
  • Support Splinter exporters (#555) by @Allanbeddouk in https://github.com/huggingface/optimum/pull/736
  • Add gpt-neo-x support by @sidthekidder in https://github.com/huggingface/optimum/pull/745

New models supported in BetterTransformer

A few additional architectures are supported in BetterTransformer: RoCBERT, RoFormer, Marian

  • Add RoCBert support for Bettertransformer by @shogohida in https://github.com/huggingface/optimum/pull/542
  • Add better transformer support for RoFormer by @manish-p-gupta in https://github.com/huggingface/optimum/pull/680
  • added BetterTransformer support for Marian by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/808

Additional tasks supported in the ONNX Runtime integration

With ORTModelForMaskedLM, ORTModelForVision2Seq, ORTModelForAudioClassification, ORTModelForCTC, ORTModelForAudioXVector, ORTModelForAudioFrameClassification, ORTStableDiffusionPipeline.

Reference: https://huggingface.co/docs/optimum/main/en/onnxruntime/packagereference/modelingort and https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models#export-and-inference-of-stable-diffusion-models

  • Add ORTModelForMaskedLM class by @JingyaHuang in https://github.com/huggingface/optimum/pull/729
  • Add ORTModelForVision2Seq for VisionEncoderDecoder models inference by @mht-sharma in https://github.com/huggingface/optimum/pull/742
  • Add ORTModelXXX for audio by @mht-sharma in https://github.com/huggingface/optimum/pull/774
  • Add stable diffusion onnx runtime pipeline by @echarlaix in https://github.com/huggingface/optimum/pull/786

Support of the ONNX export from PyTorch on float16

In the ONNX export, it is possible to pass the options --fp16 --device cuda to export using float16 when a GPU is available, directly with the native torch.onnx.export.

Example: optimum-cli export onnx --model gpt2 --fp16 --device cuda gpt2_onnx/

  • Support ONNX export on torch.float16 type by @fxmarty in https://github.com/huggingface/optimum/pull/749

TFLite export

TFLite export is now supported, with static shapes:

optimum-cli export tflite --help optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/

  • exporters.tflite initial support by @michaelbenayoun in https://github.com/huggingface/optimum/pull/716
  • TFLite auto-encoder models by @michaelbenayoun in https://github.com/huggingface/optimum/pull/757
  • [TFLite Export] Adds support for ResNet by @sayakpaul in https://github.com/huggingface/optimum/pull/813

ONNX Runtime optimization and quantization directly in the CLI

  • Add optimize and quantize command CLI by @jplu in https://github.com/huggingface/optimum/pull/700
  • Support ONNX Runtime optimizations in exporters.onnx by @fxmarty in https://github.com/huggingface/optimum/pull/807

The ONNX export optionally supports the ONNX Runtime optimizations directly in the export, passing the --optimize O1, up to --optimize O4 option:

optimum-cli export onnx --help optimum-cli export onnx --model t5-small --optimize O3 t5small_onnx/

ONNX Runtime quantization is supported directly in command line, using optimum-cli onnxruntime quantize:

optimum-cli onnxruntime quantize --help optimum-cli onnxruntime quantize --onnx_model distilbert_onnx --avx512

ONNX Runtime optimization is supported directly in command line, using optimum-cli onnxruntime optimize:

optimum-cli onnxruntime optimize --help optimum-cli onnxruntime optimize --onnx_model distilbert_onnx -O3

ORTModelForCausalLM supports decoding with a single ONNX

Up no now, for decoders, two ONNX were used: * One handling the first forward pass where no past key values have been cached yet - thus not taking them as input. * One handling the following forward pass where past key values have been cached, thus taking them as input.

This release introduces the support in the ONNX export and in ORTModelForCausalLM of a single ONNX handling both steps of the decoding. This allows to reduce memory usage, as weights are not duplicated between two separate models during inference.

Using a single ONNX for decoders can be used by passing use_merged=True to ORTModelForCausalLM.from_pretrained, loading directly from a PyTorch model:

```python from optimum.onnxruntime import ORTModelForCausalLM

model = ORTModelForCausalLM.frompretrained("gpt2", export=True, usemerged=True) ```

Alternatively, using a single ONNX for decoders is the default behavior in the ONNX export, that can later be used for example with ORTModelForCausalLM, the command optimum-cli export onnx --model gpt2 gpt2_onnx/ will produce:

└── gpt2_onnx    ├── config.json    ├── decoder_model_merged.onnx    ├── decoder_model.onnx    ├── decoder_with_past_model.onnx    ├── merges.txt    ├── special_tokens_map.json    ├── tokenizer_config.json    ├── tokenizer.json    └── vocab.json

The decoder_model.onnx and decoder_with_past_model.onnx are kept separate for backward compatibility, but during inference using solely decoder_model_merged.onnx is enough.

  • Enable inference with a merged decoder in ORTModelForCausalLM by @JingyaHuang in https://github.com/huggingface/optimum/pull/647

Single-file ORTModel accept numpy arrays

ORTModel accept numpy arrays as inputs, in addition to PyTorch tensors. This is only the case for models that use a single ONNX.

  • Accept numpy.ndarray as input and output to ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/790

ORTOptimizer support for ORTModelForCausalLM

  • ORTOptimizer support ORTModelForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/794
  • Support IO Binding for merged decoder by @fxmarty in https://github.com/huggingface/optimum/pull/797

Breaking changes

  • In the ONNX export, exporting models in several ONNX (encoder, decoder) is now the default behavior: https://github.com/huggingface/optimum/pull/747. The old behavior is still accessible with --monolith.
  • In decoders, reusing past key values is now the default in the ONNX export: https://github.com/huggingface/optimum/pull/748. The old behavior is still accessible by explicitly passing, for example, --task causal-lm instead of --task causal-lm-with-past.
  • BigBird support in the ONNX export is removed, due to the block_sparse attention type being written in pure numpy in Transformers, and hence not exportable to ONNX: https://github.com/huggingface/optimum/pull/778
  • The parameter from_transformers of ORTModel.from_pretrained will be deprecated in favor of export.

Bugfixes and improvements

  • Fix disable shape inference for optimization by @regisss in https://github.com/huggingface/optimum/pull/652
  • Fix uninformative message when passing use_cache=True to ORTModel and no ONNX with cache is available by @fxmarty in https://github.com/huggingface/optimum/pull/650
  • Fix provider options when several providers are passed by @fxmarty in https://github.com/huggingface/optimum/pull/653
  • Add TensorRT engine to ONNX Runtime GPU documentation by @fxmarty in https://github.com/huggingface/optimum/pull/657
  • Improve documentation around ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/666
  • minor updates on ONNX config guide by @mszsorondo in https://github.com/huggingface/optimum/pull/662
  • Fix FlaubertOnnxConfig by @michaelbenayoun in https://github.com/huggingface/optimum/pull/669
  • Use nvcr.io/nvidia/tensorrt image for GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/660
  • Better Transformer doc fix by @HamidShojanazeri in https://github.com/huggingface/optimum/pull/670
  • Add support for LongT5 optimization using ORT transformer optimizer script by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/683
  • Add test for missing execution providers error messages by @fxmarty in https://github.com/huggingface/optimum/pull/659
  • ONNX transformation to cast int64 constants to int32 when possible by @fxmarty in https://github.com/huggingface/optimum/pull/655
  • Add missing normalized configs by @fxmarty in https://github.com/huggingface/optimum/pull/694
  • Remove code duplication in ORTModel's load_model by @fxmarty in https://github.com/huggingface/optimum/pull/695
  • Test more architectures in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/675
  • Avoid initializing unwanted attributes for ORTModel's having several inference sessions by @fxmarty in https://github.com/huggingface/optimum/pull/696
  • Fix the ORTQuantizer loading from specific file by @echarlaix in https://github.com/huggingface/optimum/pull/701
  • Add saving of diffusion model additional components for onnx export by @echarlaix in https://github.com/huggingface/optimum/pull/699
  • Fix whisper export by @mht-sharma in https://github.com/huggingface/optimum/pull/629
  • Support trust remote code option in ONNX export and ONNX Runtime integration by @fxmarty in https://github.com/huggingface/optimum/pull/702
  • Add nightly tests on dependencies dev versions by @fxmarty in https://github.com/huggingface/optimum/pull/703
  • Fix exception condition by @mht-sharma in https://github.com/huggingface/optimum/pull/706
  • Add ORTModelForMultipleChoice to the documentation by @fxmarty in https://github.com/huggingface/optimum/pull/712
  • Fix yaml format for dev tests by @fxmarty in https://github.com/huggingface/optimum/pull/710
  • Add ONNX Runtime training benchmark by @JingyaHuang in https://github.com/huggingface/optimum/pull/592
  • Allow from optimum.onnxruntime import QuantizationConfig by @fxmarty in https://github.com/huggingface/optimum/pull/715
  • Fix documentation for doctest tests to pass by @fxmarty in https://github.com/huggingface/optimum/pull/713
  • Use transformers>=4.26.0 in setup.py by @fxmarty in https://github.com/huggingface/optimum/pull/723
  • Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/724
  • Fix ONNX Runtime inference in ORTTrainer by @JingyaHuang in https://github.com/huggingface/optimum/pull/709
  • onnxruntime/modeling_ort.py refactor, part 1 by @michaelbenayoun in https://github.com/huggingface/optimum/pull/698
  • Update docker and doc of ORT Trainer by @JingyaHuang in https://github.com/huggingface/optimum/pull/725
  • Add test for code examples in the documentation and docstrings by @fxmarty in https://github.com/huggingface/optimum/pull/704
  • add image classification example to optimum by @prathikr in https://github.com/huggingface/optimum/pull/711
  • Add TensorrtExecutionProvider modeling tests by @fxmarty in https://github.com/huggingface/optimum/pull/722
  • Whisper shape inference fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/726
  • Add some redirections to Optimum Habana's documentation by @regisss in https://github.com/huggingface/optimum/pull/735
  • Patch ORTTrainer inference with ONNX Runtime backend by @JingyaHuang in https://github.com/huggingface/optimum/pull/737
  • Remove dead code in whisper ONNX output by @fxmarty in https://github.com/huggingface/optimum/pull/741
  • Unpin protobuf 3.20.1 by @fxmarty in https://github.com/huggingface/optimum/pull/738
  • Fix speech2text export by @mht-sharma in https://github.com/huggingface/optimum/pull/746
  • Raise error on double call to BetterTransformer.transform() by @fxmarty in https://github.com/huggingface/optimum/pull/750
  • exporters.onnx output names and dynamic axes fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/731
  • Fix NNCF supported quantization strategies README table by @echarlaix in https://github.com/huggingface/optimum/pull/752
  • Add GPU tests for BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/751
  • Fix doctest by @fxmarty in https://github.com/huggingface/optimum/pull/759
  • Fix ONNX Runtime cache usage for decoders, add relevant tests by @fxmarty in https://github.com/huggingface/optimum/pull/756
  • Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/758
  • Update quality tooling for formatting by @regisss in https://github.com/huggingface/optimum/pull/760
  • Fix wrong shapes used at ONNX export and validation by @fxmarty in https://github.com/huggingface/optimum/pull/764
  • Change type annotation by @michaelbenayoun in https://github.com/huggingface/optimum/pull/768
  • Fix stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/762
  • Disable ONNX Runtime provider check on Windows by @fxmarty in https://github.com/huggingface/optimum/pull/771
  • Fix FusionOptions following ORT 1.14 release by @fxmarty in https://github.com/huggingface/optimum/pull/772
  • Unpin numpy <1.24.0 by @fxmarty in https://github.com/huggingface/optimum/pull/773
  • Fix flaky ONNX Runtime generation test with past key value reuse by @fxmarty in https://github.com/huggingface/optimum/pull/765
  • Fix output shape dimension for OnnxConfigWithPast by @fxmarty in https://github.com/huggingface/optimum/pull/780
  • Fix used shapes, device at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/777
  • Pin numpy only for tensorflow export by @fxmarty in https://github.com/huggingface/optimum/pull/781
  • Fixed broken paper space links by @Muhtasham in https://github.com/huggingface/optimum/pull/766
  • Temporarily disable python 3.9 + macOS test due to onnxruntime 1.14 regression by @fxmarty in https://github.com/huggingface/optimum/pull/783
  • Update ORT Training to 1.14.0 by @JingyaHuang in https://github.com/huggingface/optimum/pull/787
  • Temporarily disable segformer TensorRT test by @fxmarty in https://github.com/huggingface/optimum/pull/799
  • Use a stateful orderedinputnames in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/796
  • Test ORTOptimizer with IO Binding by @fxmarty in https://github.com/huggingface/optimum/pull/801
  • [BT] Add stable layer-norm Wav2vec2 by @younesbelkada in https://github.com/huggingface/optimum/pull/803
  • Update rules for ruff by @regisss in https://github.com/huggingface/optimum/pull/806
  • Improve orttrainer test by @JingyaHuang in https://github.com/huggingface/optimum/pull/779
  • Fix ORT quantization for TensorRT documentation by @fxmarty in https://github.com/huggingface/optimum/pull/812
  • Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/814
  • Update ONNX Runtime training doc - use torchrun by @JingyaHuang in https://github.com/huggingface/optimum/pull/820
  • Fix ONNX export tests by @fxmarty in https://github.com/huggingface/optimum/pull/822
  • All back workflow dispatch on GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/823
  • BetterTransformer pipeline padding issue fix by @vrdn-23 in https://github.com/huggingface/optimum/pull/821
  • Fix optimum pipeline initialization by @fxmarty in https://github.com/huggingface/optimum/pull/824
  • Fix failing GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/829
  • Remove feature dimension as dynamic axes for stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/816
  • Fix pipeline task dropping arguments bug by @fxmarty in https://github.com/huggingface/optimum/pull/828
  • Fix ORTQuantizer behavior with ORTModelForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/831
  • Update tests by @mht-sharma in https://github.com/huggingface/optimum/pull/826
  • Fix exporters GPU CI by @fxmarty in https://github.com/huggingface/optimum/pull/835
  • Keep intermediary models for ONNX causal-lm by @fxmarty in https://github.com/huggingface/optimum/pull/834
  • Fix duplicate name merged decoder by @fxmarty in https://github.com/huggingface/optimum/pull/837
  • Apply lazy import for exporters by @JingyaHuang in https://github.com/huggingface/optimum/pull/836

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.0...v1.7.0

- Python
Published by fxmarty over 3 years ago

optimum - v1.6.4: Patch release

Bugfix

  • Fix past key/value reuse in decoders following transformers 4.26.0 release and renaming: https://github.com/huggingface/optimum/commit/b9211d6826b92700e73f48821d6e14bd08226abc
  • ONNX Runtime 1.14 support: https://github.com/huggingface/optimum/pull/772

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.3...v1.6.4

- Python
Published by fxmarty over 3 years ago

optimum - v1.6.3: Patch release

Fixes ORTTrainer for the inference with the ONNX Runtime backend.

- Python
Published by JingyaHuang over 3 years ago

optimum - v1.6.2: Patch release

Hotfixes

  • Support generation config in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/651

Regressions

The export of speech-to-text architecture as a single ONNX file (that handles both the encoding and decoding) fails do to a regression with the latest transformers version: https://github.com/huggingface/optimum/issues/721

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.1...v1.6.2

- Python
Published by fxmarty over 3 years ago

optimum - v1.6.1: Patch release

Hotfixes

  • Revert breaking removal of EncoderOnnxConfig, DecoderOnnxConfig, _DecoderWithLMhead by @fxmarty in https://github.com/huggingface/optimum/pull/643
  • Fix item access of some TASKSTO_AUTOMODELS by @fxmarty in https://github.com/huggingface/optimum/pull/642

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.0...v1.6.1

- Python
Published by fxmarty over 3 years ago

optimum - v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures

Optimum CLI

The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:

optimum-cli --help optimum-cli export onnx --help optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/

  • Add Optimum CLI backbone by @fxmarty in https://github.com/huggingface/optimum/pull/593

Stable Diffusion ONNX export

Optimum now supports the ONNX export of stable diffusion models from the diffusers library:

optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/

  • Add Stable Diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/570

BetterTransformer support for more architectures

BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT

The complete list of supported models is available in the documentation.

  • [BT] Add Bettertransformer support for FSMT by @Sumanth077 in https://github.com/huggingface/optimum/pull/494
  • [BT] add BetterTransformer support for ViLT architecture by @ka00ri in https://github.com/huggingface/optimum/pull/508
  • Add MBart support for BetterTransformer by @ravenouse in https://github.com/huggingface/optimum/pull/516
  • Add CLIP BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/534
  • Add BetterTransformer support for RemBERT by @hchings in https://github.com/huggingface/optimum/pull/545

ONNX export for more architectures

The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.

  • Add Swin support in exporters.onnx by @fxmarty in https://github.com/huggingface/optimum/pull/528
  • [ONNX] add mobilenet support by @younesbelkada in https://github.com/huggingface/optimum/pull/633

Extended ONNX export for encoder-decoder and decoder models

Encoder-decoder or decoder-only models normally making use of the generate() method in transformers can now be exported in several files using the --for-ort argument:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx

yielding: . └── t5_small_onnx    ├── config.json    ├── decoder_model.onnx    ├── decoder_with_past_model.onnx    ├── encoder_model.onnx    ├── special_tokens_map.json    ├── spiece.model    ├── tokenizer_config.json    └── tokenizer.json

Passing --for-ort, exported models are expected to be loadable directly into ORTModel.

  • Add ort export in exporters for encoder-decoder models by @mht-sharma in https://github.com/huggingface/optimum/pull/497
  • Support decoder generated with --for-ort from optimum.exporters.onnx in ORTDecoder by @fxmarty in https://github.com/huggingface/optimum/pull/554

Support for ONNX models with external data at export, optimization, quantization

The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a .onnx_data file if necessary.

  • Handling ONNX models with external data by @NouamaneTazi in https://github.com/huggingface/optimum/pull/586
  • Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by @JingyaHuang in https://github.com/huggingface/optimum/pull/332

ONNX Runtime API improvement

Various improvements to allow for a better user experience in the ONNX Runtime integration:

  • ORTModel, ORTModelDecoder and ORTModelForConditionalGeneration can now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument.
  • ORTModel.from_pretrained() with from_transformers=True now downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it.
  • ORTQuantizer.save_pretrained() now saves the model configuration and the preprocessor, making the exported directory usable end-to-end.
  • ORTOptimizer.save_pretrained() now saves the preprocessor, making the exported directory usable end-to-end.

  • ONNX Runtime integration API improvement by @michaelbenayoun in https://github.com/huggingface/optimum/pull/515

Custom shapes support at ONNX export

The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.

Read more: optimum-cli export onnx --help

  • Support custom shapes for dummy inputs by @fxmarty in https://github.com/huggingface/optimum/pull/522
  • Support for custom input shapes in exporters onnx by @fxmarty in https://github.com/huggingface/optimum/pull/575

Enable use_cache=True for ORTModelForCausalLM

Reusing past key values for models using ORTModelForCausalLM (e.g. gpt2) is now possible using use_cache=True, avoiding to recompute them at each iteration of the decoding:

```python from transformers import AutoTokenizer from optimum.onnxruntime import ORTModelForCausalLM import torch

tokenizer = AutoTokenizer.frompretrained("gpt2") model = ORTModelForCausalLM.frompretrained("gpt2", fromtransformers=True, usecache=True)

inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")

gentokens = model.generate(**inputs) tokenizer.batchdecode(gen_tokens) ```

  • Enable pastkeyvalues for ORTModelForCausalLM by @echarlaix in https://github.com/huggingface/optimum/pull/326

IO binding support for ORTModelForCustomTasks

ORTModelForCustomTasks now supports IO Binding when using CUDAExecutionProvider.

  • Add IO binding support for custom ORTModel by @JingyaHuang in https://github.com/huggingface/optimum/pull/447

Experimental support to merge ONNX decoder with/without past key values

Along with --for-ort, when passing --task causal-lm-with-past, --task seq2seq-with-past or --task speech2seq-lm-with-past during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.

An experimental support is introduced to merge the two models in one. Example:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/

```python import onnx from optimum.onnx import merge_decoders

decoder = onnx.load("t5onnx/decodermodel.onnx") decoderwithpast = onnx.load("t5onnx/decoderwithpastmodel.onnx")

mergedmodel = mergedecoders(decoder, decoderwithpast) onnx.save(mergedmodel, "t5onnx/decodermergedmodel.onnx") ```

  • Merge ONNX decoder models by @JingyaHuang in https://github.com/huggingface/optimum/pull/587

Major bugs fixed

  • Fix BetterTransformer with padding="max_length" by @fxmarty in https://github.com/huggingface/optimum/pull/543
  • Fix non-nesting bug in BetterTransformer integration by @younesbelkada in https://github.com/huggingface/optimum/pull/637

Other changes, bugfixes and improvements

  • Fix doc-builder premission error by @mishig25 in https://github.com/huggingface/optimum/pull/482
  • Fix doc build pr premissions by @mishig25 in https://github.com/huggingface/optimum/pull/484
  • Re-order the task manager doc by @michaelbenayoun in https://github.com/huggingface/optimum/pull/483
  • Fix whisper device for gpu test by @fxmarty in https://github.com/huggingface/optimum/pull/486
  • Fix tensorflow CI by @fxmarty in https://github.com/huggingface/optimum/pull/489
  • Fix PR doc generation by @regisss in https://github.com/huggingface/optimum/pull/495
  • Fix broken links in the doc by @fxmarty in https://github.com/huggingface/optimum/pull/499
  • Update iobinding ORT encoder whisper by @mht-sharma in https://github.com/huggingface/optimum/pull/498
  • fix NormalizedConfig init error message by @PaulQbFeng in https://github.com/huggingface/optimum/pull/500
  • Change import structure for ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/456
  • [BT] Fix failing CI tests by @younesbelkada in https://github.com/huggingface/optimum/pull/501
  • Remove redundant condition statement in ORTDecoder(Seq2seq) by @JingyaHuang in https://github.com/huggingface/optimum/pull/504
  • [BT] put decorator on the correct place by @younesbelkada in https://github.com/huggingface/optimum/pull/509
  • [BT] clearer error message for norm_first by @younesbelkada in https://github.com/huggingface/optimum/pull/510
  • Deprecate PyTorch 1.12. for BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/513
  • Fix ORTModelForSeq2SeqLM test by @fxmarty in https://github.com/huggingface/optimum/pull/455
  • Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by @fxmarty in https://github.com/huggingface/optimum/pull/514
  • [BT] Fix doc bugs by @younesbelkada in https://github.com/huggingface/optimum/pull/517
  • Replace sklearn by scikit-learn by @lesteve in https://github.com/huggingface/optimum/pull/502
  • ORTModel uses optimum.exporters.onnx by @michaelbenayoun in https://github.com/huggingface/optimum/pull/490
  • Cleanup deprecated ONNX Runtime training docker files by @JingyaHuang in https://github.com/huggingface/optimum/pull/523
  • Added support for Tapas Model by @JuheonChu in https://github.com/huggingface/optimum/pull/520
  • Add benchmark results to gpu doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/525
  • ORTModelForConditionalGeneration uses optimum.exporters.onnx by @mht-sharma in https://github.com/huggingface/optimum/pull/529
  • Better error message when wrong task is given to exporters by @fxmarty in https://github.com/huggingface/optimum/pull/531
  • Add OrtModelForSpeechSeq2Seq to doc by @fxmarty in https://github.com/huggingface/optimum/pull/533
  • Fold sections by default in the documentation's side-bar by @regisss in https://github.com/huggingface/optimum/pull/535
  • Import GenerationMixin from transformers.generation if transformers >= 4.25.0 by @regisss in https://github.com/huggingface/optimum/pull/536
  • Add checkiftransformers_greater to manage different versions of transformers by @regisss in https://github.com/huggingface/optimum/pull/537
  • Enable to push some sections to the end of the TOC in the doc by @regisss in https://github.com/huggingface/optimum/pull/532
  • Fix import in ONNX export CLI by @fxmarty in https://github.com/huggingface/optimum/pull/553
  • Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/550
  • Refactor of 2 functions used in ORTModel by @michaelbenayoun in https://github.com/huggingface/optimum/pull/551
  • Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/556
  • Fix ORTTrainer wrapper duplication / PyTorch evaluate / update with transformers 4.25.1 by @JingyaHuang in https://github.com/huggingface/optimum/pull/561
  • Fix flaky BetterTransformer test by @fxmarty in https://github.com/huggingface/optimum/pull/564
  • enable FP16Optimizer for fp16 deepspeed training. by @AdamLouly in https://github.com/huggingface/optimum/pull/547
  • Update documentation quick tour section by @echarlaix in https://github.com/huggingface/optimum/pull/574
  • Move custom IOBinding to IOBindingHelper by @JingyaHuang in https://github.com/huggingface/optimum/pull/571
  • Add test for exporters.onnx CLI by @fxmarty in https://github.com/huggingface/optimum/pull/573
  • Documentation on quantization by @michaelbenayoun in https://github.com/huggingface/optimum/pull/565
  • More robust tests for ORTModel using decoders and use_cache=True by @fxmarty in https://github.com/huggingface/optimum/pull/576
  • Fix errors in onnxruntime modeling tests by @fxmarty in https://github.com/huggingface/optimum/pull/585
  • [BT] fix flaky test by @younesbelkada in https://github.com/huggingface/optimum/pull/591
  • Fix exporters onnx shapes by @fxmarty in https://github.com/huggingface/optimum/pull/581
  • Fix exporters.onnx tests by @fxmarty in https://github.com/huggingface/optimum/pull/584
  • Update on the ONNX Runtime documentation by @michaelbenayoun in https://github.com/huggingface/optimum/pull/567
  • Add the ORTModelForSemanticSegmentation class by @TheoMrc in https://github.com/huggingface/optimum/pull/539
  • Refactor BetterTransformer to be able to raise more informative error messages by @fxmarty in https://github.com/huggingface/optimum/pull/594
  • Constraint temprarily NumPy version to save CIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/614
  • Add encoder_last_hidden_state as an output for encoder-decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/601
  • Update dev version by @fxmarty in https://github.com/huggingface/optimum/pull/617
  • Fix documentation example by @echarlaix in https://github.com/huggingface/optimum/pull/603
  • Documentation improvements by @fxmarty in https://github.com/huggingface/optimum/pull/598
  • More informative message at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/609
  • Use optimum exporter for current weight sharing test by @JingyaHuang in https://github.com/huggingface/optimum/pull/616
  • OnnxConfig now handle the export to encoder / decoder / decoderwithpast themselves by @michaelbenayoun in https://github.com/huggingface/optimum/pull/590
  • Set explictly the device index by @JingyaHuang in https://github.com/huggingface/optimum/pull/613
  • Fix ORT GPU test by @JingyaHuang in https://github.com/huggingface/optimum/pull/624
  • Add GPT-J normalized config by @fxmarty in https://github.com/huggingface/optimum/pull/623
  • Remove diffusers dependency in onnxruntime code by @fxmarty in https://github.com/huggingface/optimum/pull/619
  • Use exporters in ORTTrainer by @mht-sharma in https://github.com/huggingface/optimum/pull/546
  • Improve use_io_binding default value for different execution providers by @JingyaHuang in https://github.com/huggingface/optimum/pull/604
  • fixed FuseBiasInLinear by specifying device by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/630
  • Fixed GPU documentation for HF pipelines by @smiraldr in https://github.com/huggingface/optimum/pull/602
  • Add argument in the CLI to specify device to do the ONNX export on by @fxmarty in https://github.com/huggingface/optimum/pull/634
  • Allow kwargs in all generatedummyinputs() methods by @fxmarty in https://github.com/huggingface/optimum/pull/638

Full Changelog: https://github.com/huggingface/optimum/compare/v1.5.2...v1.6.0

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @TheoMrc
    • Add ORTModelForSemanticSegmentation https://github.com/huggingface/optimum/pull/539
  • @ravenouse
    • Add MBart support for BetterTransformer https://github.com/huggingface/optimum/pull/516
  • @ka00ri
    • Add BetterTransformer support for ViLT architecture https://github.com/huggingface/optimum/pull/508
  • @Sumanth077
    • Add Bettertransformer support for FSMT https://github.com/huggingface/optimum/pull/494

- Python
Published by fxmarty over 3 years ago

optimum - v1.5.2: Patch release

Constraint temporarily numpy<1.24.0 (#614)

- Python
Published by fxmarty over 3 years ago

optimum - v1.5.1: Patch release

Deprecate PyTorch 1.12. for BetterTransformer with better error message (#513)

- Python
Published by fxmarty over 3 years ago

optimum - v1.5.0: BetterTransformer Integration, IOBinding, Optimum Exporters, and Whisper with ONNX Runtime

BetterTransformer

Convert your model into its PyTorch BetterTransformer format using a one liner with the new BetterTransformer integration for faster inference on CPU and GPU!

```python from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model) ``` Check the full list of supported models in the documentaiton, and check out the Google Colab demo.

Contributions

  • BetterTransformer integration (#423)
  • ViT and Wav2Vec2 support (#470)

ONNX Runtime IOBinding support

ORT models (except for ORTModelForCustomTasks) now support IOBinding to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.

By default, use_io_binding is set to True when using CUDA. You can turn off the IOBinding in case of any memory issue:

```python from optimum.onnxruntime import ORTModelForSeq2SeqLM

model = ORTModelForSeq2SeqLM.frompretrained("optimum/t5-small", useio_binding=False) ```

Contributions

  • Add IOBinding support to ONNX Runtime module (#421)

Optimum Exporters

optimum.exporters is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.

The export can be done via the CLI:

bash python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/

For more information, check the documentation.

Contributions

  • optimum.exporters creation (#403)
  • Automatic task detection (#445)

Whisper

  • Whisper can be exported to ONNX using optimum.exporters.
  • Whisper can also be exported and ran using optimum.onnxruntime, IO binding is also supported.

Note: For the now the export from optimum.exporters will not be usable by ORTModelForSpeechSeq2Seq. To be able to run inference, export Whisper directly using ORTModelForSpeechSeq2Seq. This will be solved in the next release.

Contributions

  • Whisper support with optimum.onnxruntime and optimum.exporters (#420)

Other contributions

  • ONNX Runtime training now supports ORT 1.13.1 and transformers 4.23.1 (#434)
  • ORTModel can load models from subfolders in a similar fashion as in transformers (#443)
  • ORTOptimizer has been refactored, and a factory class has been added to create common OptimizationConfigs (#457)
  • Fixes and updates in the documentation (#411, #432, #437, #441)
  • Fixes IOBinding (#454, #461)

- Python
Published by michaelbenayoun over 3 years ago

optimum - v1.4.1: Patch release

  • Add inference with ORTModel to ORTTrainer and ORTSeq2SeqTrainer #189
  • Add InferenceSession options and provider to ORTModel #271
  • Add mT5 (#341) and Marian (#393) support to ORTOptimizer
  • Add batchnorm folding torch.fx transformations #348
  • The torch.fx transformations now use the marking methods mark_as_transformed, mark_as_restored, get_transformed_nodes #385
  • Update BaseConfig for transformers 4.22.0 release #386
  • Update ORTTrainer for transformers 4.22.1 release #388
  • Add extra ONNX Runtime quantization options #398
  • Add possibility to pass provider_options to ORTModel #401
  • Add support to pass a specific device for ORTModel, as transformers does for pipelines #427
  • Fixes to support onnxruntime 1.13.1 #430

- Python
Published by echarlaix over 3 years ago

optimum - v1.4.0: ORTQuantizer and ORTOptimizer refactorization

ONNX Runtime

  • Refactorization of ORTQuantizer (#270) and ORTOptimizer (#294)
  • Add ONNX Runtime fused Adam Optimizer (#295)
  • Add ORTModelForCustomTasks allowing ONNX Runtime inference support for custom tasks (#303)
  • Add ORTModelForMultipleChoice allowing ONNX Runtime inference for models with multiple choice classification head (#358)

Torch FX

  • Add FuseBiasInLinear a transformation that fuses the weight and the bias of linear modules (#253)

Improvements and bugfixes

  • Enable the possibility to disregard the precomputed past_key_values during ONNX Runtime inference of Seq2Seq models (#241)
  • Enable node exclusion from quantization for benchmark suite (#284)
  • Enable possibility to use a token authentication when loading a calibration dataset (#289)
  • Fix optimum pipeline when no model is given (#301)

- Python
Published by echarlaix almost 4 years ago

optimum - v1.3.0: Torch FX transformations, ORTModelForSeq2SeqLM and ORTModelForImageClassification

Torch FX

The optimum.fx.optimization module (#232) provides a set of torch.fx graph transformations, along with classes and functions to write your own transformations and compose them.

  • The Transformation and ReversibleTransformation represent non-reversible and reversible transformations, and it is possible to write such transformations by inheriting from those classes
  • The compose utility function enables transformation composition
  • Two reversible transformations were added:
    • MergeLinears: merges linear layers that have the same input
    • ChangeTrueDivToMulByInverse: changes a division by a static value to a multiplication of its inverse

ORTModelForSeq2SeqLM

ORTModelForSeq2SeqLM (#199) allows ONNX export and ONNX Runtime inference for Seq2Seq models. * When exported, Seq2Seq models are decomposed into three parts : the encoder, the decoder (actually consisting of the decoder with the language modeling head), and the decoder with pre-computed key/values as additional inputs. * This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding.

Below is an example that downloads a T5 model from the Hugging Face Hub, exports it through the ONNX format and saves it :

```python from optimum.onnxruntime import ORTModelForSeq2SeqLM

Load model from hub and export it through the ONNX format

model = ORTModelForSeq2SeqLM.frompretrained("t5-small", fromtransformers=True)

Save the exported model in the given directory

model.savepretrained(outputdir) ```

ORTModelForImageClassification

ORTModelForImageClassification (#226) allows ONNX Runtime inference for models with an image classification head.

Below is an example that downloads a ViT model from the Hugging Face Hub, exports it through the ONNX format and saves it :

```python from optimum.onnxruntime import ORTModelForImageClassification

Load model from hub and export it through the ONNX format

model = ORTModelForImageClassification.frompretrained("google/vit-base-patch16-224", fromtransformers=True)

Save the exported model in the given directory

model.savepretrained(outputdir) ```

ORTOptimizer

Adds support for converting model weights from fp32 to fp16 by adding a new optimization parameter (fp16) to OptimizationConfig (#273).

Pipelines

Additional pipelines tasks are now supported, here is a list of the supported tasks along with the default model for each:

Below is an example that downloads a T5 small model from the Hub and loads it with transformers pipeline for translation :

```python from transformers import AutoTokenizer, pipeline from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.frompretrained("optimum/t5-small") model = ORTModelForSeq2SeqLM.frompretrained("optimum/t5-small") onnxtranslation = pipeline("translationentofr", model=model, tokenizer=tokenizer)

text = "What a beautiful day !" pred = onnx_translation(text)

[{'translation_text': "C'est une belle journée !"}]

```

Breaking change

The ORTModelForXXX execution provider default value is now set to CPUExecutionProvider (#203). Before, if no execution provider was provided, it was set to CUDAExecutionProvider if a gpu was detected, or to CPUExecutionProvider otherwise.

- Python
Published by echarlaix almost 4 years ago

optimum - v1.2.3: Patch release

  • Remove intel sub-package, migrating to optimum-intel (#212)
  • Fix the loading and saving of ORTModel optimized and quantized models (#214)

- Python
Published by echarlaix about 4 years ago

optimum - v1.2.2: Patch release

  • Extend QuantizationPreprocessor to dynamic quantization (https://github.com/huggingface/optimum/pull/196)
  • Introduce unified approach to create transformers vs optimized models benchmark (https://github.com/huggingface/optimum/pull/194)
  • Bump huggingface_hub version and protobuf fix (https://github.com/huggingface/optimum/pull/205)

- Python
Published by echarlaix about 4 years ago

optimum - v1.2.1: Patch release

Add support to Python version 3.7 (https://github.com/huggingface/optimum/pull/176)

- Python
Published by echarlaix about 4 years ago

optimum - v1.2.0: pipeline and AutoModelForXxx classes to run ONNX Runtime inference

ORTModel

ORTModelForXXX classes such as ORTModelForSequenceClassification were integrated with the Hugging Face Hub in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the save_pretrained and push_to_hub methods. An already optimized and / or quantized ONNX model can also be loaded using the ORTModelForXXX classes using the from_pretrained method.

Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :

```python from optimum.onnxruntime import ORTModelForSequenceClassification

Load model from hub and export it through the ONNX format

model = ORTModelForSequenceClassification.frompretrained( "distilbert-base-uncased-finetuned-sst-2-english", fromtransformers=True )

Save the exported model

model.savepretrained("alocalpathforconvertonnx_model") ```

Pipelines

Built-in support for transformers pipelines was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as ONNX Runtime.

The currently supported tasks with the default model for each are the following :

  • Text Classification (DistilBERT model fine-tuned on SST-2)
  • Question Answering (DistilBERT model fine-tuned on SQuAD v1.1)
  • Token Classification(BERT large fine-tuned on CoNLL2003)
  • Feature Extraction (DistilBERT)
  • Zero Shot Classification (BART model fine-tuned on MNLI)
  • Text Generation (DistilGPT2)

Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with transformers pipeline for question-answering.

```python from transformers import AutoTokenizer, pipeline from optimum.onnxruntime import ORTModelForQuestionAnswering

load vanilla transformers and convert to onnx

model = ORTModelForQuestionAnswering.frompretrained("deepset/roberta-base-squad2",fromtransformers=True) tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

test the model with using transformers pipeline, with handleimpossibleanswer for squad_v2

optimumqa = pipeline(task, model=model, tokenizer=tokenizer, handleimpossibleanswer=True) prediction = optimumqa( question="What's my name?", context="My name is Philipp and I live in Nuremberg." )

print(prediction)

{'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}

```

Improvements

  • Add loss when performing the evalutation step using an instance of ORTTrainer, previously not enabled when inference was performed with ONNX Runtime in #152

- Python
Published by echarlaix about 4 years ago

optimum - v1.1.1: Patch release

Habana

ONNX Runtime

  • Add the possibility to specify the execution provider in ORTModel.
  • Add IncludeFullyConnectedNodes class to find the nodes composing the fully connected layers in order to (only) target the latter for quantization to limit the accuracy drop.
  • Update QuantizationPreprocessor so that the intersection of the two sets representing the nodes to quantize and the nodes to exclude from quantization to be an empty set.
  • Rename Seq2SeqORTTrainer to ORTSeq2SeqTrainer for clarity and to keep consistency.
  • Add ORTOptimizer support for ELECTRA models.
  • Fix the loading of pretrained ORTConfig which contains optimization and quantization config.

- Python
Published by JingyaHuang about 4 years ago

optimum - v1.1.0: ORTTrainer, Seq2SeqORTTrainer, ONNX Runtime optimization and quantization API improvements

ORTTrainer and Seq2SeqORTTrainer

The ORTTrainer and Seq2SeqORTTrainer are two newly experimental classes. - Both ORTTrainer and Seq2SeqORTTrainer were created to have a similar user-facing API as the Trainer and Seq2SeqTrainer of the Transformers library. - ORTTrainer allows the usage of the ONNX Runtime backend to train a given PyTorch model in order to accelerate training. ONNX Runtime will run the forward and backward passes using an optimized automatically-exported ONNX computation graph, while the rest of the training loop is executed by native PyTorch. - ORTTrainer allows the usage of ONNX Runtime inferencing during both the evaluation and the prediction step. - For Seq2SeqORTTrainer, ONNX Runtime inferencing is incompatible with --predict_with_generate, as the generate method is not supported yet.

ONNX Runtime optimization and quantization APIs improvements

The ORTQuantizer and ORTOptimizer classes underwent a massive refactoring that should allow a simpler and more flexible user-facing API.

  • Addition of the possibility to iteratively compute the quantization activation ranges when applying static quantization by using the ORTQuantizer method partial_fit. This is especially useful when using memory-hungry calibration methods such as Entropy and Percentile methods.
  • When using the MinMax calibration method, it is now possible to compute the moving average of the minimum and maximum values representing the activations quantization ranges instead of the global minimum and maximum (feature available with onnxruntime v1.11.0 or higher).
  • The classes OptimizationConfig, QuantizationConfig and CalibrationConfig were added in order to better segment the different ONNX Runtime related parameters instead of having one unique configuration ORTConfig.
  • The QuantizationPreprocessor class was added in order to find the nodes to include and / or exclude from quantization, by finding the nodes following a given pattern (such as the nodes forming LayerNorm for example). This is particularly useful in the context of static quantization, where the quantization of modules such as LayerNorm or GELU are responsible of important drop in accuracy.

- Python
Published by echarlaix about 4 years ago

optimum - v1.0.0: ONNX Runtime optimization and quantization support

ONNX Runtime support

  • An ORTConfig class was introduced, allowing the user to define the desired export, optimization and quantization strategies.
  • The ORTOptimizer class takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance of ORTOptimizer, the user needs to provide an ORTConfig object, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling the ORTOptimizer.fit method.
  • ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added ORTQuantizer class. In order to create an instance of ORTQuantizer, the user needs to provide an ORTConfig object, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling the ORTQuantizer.fit method.

Additionnal features for Intel Neural Compressor

We have also added a new class called IncOptimizer which will take care of combining the pruning and the quantization processes.

- Python
Published by echarlaix over 4 years ago

optimum - v0.1.2: Intel Neural Compressor's pruning support

With this release, we enable Intel Neural Compressor v1.8 magnitude pruning for a variety of NLP tasks with the introduction of IncTrainer which handles the pruning process.

- Python
Published by echarlaix over 4 years ago

optimum - v0.1.1: Intel Neural Compressor's dynamic, post-training and aware-training quantization support

With this release, we enable Intel Neural Compressor v1.7 PyTorch dynamic, post-training and aware-training quantization for a variety of NLP tasks. This support includes the overall process, from quantization application to the loading of the resulting quantized model. The latter being enabled by the introduction of the IncQuantizedModel class.

- Python
Published by echarlaix over 4 years ago

optimum - Optimum v0.0.1 - EAP

Initial release for early access to Optimum library featuring Intel's LPOT quantization and pruning support.

- Python
Published by mfuntowicz almost 5 years ago