optimum - v1.27.0: Last release before v2, Transformers 4.53 support, SmolLM3, VisualBert...

🚀 Major Upgrades

Transformers v4.53 support and SmolLM3 model addition by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2326
Batched inference support across all decoders by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2319
VisualBert support by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2303

🔧 Enhancements & Fixes

Fix taskmanager by @echarlaix in https://github.com/huggingface/optimum/pull/2296
Add task onnx register by @echarlaix in https://github.com/huggingface/optimum/pull/2291
ExporterConfig refactorization by @echarlaix in https://github.com/huggingface/optimum/pull/2157
remove timm from exporters extra by @echarlaix in https://github.com/huggingface/optimum/pull/2299
No more forcing separators by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2279
Fix broken Trainer documentation link in README by @VolodymyrBg in https://github.com/huggingface/optimum/pull/2304
Propagate libraryname parameter in frompretrained to export by @tomaarsen in https://github.com/huggingface/optimum/pull/2328
Fix 'Block pattern could not be match. Pass blocknametoquantize argument in quantizemodel' while loading Qwen VL GPTQ model by @arunmadhusud in https://github.com/huggingface/optimum/pull/2295

🧹 Deprecations & v2

Deprecated support for TFLite, BetterTransformer, and ONNXRuntime‑Training, these integrations will be fully removed in v2.
TensorFlow models export will be removed in v2, consistent with Transformer library dropping TF/JAX support.
ONNX and ONNXRuntime integrations will move into the new Optimum‑ONNX package.

New Contributors

@dependabot[bot] made their first contribution in https://github.com/huggingface/optimum/pull/2292
@arunmadhusud made their first contribution in https://github.com/huggingface/optimum/pull/2295
@VolodymyrBg made their first contribution in https://github.com/huggingface/optimum/pull/2304

Full Changelog: https://github.com/huggingface/optimum/compare/v1.26.1...v1.27.0

- Python
Published by IlyasMoutawwakil 11 months ago

optimum - v1.26.1: Patch release

Add back from_transformers for base model by @echarlaix in https://github.com/huggingface/optimum/pull/2288

- Python
Published by echarlaix about 1 year ago

optimum - v1.26.0: ColPali, D-FINE, InternLM2

ONNX export

D-FINE support by @xenova in https://github.com/huggingface/optimum/pull/2249
ColPali support by @Balladie in https://github.com/huggingface/optimum/pull/2251
InternLM2 support by @gmf14 in https://github.com/huggingface/optimum/pull/2244
Chinese CLIP support by @xenova in https://github.com/huggingface/optimum/pull/1591

New features & enhancements

Add onnxslim support by @inisis in https://github.com/huggingface/optimum/pull/2258
Introduce ORTSessionMixin and enable general io binding by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2234
Fix and uniformize hub kwargs by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2276
Add compatibility with transformers 4.52 by @echarlaix in https://github.com/huggingface/optimum/pull/2270
Distribute and complete onnxruntime tests (decoder models) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2278
Add ONNX Runtime optimization support for ModernBERT by @amas0 in https://github.com/huggingface/optimum/pull/2208

New Contributors

@inisis made their first contribution in https://github.com/huggingface/optimum/pull/2258
@Balladie made their first contribution in https://github.com/huggingface/optimum/pull/2251
@gmf14 made their first contribution in https://github.com/huggingface/optimum/pull/2244
@amas0 made their first contribution in https://github.com/huggingface/optimum/pull/2208

- Python
Published by echarlaix about 1 year ago

optimum - v1.25.3: Patch release

Fix ORT pipelines by @echarlaix in https://github.com/huggingface/optimum/pull/2274

Full Changelog**: https://github.com/huggingface/optimum/compare/v1.25.2...v1.25.3

- Python
Published by echarlaix about 1 year ago

optimum - v1.25.2: Patch release

What's Changed

Upgrade optimum-intel in setup extras by @echarlaix in https://github.com/huggingface/optimum/pull/2271
Match transformers behavior with return_dict by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2269

Full Changelog: https://github.com/huggingface/optimum/compare/v1.25.1...v1.25.2

- Python
Published by IlyasMoutawwakil about 1 year ago

optimum - v1.25.1: Patch release

What's Changed

Updated readme/pypi page by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2268
Fix bug ORTModelForFeatureExtraction by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2267
Fix doc TPU section by @echarlaix in https://github.com/huggingface/optimum/pull/2265

Full Changelog: https://github.com/huggingface/optimum/compare/v1.25.0...v1.25.1

- Python
Published by IlyasMoutawwakil about 1 year ago

optimum - v1.25.0: ViTPose, RT-DETR, EfficientNet, Moonshine ONNX

:rocket: New Features & Enhancements

Add ONNX export support for ViTPose, RT-DETR, EfficientNet, Moonshine
Infer if the model needs to be exported to ONNX during loading

```diff from optimum.onnxruntime import ORTModelForCausalLM

modelid = "meta-llama/Llama-3.2-1B" - model = ORTModelForCausalLM.frompretrained(modelid, export=True) + model = ORTModelForCausalLM.frompretrained(model_id) ```

Transformers v4.49, v4.50 and v4.51 compatibility

:bustsinsilhouette: New Contributors

A huge thank you to our first-time contributors:

@ruidazeng
@ariG23498
@janak2
@qubvel
@zhxchen17
@xieofxie
@EFord36
@Thas-Tayapongsak
@hans00
@Abdennacer-Badaoui

What's Changed

Update ort training installation instructions by @echarlaix in https://github.com/huggingface/optimum/pull/2173
Dev version by @echarlaix in https://github.com/huggingface/optimum/pull/2175
Fixed All Typos in docs by @ruidazeng in https://github.com/huggingface/optimum/pull/2185
Remove deprecated ORTModel class by @echarlaix in https://github.com/huggingface/optimum/pull/2187
avoid library_name guessing if it is known in parameters standartization by @eaidova in https://github.com/huggingface/optimum/pull/2179
Infer whether a model needs to be exported to ONNX or not by @echarlaix in https://github.com/huggingface/optimum/pull/2181
Update optimum neuron extra by @dacorvo in https://github.com/huggingface/optimum/pull/2190
Add support for Moonshine ONNX export (& seq2seq models with non-legacy cache & Tensor.repeat_interleave) by @xenova in https://github.com/huggingface/optimum/pull/2162
ViTPose by @ariG23498 in https://github.com/huggingface/optimum/pull/2183
ViTPose export fix by @echarlaix in https://github.com/huggingface/optimum/pull/2192
Remove ORTTrainer code snippet from README by @echarlaix in https://github.com/huggingface/optimum/pull/2194
Remove README code snippets by @echarlaix in https://github.com/huggingface/optimum/pull/2195
Add transformers v4.49 support by @echarlaix in https://github.com/huggingface/optimum/pull/2191
Fix test benchmark suite by @echarlaix in https://github.com/huggingface/optimum/pull/2199
fix the onnx export custom model example; fix repo name; fix opset version; remove deprecated arg; by @janak2 in https://github.com/huggingface/optimum/pull/2203
Limit transformers version for bettertransformer support by @echarlaix in https://github.com/huggingface/optimum/pull/2198
Add ONNX config for RT-DETR (and RT-DETRv2) by @qubvel in https://github.com/huggingface/optimum/pull/2201
Remove deprecated notebook by @echarlaix in https://github.com/huggingface/optimum/pull/2205
Update CI runner to ubuntu 22.04 by @echarlaix in https://github.com/huggingface/optimum/pull/2206
Add executorch documentation section by @echarlaix in https://github.com/huggingface/optimum/pull/2193
Fix typo in exporters/onnx/utils.py by @zhxchen17 in https://github.com/huggingface/optimum/pull/2210
Link Optimum-ExecuTorch to parent Optimum on Hub by @guangy10 in https://github.com/huggingface/optimum/pull/2222
Fix CI and update Transformers (4.51.1) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2225
Remove FP16_Optimizer patch for DeepSpeed by @Rohan138 in https://github.com/huggingface/optimum/pull/2213
Fix diffusers by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2229
Remove diffusers extra by @echarlaix in https://github.com/huggingface/optimum/pull/2207
TRT engine docs by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1396
Always use a deafult user agent by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2230
dedup getmodelexternaldata_paths by @xieofxie in https://github.com/huggingface/optimum/pull/2217
Clean up workflows by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2231
reduce area of patch_everywhere for avoid unexpected replacements by @eaidova in https://github.com/huggingface/optimum/pull/2237
add dinov2 onnx optimizer support by @EFord36 in https://github.com/huggingface/optimum/pull/2227
Fix code quality test by @echarlaix in https://github.com/huggingface/optimum/pull/2239
Add onnx export for efficientnet by @Thas-Tayapongsak in https://github.com/huggingface/optimum/pull/2214
add loading image processor by @eaidova in https://github.com/huggingface/optimum/pull/2254
Fix CLIPSdpaAttention had dropped since v4.48 by @hans00 in https://github.com/huggingface/optimum/pull/2245
Increase clip opset by @echarlaix in https://github.com/huggingface/optimum/pull/2256
Add feature extraction support for image models by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2255
adding token classification task for qwen2 by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2261
upgrade min transformers version for phi3 by @echarlaix in https://github.com/huggingface/optimum/pull/2263

- Python
Published by echarlaix about 1 year ago

optimum - v1.24.0: SD3 & Flux, DinoV2, Modernbert, GPTQModel, Transformers v4.48...

Release Notes: Optimum v1.24.0

We’re excited to announce the release of Optimum v1.24.0. This update expands ONNX-based model capabilities and includes several improvements, bug fixes, and new contributions from the community.

:rocket: New Features & Enhancements

ORTQuantizer now supports models with ONNX subfolders.
ONNX Runtime IO Binding support for all supported Transformers models (no models left behind).
SD3 and Flux model support added to ORTDiffusionPipeline enabling latest diffusion-based models.
Transformers v4.47 and v4.48 compatibility, ensuring seamless integration with the latest advancements in Hugging Face's ecosystem.
ONNX export support extended to various models, including Decision Transformer, ModernBERT, Megatron-BERT, Dinov2, OLMo, and many more (see details).

:wrench: Key Fixes & Optimizations

Dropped support for Python 3.8
Bug fixes in ModelPatcher, SDXL refiner export, and device checks for improved reliability.

:bustsinsilhouette: New Contributors

A huge thank you to our first-time contributors: - @gabe-l-hart - @ra9hur - @bndos - @mlynatom - @LoSealL - @sjrl - @guangy10 - @LRL-ModelCloud - @pragyandev

Your contributions make Optimum better! :tada:

For a detailed list of all changes, please check out the full changelog.

:rocket: Happy optimizing!

What's Changed

* Onnx granite by @gabe-l-hart in https://github.com/huggingface/optimum/pull/2043 * Drop python 3.8 by @echarlaix in https://github.com/huggingface/optimum/pull/2086 * Update Dockerfile base image by @echarlaix in https://github.com/huggingface/optimum/pull/2089 * add transformers 4.36 tests by @echarlaix in https://github.com/huggingface/optimum/pull/2085 * [`fix`] Allow ORTQuantizer over models with subfolder ONNX files by @tomaarsen in https://github.com/huggingface/optimum/pull/2094 * SD3 and Flux support by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2073 * Remove datasets as required dependency by @echarlaix in https://github.com/huggingface/optimum/pull/2087 * Add ONNX Support for Decision Transformer Model by @ra9hur in https://github.com/huggingface/optimum/pull/2038 * Generate guidance for flux by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2104 * Unbundle inputs generated by `DummyTimestepInputGenerator` by @JingyaHuang in https://github.com/huggingface/optimum/pull/2107 * Pass the revision to SentenceTransformer models by @bndos in https://github.com/huggingface/optimum/pull/2105 * Rembert onnx support by @mlynatom in https://github.com/huggingface/optimum/pull/2108 * fix bug `ModelPatcher` returns empty outputs by @LoSealL in https://github.com/huggingface/optimum/pull/2109 * Fix workflow to mark issues as stale by @echarlaix in https://github.com/huggingface/optimum/pull/2110 * Remove doc-build by @echarlaix in https://github.com/huggingface/optimum/pull/2111 * Downgrade stale bot to v8 and fix permissions by @echarlaix in https://github.com/huggingface/optimum/pull/2112 * Update documentation color from google tpu section by @echarlaix in https://github.com/huggingface/optimum/pull/2113 * Fix workflow to mark PRs as stale by @echarlaix in https://github.com/huggingface/optimum/pull/2116 * Enable transformers v4.47 support by @echarlaix in https://github.com/huggingface/optimum/pull/2119 * Add ONNX export support for MGP-STR by @xenova in https://github.com/huggingface/optimum/pull/2099 * Add ONNX export support for OLMo and OLMo2 by @xenova in https://github.com/huggingface/optimum/pull/2121 * Pass on `model_kwargs` when exporting a SentenceTransformers model by @sjrl in https://github.com/huggingface/optimum/pull/2126 * Add ONNX export support for DinoV2, Hiera, Maskformer, PVT, SigLIP, SwinV2, VitMAE, and VitMSN models by @xenova in https://github.com/huggingface/optimum/pull/2001 * move check_dummy_inputs_allowed to common export utils by @eaidova in https://github.com/huggingface/optimum/pull/2114 * Remove CI macos runners by @echarlaix in https://github.com/huggingface/optimum/pull/2129 * Enable GPTQModel by @jiqing-feng in https://github.com/huggingface/optimum/pull/2064 * Skip private model loading for external contributors by @echarlaix in https://github.com/huggingface/optimum/pull/2130 * fix sdxl refiner export by @eaidova in https://github.com/huggingface/optimum/pull/2133 * Export to ExecuTorch: Initial Integration by @guangy10 in https://github.com/huggingface/optimum/pull/2090 * Fix AutoModel can't load gptq model due to module prefix mismatch vs AutoModelForCausalLM by @LRL-ModelCloud in https://github.com/huggingface/optimum/pull/2146 * Update docker files by @echarlaix in https://github.com/huggingface/optimum/pull/2102 * Limit diffusers version by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2150 * Add ONNX export support for ModernBERT by @xenova in https://github.com/huggingface/optimum/pull/2131 * Allow GPTQModel to auto select Marlin or faster kernels for inference only ops by @LRL-ModelCloud in https://github.com/huggingface/optimum/pull/2138 * fix device check by @jiqing-feng in https://github.com/huggingface/optimum/pull/2136 * Replace check_if_xxx_greater with is_xxx_version by @echarlaix in https://github.com/huggingface/optimum/pull/2152 * Add tf available and version by @echarlaix in https://github.com/huggingface/optimum/pull/2154 * Add ONNX export support for `PatchTST` by @xenova in https://github.com/huggingface/optimum/pull/2101 * fix infer task from model_name if model from sentence transformer by @eaidova in https://github.com/huggingface/optimum/pull/2151 * Unpin diffusers and pass onnx exporters tests by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2153 * Uncomment modernbert config by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2155 * Skip optimum-benchmark when loading namespace modules by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2159 * Fix PR doc upload by @regisss in https://github.com/huggingface/optimum/pull/2161 * Move executorch to optimum-executorch by @echarlaix in https://github.com/huggingface/optimum/pull/2165 * Adding Onnx Support For Megatron-Bert by @pragyandev in https://github.com/huggingface/optimum/pull/2169 * Transformers 4.48 by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2158 * Update ort CIs (slow, gpu, train) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2024

- Python
Published by IlyasMoutawwakil over 1 year ago

optimum - v1.23.3: Patch release

Add sentence-transformers and timm documentation example by @echarlaix in https://github.com/huggingface/optimum/pull/2072
Create token type ids when not provided by @echarlaix in https://github.com/huggingface/optimum/pull/2081
Add transformers v4.46 support by @echarlaix in https://github.com/huggingface/optimum/pull/2078

- Python
Published by echarlaix over 1 year ago

optimum - v1.23.2: Patch release

Fix compatibility with diffusers < 0.25.0 #2063 @echarlaix
Update the habana extra #2077 @regisss

Full Changelog: https://github.com/huggingface/optimum/compare/v1.23.1...v1.23.2

- Python
Published by regisss over 1 year ago

optimum - v1.23.1: Patch release

Fix doc build by @regisss in https://github.com/huggingface/optimum/pull/2050
Don't hardcode the logger level to INFO let users set TRANSFORMERS_VERBOSITY by @tomaarsen in https://github.com/huggingface/optimum/pull/2047
Add workflow to mark issues as stale by @regisss in https://github.com/huggingface/optimum/pull/2051
Fix onnx export when transformers >= v4.45 (impacting sentence-transformers and timm models) by @echarlaix in https://github.com/huggingface/optimum/pull/2053 and https://github.com/huggingface/optimum/pull/2054

- Python
Published by echarlaix over 1 year ago

optimum - v1.23.0: ORTDiffusionPipeline, transformers v4.45

ONNX Runtime Diffusion pipeline

Adding ORTDiffusionPipeline to simplify diffusers model loading by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1960 and https://github.com/huggingface/optimum/pull/2021

diff model_id = "runwayml/stable-diffusion-v1-5" - pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, revision="onnx") + pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx") image = pipeline("sailing ship in storm by Leonardo da Vinci").images[0]

Transformers v4.45

Transformers v4.45 support by @echarlaix in https://github.com/huggingface/optimum/pull/2023 and https://github.com/huggingface/optimum/pull/2045

Subfolder

Remove the restriction for the model's config to be in the model's subfolder by @echarlaix in https://github.com/huggingface/optimum/pull/2044

New Contributors

@tcsavage made their first contribution in https://github.com/huggingface/optimum/pull/1965
@yuanwu2017 made their first contribution in https://github.com/huggingface/optimum/pull/2003
@h3110Fr13nd made their first contribution in https://github.com/huggingface/optimum/pull/2031
@glegendre01 made their first contribution in https://github.com/huggingface/optimum/pull/2033
@rbrugaro made their first contribution in https://github.com/huggingface/optimum/pull/2027

Full Changelog: https://github.com/huggingface/optimum/compare/v1.22.0...v1.23.0

- Python
Published by echarlaix over 1 year ago

optimum - v1.22.0: transformers 4.44 compatibility, bugfixes

What's Changed

Fix sentence transformers modeling patching for export by @echarlaix in https://github.com/huggingface/optimum/pull/1936
Update optimum intel extra by @echarlaix in https://github.com/huggingface/optimum/pull/1935
Update Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1937
Remove inplace op in mistral patcher by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1938
Fix forward bug in ORTModelForFeatureExtraction by @moria97 in https://github.com/huggingface/optimum/pull/1941
Deprecate ORTModel class by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1939
Remove warning by @echarlaix in https://github.com/huggingface/optimum/pull/1945
Clip vision model onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/1920
Add export test for swin with shifted windows by @echarlaix in https://github.com/huggingface/optimum/pull/1942
Refactor diffusers tasks by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1947
Fix optimizer's command line reading by @idruker-cerence in https://github.com/huggingface/optimum/pull/1961
Fix unmaskunattendedpatched signature by @fxmarty in https://github.com/huggingface/optimum/pull/1963
Fix undefined variable in library name inference by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1964
Fix gpt bigcode ONNX export for transformers<4.39.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1973
Support transformers 4.43 by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1971
chore(ci): migrate runner configuration in GitHub workflows by @XciD in https://github.com/huggingface/optimum/pull/1978
Fix typos in quantization.mdx by @aldakata in https://github.com/huggingface/optimum/pull/1989
Update Habana extra in setup.py by @regisss in https://github.com/huggingface/optimum/pull/1991
Follow up the diffusers task refactoring by @JingyaHuang in https://github.com/huggingface/optimum/pull/1999
Transformers 4.44 support by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1996
Modify token classification processor default dataset args by @echarlaix in https://github.com/huggingface/optimum/pull/2005
Fix TFLite tests by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2007
Fix attribute name from inputs_names to input_names by @J4BEZ in https://github.com/huggingface/optimum/pull/2010
Fix typo in BetterTransformer's overview docs by @ftnext in https://github.com/huggingface/optimum/pull/2015
Apply deprecated evaluation_strategy by @muellerzr in https://github.com/huggingface/optimum/pull/1819
Update transformers imports for deepspeed and is_torch_xla_available by @Rohan138 in https://github.com/huggingface/optimum/pull/2012
Add quanto install and instructions by @dacorvo in https://github.com/huggingface/optimum/pull/1976

New Contributors

@moria97 made their first contribution in https://github.com/huggingface/optimum/pull/1941
@XciD made their first contribution in https://github.com/huggingface/optimum/pull/1978
@zhenglongjiepheonix made their first contribution in https://github.com/huggingface/optimum/pull/1933
@aldakata made their first contribution in https://github.com/huggingface/optimum/pull/1989
@J4BEZ made their first contribution in https://github.com/huggingface/optimum/pull/2010
@ftnext made their first contribution in https://github.com/huggingface/optimum/pull/2015
@muellerzr made their first contribution in https://github.com/huggingface/optimum/pull/1819
@Rohan138 made their first contribution in https://github.com/huggingface/optimum/pull/2012

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.4...v1.22.0

- Python
Published by echarlaix almost 2 years ago

optimum - v1.21.4: Patch release

Update Habana extra in setup.py by @regisss in #1991

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.3...v1.21.4

- Python
Published by regisss almost 2 years ago

optimum - v1.21.3: Patch release

Deprecate ORTModel class by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1939
Remove warning by @echarlaix in https://github.com/huggingface/optimum/pull/1945
Fix optimizer's command line reading by @idruker-cerence in https://github.com/huggingface/optimum/pull/1961
Fix unmaskunattendedpatched signature by @fxmarty in https://github.com/huggingface/optimum/pull/1963
Fix gpt bigcode ONNX export for transformers<4.39.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1973
Support transformers 4.43 by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1971

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.2...v1.21.3

- Python
Published by echarlaix almost 2 years ago

optimum - v1.21.2: Patch release

Remove inplace op in mistral patcher by @IlyasMoutawwakil in #1938
Fix ORTModelForFeatureExtraction modeling by @moria97 in #1941

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.1...v1.21.2

- Python
Published by echarlaix almost 2 years ago

optimum - v1.21.1: Patch release

Fix sentence transformers model patching by @echarlaix in https://github.com/huggingface/optimum/pull/1936
Update Intel extra by @echarlaix in https://github.com/huggingface/optimum/pull/1935
Update Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1937

Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.0...v1.21.1

- Python
Published by echarlaix almost 2 years ago

optimum - v1.21.0: many bugfixes, transformers 4.42 compatibility

What's Changed

ORTOptimizer for the model type Segformer by @zachmayer in https://github.com/huggingface/optimum/pull/1820
fix: device consistence by @Daya-Jin in https://github.com/huggingface/optimum/pull/1891
Allow optimum to discover and load subpackages by @dacorvo in https://github.com/huggingface/optimum/pull/1894
feat(ci): add trufflehog secrets detector by @McPatate in https://github.com/huggingface/optimum/pull/1899
fix(ci): remove unnecessary permissions by @McPatate in https://github.com/huggingface/optimum/pull/1904
Remove read token by @fxmarty in https://github.com/huggingface/optimum/pull/1903
Remove dataset with restrictive license by @echarlaix in https://github.com/huggingface/optimum/pull/1910
Fix Windows and onnx dtype compatibility by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1886
Deprecated use_auth_token by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1837
Add redirection for optimum intel doc by @echarlaix in https://github.com/huggingface/optimum/pull/1918
Read useexternaldata_format from ORTConfig file by @idruker-cerence in https://github.com/huggingface/optimum/pull/1917
Pin numpy v1 for onnxruntime by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1921
Fix GPTQ CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1878
Fix code quality by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1928
Fix incorrect names for usage blenderbot for causallm by @eaidova in https://github.com/huggingface/optimum/pull/1887
Fixed bug key error "lasthiddenstate" by @satishsilveri in https://github.com/huggingface/optimum/pull/1674
Support transformers 4.42 by @fxmarty in https://github.com/huggingface/optimum/pull/1929

New Contributors

@zachmayer made their first contribution in https://github.com/huggingface/optimum/pull/1820
@Daya-Jin made their first contribution in https://github.com/huggingface/optimum/pull/1891
@dacorvo made their first contribution in https://github.com/huggingface/optimum/pull/1894
@McPatate made their first contribution in https://github.com/huggingface/optimum/pull/1899
@idruker-cerence made their first contribution in https://github.com/huggingface/optimum/pull/1917
@satishsilveri made their first contribution in https://github.com/huggingface/optimum/pull/1674

Full Changelog: https://github.com/huggingface/optimum/compare/v1.20.0...v1.21.0

- Python
Published by fxmarty almost 2 years ago

optimum - v1.20.0: VITS, Phi-3 ONNX export

Extended ONNX export

VITS ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1607
Phi-3 ONNX export by @JingyaHuang in https://github.com/huggingface/optimum/pull/1870
Add Phi-3 normalized config by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/1841
Add Phi-3 small normalized config by @JingyaHuang in https://github.com/huggingface/optimum/pull/1864

Other changes and bugfixes

Bump transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1824
Remove call to apt update before apt purge in the main doc build workflow by @regisss in https://github.com/huggingface/optimum/pull/1830
Update github workflows by @echarlaix in https://github.com/huggingface/optimum/pull/1829
Remove bad PPA in main doc build workflow by @regisss in https://github.com/huggingface/optimum/pull/1831
Fix TPU doc build by @regisss in https://github.com/huggingface/optimum/pull/1834
Fix sentence transformers models infer library by @echarlaix in https://github.com/huggingface/optimum/pull/1832
Fix random initialization of bias when using GPTQ quantization with models without bias by @B-201 in https://github.com/huggingface/optimum/pull/1827
Update the Transformers dependency in the Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1851
Make stable diffusion unet and vae number of channels static by @eaidova in https://github.com/huggingface/optimum/pull/1840
Fix compatibility with transformers v4.41.0 for ONNX by @echarlaix in https://github.com/huggingface/optimum/pull/1860
Fix FX CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1866
Fix Utils CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1867
Fix BT CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1872
Fix ORTConfig loading by @mr-sarthakgupta in https://github.com/huggingface/optimum/pull/1879
Update ORT doc for ROCM 6.0 by @mht-sharma in https://github.com/huggingface/optimum/pull/1862
Fix ort config instantiation (frompretrained) and saving (savepretrained) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1865
Fix ORT CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1875
Update optimum intel extra by @echarlaix in https://github.com/huggingface/optimum/pull/1882
Bump transformers version for neuron extras by @JingyaHuang in https://github.com/huggingface/optimum/pull/1881

New Contributors

@B-201 made their first contribution in https://github.com/huggingface/optimum/pull/1827
@mr-sarthakgupta made their first contribution in https://github.com/huggingface/optimum/pull/1879

Full Changelog: https://github.com/huggingface/optimum/compare/v1.19.0...v1.20.0

- Python
Published by echarlaix about 2 years ago

optimum - v1.19.2: Patch release

Update the Transformers dependency in the Habana extra #1851 @regisss

Full Changelog: https://github.com/huggingface/optimum/compare/v1.19.1...v1.19.2

- Python
Published by regisss about 2 years ago

optimum - v1.19.1: Patch release

Bump transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1824
Remove call to apt update before apt purge in the main doc build workflow by @regisss in https://github.com/huggingface/optimum/pull/1830

Full Changelog: https://github.com/huggingface/optimum/compare/v1.19.0...v1.19.1

- Python
Published by echarlaix about 2 years ago

optimum - v1.19.0: Musicgen, MarkupLM ONNX export

Extended ONNX export

Musicgen and MarkupLM models from Transformers can now be exported to ONNX through optimum-cli export onnx. Musicgen ONNX export is used to run the model locally in a browser through transformers.js.

Musicgen ONNX export (text-conditional only) by @fxmarty in https://github.com/huggingface/optimum/pull/1779
Add support for markuplm ONNX export by @pogzyb in https://github.com/huggingface/optimum/pull/1784

Other changes and bugfixes

Fix IR version for merged ONNX decoders by @fxmarty in https://github.com/huggingface/optimum/pull/1780
Update test model id by @echarlaix in https://github.com/huggingface/optimum/pull/1785
Add Nvidia and Neuron to README by @JingyaHuang in https://github.com/huggingface/optimum/pull/1791
adds debug options to dump onnx graphs by @prathikr in https://github.com/huggingface/optimum/pull/1789
Improve PR template by @fxmarty in https://github.com/huggingface/optimum/pull/1799
Add Google TPU to the mix by @mfuntowicz in https://github.com/huggingface/optimum/pull/1797
Add redirection for Optimum TPU by @regisss in https://github.com/huggingface/optimum/pull/1801
Add Nvidia and Neuron to the installation doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/1803
Update installation instructions by @echarlaix in https://github.com/huggingface/optimum/pull/1806
Fix offline compatibility by @fxmarty in https://github.com/huggingface/optimum/pull/1805
Remove unnecessary constants for > 2GB ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/1808
Add onnx export function for pix2struct model by @naormatania in https://github.com/huggingface/optimum/pull/1815

New Contributors

@pogzyb made their first contribution in https://github.com/huggingface/optimum/pull/1784
@naormatania made their first contribution in https://github.com/huggingface/optimum/pull/1815

Full Changelog: https://github.com/huggingface/optimum/compare/v1.18.0...v1.19.0

- Python
Published by fxmarty about 2 years ago

optimum - v1.18.1: Patch release

Fix the installation for Optimum Neuron v0.0.21 release

Improve the installation of optimum-neuron through optimum extras #1778

Fix the task inference of stable diffusion

Fix infer task for stable diffusion #1793

Full Changelog: https://github.com/huggingface/optimum/compare/v1.18.0...v1.18.1

- Python
Published by JingyaHuang about 2 years ago

optimum - v1.18.0: Gemma, OWLv2, MPNet Qwen2 ONNX support

New architectures ONNX export :

OWLv2 by @xenova in #1689
Gemma by @fxmarty in #1714
MPNet by @nathan-az in #1471
Qwen2 by @uniartisan in #1746

Other changes and bugfixes

Fix starcoder ORT integration by @fxmarty in #1722
Fix useauthtoken with ORTModel by @fxmarty in #1740
Fix compatibility with transformers v4.39.0 by @echarlaix in #1764

- Python
Published by echarlaix about 2 years ago

optimum - v1.17.1: Patch release

Update Transformers dependency for the release of Optimum Habana v1.10.2

Update Transformers dependency in Habana extra #1700

Full Changelog: https://github.com/huggingface/optimum/compare/v1.17.0...v1.17.1

- Python
Published by regisss over 2 years ago

optimum - v1.17.0: Improved ONNX support & many bugfixes

ONNX export from `nn.Module`

A function is exposed to programmatically export any nn.Module (e.g. models coming from Transformers, but modified). This is useful in case you need to do some modifications on models loaded from the Hub before exporting. Example:

```python from transformers import AutoModelForImageClassification from optimum.exporters.onnx import onnxexportfrom_model

model = AutoModelForImageClassification.from_pretrained("google/vit-base-patch16-224")

Here one could do any modification on the model before the export.

onnxexportfrommodel(model, output="vitonnx") ```

Enable model ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1649

ONNX export with static shapes

The Optimum ONNX export CLI allows to disable dynamic shape for inputs/outputs:

optimum-cli export onnx --model timm/ese_vovnet39b.ra_in1k out_vov --no-dynamic-axes

This is useful if the exported model is to be consumed by a runtime that does not support dynamic shapes. The static shape can be specified e.g. with --batch_size 1 . See all the shape options in optimum-cli export onnx --help.

Enable export of model with fixed shape by @mht-sharma in https://github.com/huggingface/optimum/pull/1643

BF16 ONNX export

The Optimum ONNX export now supports BF16 export on CPU and GPU. Beware though that ONNX Runtime is most often not able to consume the models as some operation are not implemented in this data type, although the exported models comply with ONNX standard. This is useful if you are developing a runtime that consomes BF16 ONNX models.

Example: optimum-cli export onnx --model bert-base-uncased --dtype bf16 bert_onnx

BF16 support in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1654

ONNX export for news models

You can now export to ONNX table-transformer, bart for text-classification.

Add ONNX export for table-transformer by @xenova in https://github.com/huggingface/optimum/pull/1616
Reactivate BART Onnx Export by @claeyzre in https://github.com/huggingface/optimum/pull/1666

Sentence Transformers ONNX export

Fix sentence transformers ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1632
Bump sentence-transformers ONNX opset by @fxmarty in https://github.com/huggingface/optimum/pull/1634
Pass trust_remote_code to sentence transformers export by @xenova in https://github.com/huggingface/optimum/pull/1677
Fix library detection by @fxmarty in https://github.com/huggingface/optimum/pull/1690

Timm models support with ONNX Runtime

Timm models can now be run through ONNX Runtime with the class ORTModelForImageClassification:

```python from urllib.request import urlopen

import timm import torch from PIL import Image

from optimum.onnxruntime import ORTModelForImageClassification

Export the model to ONNX under the hood with export=True.

model = ORTModelForImageClassification.frompretrained("timm/resnext10164x4d.c1_in1k", export=True)

Get model specific transforms (normalization, resize).

dataconfig = timm.data.resolvedataconfig(pretrainedcfg=model.config.pretrainedcfg) transforms = timm.data.createtransform(**dataconfig, istraining=False)

img = Image.open( urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png") ) output = model(transforms(img).unsqueeze(0)).logits top5probabilities, top5class_indices = torch.topk(torch.softmax(output, dim=1) * 100, k=5) ```

Add Timm support in ORTModelForImageClassification by @mht-sharma in https://github.com/huggingface/optimum/pull/1578

Other changes and bugfixes

Modify SEW-D model for tests by @echarlaix in https://github.com/huggingface/optimum/pull/1601
Add phi and mixtral model type to normalizedconfig by @changwangss in https://github.com/huggingface/optimum/pull/1625
Remove "to ONNX" from info message when exporting model by @helena-intel in https://github.com/huggingface/optimum/pull/1627
Modify model id for test by @echarlaix in https://github.com/huggingface/optimum/pull/1628
Fix cupy detection by @fxmarty in https://github.com/huggingface/optimum/pull/1635
Fix ORT detection by @fxmarty in https://github.com/huggingface/optimum/pull/1636
Enable sdpa export for SD unet component by @echarlaix in https://github.com/huggingface/optimum/pull/1637
[ORT] Improve dummy mask & add tips for attention fusion in the doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/1640
Improve error message by @Almonok in https://github.com/huggingface/optimum/pull/1623
Add input_labels input to SAM model export by @xenova in https://github.com/huggingface/optimum/pull/1638
Fix c4 dataset loading by @SunMarc in https://github.com/huggingface/optimum/pull/1646
Avoid loading onnx file in weight deduplication if not necessary by @fxmarty in https://github.com/huggingface/optimum/pull/1648
Allow lower ONNX opsets by @fxmarty in https://github.com/huggingface/optimum/pull/1650
Remove abstract decorator from _export by @JingyaHuang in https://github.com/huggingface/optimum/pull/1652
Add rjieba install by @mht-sharma in https://github.com/huggingface/optimum/pull/1661
Fix wikitext2 processing by @SunMarc in https://github.com/huggingface/optimum/pull/1663
Fix: local variable 'dataset' referenced before assignment by @hiyouga in https://github.com/huggingface/optimum/pull/1600
Support float16 images in StableDiffusionXLWatermarker by @jambayk in https://github.com/huggingface/optimum/pull/1603
Extend autocast check to cover more platforms like XPU by @hoshibara in https://github.com/huggingface/optimum/pull/1639
Support IO Binding for ORTModelForCTC by @vidalmaxime in https://github.com/huggingface/optimum/pull/1629
Add fp16 support for split cache by @PatriceVignola in https://github.com/huggingface/optimum/pull/1602
ORTModelForFeatureExtraction always exports as transformers models by @fxmarty in https://github.com/huggingface/optimum/pull/1684
Avoid overriding model_type in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/1647
Fix gptq device_map = "cpu" by @SunMarc in https://github.com/huggingface/optimum/pull/1662
CI: Avoid iterating over a mutated iterable by @fxmarty in https://github.com/huggingface/optimum/pull/1683
Add option to disable ONNX constant folding by @fxmarty in https://github.com/huggingface/optimum/pull/1682
re-enable decoder sequence classification by @dwyatte in https://github.com/huggingface/optimum/pull/1679
Move & rename onnx_export by @fxmarty in https://github.com/huggingface/optimum/pull/1685
Update standardizemodelattributes by @mht-sharma in https://github.com/huggingface/optimum/pull/1686
Fix: AttributeError: module 'packaging' has no attribute 'version' by @soulteary in https://github.com/huggingface/optimum/pull/1660
Disable failing test & free space when building documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1693
Fix no space left on device in actions by @fxmarty in https://github.com/huggingface/optimum/pull/1694
Add end-to-end Marlin benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1695
Fix main doc build by @fxmarty in https://github.com/huggingface/optimum/pull/1697
Update optimum-intel requirements by @echarlaix in https://github.com/huggingface/optimum/pull/1699

New Contributors

@tomaarsen made their first contribution in https://github.com/huggingface/optimum/pull/1597
@helena-intel made their first contribution in https://github.com/huggingface/optimum/pull/1627
@Almonok made their first contribution in https://github.com/huggingface/optimum/pull/1623
@hiyouga made their first contribution in https://github.com/huggingface/optimum/pull/1600
@jambayk made their first contribution in https://github.com/huggingface/optimum/pull/1603
@hoshibara made their first contribution in https://github.com/huggingface/optimum/pull/1639
@vidalmaxime made their first contribution in https://github.com/huggingface/optimum/pull/1629
@PatriceVignola made their first contribution in https://github.com/huggingface/optimum/pull/1602
@claeyzre made their first contribution in https://github.com/huggingface/optimum/pull/1666
@dwyatte made their first contribution in https://github.com/huggingface/optimum/pull/1679
@soulteary made their first contribution in https://github.com/huggingface/optimum/pull/1660

Full Changelog: https://github.com/huggingface/optimum/compare/v1.16.0...v1.17.0

- Python
Published by fxmarty over 2 years ago

optimum - v1.16.2: Patch release

Fix ORT training compatibility for transformers v4.36.0 by @AdamLouly https://github.com/huggingface/optimum/pull/1586
Fix ONNX expor tcompatibility for transformers v4.37.0 by @echarlaix https://github.com/huggingface/optimum/pull/1641

- Python
Published by echarlaix over 2 years ago

optimum - v1.16.1: Patch release

Breaking change: BetterTransformer llama, falcon, whisper, bart is deprecated

The features from BetterTransformer for Llama, Falcon, Whisper and Bart have been upstreamed in Transformers. Please use transformers>=4.36 and torch>=2.1.1 to use by default PyTorch's scaled_dot_product_attention.

More details: https://github.com/huggingface/transformers/releases/tag/v4.36.0

What's Changed

Update dev version by @fxmarty in https://github.com/huggingface/optimum/pull/1596
Typo: tansformers -> transformers by @tomaarsen in https://github.com/huggingface/optimum/pull/1597
[GPTQ] fix tests by @SunMarc in https://github.com/huggingface/optimum/pull/1598
Show correct error message on using BT for SDPA models by @fxmarty in https://github.com/huggingface/optimum/pull/1599

New Contributors

@tomaarsen made their first contribution in https://github.com/huggingface/optimum/pull/1597

Full Changelog: https://github.com/huggingface/optimum/compare/v1.16.0...v1.16.1

- Python
Published by fxmarty over 2 years ago

optimum - v1.16.0: Transformers 4.36 compatibility, extended ONNX support, Mixtral GPTQ

Transformers 4.36 compatiblity

Notably, the ONNX exports aten::scaled_dot_product_attention in a standardized way for the compatible models.

Compatibility with Transformers 4.36 by @fxmarty in https://github.com/huggingface/optimum/pull/1590

Extended ONNX support: timm, sentence-transformers, Phi, ESM

Add ONNX export for phi models by @xenova in https://github.com/huggingface/optimum/pull/1579
Add ESM onnx support by @xenova in https://github.com/huggingface/optimum/pull/1581
Add timm models export by @mht-sharma in https://github.com/huggingface/optimum/pull/1587
Proper sentence-transformers ONNX export support by @fxmarty in https://github.com/huggingface/optimum/pull/1589

GPTQ for Mixtral

Work in progress.

add modules_in_block_to_quantize arg for gptq by @SunMarc in https://github.com/huggingface/optimum/pull/1585

What's Changed

Update version to 1.16.0.dev0 by @fxmarty in https://github.com/huggingface/optimum/pull/1571
Use doc links in the README for subpackages by @fxmarty in https://github.com/huggingface/optimum/pull/1572
Fix GPTQ compatibility with AutoGPTQ by @fxmarty in https://github.com/huggingface/optimum/pull/1574
Refactoring EC2 CIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/1575
Remove inputs from sentence-transformers ONNX output by @fxmarty in https://github.com/huggingface/optimum/pull/1593
Gptq tokenized dataset by @SunMarc in https://github.com/huggingface/optimum/pull/1584
Run timm ONNX CI only once per day by @fxmarty in https://github.com/huggingface/optimum/pull/1594
Run timm ONNX CI nightly v2 by @fxmarty in https://github.com/huggingface/optimum/pull/1595

Full Changelog: https://github.com/huggingface/optimum/compare/v1.15.0...v1.16.0

- Python
Published by fxmarty over 2 years ago

optimum - v1.15.0: ROCMExecutionProvider support

ROCMExecutionProvider support

The Optimum ONNX Runtime integration is extended to officially support ROCMExecutionProvider. See more details in the documentation.

Add AMD GPU support by @mht-sharma in https://github.com/huggingface/optimum/pull/1546
Update ROCM ORT doc by @mht-sharma in https://github.com/huggingface/optimum/pull/1564

Extended ONNX export

The Swin2sr, DPT, GLPN, ConvNextv2 are now supported in the ONNX export.

Swin2sr onnx by @baskrahmer in https://github.com/huggingface/optimum/pull/1492
Add depth-estimation w/ DPT+GLPN by @xenova in https://github.com/huggingface/optimum/pull/1529
Add convnextv2 onnx export by @xenova in https://github.com/huggingface/optimum/pull/1560

What's Changed

Add OV export CLI to README by @echarlaix in https://github.com/huggingface/optimum/pull/1526
Refactor NormalizedConfigs for GQA by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1539
Fix model patcher ONNX decoder export by @fxmarty in https://github.com/huggingface/optimum/pull/1547
Add AMD to the documentation main page by @mfuntowicz in https://github.com/huggingface/optimum/pull/1540
Add Optimum-amd documentation to the PR & release doc by @fxmarty in https://github.com/huggingface/optimum/pull/1562
Add amd documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1557
Remove delete_doc_comment workflows by @regisss in https://github.com/huggingface/optimum/pull/1565
optimum-nvidia by @mfuntowicz in https://github.com/huggingface/optimum/pull/1566
Update installation instructions in README by @echarlaix in https://github.com/huggingface/optimum/pull/1568
Update doc for AMD by @mht-sharma in https://github.com/huggingface/optimum/pull/1570
Add amd extra to setup.py by @echarlaix in https://github.com/huggingface/optimum/pull/1567

New Contributors

@xenova made their first contribution in https://github.com/huggingface/optimum/pull/1529

Full Changelog: https://github.com/huggingface/optimum/compare/v1.14.0...v1.15.0

- Python
Published by fxmarty over 2 years ago

optimum - v1.14.1: Patch release

Update optimum-intel required version by @echarlaix in https://github.com/huggingface/optimum/pull/1521
Swin2sr onnx by @baskrahmer in https://github.com/huggingface/optimum/pull/1492
Fix Falcon ONNX export with alibi by @fxmarty in https://github.com/huggingface/optimum/pull/1524
Fix whisper v3 ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1525
Add new fusion argument to fix compatibility with onnxruntime v1.16.2 by @echarlaix in https://github.com/huggingface/optimum/pull/1535
Add depth-estimation w/ DPT+GLPN by @xenova in https://github.com/huggingface/optimum/pull/1529

- Python
Published by echarlaix over 2 years ago

optimum - v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization

ONNX

New architectures

Falcon

Add ONNX and ORT support for Falcon by @fxmarty in https://github.com/huggingface/optimum/pull/1391

SpeechT5

SpeechT5 ONNX support by @fxmarty in https://github.com/huggingface/optimum/pull/1404

Mistral

Add Mistral models ONNX export support by @echarlaix in https://github.com/huggingface/optimum/pull/1425

TrOCR

Enable KV cache support by @fxmarty in https://github.com/huggingface/optimum/pull/1456

LCMs

Enable LCMs (available in in diffusers since v0.22.0) ONNX export and ORT inference by @echarlaix in https://github.com/huggingface/optimum/pull/1469

```python from optimum.onnxruntime import ORTLatentConsistencyModelPipeline

pipe = ORTLatentConsistencyModelPipeline.frompretrained("SimianLuo/LCMDreamshaperv7", export=True) prompt = "sailing ship in storm by Leonardo da Vinci" images = pipe(prompt=prompt, numinferencesteps=4, guidancescale=8.0).images ``` Also enable ONNX export using the CLI :

bash optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/

Decoder refactorization

Add position ids as input during ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1381
Enable the export of only one decoder for decoder-only models by @echarlaix in https://github.com/huggingface/optimum/pull/1257

GPTQ

Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in https://github.com/huggingface/optimum/pull/1419
Disable exllamav2 for quantization by @SunMarc in https://github.com/huggingface/optimum/pull/1482
Default to exllama when exllamav2 is disabled by @SunMarc in https://github.com/huggingface/optimum/pull/1494
Added cacheblockoutputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in https://github.com/huggingface/optimum/pull/1479
Add support for CPU Inference by @vivekkhandelwal1 in https://github.com/huggingface/optimum/pull/1496
Fix minimum version of auto-gptq by @fxmarty in https://github.com/huggingface/optimum/pull/1504
switch to exllama_config instead of disabling exllamav2 by @SunMarc in https://github.com/huggingface/optimum/pull/1505

Other changes and bugfixes

Fix wrong dtype in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1369
Add support for loading quantization from config by @aarnphm https://github.com/huggingface/optimum/pull/1363
Guard multiprocessing set start method by @fxmarty in https://github.com/huggingface/optimum/pull/1377
Do not output KV cache when not using with-past in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1358
Fix provider availability check on ORT 1.16.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/1403
Fix quantization for onnxruntime v1.16.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1405
Fix normalized config key for models architecture by @echarlaix in https://github.com/huggingface/optimum/pull/1408
Fix arg in bettertransformer llama attention by @SunMarc in https://github.com/huggingface/optimum/pull/1421
Ignore .xml files for Stable Diffusion ORT downloads by @baskrahmer in https://github.com/huggingface/optimum/pull/1428
Falcon BetterTransformer requires transformers>=4.34 by @fxmarty in https://github.com/huggingface/optimum/pull/1431
Fix llama ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1432
Update attention.py by @DongHande in https://github.com/huggingface/optimum/pull/1416
Remove SharedDDP as it was deprecated from Transformers by @AdamLouly in https://github.com/huggingface/optimum/pull/1443
Fix owlvit task detection by @fxmarty in https://github.com/huggingface/optimum/pull/1453
Improve ONNX quantization doc by @fxmarty in https://github.com/huggingface/optimum/pull/1451
Fix perceiver tests and dummy inputs for ONNX by @baskrahmer in https://github.com/huggingface/optimum/pull/1449
Disable bart onnx export for text-classification and question-answering by @fxmarty in https://github.com/huggingface/optimum/pull/1457
Fix ONNX exporter library_name by @baskrahmer in https://github.com/huggingface/optimum/pull/1460
[ORT Training] Some important updates of ONNX Runtime training APIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/1335
Fix typo in BetterTransformer CLIP by @fxmarty in https://github.com/huggingface/optimum/pull/1468
Fix custom architecture detection in onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/1472
Fix whisper export by @mht-sharma in https://github.com/huggingface/optimum/pull/1503
Update Transformers dependency for Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1508
Fix argument error by @ranchlai in https://github.com/huggingface/optimum/pull/1501
Remove attention mask patching by @fxmarty in https://github.com/huggingface/optimum/pull/1509
Fix generation input by @echarlaix in https://github.com/huggingface/optimum/pull/1512
Fix tests ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/1517
Fix BT on transformers 4.35 release by @fxmarty in https://github.com/huggingface/optimum/pull/1518

New Contributors

@aarnphm made their first contribution in https://github.com/huggingface/optimum/pull/1363
@DongHande made their first contribution in https://github.com/huggingface/optimum/pull/1416
@AlexKoff88 made their first contribution in https://github.com/huggingface/optimum/pull/1479
@vivekkhandelwal1 made their first contribution in https://github.com/huggingface/optimum/pull/1496
@ranchlai made their first contribution in https://github.com/huggingface/optimum/pull/1501

- Python
Published by echarlaix over 2 years ago

optimum - v1.13.3: Patch release

Patch release for transformers==4.34.1 compatibility. We will do a release next week for transformers==4.35 compatibility and new features. Please bear with us!

Falcon BetterTransformer requires transformers>=4.34 by @fxmarty https://github.com/huggingface/optimum/pull/1431
Fix arg in bettertransformer llama attention by @SunMarc #1421
Update Transformers dependency for Habana extra by @regisss #1508
temporarily pin to transformers<4.35 by @fxmarty https://github.com/huggingface/optimum/commit/616931019b9bd7546918a48d475a07efb92f51b1

- Python
Published by fxmarty over 2 years ago

optimum - v1.13.2: Patch release

Fix provider availability check on ORT 1.16.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/1403
Fix ONNX Runtime quantization compatibility for onnxruntime v1.16.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1405

- Python
Published by echarlaix over 2 years ago

optimum - v1.13.1: Patch release

Fix ONNX fp16 export that broke in 1.13.0.

What's Changed

Fix wrong dtype in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1369
Fix tests collection for TFLite export and trigger TFLite tests only when relevant by @fxmarty in https://github.com/huggingface/optimum/pull/1368
upgrade min compatible optimum-intel version by @echarlaix in https://github.com/huggingface/optimum/pull/1371
Fix fp16 ONNX export test by @fxmarty in https://github.com/huggingface/optimum/pull/1373

- Python
Published by fxmarty almost 3 years ago

optimum - v1.13.0: ONNX weight deduplication, ONNX export and ORT extension

Deduplicate Embedding / LM head weight in the ONNX export

Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: https://github.com/pytorch/pytorch/issues/108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.

Fix PyTorch tied weights being duplicated in the exported ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/1326
Fix initializer detection for weight deduplication by @fxmarty in https://github.com/huggingface/optimum/pull/1333

Extended ONNX Runtime support

ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.

Pix2Struct onnxruntime support by @krathul in https://github.com/huggingface/optimum/pull/1296
Add MPT onnx and ORT support by @jiqing-feng in https://github.com/huggingface/optimum/pull/1161
Donut iobinding by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1209
Add encoder decoder model by @mht-sharma in https://github.com/huggingface/optimum/pull/851

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

Additionally, the model SAM is now be default exported as a visionencoder.onnx, and promptencodermaskdecoder.onnx.

Add MPT onnx and ORT support by @jiqing-feng in https://github.com/huggingface/optimum/pull/1161
Adds ONNX Export Support for Timm Models by @mht-sharma in https://github.com/huggingface/optimum/pull/965
Add encoder decoder model by @mht-sharma in https://github.com/huggingface/optimum/pull/851
Fix SAM ONNX export requirements with transformers 4.32, export vision encoder separately by @fxmarty in https://github.com/huggingface/optimum/pull/1301

BetterTransformer supports Falcon

[BetterTransformer] Add falcon to BetterTransformer by @younesbelkada in https://github.com/huggingface/optimum/pull/1343

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

The function exllama_set_max_input_length from auto-gptq can now be used with Transformers GPTQ models.

Version bump + add maxinputlength to gptq by @SunMarc in https://github.com/huggingface/optimum/pull/1329

Other changes and bugfixes

Update version to 1.12.1.dev0 following release by @fxmarty in https://github.com/huggingface/optimum/pull/1312
Add GPTQ prefill benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1313
Precise ORTModel documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1268
Improve BetterTransformer backward compatibility by @fxmarty in https://github.com/huggingface/optimum/pull/1314
Improve ORTModel documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1245
Add bitsandbytes benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1320
fix typo in log message by @AAnirudh07 in https://github.com/huggingface/optimum/pull/1322
Support customize dtype for dummy generators by @JingyaHuang in https://github.com/huggingface/optimum/pull/1307
Fix opset custom onnx export by @mht-sharma in https://github.com/huggingface/optimum/pull/1331
Replace mpt to ernie custom export by @mht-sharma in https://github.com/huggingface/optimum/pull/1332
Fix BT benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/1344
Add nameorpath for donut generation by @fxmarty in https://github.com/huggingface/optimum/pull/1345
send both negative prompt embeds to ORT SDXL by @ssube in https://github.com/huggingface/optimum/pull/1339
add vae image processor by @echarlaix in https://github.com/huggingface/optimum/pull/1219
add negative prompt test by @echarlaix in https://github.com/huggingface/optimum/pull/1347
Add GPT BigCode to the BT documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1356
Add BT dummy objects by @fxmarty in https://github.com/huggingface/optimum/pull/1355
Add text2text-generation-with-past test for encoder-decoder model by @mht-sharma in https://github.com/huggingface/optimum/pull/1338
Fix sentence transformer export by @mht-sharma in https://github.com/huggingface/optimum/pull/1366

New Contributors

@krathul made their first contribution in https://github.com/huggingface/optimum/pull/1296
@AAnirudh07 made their first contribution in https://github.com/huggingface/optimum/pull/1322
@jiqing-feng made their first contribution in https://github.com/huggingface/optimum/pull/1161
@ssube made their first contribution in https://github.com/huggingface/optimum/pull/1339

Full Changelog: https://github.com/huggingface/optimum/compare/v1.12.0...v1.13.0

- Python
Published by fxmarty almost 3 years ago

optimum - v1.12.0: AutoGPTQ integration, extended BetterTransformer support

AutoGPTQ integration

Part of AutoGPTQ library has been integrated in Optimum, with utilities to ease the integration in other Hugging Face libraries. Reference: https://huggingface.co/docs/optimum/llmquantization/usageguides/quantization

Add GPTQ Quantization by @SunMarc in https://github.com/huggingface/optimum/pull/1216
Fix GPTQ doc by @regisss in https://github.com/huggingface/optimum/pull/1267
Add AutoGPTQ benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1292
Fix gptq params by @SunMarc in https://github.com/huggingface/optimum/pull/1284

Extended BetterTransformer support

BetterTransformer now supports BLOOM and GPT-BigCode architectures.

Bt bloom by @baskrahmer in https://github.com/huggingface/optimum/pull/1221
Support gpt_bigcode in bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/1252
Fix BetterTransformer starcoder init by @fxmarty in https://github.com/huggingface/optimum/pull/1254
Fix BT starcoder fp16 by @fxmarty in https://github.com/huggingface/optimum/pull/1255
SDPA dispatches to flash for MQA by @fxmarty in https://github.com/huggingface/optimum/pull/1259
Check output_attentions is False in BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/1306

Other changes and bugfixes

Update bug report template by @fxmarty in https://github.com/huggingface/optimum/pull/1266
Fix ORTModule uses fp32 model issue by @jingyanwangms in https://github.com/huggingface/optimum/pull/1264
Fix build PR doc workflow by @fxmarty in https://github.com/huggingface/optimum/pull/1270
Avoid triggering stop job on label by @fxmarty in https://github.com/huggingface/optimum/pull/1274
Update version following 1.11.1 patch by @fxmarty in https://github.com/huggingface/optimum/pull/1275
Fix fp16 ONNX detection for decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/1276
Update version following 1.11.2 patch by @regisss in https://github.com/huggingface/optimum/pull/1291
Pin tensorflow<=2.12.1 by @fxmarty in https://github.com/huggingface/optimum/pull/1305
ONNX: disable text-generation models for sequence classification & fixes for transformers 4.32 by @fxmarty in https://github.com/huggingface/optimum/pull/1308
Fix staging tests following transformers 4.32 release by @fxmarty in https://github.com/huggingface/optimum/pull/1309
More fixes following transformers 4.32 release by @fxmarty in https://github.com/huggingface/optimum/pull/1311

New Contributors

@SunMarc made their first contribution in https://github.com/huggingface/optimum/pull/1216
@jingyanwangms made their first contribution in https://github.com/huggingface/optimum/pull/1264

Full Changelog: https://github.com/huggingface/optimum/compare/v1.11.2...v1.12.0

- Python
Published by fxmarty almost 3 years ago

optimum - v1.11.2: Patch release

Remove the Transformers version constraint on optimum[habana].

Remove Transformers version constraint on Optimum Habana #1290 by @regisss

Full Changelog: https://github.com/huggingface/optimum/compare/v1.11.1...v1.11.2

- Python
Published by regisss almost 3 years ago

optimum - v1.11.1: Patch release

Minor fix: documentation building for 1.11.

Accelerate as a soft dependency by @fxmarty

Full Changelog: https://github.com/huggingface/optimum/compare/v1.11.0...v1.11.1

- Python
Published by fxmarty almost 3 years ago

optimum - v1.11.0: Extended ONNX, ONNX Runtime, BetterTransformer support

Extended ONNX and ONNX Runtime support

Add ONNX export and ONNX Runtime inference support for gpt bigcode.

Add ONNX / ONNXRuntime support for StarCoder by @JingyaHuang in #1042

Extended BetterTransformer support

BetterTransformer now supports Llama 2 and bark.

Training and autocast are now supported for most architectures, please refer to the documentation for more details: https://huggingface.co/docs/optimum/main/en/bettertransformer/overview

Support Llama 2 in BetterTransformer. by @noamwies in #1235
BetterTransformer support training & autocast for all archs by @fxmarty in #1225
Add bark into bettertransformer by @ylacombe in https://github.com/huggingface/optimum/pull/1199
Drop mask for training in all cases for BetterTransformer & precise documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1250

Major bugfixes

Update ORT training to be compatible with transformers 4.31 by @JingyaHuang in #1227

Other improvements and bugfix

add upgrade strategy by @echarlaix in https://github.com/huggingface/optimum/pull/1228
fix typo README by @echarlaix in https://github.com/huggingface/optimum/pull/1230
Fix OwlViT exporter config by @regisss in https://github.com/huggingface/optimum/pull/1188
Add example SD XL documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1233
fix SD loading when safetensors weights only by @echarlaix in https://github.com/huggingface/optimum/pull/1232
fix optimum-intel min version by @echarlaix in https://github.com/huggingface/optimum/pull/1234
fix typo documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1238
update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1240
Update onnxruntime minimum version to 1.11 by @fxmarty in https://github.com/huggingface/optimum/pull/1244
ORT quantizes by default all ops by @fxmarty in https://github.com/huggingface/optimum/pull/1246

New Contributors

@ylacombe made their first contribution in https://github.com/huggingface/optimum/pull/1199
@noamwies made their first contribution in https://github.com/huggingface/optimum/pull/1235

Full Changelog: https://github.com/huggingface/optimum/compare/v1.10.0...v1.11.0

- Python
Published by JingyaHuang almost 3 years ago

optimum - v1.10.1: Patch release

Fix OwlViT exporter by @regisss in https://github.com/huggingface/optimum/pull/1188
Fix SD loading when safetensors weights only by @echarlaix in https://github.com/huggingface/optimum/pull/1232
Fix optimum-intel version requirements by @echarlaix in https://github.com/huggingface/optimum/pull/1234

Full Changelog: https://github.com/huggingface/optimum/compare/v1.10.0...v1.10.1

- Python
Published by echarlaix almost 3 years ago

optimum - v1.10.0: Stable Diffusion XL pipelines

Stable Diffusion XL

Enable SD XL ONNX export and ONNX Runtime inference by @echarlaix in https://github.com/huggingface/optimum/pull/1168

Enable SD XL ONNX export using the CLI :

optimum-cli export onnx --model stabilityai/stable-diffusion-xl-base-0.9 --task stable-diffusion-xl ./sd_xl_onnx

Add SD XL pipelines for ONNX Runtime inference (supported tasks : text-to-image and image-to-image) :

```python from optimum.onnxruntime import ORTStableDiffusionXLPipeline

modelid = "stabilityai/stable-diffusion-xl-base-0.9" pipeline = ORTStableDiffusionXLPipeline.frompretrained(model_id, export=True)

prompt = "sailing ship in storm by Leonardo da Vinci" image = pipeline(prompt).images[0] pipeline.save_pretrained("onnx-sd-xl-base-0.9") ```

Stable Diffusion pipelines

Enable image-to-image and inpainting pipelines for ONNX Runtime inference by @echarlaix in https://github.com/huggingface/optimum/pull/1121

Major bugfixes

Fix bloom KV cache usage in ORTForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/1152

What's Changed

Add stable diffusion example by @prathikr in https://github.com/huggingface/optimum/pull/1136
Fixed incomplete ONNX export model memory release issue by @sharpbai in https://github.com/huggingface/optimum/pull/1154
Add trust remote code option for config by @changwangss in https://github.com/huggingface/optimum/pull/1151
Fix typos of ONNXRuntimme -> ONNXRuntime by @mgoin in https://github.com/huggingface/optimum/pull/1155
Fix ONNX export for MobileViT for segmentation by @regisss in https://github.com/huggingface/optimum/pull/1128
Revert "update the default block size" by @rui-ren in https://github.com/huggingface/optimum/pull/1162
ONNX export for custom architectures & models with custom modeling code by @fxmarty in https://github.com/huggingface/optimum/pull/1166
Update Optimum Neuron doc by @regisss in https://github.com/huggingface/optimum/pull/1164
Fix stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1173
Add gptbigcode modeltype to NormalizedTextConfig by @changwangss in https://github.com/huggingface/optimum/pull/1170
Allow attention_mask=None for BetterTransformer in the inference batched case for gpt2 & gpt-neo by @fxmarty in https://github.com/huggingface/optimum/pull/1180
Fix encoder attention mask input order for ORT by @fxmarty in https://github.com/huggingface/optimum/pull/1181
Fix ORTModel initialization on specific device id by @fxmarty in https://github.com/huggingface/optimum/pull/1182
Add stable diffusion img2img and inpain documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1149
Fix SD XL ONNX export for img2img task by @echarlaix in https://github.com/huggingface/optimum/pull/1194
Remove graphcore from documentation quickstart by @echarlaix in https://github.com/huggingface/optimum/pull/1201
Unpin tensorflow by @fxmarty in https://github.com/huggingface/optimum/pull/1211
Fix ORT test for unknown architecture for task by @fxmarty in https://github.com/huggingface/optimum/pull/1212
add ort + stable diffusion documentation by @prathikr in https://github.com/huggingface/optimum/pull/1205
Fix vision encoder decoder that may not cache cross-attention by @fxmarty in https://github.com/huggingface/optimum/pull/1210
Add documentation for Optimum Furiosa by @regisss in https://github.com/huggingface/optimum/pull/1165
Add BLIP-2 to BetterTransformer documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1218
Set default value to unet config sample size by @echarlaix in https://github.com/huggingface/optimum/pull/1223
Fix broken link in doc by @regisss in https://github.com/huggingface/optimum/pull/1222
Fix BT test by @fxmarty in https://github.com/huggingface/optimum/pull/1224
Add SD XL documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1198
Update setup.py to add optimum-furiosa extras by @mht-sharma in https://github.com/huggingface/optimum/pull/1226

New Contributors

@sharpbai made their first contribution in https://github.com/huggingface/optimum/pull/1154
@mgoin made their first contribution in https://github.com/huggingface/optimum/pull/1155

Full Changelog: https://github.com/huggingface/optimum/compare/v1.9.0...v1.10.0

- Python
Published by echarlaix almost 3 years ago

optimum - v1.9.1: Patch release

Fix stable diffusion ONNX export for diffusers>=v0.18.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1173

Full Changelog: https://github.com/huggingface/optimum/compare/v1.9.0...v1.9.1

- Python
Published by echarlaix almost 3 years ago

optimum - v1.9: extended ONNX, ONNX Runtime support

Improved memory management in the ONNX export

Lower memory usage during the ONNX export. This is especially useful to export large models, or on cuda device. Until PyTorch 2.1 release, we recommend to use PyTorch nightly in case memory issues are encountered, as two major bugs were fixed on PyTorch side: https://github.com/pytorch/pytorch/pull/101134 https://github.com/pytorch/pytorch/pull/101148

Run validation of exported model in no_grad mode by @fxmarty in https://github.com/huggingface/optimum/pull/1111
Load model directly on cuda device for the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1112
Lower GPU memory requirements at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1115

Extended ONNX export

The ONNX export now supports the sam, lilt, pix2struct, cvt and owlvit architectures.

Sam ONNX export support by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1025
Add onnx exporter for Lilt model by @mariababich in https://github.com/huggingface/optimum/pull/1098
Add pix2struct to ONNX support (v2) by @arvisioncode in https://github.com/huggingface/optimum/pull/1034
Add CvTONNX Config by @rishabbala in https://github.com/huggingface/optimum/pull/1131
Support document-question-answering ONNX export for vision-encoder-decoder by @fxmarty in https://github.com/huggingface/optimum/pull/1110
add owlvit by @darwinharianto in https://github.com/huggingface/optimum/pull/1067

Support of custom ONNX configurations for export

The method main_export now supports two arguments model_kwargs and custom_onnx_configs that allow for a more custom export for advanced users. Reference.

[ONNX export] Ability to pass arbitrary kwargs, custom ONNX configs by @fxmarty in https://github.com/huggingface/optimum/pull/1143

Extended BetterTransformer support

Add blip-2 to bettertransformer by @baskrahmer in https://github.com/huggingface/optimum/pull/1125
Support llama bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/998

ONNX Runtime: use IO Binding by default for decoder models on CPUExecutionProvider

IO Binding is useful not only to avoid RAM/device memory copies, but also simply between numpy tensors and OrtValue. Thus, for autoregressive tasks we enable IO Binding as a default on CPUExecutionProvider as well, which may bring >10% speedup for large context lengths.

Enable useiobinding = True on CPU by @yihonglyu in https://github.com/huggingface/optimum/pull/1087

ORTModelForSpeechSeq2Seq supported in ORTOptimizer

added ORTModelForSpeechSeq2Seq support to optimizer by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1068

Major bugfixes

Use mask for seq2seq ONNX decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/1076

What's Changed

Fix protobuf max allowed size by @fxmarty in https://github.com/huggingface/optimum/pull/988
Add Whisper to ORT optimizer configuration by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/986
Fix sentence-similarity task in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/996
Simplify auto task detection by @fxmarty in https://github.com/huggingface/optimum/pull/997
Fix merged decoder usage with fp16 by @fxmarty in https://github.com/huggingface/optimum/pull/1006
Fix past key value generator used for ONNX export validation for t5/mt5 by @fxmarty in https://github.com/huggingface/optimum/pull/1007
Fix typo for custom shapes passed at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1008
Fix _versions.yml upload in doc build by @regisss in https://github.com/huggingface/optimum/pull/1003
ORTQuantizer supports subgraphs by @fxmarty in https://github.com/huggingface/optimum/pull/1009
fix for huggingface_hub last release by @echarlaix in https://github.com/huggingface/optimum/pull/1014
Add links to documentation to README by @echarlaix in https://github.com/huggingface/optimum/pull/1013
Upate documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1011
update optimum intel description by @echarlaix in https://github.com/huggingface/optimum/pull/1015
fix: ValueError offload_dir by @orangetin in https://github.com/huggingface/optimum/pull/993
Sentence transformers ONNX export fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1029
Add OpenVINO notebooks by @echarlaix in https://github.com/huggingface/optimum/pull/1030
Fix task inference for sam by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1031
fix typo by @echarlaix in https://github.com/huggingface/optimum/pull/1033
added types to new fields in OptimizationConfig by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1036
Fix some typos in the quantization guide by @dcferreira in https://github.com/huggingface/optimum/pull/1041
Optional attention_mask in ORTModelForxxx by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1045
ONNX SAM export - change input_points data type by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1048
masked-im output name fix for transformers >= 4.29.0 by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1049
remove torchvision requirement by @BramVanroy in https://github.com/huggingface/optimum/pull/1052
Update version by @regisss in https://github.com/huggingface/optimum/pull/1058
Bump package version by @regisss in https://github.com/huggingface/optimum/pull/1062
Raise MinimumVersionError when OnnxConfig.MINTORCHVERSION is not satisfied by @regisss in https://github.com/huggingface/optimum/pull/1070
Remove deprecated argument from tests and examples by @echarlaix in https://github.com/huggingface/optimum/pull/1072
Detect model type for all transformers models in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/1075
Fix HF Push to hub by @JingyaHuang in https://github.com/huggingface/optimum/pull/1080
Fix float16 ORT conversion for models > 2GB by @fxmarty in https://github.com/huggingface/optimum/pull/1079
Update doc workflows by @regisss in https://github.com/huggingface/optimum/pull/1093
Error out on ORTQuantizer.quantize call for static quantization when no calibration range is provided by @fxmarty in https://github.com/huggingface/optimum/pull/1094
Add mpt model_type to NormalizedTextConfig by @changwangss in https://github.com/huggingface/optimum/pull/1101
Fix doc build by @regisss in https://github.com/huggingface/optimum/pull/1107
Improve the offline support for the ONNX/TFLite export by @fxmarty in https://github.com/huggingface/optimum/pull/1109
Add ViT to ORTConfigManager by @baskrahmer in https://github.com/huggingface/optimum/pull/1117
Fix TasksManager getmodelfrom_task with None device by @fxmarty in https://github.com/huggingface/optimum/pull/1122
Small typos by @baskrahmer in https://github.com/huggingface/optimum/pull/1124
Refactor BetterTransformerManager requirement validation methods by @baskrahmer in https://github.com/huggingface/optimum/pull/1132
update the default block size by @rui-ren in https://github.com/huggingface/optimum/pull/1137
Update ORT training docker to 1.15 by @JingyaHuang in https://github.com/huggingface/optimum/pull/1139
Adamlouly/fix unwrap model eval by @AdamLouly in https://github.com/huggingface/optimum/pull/1099
Remove version pinning for onnx package by @cody-moveworks in https://github.com/huggingface/optimum/pull/1141

New Contributors

@orangetin made their first contribution in https://github.com/huggingface/optimum/pull/993
@dcferreira made their first contribution in https://github.com/huggingface/optimum/pull/1041
@BramVanroy made their first contribution in https://github.com/huggingface/optimum/pull/1052
@darwinharianto made their first contribution in https://github.com/huggingface/optimum/pull/1067
@mariababich made their first contribution in https://github.com/huggingface/optimum/pull/1098
@changwangss made their first contribution in https://github.com/huggingface/optimum/pull/1101
@arvisioncode made their first contribution in https://github.com/huggingface/optimum/pull/1034
@yihonglyu made their first contribution in https://github.com/huggingface/optimum/pull/1087
@rui-ren made their first contribution in https://github.com/huggingface/optimum/pull/1137
@cody-moveworks made their first contribution in https://github.com/huggingface/optimum/pull/1141
@rishabbala made their first contribution in https://github.com/huggingface/optimum/pull/1131

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.0...v1.9.0

- Python
Published by fxmarty almost 3 years ago

optimum - v1.8.8: Patch release

Fix optimum model inference compatibility with transformers>=v4.30.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1102
Fix stable diffusion ONNX export following diffusers breaking change by @fxmarty in https://github.com/huggingface/optimum/pull/1116

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.7...v1.8.8

- Python
Published by echarlaix about 3 years ago

optimum - v1.8.7: Patch release

Restrict transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1097

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.6...v1.8.7

- Python
Published by echarlaix about 3 years ago

optimum - v1.8.6: Patch release

Fix CLI for exporting models to TFLite by @regisss #1059

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.5...v1.8.6

- Python
Published by regisss about 3 years ago

optimum - v1.8.5: Patch release

Add transformers<4.29.0 in Habana extra by @regisss in #1047

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.4...v1.8.5

- Python
Published by regisss about 3 years ago

optimum - v1.8.4: Patch release

Set onnx requirement by @echarlaix @regisss in https://github.com/huggingface/optimum/pull/1037

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.3...v1.8.4

- Python
Published by echarlaix about 3 years ago

optimum - v1.8.3: Patch release

Fix Stable Diffusion model ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1020
Add optimum-neuron extra by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1021

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.2...v1.8.3

- Python
Published by echarlaix about 3 years ago

optimum - v1.8: extended BetterTransformer support, ONNX merged seq2seq models

Extended BetterTransformer support

Various improvements in the PyTorch BetterTransformer integration.

[BT] add BetterTransformer support for ProphetNet by @hirotasoshu in https://github.com/huggingface/optimum/pull/923
Improve bettertransformer benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/939
Fix sdpa with batch size = 1, better benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/915
Fix slow tests & sdpa dropout by @fxmarty in https://github.com/huggingface/optimum/pull/974
Remove getattr overhead in spda by @fxmarty in https://github.com/huggingface/optimum/pull/934
[BT] Improve docs by @younesbelkada in https://github.com/huggingface/optimum/pull/944

ONNX merged seq2seq models

Instead of using two separate decoder_model.onnx and decoder_with_past_model.onnx models, a single decoder can be used for encoder-decoder models: decoder_model_merged.onnx. This allows to avoid duplicated weights in the two without/with past ONNX models.

By default, if available, the decoder_model_merged.onnx will be used in the ORTModel integration. This can be disabled with the option --no-post-process in the ONNX export CLI, and with use_merged=False in the ORTModel.from_pretrained method.

Example:

optimum-cli export onnx --model t5-small t5_onnx

will give:

└── t5_onnx ├── config.json ├── decoder_model_merged.onnx ├── decoder_model.onnx ├── decoder_with_past_model.onnx ├── encoder_model.onnx ├── generation_config.json ├── special_tokens_map.json ├── spiece.model ├── tokenizer_config.json └── tokenizer.json

And decoder_model_merged.onnx is enough to be used for inference. We strongly recommend to inspect the subgraphs with netron to understand what are the inputs/outputs, in case the exported model is to be used with an other engine than ONNX Runtime in the Optimum integration.

Fix encoder-decoder ONNX merge by @fxmarty in https://github.com/huggingface/optimum/pull/924
Support the merge of decoder without/with past for encoder-decoder models in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/926
Support merged seq2seq models in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/930

New models in the ONNX export

Add llama onnx export & onnxruntime support by @nenkoru in https://github.com/huggingface/optimum/pull/975

Major bugfix

Remove constant output in encoder-decoder ONNX models decoder with past by @fxmarty in https://github.com/huggingface/optimum/pull/920
Hash tensor data during deduplication by @VikParuchuri in https://github.com/huggingface/optimum/pull/932

Potentially breaking changes

The TasksManager replaces legacy tasks names by the canonical ones used on the Hub and in transformers metadata: - sequence-classification becomes text-classification, - causal-lm becomes text-generation, - seq2seq-lm becomes text2text-generation, - speech2seq-lm and audio-ctc becomes automatic-speech-recognition, - default becomes feature-extraction, - masked-lm becomes fill-mask, - vision2seq-lm becomes image-to-text

This should not break anything except if you rely on private methods and attributes from TasksManager.

Allow to use a custom class in TasksManager & use canonical tasks names by @fxmarty in https://github.com/huggingface/optimum/pull/967

What's Changed

Update ort trainer to transformers 4.27.2 by @JingyaHuang in https://github.com/huggingface/optimum/pull/917
Compute Loss inside the training step. by @AdamLouly in https://github.com/huggingface/optimum/pull/686
Fix ORTModel MRO for whisper by @fxmarty in https://github.com/huggingface/optimum/pull/919
add ORTStableDiffusionPipeline reference in documentation by @echarlaix in https://github.com/huggingface/optimum/pull/890
Fix decoder ONNX model loading from the Hub by @fxmarty in https://github.com/huggingface/optimum/pull/929
optimun-cli onnxruntime quantize / optimize output argument is now required by @michaelbenayoun in https://github.com/huggingface/optimum/pull/927
Register mechanism for the Optimum CLI by @michaelbenayoun in https://github.com/huggingface/optimum/pull/928
Ensure backward compatibility of ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/933
Update the README by @michaelbenayoun in https://github.com/huggingface/optimum/pull/925
Update README by @echarlaix in https://github.com/huggingface/optimum/pull/941
Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/942
Remove GC from README by @michaelbenayoun in https://github.com/huggingface/optimum/pull/943
Add user and token for CI by @michaelbenayoun in https://github.com/huggingface/optimum/pull/945
Update README by @echarlaix in https://github.com/huggingface/optimum/pull/946
optimum-cli print the help of subcommands by @michaelbenayoun in https://github.com/huggingface/optimum/pull/940
Remove from_transformers references from the documentation by @fxmarty in https://github.com/huggingface/optimum/pull/935
Turn command import into optional by @JingyaHuang in https://github.com/huggingface/optimum/pull/936
Auto-set usemerged to False if usecache is passed as False by @fxmarty in https://github.com/huggingface/optimum/pull/954
Raise error with usecache=False, useio_binding=True by @fxmarty in https://github.com/huggingface/optimum/pull/955
Add an ORT training notebook by @JingyaHuang in https://github.com/huggingface/optimum/pull/959
Fix issue with doc build sometimes failing silently in GH workflows by @regisss in https://github.com/huggingface/optimum/pull/960
Fix typos by @regisss in https://github.com/huggingface/optimum/pull/963
Disable tests upon transformers 4.28 release by @fxmarty in https://github.com/huggingface/optimum/pull/976

New Contributors

@hirotasoshu made their first contribution in https://github.com/huggingface/optimum/pull/923
@VikParuchuri made their first contribution in https://github.com/huggingface/optimum/pull/932

Full Changelog: https://github.com/huggingface/optimum/compare/v1.7.3...v1.8.2

- Python
Published by fxmarty about 3 years ago

optimum - v1.7.3: Patch release for PyTorch 2.0 and transformers 4.27.0

This patch releases fixes a few bugs with PyTorch 2.0 release, and include a few new features as well.

Breaking change: constant outputs removed from ONNX encoder-decoder models

We removed some constant past key values outputs from encoder-decoder models in the ONNX export. Beware that this could potentially break your existing code, but we recommend to use the new exported models as this removes unnecessary Identity nodes in the models.

Remove constant outputs from decoder with past ONNX model for encoder-decoder architectures by @fxmarty in https://github.com/huggingface/optimum/pull/872

`torch.nn.functional.scaled_dot_product_attention` support for decoders in BetterTransformer

Pytorch 2.0 introduces in beta torch.nn.functional.scaled_dot_product_attention, a fastpath for attention extending their accelerated transformer features. This is included in optimum.bettertransformer to be used with the following architectures: Bart, Blenderbot, GPT2, GTP-J, M2M100, Marian, Mbart, OPT, Pegasus, T5.

Beware that this is still experimental and speedups have yet to be validated on all architectures.

PyTorch's scaled_dot_product_attention allows to use flash attention and memory efficient attention natively in PyTorch.

Usage is as follow:

```python from transformers import AutoTokenizer, AutoModelForCausalLM from optimum.bettertransformer import BetterTransformer

tokenizer = AutoTokenizer.frompretrained("gpt2") model = AutoModelForCausalLM.frompretrained("gpt2")

model = BetterTransformer.transform(model) # modify transformers modeling to use native scaleddotproduct_attention

do you inference or training here

model = BetterTransformer.reverse(model) # go back to using canonical transformers modeling model.savepretrained("gpt2model") ```

Inference benchmark (on fp16):

| Model | batch size | Input sequence length | Generated tokens | Latency eager (s) | Latency BT (s) | Speedup | Peak memory eager (MB) | Peak memory BT (MB) | Memory savings | |--------------|------------|-----------------------|------------------|-------------------|-------------------------------|---------|------------------------|------------------------------------|----------------| | gpt2 | 1 | 64 | 256 | 1.800 | 1.607 | 12.0% | 569.90 | 569.89 | 0% | | gpt2 | 64 | 64 | 256 | 2.159 | 1.617 | 33.5% | 2067.45 | 2093.80 | 0% | | opt-1.3b | 1 | 64 | 256 | 3.010 | 2.667 | 12.9% | 5408.238 | 5408.238 | 0% | | gpt-neox-20b | 1 | 64 | 256 | 10.869 | 9.937 | 9.4% | 83670.67 | 83673.53 | 0% |

Training benchmark (on fp16):

| Model | batch size | Sequence length | time/epoch (eager, s) | time/epoch (BT, s) | Speedup | Peak memory eager (MB) | Peak memory BT (MB) | Memory savings | |-------|------------|-----------------|------------------------------|------------------------------------------|---------|------------------------|------------------------------------|----------------| | gpt2 | 8 | 1024 | 17.732 | 14.037 | 26.3% | 13291.16 | 10191.52 | 30.4% | | gpt2 | 32 | 1024 | 17.336 | 13.309 | 30.3% | 52834.83 | 38858.56 | 36.0% | | gpt2 | 64 | 1024 | OOM | 14.067 | / | OOM | 75600.08 | / |

Benchmarks can be reproduced using the inference script and training script:

python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256 python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256 --seqlen-stdev 0

Add scaleddotproduct_attention support for decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/853
Support scaleddotproduct_attention for t5 by @fxmarty in https://github.com/huggingface/optimum/pull/856
[BT] add decoder benchmark script by @younesbelkada in https://github.com/huggingface/optimum/pull/857
[BT] Fix bt benchmark by @younesbelkada in https://github.com/huggingface/optimum/pull/858
Fix pytorch version check in bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/862
[BT] Add fp16 support by @younesbelkada in https://github.com/huggingface/optimum/pull/859
[BT] Add decoder training support by @younesbelkada in https://github.com/huggingface/optimum/pull/860
Bart support scaleddotproduct_attention by @fxmarty in https://github.com/huggingface/optimum/pull/863
[BT] add accelerate_test markers by @younesbelkada in https://github.com/huggingface/optimum/pull/864
Mbart, pegasus, blenderbot, marian, m2m100 support scaleddotproductattention by @fxmarty in https://github.com/huggingface/optimum/pull/865
Add bettertransformer reverse transform by @fxmarty in https://github.com/huggingface/optimum/pull/868
Add bettertransformer training benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/873

New architectures in the ONNX export

Three additional architectures are supported in the ONNX export: ImageGPT, RegNet, OPT.

Adding ONNX support for ImageGPT by @adit299 in https://github.com/huggingface/optimum/pull/819
Add ONNX support for RegNet by @asrimanth in https://github.com/huggingface/optimum/pull/833
Adding support for Facebook's OPT models by @hivaze in https://github.com/huggingface/optimum/pull/852

(WIP) TFLite export with quantization support

Continued progress in the TFLite export with quantization support. This is work in progress and not documented yet.

Quantization with TFLite by @michaelbenayoun in https://github.com/huggingface/optimum/pull/854

Bugfixes and improvements

Update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/843
Fix typo in documentation by @regisss in https://github.com/huggingface/optimum/pull/848
Remove redundant code by @mht-sharma in https://github.com/huggingface/optimum/pull/841
Update README by @echarlaix in https://github.com/huggingface/optimum/pull/850
Update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/855
Remove iobinding ORTModelForCTC by @mht-sharma in https://github.com/huggingface/optimum/pull/840
Fix typo in documentation by @echarlaix in https://github.com/huggingface/optimum/pull/861
Fix causal-lm ONNX axis names by @fxmarty in https://github.com/huggingface/optimum/pull/871
add NNCF openvino notebook by @echarlaix in https://github.com/huggingface/optimum/pull/875
Remove positional-only parameters not support by python < v3.8 by @echarlaix in https://github.com/huggingface/optimum/pull/881
lazy import for task manager by @JingyaHuang in https://github.com/huggingface/optimum/pull/844
Remove onnx and ort dependencies on the TasksManager by @michaelbenayoun in https://github.com/huggingface/optimum/pull/846
Reactivate export & optimization tests for causal-lm models by @fxmarty in https://github.com/huggingface/optimum/pull/885
Fix ONNX export on transformers 4.27 release by @fxmarty in https://github.com/huggingface/optimum/pull/884
Do not use scaleddotproduct_attention for stable diffusion onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/888
Fix loading of an ONNX stable diffusion model when config doesn't match by @echarlaix in https://github.com/huggingface/optimum/pull/887
Automatic framework detection in TasksManager for large models by @fxmarty in https://github.com/huggingface/optimum/pull/883
Fix WavLM onnx export upon torch 2.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/889
Fix PushToHubMixin.createrepo according to transformers 4.27 release by @fxmarty in https://github.com/huggingface/optimum/pull/892
Fix stable diffusion framework detection by @fxmarty in https://github.com/huggingface/optimum/pull/893
Add donut CPU inference ORT by @mht-sharma in https://github.com/huggingface/optimum/pull/761
Fix check_model for large merged ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/896
Drop python 3.7 support by @fxmarty in https://github.com/huggingface/optimum/pull/891
Fix dummy label generator for vision tasks by @JingyaHuang in https://github.com/huggingface/optimum/pull/900
Add stable diffusion dummy object by @echarlaix in https://github.com/huggingface/optimum/pull/899
Automatic support for large ONNX models in ORTOptimizer by @fxmarty in https://github.com/huggingface/optimum/pull/886
Remove subprocess calls in ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/897
Registering mechanism for the TasksManager by @michaelbenayoun in https://github.com/huggingface/optimum/pull/898
add option to run inference with ort by @prathikr in https://github.com/huggingface/optimum/pull/838
Check min diffusers version by @echarlaix in https://github.com/huggingface/optimum/pull/902
Update bug-report.yml by @lewtun in https://github.com/huggingface/optimum/pull/895
Fix axis name for seq2seq ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/904
Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/909
Fix misleading error message in ORTOptimizer by @fxmarty in https://github.com/huggingface/optimum/pull/910
Delete all Docker images before building the doc of Optimum by @regisss in https://github.com/huggingface/optimum/pull/911
Fix onnx export preprocessors save by @fxmarty in https://github.com/huggingface/optimum/pull/913
Fix GPU CI by @fxmarty in https://github.com/huggingface/optimum/pull/914

New Contributors

@adit299 made their first contribution in https://github.com/huggingface/optimum/pull/819
@asrimanth made their first contribution in https://github.com/huggingface/optimum/pull/833
@hivaze made their first contribution in https://github.com/huggingface/optimum/pull/852

Full Changelog: https://github.com/huggingface/optimum/compare/v1.2.0...v1.7.2

- Python
Published by fxmarty about 3 years ago

optimum - v1.7.1: Patch release

Temporarily fix a critical bug in BetterTransformer https://github.com/huggingface/optimum/pull/849

Full Changelog: https://github.com/huggingface/optimum/compare/v1.7.0...v1.7.1

- Python
Published by fxmarty over 3 years ago

optimum - v1.7.0: ONNX export extension, TFLite export, single-ONNX decoding, ONNX Runtime extension for audio, vision tasks, stable diffusion

New models supported in the ONNX export

Additional architectures are supported in the ONNX export: PoolFormer, Pegasus, Audio Spectrogram Transformer, Hubert, SEW, Speech2Text, UniSpeech, UniSpeech-SAT, Wav2Vec2, Wav2Vec2-Conformer, WavLM, Data2Vec Audio, MPNet, stable diffusion VAE encoder, vision encoder decoder, Nystromformer, Splinter, GPT NeoX.

Add PoolFormer support in exporters.onnx by @BakingBrains in https://github.com/huggingface/optimum/pull/646
Support pegasus exporters by @mht-sharma in https://github.com/huggingface/optimum/pull/620
Audio models support with optimum.exporters.onnx by @michaelbenayoun in https://github.com/huggingface/optimum/pull/622
Add MPNet ONNX export by @jplu in https://github.com/huggingface/optimum/pull/691
Add stable diffusion VAE encoder export by @echarlaix in https://github.com/huggingface/optimum/pull/705
Add vision encoder decoder model in exporters by @mht-sharma in https://github.com/huggingface/optimum/pull/588
Nystromformer ONNX export by @whr778 in https://github.com/huggingface/optimum/pull/728
Support Splinter exporters (#555) by @Allanbeddouk in https://github.com/huggingface/optimum/pull/736
Add gpt-neo-x support by @sidthekidder in https://github.com/huggingface/optimum/pull/745

New models supported in BetterTransformer

A few additional architectures are supported in BetterTransformer: RoCBERT, RoFormer, Marian

Add RoCBert support for Bettertransformer by @shogohida in https://github.com/huggingface/optimum/pull/542
Add better transformer support for RoFormer by @manish-p-gupta in https://github.com/huggingface/optimum/pull/680
added BetterTransformer support for Marian by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/808

Additional tasks supported in the ONNX Runtime integration

With ORTModelForMaskedLM, ORTModelForVision2Seq, ORTModelForAudioClassification, ORTModelForCTC, ORTModelForAudioXVector, ORTModelForAudioFrameClassification, ORTStableDiffusionPipeline.

Reference: https://huggingface.co/docs/optimum/main/en/onnxruntime/packagereference/modelingort and https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models#export-and-inference-of-stable-diffusion-models

Add ORTModelForMaskedLM class by @JingyaHuang in https://github.com/huggingface/optimum/pull/729
Add ORTModelForVision2Seq for VisionEncoderDecoder models inference by @mht-sharma in https://github.com/huggingface/optimum/pull/742
Add ORTModelXXX for audio by @mht-sharma in https://github.com/huggingface/optimum/pull/774
Add stable diffusion onnx runtime pipeline by @echarlaix in https://github.com/huggingface/optimum/pull/786

Support of the ONNX export from PyTorch on float16

In the ONNX export, it is possible to pass the options --fp16 --device cuda to export using float16 when a GPU is available, directly with the native torch.onnx.export.

Example: optimum-cli export onnx --model gpt2 --fp16 --device cuda gpt2_onnx/

Support ONNX export on torch.float16 type by @fxmarty in https://github.com/huggingface/optimum/pull/749

TFLite export

TFLite export is now supported, with static shapes:

optimum-cli export tflite --help optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/

exporters.tflite initial support by @michaelbenayoun in https://github.com/huggingface/optimum/pull/716
TFLite auto-encoder models by @michaelbenayoun in https://github.com/huggingface/optimum/pull/757
[TFLite Export] Adds support for ResNet by @sayakpaul in https://github.com/huggingface/optimum/pull/813

ONNX Runtime optimization and quantization directly in the CLI

Add optimize and quantize command CLI by @jplu in https://github.com/huggingface/optimum/pull/700
Support ONNX Runtime optimizations in exporters.onnx by @fxmarty in https://github.com/huggingface/optimum/pull/807

The ONNX export optionally supports the ONNX Runtime optimizations directly in the export, passing the --optimize O1, up to --optimize O4 option:

optimum-cli export onnx --help optimum-cli export onnx --model t5-small --optimize O3 t5small_onnx/

ONNX Runtime quantization is supported directly in command line, using optimum-cli onnxruntime quantize:

optimum-cli onnxruntime quantize --help optimum-cli onnxruntime quantize --onnx_model distilbert_onnx --avx512

ONNX Runtime optimization is supported directly in command line, using optimum-cli onnxruntime optimize:

optimum-cli onnxruntime optimize --help optimum-cli onnxruntime optimize --onnx_model distilbert_onnx -O3

ORTModelForCausalLM supports decoding with a single ONNX

Up no now, for decoders, two ONNX were used: * One handling the first forward pass where no past key values have been cached yet - thus not taking them as input. * One handling the following forward pass where past key values have been cached, thus taking them as input.

This release introduces the support in the ONNX export and in ORTModelForCausalLM of a single ONNX handling both steps of the decoding. This allows to reduce memory usage, as weights are not duplicated between two separate models during inference.

Using a single ONNX for decoders can be used by passing use_merged=True to ORTModelForCausalLM.from_pretrained, loading directly from a PyTorch model:

```python from optimum.onnxruntime import ORTModelForCausalLM

model = ORTModelForCausalLM.frompretrained("gpt2", export=True, usemerged=True) ```

Alternatively, using a single ONNX for decoders is the default behavior in the ONNX export, that can later be used for example with ORTModelForCausalLM, the command optimum-cli export onnx --model gpt2 gpt2_onnx/ will produce:

└── gpt2_onnx ├── config.json ├── decoder_model_merged.onnx ├── decoder_model.onnx ├── decoder_with_past_model.onnx ├── merges.txt ├── special_tokens_map.json ├── tokenizer_config.json ├── tokenizer.json └── vocab.json

The decoder_model.onnx and decoder_with_past_model.onnx are kept separate for backward compatibility, but during inference using solely decoder_model_merged.onnx is enough.

Enable inference with a merged decoder in ORTModelForCausalLM by @JingyaHuang in https://github.com/huggingface/optimum/pull/647

Single-file ORTModel accept numpy arrays

ORTModel accept numpy arrays as inputs, in addition to PyTorch tensors. This is only the case for models that use a single ONNX.

Accept numpy.ndarray as input and output to ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/790

ORTOptimizer support for ORTModelForCausalLM

ORTOptimizer support ORTModelForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/794
Support IO Binding for merged decoder by @fxmarty in https://github.com/huggingface/optimum/pull/797

Breaking changes

In the ONNX export, exporting models in several ONNX (encoder, decoder) is now the default behavior: https://github.com/huggingface/optimum/pull/747. The old behavior is still accessible with --monolith.
In decoders, reusing past key values is now the default in the ONNX export: https://github.com/huggingface/optimum/pull/748. The old behavior is still accessible by explicitly passing, for example, --task causal-lm instead of --task causal-lm-with-past.
BigBird support in the ONNX export is removed, due to the block_sparse attention type being written in pure numpy in Transformers, and hence not exportable to ONNX: https://github.com/huggingface/optimum/pull/778
The parameter from_transformers of ORTModel.from_pretrained will be deprecated in favor of export.

Bugfixes and improvements

Fix disable shape inference for optimization by @regisss in https://github.com/huggingface/optimum/pull/652
Fix uninformative message when passing use_cache=True to ORTModel and no ONNX with cache is available by @fxmarty in https://github.com/huggingface/optimum/pull/650
Fix provider options when several providers are passed by @fxmarty in https://github.com/huggingface/optimum/pull/653
Add TensorRT engine to ONNX Runtime GPU documentation by @fxmarty in https://github.com/huggingface/optimum/pull/657
Improve documentation around ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/666
minor updates on ONNX config guide by @mszsorondo in https://github.com/huggingface/optimum/pull/662
Fix FlaubertOnnxConfig by @michaelbenayoun in https://github.com/huggingface/optimum/pull/669
Use nvcr.io/nvidia/tensorrt image for GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/660
Better Transformer doc fix by @HamidShojanazeri in https://github.com/huggingface/optimum/pull/670
Add support for LongT5 optimization using ORT transformer optimizer script by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/683
Add test for missing execution providers error messages by @fxmarty in https://github.com/huggingface/optimum/pull/659
ONNX transformation to cast int64 constants to int32 when possible by @fxmarty in https://github.com/huggingface/optimum/pull/655
Add missing normalized configs by @fxmarty in https://github.com/huggingface/optimum/pull/694
Remove code duplication in ORTModel's load_model by @fxmarty in https://github.com/huggingface/optimum/pull/695
Test more architectures in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/675
Avoid initializing unwanted attributes for ORTModel's having several inference sessions by @fxmarty in https://github.com/huggingface/optimum/pull/696
Fix the ORTQuantizer loading from specific file by @echarlaix in https://github.com/huggingface/optimum/pull/701
Add saving of diffusion model additional components for onnx export by @echarlaix in https://github.com/huggingface/optimum/pull/699
Fix whisper export by @mht-sharma in https://github.com/huggingface/optimum/pull/629
Support trust remote code option in ONNX export and ONNX Runtime integration by @fxmarty in https://github.com/huggingface/optimum/pull/702
Add nightly tests on dependencies dev versions by @fxmarty in https://github.com/huggingface/optimum/pull/703
Fix exception condition by @mht-sharma in https://github.com/huggingface/optimum/pull/706
Add ORTModelForMultipleChoice to the documentation by @fxmarty in https://github.com/huggingface/optimum/pull/712
Fix yaml format for dev tests by @fxmarty in https://github.com/huggingface/optimum/pull/710
Add ONNX Runtime training benchmark by @JingyaHuang in https://github.com/huggingface/optimum/pull/592
Allow from optimum.onnxruntime import QuantizationConfig by @fxmarty in https://github.com/huggingface/optimum/pull/715
Fix documentation for doctest tests to pass by @fxmarty in https://github.com/huggingface/optimum/pull/713
Use transformers>=4.26.0 in setup.py by @fxmarty in https://github.com/huggingface/optimum/pull/723
Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/724
Fix ONNX Runtime inference in ORTTrainer by @JingyaHuang in https://github.com/huggingface/optimum/pull/709
onnxruntime/modeling_ort.py refactor, part 1 by @michaelbenayoun in https://github.com/huggingface/optimum/pull/698
Update docker and doc of ORT Trainer by @JingyaHuang in https://github.com/huggingface/optimum/pull/725
Add test for code examples in the documentation and docstrings by @fxmarty in https://github.com/huggingface/optimum/pull/704
add image classification example to optimum by @prathikr in https://github.com/huggingface/optimum/pull/711
Add TensorrtExecutionProvider modeling tests by @fxmarty in https://github.com/huggingface/optimum/pull/722
Whisper shape inference fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/726
Add some redirections to Optimum Habana's documentation by @regisss in https://github.com/huggingface/optimum/pull/735
Patch ORTTrainer inference with ONNX Runtime backend by @JingyaHuang in https://github.com/huggingface/optimum/pull/737
Remove dead code in whisper ONNX output by @fxmarty in https://github.com/huggingface/optimum/pull/741
Unpin protobuf 3.20.1 by @fxmarty in https://github.com/huggingface/optimum/pull/738
Fix speech2text export by @mht-sharma in https://github.com/huggingface/optimum/pull/746
Raise error on double call to BetterTransformer.transform() by @fxmarty in https://github.com/huggingface/optimum/pull/750
exporters.onnx output names and dynamic axes fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/731
Fix NNCF supported quantization strategies README table by @echarlaix in https://github.com/huggingface/optimum/pull/752
Add GPU tests for BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/751
Fix doctest by @fxmarty in https://github.com/huggingface/optimum/pull/759
Fix ONNX Runtime cache usage for decoders, add relevant tests by @fxmarty in https://github.com/huggingface/optimum/pull/756
Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/758
Update quality tooling for formatting by @regisss in https://github.com/huggingface/optimum/pull/760
Fix wrong shapes used at ONNX export and validation by @fxmarty in https://github.com/huggingface/optimum/pull/764
Change type annotation by @michaelbenayoun in https://github.com/huggingface/optimum/pull/768
Fix stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/762
Disable ONNX Runtime provider check on Windows by @fxmarty in https://github.com/huggingface/optimum/pull/771
Fix FusionOptions following ORT 1.14 release by @fxmarty in https://github.com/huggingface/optimum/pull/772
Unpin numpy <1.24.0 by @fxmarty in https://github.com/huggingface/optimum/pull/773
Fix flaky ONNX Runtime generation test with past key value reuse by @fxmarty in https://github.com/huggingface/optimum/pull/765
Fix output shape dimension for OnnxConfigWithPast by @fxmarty in https://github.com/huggingface/optimum/pull/780
Fix used shapes, device at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/777
Pin numpy only for tensorflow export by @fxmarty in https://github.com/huggingface/optimum/pull/781
Fixed broken paper space links by @Muhtasham in https://github.com/huggingface/optimum/pull/766
Temporarily disable python 3.9 + macOS test due to onnxruntime 1.14 regression by @fxmarty in https://github.com/huggingface/optimum/pull/783
Update ORT Training to 1.14.0 by @JingyaHuang in https://github.com/huggingface/optimum/pull/787
Temporarily disable segformer TensorRT test by @fxmarty in https://github.com/huggingface/optimum/pull/799
Use a stateful orderedinputnames in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/796
Test ORTOptimizer with IO Binding by @fxmarty in https://github.com/huggingface/optimum/pull/801
[BT] Add stable layer-norm Wav2vec2 by @younesbelkada in https://github.com/huggingface/optimum/pull/803
Update rules for ruff by @regisss in https://github.com/huggingface/optimum/pull/806
Improve orttrainer test by @JingyaHuang in https://github.com/huggingface/optimum/pull/779
Fix ORT quantization for TensorRT documentation by @fxmarty in https://github.com/huggingface/optimum/pull/812
Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/814
Update ONNX Runtime training doc - use torchrun by @JingyaHuang in https://github.com/huggingface/optimum/pull/820
Fix ONNX export tests by @fxmarty in https://github.com/huggingface/optimum/pull/822
All back workflow dispatch on GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/823
BetterTransformer pipeline padding issue fix by @vrdn-23 in https://github.com/huggingface/optimum/pull/821
Fix optimum pipeline initialization by @fxmarty in https://github.com/huggingface/optimum/pull/824
Fix failing GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/829
Remove feature dimension as dynamic axes for stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/816
Fix pipeline task dropping arguments bug by @fxmarty in https://github.com/huggingface/optimum/pull/828
Fix ORTQuantizer behavior with ORTModelForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/831
Update tests by @mht-sharma in https://github.com/huggingface/optimum/pull/826
Fix exporters GPU CI by @fxmarty in https://github.com/huggingface/optimum/pull/835
Keep intermediary models for ONNX causal-lm by @fxmarty in https://github.com/huggingface/optimum/pull/834
Fix duplicate name merged decoder by @fxmarty in https://github.com/huggingface/optimum/pull/837
Apply lazy import for exporters by @JingyaHuang in https://github.com/huggingface/optimum/pull/836

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.0...v1.7.0

- Python
Published by fxmarty over 3 years ago

optimum - v1.6.4: Patch release

Bugfix

Fix past key/value reuse in decoders following transformers 4.26.0 release and renaming: https://github.com/huggingface/optimum/commit/b9211d6826b92700e73f48821d6e14bd08226abc
ONNX Runtime 1.14 support: https://github.com/huggingface/optimum/pull/772

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.3...v1.6.4

- Python
Published by fxmarty over 3 years ago

optimum - v1.6.3: Patch release

Fixes ORTTrainer for the inference with the ONNX Runtime backend.

- Python
Published by JingyaHuang over 3 years ago

optimum - v1.6.2: Patch release

Hotfixes

Support generation config in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/651

Regressions

The export of speech-to-text architecture as a single ONNX file (that handles both the encoding and decoding) fails do to a regression with the latest transformers version: https://github.com/huggingface/optimum/issues/721

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.1...v1.6.2

- Python
Published by fxmarty over 3 years ago

optimum - v1.6.1: Patch release

Hotfixes

Revert breaking removal of EncoderOnnxConfig, DecoderOnnxConfig, _DecoderWithLMhead by @fxmarty in https://github.com/huggingface/optimum/pull/643
Fix item access of some TASKSTO_AUTOMODELS by @fxmarty in https://github.com/huggingface/optimum/pull/642

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.0...v1.6.1

- Python
Published by fxmarty over 3 years ago

optimum - v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures

Optimum CLI

The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:

optimum-cli --help optimum-cli export onnx --help optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/

Add Optimum CLI backbone by @fxmarty in https://github.com/huggingface/optimum/pull/593

Stable Diffusion ONNX export

Optimum now supports the ONNX export of stable diffusion models from the diffusers library:

optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/

Add Stable Diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/570

BetterTransformer support for more architectures

BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT

The complete list of supported models is available in the documentation.

[BT] Add Bettertransformer support for FSMT by @Sumanth077 in https://github.com/huggingface/optimum/pull/494
[BT] add BetterTransformer support for ViLT architecture by @ka00ri in https://github.com/huggingface/optimum/pull/508
Add MBart support for BetterTransformer by @ravenouse in https://github.com/huggingface/optimum/pull/516
Add CLIP BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/534
Add BetterTransformer support for RemBERT by @hchings in https://github.com/huggingface/optimum/pull/545

ONNX export for more architectures

The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.

Add Swin support in exporters.onnx by @fxmarty in https://github.com/huggingface/optimum/pull/528
[ONNX] add mobilenet support by @younesbelkada in https://github.com/huggingface/optimum/pull/633

Extended ONNX export for encoder-decoder and decoder models

Encoder-decoder or decoder-only models normally making use of the generate() method in transformers can now be exported in several files using the --for-ort argument:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx

yielding: . └── t5_small_onnx ├── config.json ├── decoder_model.onnx ├── decoder_with_past_model.onnx ├── encoder_model.onnx ├── special_tokens_map.json ├── spiece.model ├── tokenizer_config.json └── tokenizer.json

Passing --for-ort, exported models are expected to be loadable directly into ORTModel.

Add ort export in exporters for encoder-decoder models by @mht-sharma in https://github.com/huggingface/optimum/pull/497
Support decoder generated with --for-ort from optimum.exporters.onnx in ORTDecoder by @fxmarty in https://github.com/huggingface/optimum/pull/554

Support for ONNX models with external data at export, optimization, quantization

The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a .onnx_data file if necessary.

Handling ONNX models with external data by @NouamaneTazi in https://github.com/huggingface/optimum/pull/586
Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by @JingyaHuang in https://github.com/huggingface/optimum/pull/332

ONNX Runtime API improvement

Various improvements to allow for a better user experience in the ONNX Runtime integration:

ORTModel, ORTModelDecoder and ORTModelForConditionalGeneration can now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument.
ORTModel.from_pretrained() with from_transformers=True now downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it.
ORTQuantizer.save_pretrained() now saves the model configuration and the preprocessor, making the exported directory usable end-to-end.
ORTOptimizer.save_pretrained() now saves the preprocessor, making the exported directory usable end-to-end.
ONNX Runtime integration API improvement by @michaelbenayoun in https://github.com/huggingface/optimum/pull/515

Custom shapes support at ONNX export

The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.

Read more: optimum-cli export onnx --help

Support custom shapes for dummy inputs by @fxmarty in https://github.com/huggingface/optimum/pull/522
Support for custom input shapes in exporters onnx by @fxmarty in https://github.com/huggingface/optimum/pull/575

Enable `use_cache=True` for ORTModelForCausalLM

Reusing past key values for models using ORTModelForCausalLM (e.g. gpt2) is now possible using use_cache=True, avoiding to recompute them at each iteration of the decoding:

```python from transformers import AutoTokenizer from optimum.onnxruntime import ORTModelForCausalLM import torch

tokenizer = AutoTokenizer.frompretrained("gpt2") model = ORTModelForCausalLM.frompretrained("gpt2", fromtransformers=True, usecache=True)

inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")

gentokens = model.generate(**inputs) tokenizer.batchdecode(gen_tokens) ```

Enable pastkeyvalues for ORTModelForCausalLM by @echarlaix in https://github.com/huggingface/optimum/pull/326

IO binding support for ORTModelForCustomTasks

ORTModelForCustomTasks now supports IO Binding when using CUDAExecutionProvider.

Add IO binding support for custom ORTModel by @JingyaHuang in https://github.com/huggingface/optimum/pull/447

Experimental support to merge ONNX decoder with/without past key values

Along with --for-ort, when passing --task causal-lm-with-past, --task seq2seq-with-past or --task speech2seq-lm-with-past during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.

An experimental support is introduced to merge the two models in one. Example:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/

```python import onnx from optimum.onnx import merge_decoders

decoder = onnx.load("t5onnx/decodermodel.onnx") decoderwithpast = onnx.load("t5onnx/decoderwithpastmodel.onnx")

mergedmodel = mergedecoders(decoder, decoderwithpast) onnx.save(mergedmodel, "t5onnx/decodermergedmodel.onnx") ```

Merge ONNX decoder models by @JingyaHuang in https://github.com/huggingface/optimum/pull/587

Major bugs fixed

Fix BetterTransformer with padding="max_length" by @fxmarty in https://github.com/huggingface/optimum/pull/543
Fix non-nesting bug in BetterTransformer integration by @younesbelkada in https://github.com/huggingface/optimum/pull/637

Other changes, bugfixes and improvements

Fix doc-builder premission error by @mishig25 in https://github.com/huggingface/optimum/pull/482
Fix doc build pr premissions by @mishig25 in https://github.com/huggingface/optimum/pull/484
Re-order the task manager doc by @michaelbenayoun in https://github.com/huggingface/optimum/pull/483
Fix whisper device for gpu test by @fxmarty in https://github.com/huggingface/optimum/pull/486
Fix tensorflow CI by @fxmarty in https://github.com/huggingface/optimum/pull/489
Fix PR doc generation by @regisss in https://github.com/huggingface/optimum/pull/495
Fix broken links in the doc by @fxmarty in https://github.com/huggingface/optimum/pull/499
Update iobinding ORT encoder whisper by @mht-sharma in https://github.com/huggingface/optimum/pull/498
fix NormalizedConfig init error message by @PaulQbFeng in https://github.com/huggingface/optimum/pull/500
Change import structure for ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/456
[BT] Fix failing CI tests by @younesbelkada in https://github.com/huggingface/optimum/pull/501
Remove redundant condition statement in ORTDecoder(Seq2seq) by @JingyaHuang in https://github.com/huggingface/optimum/pull/504
[BT] put decorator on the correct place by @younesbelkada in https://github.com/huggingface/optimum/pull/509
[BT] clearer error message for norm_first by @younesbelkada in https://github.com/huggingface/optimum/pull/510
Deprecate PyTorch 1.12. for BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/513
Fix ORTModelForSeq2SeqLM test by @fxmarty in https://github.com/huggingface/optimum/pull/455
Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by @fxmarty in https://github.com/huggingface/optimum/pull/514
[BT] Fix doc bugs by @younesbelkada in https://github.com/huggingface/optimum/pull/517
Replace sklearn by scikit-learn by @lesteve in https://github.com/huggingface/optimum/pull/502
ORTModel uses optimum.exporters.onnx by @michaelbenayoun in https://github.com/huggingface/optimum/pull/490
Cleanup deprecated ONNX Runtime training docker files by @JingyaHuang in https://github.com/huggingface/optimum/pull/523
Added support for Tapas Model by @JuheonChu in https://github.com/huggingface/optimum/pull/520
Add benchmark results to gpu doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/525
ORTModelForConditionalGeneration uses optimum.exporters.onnx by @mht-sharma in https://github.com/huggingface/optimum/pull/529
Better error message when wrong task is given to exporters by @fxmarty in https://github.com/huggingface/optimum/pull/531
Add OrtModelForSpeechSeq2Seq to doc by @fxmarty in https://github.com/huggingface/optimum/pull/533
Fold sections by default in the documentation's side-bar by @regisss in https://github.com/huggingface/optimum/pull/535
Import GenerationMixin from transformers.generation if transformers >= 4.25.0 by @regisss in https://github.com/huggingface/optimum/pull/536
Add checkiftransformers_greater to manage different versions of transformers by @regisss in https://github.com/huggingface/optimum/pull/537
Enable to push some sections to the end of the TOC in the doc by @regisss in https://github.com/huggingface/optimum/pull/532
Fix import in ONNX export CLI by @fxmarty in https://github.com/huggingface/optimum/pull/553
Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/550
Refactor of 2 functions used in ORTModel by @michaelbenayoun in https://github.com/huggingface/optimum/pull/551
Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/556
Fix ORTTrainer wrapper duplication / PyTorch evaluate / update with transformers 4.25.1 by @JingyaHuang in https://github.com/huggingface/optimum/pull/561
Fix flaky BetterTransformer test by @fxmarty in https://github.com/huggingface/optimum/pull/564
enable FP16Optimizer for fp16 deepspeed training. by @AdamLouly in https://github.com/huggingface/optimum/pull/547
Update documentation quick tour section by @echarlaix in https://github.com/huggingface/optimum/pull/574
Move custom IOBinding to IOBindingHelper by @JingyaHuang in https://github.com/huggingface/optimum/pull/571
Add test for exporters.onnx CLI by @fxmarty in https://github.com/huggingface/optimum/pull/573
Documentation on quantization by @michaelbenayoun in https://github.com/huggingface/optimum/pull/565
More robust tests for ORTModel using decoders and use_cache=True by @fxmarty in https://github.com/huggingface/optimum/pull/576
Fix errors in onnxruntime modeling tests by @fxmarty in https://github.com/huggingface/optimum/pull/585
[BT] fix flaky test by @younesbelkada in https://github.com/huggingface/optimum/pull/591
Fix exporters onnx shapes by @fxmarty in https://github.com/huggingface/optimum/pull/581
Fix exporters.onnx tests by @fxmarty in https://github.com/huggingface/optimum/pull/584
Update on the ONNX Runtime documentation by @michaelbenayoun in https://github.com/huggingface/optimum/pull/567
Add the ORTModelForSemanticSegmentation class by @TheoMrc in https://github.com/huggingface/optimum/pull/539
Refactor BetterTransformer to be able to raise more informative error messages by @fxmarty in https://github.com/huggingface/optimum/pull/594
Constraint temprarily NumPy version to save CIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/614
Add encoder_last_hidden_state as an output for encoder-decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/601
Update dev version by @fxmarty in https://github.com/huggingface/optimum/pull/617
Fix documentation example by @echarlaix in https://github.com/huggingface/optimum/pull/603
Documentation improvements by @fxmarty in https://github.com/huggingface/optimum/pull/598
More informative message at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/609
Use optimum exporter for current weight sharing test by @JingyaHuang in https://github.com/huggingface/optimum/pull/616
OnnxConfig now handle the export to encoder / decoder / decoderwithpast themselves by @michaelbenayoun in https://github.com/huggingface/optimum/pull/590
Set explictly the device index by @JingyaHuang in https://github.com/huggingface/optimum/pull/613
Fix ORT GPU test by @JingyaHuang in https://github.com/huggingface/optimum/pull/624
Add GPT-J normalized config by @fxmarty in https://github.com/huggingface/optimum/pull/623
Remove diffusers dependency in onnxruntime code by @fxmarty in https://github.com/huggingface/optimum/pull/619
Use exporters in ORTTrainer by @mht-sharma in https://github.com/huggingface/optimum/pull/546
Improve use_io_binding default value for different execution providers by @JingyaHuang in https://github.com/huggingface/optimum/pull/604
fixed FuseBiasInLinear by specifying device by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/630
Fixed GPU documentation for HF pipelines by @smiraldr in https://github.com/huggingface/optimum/pull/602
Add argument in the CLI to specify device to do the ONNX export on by @fxmarty in https://github.com/huggingface/optimum/pull/634
Allow kwargs in all generatedummyinputs() methods by @fxmarty in https://github.com/huggingface/optimum/pull/638

Full Changelog: https://github.com/huggingface/optimum/compare/v1.5.2...v1.6.0

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@TheoMrc
- Add ORTModelForSemanticSegmentation https://github.com/huggingface/optimum/pull/539
@ravenouse
- Add MBart support for BetterTransformer https://github.com/huggingface/optimum/pull/516
@ka00ri
- Add BetterTransformer support for ViLT architecture https://github.com/huggingface/optimum/pull/508
@Sumanth077
- Add Bettertransformer support for FSMT https://github.com/huggingface/optimum/pull/494

- Python
Published by fxmarty over 3 years ago

optimum - v1.5.2: Patch release

Constraint temporarily numpy<1.24.0 (#614)

- Python
Published by fxmarty over 3 years ago

optimum - v1.5.1: Patch release

Deprecate PyTorch 1.12. for BetterTransformer with better error message (#513)

- Python
Published by fxmarty over 3 years ago

optimum - v1.5.0: BetterTransformer Integration, IOBinding, Optimum Exporters, and Whisper with ONNX Runtime

BetterTransformer

Convert your model into its PyTorch BetterTransformer format using a one liner with the new BetterTransformer integration for faster inference on CPU and GPU!

```python from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model) ``` Check the full list of supported models in the documentaiton, and check out the Google Colab demo.

Contributions

BetterTransformer integration (#423)
ViT and Wav2Vec2 support (#470)

ONNX Runtime IOBinding support

ORT models (except for ORTModelForCustomTasks) now support IOBinding to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.

By default, use_io_binding is set to True when using CUDA. You can turn off the IOBinding in case of any memory issue:

```python from optimum.onnxruntime import ORTModelForSeq2SeqLM

model = ORTModelForSeq2SeqLM.frompretrained("optimum/t5-small", useio_binding=False) ```

Contributions

Add IOBinding support to ONNX Runtime module (#421)

Optimum Exporters

optimum.exporters is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.

The export can be done via the CLI:

bash python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/

For more information, check the documentation.

Contributions

optimum.exporters creation (#403)
Automatic task detection (#445)

Whisper

Whisper can be exported to ONNX using optimum.exporters.
Whisper can also be exported and ran using optimum.onnxruntime, IO binding is also supported.

Note: For the now the export from optimum.exporters will not be usable by ORTModelForSpeechSeq2Seq. To be able to run inference, export Whisper directly using ORTModelForSpeechSeq2Seq. This will be solved in the next release.

Contributions

Whisper support with optimum.onnxruntime and optimum.exporters (#420)

Other contributions

ONNX Runtime training now supports ORT 1.13.1 and transformers 4.23.1 (#434)
ORTModel can load models from subfolders in a similar fashion as in transformers (#443)
ORTOptimizer has been refactored, and a factory class has been added to create common OptimizationConfigs (#457)
Fixes and updates in the documentation (#411, #432, #437, #441)
Fixes IOBinding (#454, #461)

- Python
Published by michaelbenayoun over 3 years ago

optimum - v1.4.1: Patch release

Add inference with ORTModel to ORTTrainer and ORTSeq2SeqTrainer #189
Add InferenceSession options and provider to ORTModel #271
Add mT5 (#341) and Marian (#393) support to ORTOptimizer
Add batchnorm folding torch.fx transformations #348
The torch.fx transformations now use the marking methods mark_as_transformed, mark_as_restored, get_transformed_nodes #385
Update BaseConfig for transformers 4.22.0 release #386
Update ORTTrainer for transformers 4.22.1 release #388
Add extra ONNX Runtime quantization options #398
Add possibility to pass provider_options to ORTModel #401
Add support to pass a specific device for ORTModel, as transformers does for pipelines #427
Fixes to support onnxruntime 1.13.1 #430

- Python
Published by echarlaix over 3 years ago

optimum - v1.4.0: ORTQuantizer and ORTOptimizer refactorization

ONNX Runtime

Refactorization of ORTQuantizer (#270) and ORTOptimizer (#294)
Add ONNX Runtime fused Adam Optimizer (#295)
Add ORTModelForCustomTasks allowing ONNX Runtime inference support for custom tasks (#303)
Add ORTModelForMultipleChoice allowing ONNX Runtime inference for models with multiple choice classification head (#358)

Torch FX

Add FuseBiasInLinear a transformation that fuses the weight and the bias of linear modules (#253)

Improvements and bugfixes

Enable the possibility to disregard the precomputed past_key_values during ONNX Runtime inference of Seq2Seq models (#241)
Enable node exclusion from quantization for benchmark suite (#284)
Enable possibility to use a token authentication when loading a calibration dataset (#289)
Fix optimum pipeline when no model is given (#301)

- Python
Published by echarlaix almost 4 years ago

optimum - v1.3.0: Torch FX transformations, ORTModelForSeq2SeqLM and ORTModelForImageClassification

Torch FX

The optimum.fx.optimization module (#232) provides a set of torch.fx graph transformations, along with classes and functions to write your own transformations and compose them.

The Transformation and ReversibleTransformation represent non-reversible and reversible transformations, and it is possible to write such transformations by inheriting from those classes
The compose utility function enables transformation composition
Two reversible transformations were added:
- MergeLinears: merges linear layers that have the same input
- ChangeTrueDivToMulByInverse: changes a division by a static value to a multiplication of its inverse

ORTModelForSeq2SeqLM

ORTModelForSeq2SeqLM (#199) allows ONNX export and ONNX Runtime inference for Seq2Seq models. * When exported, Seq2Seq models are decomposed into three parts : the encoder, the decoder (actually consisting of the decoder with the language modeling head), and the decoder with pre-computed key/values as additional inputs. * This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding.

Below is an example that downloads a T5 model from the Hugging Face Hub, exports it through the ONNX format and saves it :

```python from optimum.onnxruntime import ORTModelForSeq2SeqLM

Load model from hub and export it through the ONNX format

model = ORTModelForSeq2SeqLM.frompretrained("t5-small", fromtransformers=True)

Save the exported model in the given directory

model.savepretrained(outputdir) ```

ORTModelForImageClassification

ORTModelForImageClassification (#226) allows ONNX Runtime inference for models with an image classification head.

Below is an example that downloads a ViT model from the Hugging Face Hub, exports it through the ONNX format and saves it :

```python from optimum.onnxruntime import ORTModelForImageClassification

Load model from hub and export it through the ONNX format

model = ORTModelForImageClassification.frompretrained("google/vit-base-patch16-224", fromtransformers=True)

Save the exported model in the given directory

model.savepretrained(outputdir) ```

ORTOptimizer

Adds support for converting model weights from fp32 to fp16 by adding a new optimization parameter (fp16) to OptimizationConfig (#273).

Pipelines

Additional pipelines tasks are now supported, here is a list of the supported tasks along with the default model for each:

Image Classification (ViT)
Text-to-Text Generation (T5 small)
Summarization (T5 base)
Translation (T5 base)

Below is an example that downloads a T5 small model from the Hub and loads it with transformers pipeline for translation :

```python from transformers import AutoTokenizer, pipeline from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.frompretrained("optimum/t5-small") model = ORTModelForSeq2SeqLM.frompretrained("optimum/t5-small") onnxtranslation = pipeline("translationentofr", model=model, tokenizer=tokenizer)

text = "What a beautiful day !" pred = onnx_translation(text)

[{'translation_text': "C'est une belle journée !"}]

```

Breaking change

The ORTModelForXXX execution provider default value is now set to CPUExecutionProvider (#203). Before, if no execution provider was provided, it was set to CUDAExecutionProvider if a gpu was detected, or to CPUExecutionProvider otherwise.

- Python
Published by echarlaix almost 4 years ago

optimum - v1.2.3: Patch release

Remove intel sub-package, migrating to optimum-intel (#212)
Fix the loading and saving of ORTModel optimized and quantized models (#214)

- Python
Published by echarlaix about 4 years ago

optimum - v1.2.2: Patch release

Extend QuantizationPreprocessor to dynamic quantization (https://github.com/huggingface/optimum/pull/196)
Introduce unified approach to create transformers vs optimized models benchmark (https://github.com/huggingface/optimum/pull/194)
Bump huggingface_hub version and protobuf fix (https://github.com/huggingface/optimum/pull/205)

- Python
Published by echarlaix about 4 years ago

optimum - v1.2.1: Patch release

Add support to Python version 3.7 (https://github.com/huggingface/optimum/pull/176)

- Python
Published by echarlaix about 4 years ago

optimum - v1.2.0: pipeline and AutoModelForXxx classes to run ONNX Runtime inference

ORTModel

ORTModelForXXX classes such as ORTModelForSequenceClassification were integrated with the Hugging Face Hub in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the save_pretrained and push_to_hub methods. An already optimized and / or quantized ONNX model can also be loaded using the ORTModelForXXX classes using the from_pretrained method.

Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :

```python from optimum.onnxruntime import ORTModelForSequenceClassification

Load model from hub and export it through the ONNX format

model = ORTModelForSequenceClassification.frompretrained( "distilbert-base-uncased-finetuned-sst-2-english", fromtransformers=True )

Save the exported model

model.savepretrained("alocalpathforconvertonnx_model") ```

Pipelines

Built-in support for transformers pipelines was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as ONNX Runtime.

The currently supported tasks with the default model for each are the following :

Text Classification (DistilBERT model fine-tuned on SST-2)
Question Answering (DistilBERT model fine-tuned on SQuAD v1.1)
Token Classification(BERT large fine-tuned on CoNLL2003)
Feature Extraction (DistilBERT)
Zero Shot Classification (BART model fine-tuned on MNLI)
Text Generation (DistilGPT2)

Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with transformers pipeline for question-answering.

```python from transformers import AutoTokenizer, pipeline from optimum.onnxruntime import ORTModelForQuestionAnswering

load vanilla transformers and convert to onnx

model = ORTModelForQuestionAnswering.frompretrained("deepset/roberta-base-squad2",fromtransformers=True) tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

test the model with using transformers pipeline, with handleimpossibleanswer for squad_v2

optimumqa = pipeline(task, model=model, tokenizer=tokenizer, handleimpossibleanswer=True) prediction = optimumqa( question="What's my name?", context="My name is Philipp and I live in Nuremberg." )

print(prediction)

{'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}

```

Improvements

Add loss when performing the evalutation step using an instance of ORTTrainer, previously not enabled when inference was performed with ONNX Runtime in #152

- Python
Published by echarlaix about 4 years ago

optimum - v1.1.1: Patch release

Habana

Installation details added for Optimum-Habana which provides optimized transformers integration for Intel's Habana Gaudi Processor (HPU).

ONNX Runtime

Add the possibility to specify the execution provider in ORTModel.
Add IncludeFullyConnectedNodes class to find the nodes composing the fully connected layers in order to (only) target the latter for quantization to limit the accuracy drop.
Update QuantizationPreprocessor so that the intersection of the two sets representing the nodes to quantize and the nodes to exclude from quantization to be an empty set.
Rename Seq2SeqORTTrainer to ORTSeq2SeqTrainer for clarity and to keep consistency.
Add ORTOptimizer support for ELECTRA models.
Fix the loading of pretrained ORTConfig which contains optimization and quantization config.

- Python
Published by JingyaHuang about 4 years ago

optimum - v1.1.0: ORTTrainer, Seq2SeqORTTrainer, ONNX Runtime optimization and quantization API improvements

ORTTrainer and Seq2SeqORTTrainer

The ORTTrainer and Seq2SeqORTTrainer are two newly experimental classes. - Both ORTTrainer and Seq2SeqORTTrainer were created to have a similar user-facing API as the Trainer and Seq2SeqTrainer of the Transformers library. - ORTTrainer allows the usage of the ONNX Runtime backend to train a given PyTorch model in order to accelerate training. ONNX Runtime will run the forward and backward passes using an optimized automatically-exported ONNX computation graph, while the rest of the training loop is executed by native PyTorch. - ORTTrainer allows the usage of ONNX Runtime inferencing during both the evaluation and the prediction step. - For Seq2SeqORTTrainer, ONNX Runtime inferencing is incompatible with --predict_with_generate, as the generate method is not supported yet.

ONNX Runtime optimization and quantization APIs improvements

The ORTQuantizer and ORTOptimizer classes underwent a massive refactoring that should allow a simpler and more flexible user-facing API.

Addition of the possibility to iteratively compute the quantization activation ranges when applying static quantization by using the ORTQuantizer method partial_fit. This is especially useful when using memory-hungry calibration methods such as Entropy and Percentile methods.
When using the MinMax calibration method, it is now possible to compute the moving average of the minimum and maximum values representing the activations quantization ranges instead of the global minimum and maximum (feature available with onnxruntime v1.11.0 or higher).
The classes OptimizationConfig, QuantizationConfig and CalibrationConfig were added in order to better segment the different ONNX Runtime related parameters instead of having one unique configuration ORTConfig.
The QuantizationPreprocessor class was added in order to find the nodes to include and / or exclude from quantization, by finding the nodes following a given pattern (such as the nodes forming LayerNorm for example). This is particularly useful in the context of static quantization, where the quantization of modules such as LayerNorm or GELU are responsible of important drop in accuracy.

- Python
Published by echarlaix about 4 years ago

optimum - v1.0.0: ONNX Runtime optimization and quantization support

ONNX Runtime support

An ORTConfig class was introduced, allowing the user to define the desired export, optimization and quantization strategies.
The ORTOptimizer class takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance of ORTOptimizer, the user needs to provide an ORTConfig object, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling the ORTOptimizer.fit method.
ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added ORTQuantizer class. In order to create an instance of ORTQuantizer, the user needs to provide an ORTConfig object, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling the ORTQuantizer.fit method.

Additionnal features for Intel Neural Compressor

We have also added a new class called IncOptimizer which will take care of combining the pruning and the quantization processes.

- Python
Published by echarlaix over 4 years ago

optimum - v0.1.2: Intel Neural Compressor's pruning support

With this release, we enable Intel Neural Compressor v1.8 magnitude pruning for a variety of NLP tasks with the introduction of IncTrainer which handles the pruning process.

- Python
Published by echarlaix over 4 years ago

optimum - v0.1.1: Intel Neural Compressor's dynamic, post-training and aware-training quantization support

With this release, we enable Intel Neural Compressor v1.7 PyTorch dynamic, post-training and aware-training quantization for a variety of NLP tasks. This support includes the overall process, from quantization application to the loading of the resulting quantized model. The latter being enabled by the introduction of the IncQuantizedModel class.

- Python
Published by echarlaix over 4 years ago

optimum - Optimum v0.0.1 - EAP

Initial release for early access to Optimum library featuring Intel's LPOT quantization and pruning support.

- Python
Published by mfuntowicz almost 5 years ago

Recent Releases of optimum

optimum - v1.27.0: Last release before v2, Transformers 4.53 support, SmolLM3, VisualBert...

🚀 Major Upgrades

🔧 Enhancements & Fixes

🧹 Deprecations & v2

New Contributors

optimum - v1.26.1: Patch release

optimum - v1.26.0: ColPali, D-FINE, InternLM2

ONNX export

New features & enhancements

New Contributors

optimum - v1.25.3: Patch release

optimum - v1.25.2: Patch release

What's Changed

optimum - v1.25.1: Patch release

What's Changed

optimum - v1.25.0: ViTPose, RT-DETR, EfficientNet, Moonshine ONNX

:rocket: New Features & Enhancements

:bustsinsilhouette: New Contributors

What's Changed

optimum - v1.24.0: SD3 & Flux, DinoV2, Modernbert, GPTQModel, Transformers v4.48...

Release Notes: Optimum v1.24.0

:rocket: New Features & Enhancements

:wrench: Key Fixes & Optimizations

:bustsinsilhouette: New Contributors

What's Changed

optimum - v1.23.3: Patch release

optimum - v1.23.2: Patch release

optimum - v1.23.1: Patch release

optimum - v1.23.0: ORTDiffusionPipeline, transformers v4.45

ONNX Runtime Diffusion pipeline

Transformers v4.45

Subfolder

New Contributors

optimum - v1.22.0: transformers 4.44 compatibility, bugfixes

What's Changed

New Contributors

optimum - v1.21.4: Patch release

optimum - v1.21.3: Patch release

optimum - v1.21.2: Patch release

optimum - v1.21.1: Patch release

optimum - v1.21.0: many bugfixes, transformers 4.42 compatibility

What's Changed

New Contributors

optimum - v1.20.0: VITS, Phi-3 ONNX export

Extended ONNX export

Other changes and bugfixes

New Contributors

optimum - v1.19.2: Patch release

optimum - v1.19.1: Patch release

optimum - v1.19.0: Musicgen, MarkupLM ONNX export

Extended ONNX export

Other changes and bugfixes

New Contributors

optimum - v1.18.1: Patch release

Fix the installation for Optimum Neuron v0.0.21 release

Fix the task inference of stable diffusion

optimum - v1.18.0: Gemma, OWLv2, MPNet Qwen2 ONNX support

New architectures ONNX export :

Other changes and bugfixes

optimum - v1.17.1: Patch release

Update Transformers dependency for the release of Optimum Habana v1.10.2

optimum - v1.17.0: Improved ONNX support & many bugfixes

ONNX export from nn.Module

Here one could do any modification on the model before the export.

ONNX export with static shapes

BF16 ONNX export

ONNX export for news models

Sentence Transformers ONNX export

Timm models support with ONNX Runtime

Export the model to ONNX under the hood with export=True.

Get model specific transforms (normalization, resize).

Other changes and bugfixes

New Contributors

optimum - v1.16.2: Patch release

optimum - v1.16.1: Patch release

Breaking change: BetterTransformer llama, falcon, whisper, bart is deprecated

What's Changed

New Contributors

optimum - v1.16.0: Transformers 4.36 compatibility, extended ONNX support, Mixtral GPTQ

ONNX export from `nn.Module`