Recent Releases of optimum
optimum - v1.27.0: Last release before v2, Transformers 4.53 support, SmolLM3, VisualBert...
🚀 Major Upgrades
- Transformers v4.53 support and SmolLM3 model addition by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2326
- Batched inference support across all decoders by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2319
- VisualBert support by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2303
🔧 Enhancements & Fixes
- Fix taskmanager by @echarlaix in https://github.com/huggingface/optimum/pull/2296
- Add task onnx register by @echarlaix in https://github.com/huggingface/optimum/pull/2291
- ExporterConfig refactorization by @echarlaix in https://github.com/huggingface/optimum/pull/2157
- remove timm from exporters extra by @echarlaix in https://github.com/huggingface/optimum/pull/2299
- No more forcing separators by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2279
- Fix broken Trainer documentation link in README by @VolodymyrBg in https://github.com/huggingface/optimum/pull/2304
- Propagate libraryname parameter in frompretrained to export by @tomaarsen in https://github.com/huggingface/optimum/pull/2328
- Fix 'Block pattern could not be match. Pass blocknametoquantize argument in quantizemodel' while loading Qwen VL GPTQ model by @arunmadhusud in https://github.com/huggingface/optimum/pull/2295
🧹 Deprecations & v2
- Deprecated support for TFLite, BetterTransformer, and ONNXRuntime‑Training, these integrations will be fully removed in v2.
- TensorFlow models export will be removed in v2, consistent with Transformer library dropping TF/JAX support.
- ONNX and ONNXRuntime integrations will move into the new Optimum‑ONNX package.
New Contributors
- @dependabot[bot] made their first contribution in https://github.com/huggingface/optimum/pull/2292
- @arunmadhusud made their first contribution in https://github.com/huggingface/optimum/pull/2295
- @VolodymyrBg made their first contribution in https://github.com/huggingface/optimum/pull/2304
Full Changelog: https://github.com/huggingface/optimum/compare/v1.26.1...v1.27.0
- Python
Published by IlyasMoutawwakil 11 months ago
optimum - v1.26.1: Patch release
Add back from_transformers for base model by @echarlaix in https://github.com/huggingface/optimum/pull/2288
- Python
Published by echarlaix about 1 year ago
optimum - v1.26.0: ColPali, D-FINE, InternLM2
ONNX export
- D-FINE support by @xenova in https://github.com/huggingface/optimum/pull/2249
- ColPali support by @Balladie in https://github.com/huggingface/optimum/pull/2251
- InternLM2 support by @gmf14 in https://github.com/huggingface/optimum/pull/2244
- Chinese CLIP support by @xenova in https://github.com/huggingface/optimum/pull/1591
New features & enhancements
- Add onnxslim support by @inisis in https://github.com/huggingface/optimum/pull/2258
- Introduce ORTSessionMixin and enable general io binding by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2234
- Fix and uniformize hub kwargs by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2276
- Add compatibility with transformers 4.52 by @echarlaix in https://github.com/huggingface/optimum/pull/2270
- Distribute and complete onnxruntime tests (decoder models) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2278
- Add ONNX Runtime optimization support for ModernBERT by @amas0 in https://github.com/huggingface/optimum/pull/2208
New Contributors
- @inisis made their first contribution in https://github.com/huggingface/optimum/pull/2258
- @Balladie made their first contribution in https://github.com/huggingface/optimum/pull/2251
- @gmf14 made their first contribution in https://github.com/huggingface/optimum/pull/2244
- @amas0 made their first contribution in https://github.com/huggingface/optimum/pull/2208
- Python
Published by echarlaix about 1 year ago
optimum - v1.25.3: Patch release
- Fix ORT pipelines by @echarlaix in https://github.com/huggingface/optimum/pull/2274
Full Changelog**: https://github.com/huggingface/optimum/compare/v1.25.2...v1.25.3
- Python
Published by echarlaix about 1 year ago
optimum - v1.25.2: Patch release
What's Changed
- Upgrade optimum-intel in setup extras by @echarlaix in https://github.com/huggingface/optimum/pull/2271
- Match transformers behavior with return_dict by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2269
Full Changelog: https://github.com/huggingface/optimum/compare/v1.25.1...v1.25.2
- Python
Published by IlyasMoutawwakil about 1 year ago
optimum - v1.25.1: Patch release
What's Changed
- Updated readme/pypi page by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2268
- Fix bug ORTModelForFeatureExtraction by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2267
- Fix doc TPU section by @echarlaix in https://github.com/huggingface/optimum/pull/2265
Full Changelog: https://github.com/huggingface/optimum/compare/v1.25.0...v1.25.1
- Python
Published by IlyasMoutawwakil about 1 year ago
optimum - v1.25.0: ViTPose, RT-DETR, EfficientNet, Moonshine ONNX
:rocket: New Features & Enhancements
- Add ONNX export support for ViTPose, RT-DETR, EfficientNet, Moonshine
- Infer if the model needs to be exported to ONNX during loading
```diff from optimum.onnxruntime import ORTModelForCausalLM
modelid = "meta-llama/Llama-3.2-1B" - model = ORTModelForCausalLM.frompretrained(modelid, export=True) + model = ORTModelForCausalLM.frompretrained(model_id) ```
- Transformers v4.49, v4.50 and v4.51 compatibility
:bustsinsilhouette: New Contributors
A huge thank you to our first-time contributors:
- @ruidazeng
- @ariG23498
- @janak2
- @qubvel
- @zhxchen17
- @xieofxie
- @EFord36
- @Thas-Tayapongsak
- @hans00
- @Abdennacer-Badaoui
What's Changed
- Update ort training installation instructions by @echarlaix in https://github.com/huggingface/optimum/pull/2173
- Dev version by @echarlaix in https://github.com/huggingface/optimum/pull/2175
- Fixed All Typos in docs by @ruidazeng in https://github.com/huggingface/optimum/pull/2185
- Remove deprecated ORTModel class by @echarlaix in https://github.com/huggingface/optimum/pull/2187
- avoid library_name guessing if it is known in parameters standartization by @eaidova in https://github.com/huggingface/optimum/pull/2179
- Infer whether a model needs to be exported to ONNX or not by @echarlaix in https://github.com/huggingface/optimum/pull/2181
- Update optimum neuron extra by @dacorvo in https://github.com/huggingface/optimum/pull/2190
- Add support for Moonshine ONNX export (& seq2seq models with non-legacy cache &
Tensor.repeat_interleave) by @xenova in https://github.com/huggingface/optimum/pull/2162 - ViTPose by @ariG23498 in https://github.com/huggingface/optimum/pull/2183
- ViTPose export fix by @echarlaix in https://github.com/huggingface/optimum/pull/2192
- Remove ORTTrainer code snippet from README by @echarlaix in https://github.com/huggingface/optimum/pull/2194
- Remove README code snippets by @echarlaix in https://github.com/huggingface/optimum/pull/2195
- Add transformers v4.49 support by @echarlaix in https://github.com/huggingface/optimum/pull/2191
- Fix test benchmark suite by @echarlaix in https://github.com/huggingface/optimum/pull/2199
- fix the onnx export custom model example; fix repo name; fix opset version; remove deprecated arg; by @janak2 in https://github.com/huggingface/optimum/pull/2203
- Limit transformers version for bettertransformer support by @echarlaix in https://github.com/huggingface/optimum/pull/2198
- Add ONNX config for RT-DETR (and RT-DETRv2) by @qubvel in https://github.com/huggingface/optimum/pull/2201
- Remove deprecated notebook by @echarlaix in https://github.com/huggingface/optimum/pull/2205
- Update CI runner to ubuntu 22.04 by @echarlaix in https://github.com/huggingface/optimum/pull/2206
- Add executorch documentation section by @echarlaix in https://github.com/huggingface/optimum/pull/2193
- Fix typo in exporters/onnx/utils.py by @zhxchen17 in https://github.com/huggingface/optimum/pull/2210
- Link Optimum-ExecuTorch to parent Optimum on Hub by @guangy10 in https://github.com/huggingface/optimum/pull/2222
- Fix CI and update Transformers (4.51.1) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2225
- Remove FP16_Optimizer patch for DeepSpeed by @Rohan138 in https://github.com/huggingface/optimum/pull/2213
- Fix diffusers by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2229
- Remove diffusers extra by @echarlaix in https://github.com/huggingface/optimum/pull/2207
- TRT engine docs by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1396
- Always use a deafult user agent by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2230
- dedup getmodelexternaldata_paths by @xieofxie in https://github.com/huggingface/optimum/pull/2217
- Clean up workflows by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2231
- reduce area of patch_everywhere for avoid unexpected replacements by @eaidova in https://github.com/huggingface/optimum/pull/2237
- add dinov2 onnx optimizer support by @EFord36 in https://github.com/huggingface/optimum/pull/2227
- Fix code quality test by @echarlaix in https://github.com/huggingface/optimum/pull/2239
- Add onnx export for efficientnet by @Thas-Tayapongsak in https://github.com/huggingface/optimum/pull/2214
- add loading image processor by @eaidova in https://github.com/huggingface/optimum/pull/2254
- Fix
CLIPSdpaAttentionhad dropped since v4.48 by @hans00 in https://github.com/huggingface/optimum/pull/2245 - Increase clip opset by @echarlaix in https://github.com/huggingface/optimum/pull/2256
- Add feature extraction support for image models by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2255
- adding token classification task for qwen2 by @Abdennacer-Badaoui in https://github.com/huggingface/optimum/pull/2261
- upgrade min transformers version for phi3 by @echarlaix in https://github.com/huggingface/optimum/pull/2263
- Python
Published by echarlaix about 1 year ago
optimum - v1.24.0: SD3 & Flux, DinoV2, Modernbert, GPTQModel, Transformers v4.48...
Release Notes: Optimum v1.24.0
We’re excited to announce the release of Optimum v1.24.0. This update expands ONNX-based model capabilities and includes several improvements, bug fixes, and new contributions from the community.
:rocket: New Features & Enhancements
ORTQuantizernow supports models with ONNX subfolders.- ONNX Runtime IO Binding support for all supported Transformers models (no models left behind).
- SD3 and Flux model support added to
ORTDiffusionPipelineenabling latest diffusion-based models. - Transformers v4.47 and v4.48 compatibility, ensuring seamless integration with the latest advancements in Hugging Face's ecosystem.
- ONNX export support extended to various models, including Decision Transformer, ModernBERT, Megatron-BERT, Dinov2, OLMo, and many more (see details).
:wrench: Key Fixes & Optimizations
- Dropped support for Python 3.8
- Bug fixes in
ModelPatcher, SDXL refiner export, and device checks for improved reliability.
:bustsinsilhouette: New Contributors
A huge thank you to our first-time contributors: - @gabe-l-hart - @ra9hur - @bndos - @mlynatom - @LoSealL - @sjrl - @guangy10 - @LRL-ModelCloud - @pragyandev
Your contributions make Optimum better! :tada:
For a detailed list of all changes, please check out the full changelog.
:rocket: Happy optimizing!
What's Changed
- Python
Published by IlyasMoutawwakil over 1 year ago
optimum - v1.23.3: Patch release
- Add sentence-transformers and timm documentation example by @echarlaix in https://github.com/huggingface/optimum/pull/2072
- Create token type ids when not provided by @echarlaix in https://github.com/huggingface/optimum/pull/2081
- Add transformers v4.46 support by @echarlaix in https://github.com/huggingface/optimum/pull/2078
- Python
Published by echarlaix over 1 year ago
optimum - v1.23.2: Patch release
- Fix compatibility with diffusers < 0.25.0 #2063 @echarlaix
- Update the habana extra #2077 @regisss
Full Changelog: https://github.com/huggingface/optimum/compare/v1.23.1...v1.23.2
- Python
Published by regisss over 1 year ago
optimum - v1.23.1: Patch release
- Fix doc build by @regisss in https://github.com/huggingface/optimum/pull/2050
- Don't hardcode the logger level to INFO let users set TRANSFORMERS_VERBOSITY by @tomaarsen in https://github.com/huggingface/optimum/pull/2047
- Add workflow to mark issues as stale by @regisss in https://github.com/huggingface/optimum/pull/2051
- Fix onnx export when transformers >= v4.45 (impacting sentence-transformers and timm models) by @echarlaix in https://github.com/huggingface/optimum/pull/2053 and https://github.com/huggingface/optimum/pull/2054
- Python
Published by echarlaix over 1 year ago
optimum - v1.23.0: ORTDiffusionPipeline, transformers v4.45
ONNX Runtime Diffusion pipeline
Adding ORTDiffusionPipeline to simplify diffusers model loading by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1960 and https://github.com/huggingface/optimum/pull/2021
diff
model_id = "runwayml/stable-diffusion-v1-5"
- pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, revision="onnx")
+ pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx")
image = pipeline("sailing ship in storm by Leonardo da Vinci").images[0]
Transformers v4.45
Transformers v4.45 support by @echarlaix in https://github.com/huggingface/optimum/pull/2023 and https://github.com/huggingface/optimum/pull/2045
Subfolder
Remove the restriction for the model's config to be in the model's subfolder by @echarlaix in https://github.com/huggingface/optimum/pull/2044
New Contributors
- @tcsavage made their first contribution in https://github.com/huggingface/optimum/pull/1965
- @yuanwu2017 made their first contribution in https://github.com/huggingface/optimum/pull/2003
- @h3110Fr13nd made their first contribution in https://github.com/huggingface/optimum/pull/2031
- @glegendre01 made their first contribution in https://github.com/huggingface/optimum/pull/2033
- @rbrugaro made their first contribution in https://github.com/huggingface/optimum/pull/2027
Full Changelog: https://github.com/huggingface/optimum/compare/v1.22.0...v1.23.0
- Python
Published by echarlaix over 1 year ago
optimum - v1.22.0: transformers 4.44 compatibility, bugfixes
What's Changed
- Fix sentence transformers modeling patching for export by @echarlaix in https://github.com/huggingface/optimum/pull/1936
- Update optimum intel extra by @echarlaix in https://github.com/huggingface/optimum/pull/1935
- Update Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1937
- Remove inplace op in mistral patcher by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1938
- Fix forward bug in ORTModelForFeatureExtraction by @moria97 in https://github.com/huggingface/optimum/pull/1941
- Deprecate ORTModel class by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1939
- Remove warning by @echarlaix in https://github.com/huggingface/optimum/pull/1945
- Clip vision model onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/1920
- Add export test for swin with shifted windows by @echarlaix in https://github.com/huggingface/optimum/pull/1942
- Refactor diffusers tasks by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1947
- Fix optimizer's command line reading by @idruker-cerence in https://github.com/huggingface/optimum/pull/1961
- Fix unmaskunattendedpatched signature by @fxmarty in https://github.com/huggingface/optimum/pull/1963
- Fix undefined variable in library name inference by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1964
- Fix gpt bigcode ONNX export for transformers<4.39.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1973
- Support transformers 4.43 by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1971
- chore(ci): migrate runner configuration in GitHub workflows by @XciD in https://github.com/huggingface/optimum/pull/1978
- Fix typos in quantization.mdx by @aldakata in https://github.com/huggingface/optimum/pull/1989
- Update Habana extra in setup.py by @regisss in https://github.com/huggingface/optimum/pull/1991
- Follow up the diffusers task refactoring by @JingyaHuang in https://github.com/huggingface/optimum/pull/1999
- Transformers 4.44 support by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1996
- Modify token classification processor default dataset args by @echarlaix in https://github.com/huggingface/optimum/pull/2005
- Fix TFLite tests by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/2007
- Fix attribute name from
inputs_namestoinput_namesby @J4BEZ in https://github.com/huggingface/optimum/pull/2010 - Fix typo in BetterTransformer's overview docs by @ftnext in https://github.com/huggingface/optimum/pull/2015
- Apply deprecated
evaluation_strategyby @muellerzr in https://github.com/huggingface/optimum/pull/1819 - Update transformers imports for
deepspeedandis_torch_xla_availableby @Rohan138 in https://github.com/huggingface/optimum/pull/2012 - Add quanto install and instructions by @dacorvo in https://github.com/huggingface/optimum/pull/1976
New Contributors
- @moria97 made their first contribution in https://github.com/huggingface/optimum/pull/1941
- @XciD made their first contribution in https://github.com/huggingface/optimum/pull/1978
- @zhenglongjiepheonix made their first contribution in https://github.com/huggingface/optimum/pull/1933
- @aldakata made their first contribution in https://github.com/huggingface/optimum/pull/1989
- @J4BEZ made their first contribution in https://github.com/huggingface/optimum/pull/2010
- @ftnext made their first contribution in https://github.com/huggingface/optimum/pull/2015
- @muellerzr made their first contribution in https://github.com/huggingface/optimum/pull/1819
- @Rohan138 made their first contribution in https://github.com/huggingface/optimum/pull/2012
Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.4...v1.22.0
- Python
Published by echarlaix almost 2 years ago
optimum - v1.21.4: Patch release
- Update Habana extra in setup.py by @regisss in #1991
Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.3...v1.21.4
- Python
Published by regisss almost 2 years ago
optimum - v1.21.3: Patch release
- Deprecate ORTModel class by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1939
- Remove warning by @echarlaix in https://github.com/huggingface/optimum/pull/1945
- Fix optimizer's command line reading by @idruker-cerence in https://github.com/huggingface/optimum/pull/1961
- Fix unmaskunattendedpatched signature by @fxmarty in https://github.com/huggingface/optimum/pull/1963
- Fix gpt bigcode ONNX export for transformers<4.39.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1973
- Support transformers 4.43 by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1971
Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.2...v1.21.3
- Python
Published by echarlaix almost 2 years ago
optimum - v1.21.2: Patch release
- Remove inplace op in mistral patcher by @IlyasMoutawwakil in #1938
- Fix ORTModelForFeatureExtraction modeling by @moria97 in #1941
Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.1...v1.21.2
- Python
Published by echarlaix almost 2 years ago
optimum - v1.21.1: Patch release
- Fix sentence transformers model patching by @echarlaix in https://github.com/huggingface/optimum/pull/1936
- Update Intel extra by @echarlaix in https://github.com/huggingface/optimum/pull/1935
- Update Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1937
Full Changelog: https://github.com/huggingface/optimum/compare/v1.21.0...v1.21.1
- Python
Published by echarlaix almost 2 years ago
optimum - v1.21.0: many bugfixes, transformers 4.42 compatibility
What's Changed
- ORTOptimizer for the model type Segformer by @zachmayer in https://github.com/huggingface/optimum/pull/1820
- fix: device consistence by @Daya-Jin in https://github.com/huggingface/optimum/pull/1891
- Allow optimum to discover and load subpackages by @dacorvo in https://github.com/huggingface/optimum/pull/1894
- feat(ci): add trufflehog secrets detector by @McPatate in https://github.com/huggingface/optimum/pull/1899
- fix(ci): remove unnecessary permissions by @McPatate in https://github.com/huggingface/optimum/pull/1904
- Remove read token by @fxmarty in https://github.com/huggingface/optimum/pull/1903
- Remove dataset with restrictive license by @echarlaix in https://github.com/huggingface/optimum/pull/1910
- Fix Windows and onnx dtype compatibility by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1886
- Deprecated
use_auth_tokenby @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1837 - Add redirection for optimum intel doc by @echarlaix in https://github.com/huggingface/optimum/pull/1918
- Read useexternaldata_format from ORTConfig file by @idruker-cerence in https://github.com/huggingface/optimum/pull/1917
- Pin numpy v1 for onnxruntime by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1921
- Fix GPTQ CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1878
- Fix code quality by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1928
- Fix incorrect names for usage blenderbot for causallm by @eaidova in https://github.com/huggingface/optimum/pull/1887
- Fixed bug key error "lasthiddenstate" by @satishsilveri in https://github.com/huggingface/optimum/pull/1674
- Support transformers 4.42 by @fxmarty in https://github.com/huggingface/optimum/pull/1929
New Contributors
- @zachmayer made their first contribution in https://github.com/huggingface/optimum/pull/1820
- @Daya-Jin made their first contribution in https://github.com/huggingface/optimum/pull/1891
- @dacorvo made their first contribution in https://github.com/huggingface/optimum/pull/1894
- @McPatate made their first contribution in https://github.com/huggingface/optimum/pull/1899
- @idruker-cerence made their first contribution in https://github.com/huggingface/optimum/pull/1917
- @satishsilveri made their first contribution in https://github.com/huggingface/optimum/pull/1674
Full Changelog: https://github.com/huggingface/optimum/compare/v1.20.0...v1.21.0
- Python
Published by fxmarty almost 2 years ago
optimum - v1.20.0: VITS, Phi-3 ONNX export
Extended ONNX export
- VITS ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1607
- Phi-3 ONNX export by @JingyaHuang in https://github.com/huggingface/optimum/pull/1870
- Add Phi-3 normalized config by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/1841
- Add Phi-3 small normalized config by @JingyaHuang in https://github.com/huggingface/optimum/pull/1864
Other changes and bugfixes
- Bump transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1824
- Remove call to
apt updatebeforeapt purgein the main doc build workflow by @regisss in https://github.com/huggingface/optimum/pull/1830 - Update github workflows by @echarlaix in https://github.com/huggingface/optimum/pull/1829
- Remove bad PPA in main doc build workflow by @regisss in https://github.com/huggingface/optimum/pull/1831
- Fix TPU doc build by @regisss in https://github.com/huggingface/optimum/pull/1834
- Fix sentence transformers models infer library by @echarlaix in https://github.com/huggingface/optimum/pull/1832
Fix random initialization of bias when using GPTQ quantization with models without bias by @B-201 in https://github.com/huggingface/optimum/pull/1827
Update the Transformers dependency in the Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1851
Make stable diffusion unet and vae number of channels static by @eaidova in https://github.com/huggingface/optimum/pull/1840
Fix compatibility with transformers v4.41.0 for ONNX by @echarlaix in https://github.com/huggingface/optimum/pull/1860
Fix FX CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1866
Fix Utils CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1867
Fix BT CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1872
Fix ORTConfig loading by @mr-sarthakgupta in https://github.com/huggingface/optimum/pull/1879
Update ORT doc for ROCM 6.0 by @mht-sharma in https://github.com/huggingface/optimum/pull/1862
Fix ort config instantiation (frompretrained) and saving (savepretrained) by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1865
Fix ORT CI by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1875
Update optimum intel extra by @echarlaix in https://github.com/huggingface/optimum/pull/1882
Bump transformers version for neuron extras by @JingyaHuang in https://github.com/huggingface/optimum/pull/1881
New Contributors
- @B-201 made their first contribution in https://github.com/huggingface/optimum/pull/1827
- @mr-sarthakgupta made their first contribution in https://github.com/huggingface/optimum/pull/1879
Full Changelog: https://github.com/huggingface/optimum/compare/v1.19.0...v1.20.0
- Python
Published by echarlaix about 2 years ago
optimum - v1.19.2: Patch release
- Update the Transformers dependency in the Habana extra #1851 @regisss
Full Changelog: https://github.com/huggingface/optimum/compare/v1.19.1...v1.19.2
- Python
Published by regisss about 2 years ago
optimum - v1.19.1: Patch release
- Bump transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1824
- Remove call to
apt updatebeforeapt purgein the main doc build workflow by @regisss in https://github.com/huggingface/optimum/pull/1830
Full Changelog: https://github.com/huggingface/optimum/compare/v1.19.0...v1.19.1
- Python
Published by echarlaix about 2 years ago
optimum - v1.19.0: Musicgen, MarkupLM ONNX export
Extended ONNX export
Musicgen and MarkupLM models from Transformers can now be exported to ONNX through optimum-cli export onnx. Musicgen ONNX export is used to run the model locally in a browser through transformers.js.
- Musicgen ONNX export (text-conditional only) by @fxmarty in https://github.com/huggingface/optimum/pull/1779
- Add support for markuplm ONNX export by @pogzyb in https://github.com/huggingface/optimum/pull/1784
Other changes and bugfixes
- Fix IR version for merged ONNX decoders by @fxmarty in https://github.com/huggingface/optimum/pull/1780
- Update test model id by @echarlaix in https://github.com/huggingface/optimum/pull/1785
- Add Nvidia and Neuron to README by @JingyaHuang in https://github.com/huggingface/optimum/pull/1791
- adds debug options to dump onnx graphs by @prathikr in https://github.com/huggingface/optimum/pull/1789
- Improve PR template by @fxmarty in https://github.com/huggingface/optimum/pull/1799
- Add Google TPU to the mix by @mfuntowicz in https://github.com/huggingface/optimum/pull/1797
- Add redirection for Optimum TPU by @regisss in https://github.com/huggingface/optimum/pull/1801
- Add Nvidia and Neuron to the installation doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/1803
- Update installation instructions by @echarlaix in https://github.com/huggingface/optimum/pull/1806
- Fix offline compatibility by @fxmarty in https://github.com/huggingface/optimum/pull/1805
- Remove unnecessary constants for > 2GB ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/1808
- Add onnx export function for pix2struct model by @naormatania in https://github.com/huggingface/optimum/pull/1815
New Contributors
- @pogzyb made their first contribution in https://github.com/huggingface/optimum/pull/1784
- @naormatania made their first contribution in https://github.com/huggingface/optimum/pull/1815
Full Changelog: https://github.com/huggingface/optimum/compare/v1.18.0...v1.19.0
- Python
Published by fxmarty about 2 years ago
optimum - v1.18.1: Patch release
Fix the installation for Optimum Neuron v0.0.21 release
- Improve the installation of optimum-neuron through optimum extras #1778
Fix the task inference of stable diffusion
- Fix infer task for stable diffusion #1793
Full Changelog: https://github.com/huggingface/optimum/compare/v1.18.0...v1.18.1
- Python
Published by JingyaHuang about 2 years ago
optimum - v1.18.0: Gemma, OWLv2, MPNet Qwen2 ONNX support
New architectures ONNX export :
- OWLv2 by @xenova in #1689
- Gemma by @fxmarty in #1714
- MPNet by @nathan-az in #1471
- Qwen2 by @uniartisan in #1746
Other changes and bugfixes
- Fix starcoder ORT integration by @fxmarty in #1722
- Fix useauthtoken with ORTModel by @fxmarty in #1740
- Fix compatibility with transformers
v4.39.0by @echarlaix in #1764
- Python
Published by echarlaix about 2 years ago
optimum - v1.17.1: Patch release
Update Transformers dependency for the release of Optimum Habana v1.10.2
- Update Transformers dependency in Habana extra #1700
Full Changelog: https://github.com/huggingface/optimum/compare/v1.17.0...v1.17.1
- Python
Published by regisss over 2 years ago
optimum - v1.17.0: Improved ONNX support & many bugfixes
ONNX export from nn.Module
A function is exposed to programmatically export any nn.Module (e.g. models coming from Transformers, but modified). This is useful in case you need to do some modifications on models loaded from the Hub before exporting. Example:
```python from transformers import AutoModelForImageClassification from optimum.exporters.onnx import onnxexportfrom_model
model = AutoModelForImageClassification.from_pretrained("google/vit-base-patch16-224")
Here one could do any modification on the model before the export.
onnxexportfrommodel(model, output="vitonnx") ```
- Enable model ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1649
ONNX export with static shapes
The Optimum ONNX export CLI allows to disable dynamic shape for inputs/outputs:
optimum-cli export onnx --model timm/ese_vovnet39b.ra_in1k out_vov --no-dynamic-axes
This is useful if the exported model is to be consumed by a runtime that does not support dynamic shapes. The static shape can be specified e.g. with --batch_size 1 . See all the shape options in optimum-cli export onnx --help.
- Enable export of model with fixed shape by @mht-sharma in https://github.com/huggingface/optimum/pull/1643
BF16 ONNX export
The Optimum ONNX export now supports BF16 export on CPU and GPU. Beware though that ONNX Runtime is most often not able to consume the models as some operation are not implemented in this data type, although the exported models comply with ONNX standard. This is useful if you are developing a runtime that consomes BF16 ONNX models.
Example:
optimum-cli export onnx --model bert-base-uncased --dtype bf16 bert_onnx
- BF16 support in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1654
ONNX export for news models
You can now export to ONNX table-transformer, bart for text-classification.
- Add ONNX export for table-transformer by @xenova in https://github.com/huggingface/optimum/pull/1616
- Reactivate BART Onnx Export by @claeyzre in https://github.com/huggingface/optimum/pull/1666
Sentence Transformers ONNX export
- Fix sentence transformers ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1632
- Bump sentence-transformers ONNX opset by @fxmarty in https://github.com/huggingface/optimum/pull/1634
- Pass
trust_remote_codeto sentence transformers export by @xenova in https://github.com/huggingface/optimum/pull/1677 - Fix library detection by @fxmarty in https://github.com/huggingface/optimum/pull/1690
Timm models support with ONNX Runtime
Timm models can now be run through ONNX Runtime with the class ORTModelForImageClassification:
```python from urllib.request import urlopen
import timm import torch from PIL import Image
from optimum.onnxruntime import ORTModelForImageClassification
Export the model to ONNX under the hood with export=True.
model = ORTModelForImageClassification.frompretrained("timm/resnext10164x4d.c1_in1k", export=True)
Get model specific transforms (normalization, resize).
dataconfig = timm.data.resolvedataconfig(pretrainedcfg=model.config.pretrainedcfg) transforms = timm.data.createtransform(**dataconfig, istraining=False)
img = Image.open( urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png") ) output = model(transforms(img).unsqueeze(0)).logits top5probabilities, top5class_indices = torch.topk(torch.softmax(output, dim=1) * 100, k=5) ```
- Add Timm support in ORTModelForImageClassification by @mht-sharma in https://github.com/huggingface/optimum/pull/1578
Other changes and bugfixes
- Modify SEW-D model for tests by @echarlaix in https://github.com/huggingface/optimum/pull/1601
- Add phi and mixtral model type to normalizedconfig by @changwangss in https://github.com/huggingface/optimum/pull/1625
- Remove "to ONNX" from info message when exporting model by @helena-intel in https://github.com/huggingface/optimum/pull/1627
- Modify model id for test by @echarlaix in https://github.com/huggingface/optimum/pull/1628
- Fix cupy detection by @fxmarty in https://github.com/huggingface/optimum/pull/1635
- Fix ORT detection by @fxmarty in https://github.com/huggingface/optimum/pull/1636
- Enable sdpa export for SD unet component by @echarlaix in https://github.com/huggingface/optimum/pull/1637
- [ORT] Improve dummy mask & add tips for attention fusion in the doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/1640
- Improve error message by @Almonok in https://github.com/huggingface/optimum/pull/1623
- Add
input_labelsinput to SAM model export by @xenova in https://github.com/huggingface/optimum/pull/1638 - Fix c4 dataset loading by @SunMarc in https://github.com/huggingface/optimum/pull/1646
- Avoid loading onnx file in weight deduplication if not necessary by @fxmarty in https://github.com/huggingface/optimum/pull/1648
- Allow lower ONNX opsets by @fxmarty in https://github.com/huggingface/optimum/pull/1650
- Remove abstract decorator from
_exportby @JingyaHuang in https://github.com/huggingface/optimum/pull/1652 - Add rjieba install by @mht-sharma in https://github.com/huggingface/optimum/pull/1661
- Fix wikitext2 processing by @SunMarc in https://github.com/huggingface/optimum/pull/1663
- Fix: local variable 'dataset' referenced before assignment by @hiyouga in https://github.com/huggingface/optimum/pull/1600
- Support float16 images in StableDiffusionXLWatermarker by @jambayk in https://github.com/huggingface/optimum/pull/1603
- Extend autocast check to cover more platforms like XPU by @hoshibara in https://github.com/huggingface/optimum/pull/1639
- Support IO Binding for ORTModelForCTC by @vidalmaxime in https://github.com/huggingface/optimum/pull/1629
- Add fp16 support for split cache by @PatriceVignola in https://github.com/huggingface/optimum/pull/1602
- ORTModelForFeatureExtraction always exports as transformers models by @fxmarty in https://github.com/huggingface/optimum/pull/1684
- Avoid overriding model_type in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/1647
- Fix gptq device_map = "cpu" by @SunMarc in https://github.com/huggingface/optimum/pull/1662
- CI: Avoid iterating over a mutated iterable by @fxmarty in https://github.com/huggingface/optimum/pull/1683
- Add option to disable ONNX constant folding by @fxmarty in https://github.com/huggingface/optimum/pull/1682
- re-enable decoder sequence classification by @dwyatte in https://github.com/huggingface/optimum/pull/1679
- Move & rename
onnx_exportby @fxmarty in https://github.com/huggingface/optimum/pull/1685 - Update standardizemodelattributes by @mht-sharma in https://github.com/huggingface/optimum/pull/1686
- Fix: AttributeError: module 'packaging' has no attribute 'version' by @soulteary in https://github.com/huggingface/optimum/pull/1660
- Disable failing test & free space when building documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1693
- Fix no space left on device in actions by @fxmarty in https://github.com/huggingface/optimum/pull/1694
- Add end-to-end Marlin benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1695
- Fix main doc build by @fxmarty in https://github.com/huggingface/optimum/pull/1697
- Update optimum-intel requirements by @echarlaix in https://github.com/huggingface/optimum/pull/1699
New Contributors
- @tomaarsen made their first contribution in https://github.com/huggingface/optimum/pull/1597
- @helena-intel made their first contribution in https://github.com/huggingface/optimum/pull/1627
- @Almonok made their first contribution in https://github.com/huggingface/optimum/pull/1623
- @hiyouga made their first contribution in https://github.com/huggingface/optimum/pull/1600
- @jambayk made their first contribution in https://github.com/huggingface/optimum/pull/1603
- @hoshibara made their first contribution in https://github.com/huggingface/optimum/pull/1639
- @vidalmaxime made their first contribution in https://github.com/huggingface/optimum/pull/1629
- @PatriceVignola made their first contribution in https://github.com/huggingface/optimum/pull/1602
- @claeyzre made their first contribution in https://github.com/huggingface/optimum/pull/1666
- @dwyatte made their first contribution in https://github.com/huggingface/optimum/pull/1679
- @soulteary made their first contribution in https://github.com/huggingface/optimum/pull/1660
Full Changelog: https://github.com/huggingface/optimum/compare/v1.16.0...v1.17.0
- Python
Published by fxmarty over 2 years ago
optimum - v1.16.2: Patch release
Fix ORT training compatibility for transformers v4.36.0 by @AdamLouly https://github.com/huggingface/optimum/pull/1586
Fix ONNX expor tcompatibility for transformers v4.37.0 by @echarlaix https://github.com/huggingface/optimum/pull/1641
- Python
Published by echarlaix over 2 years ago
optimum - v1.16.1: Patch release
Breaking change: BetterTransformer llama, falcon, whisper, bart is deprecated
The features from BetterTransformer for Llama, Falcon, Whisper and Bart have been upstreamed in Transformers. Please use transformers>=4.36 and torch>=2.1.1 to use by default PyTorch's scaled_dot_product_attention.
More details: https://github.com/huggingface/transformers/releases/tag/v4.36.0
What's Changed
- Update dev version by @fxmarty in https://github.com/huggingface/optimum/pull/1596
- Typo: tansformers -> transformers by @tomaarsen in https://github.com/huggingface/optimum/pull/1597
- [GPTQ] fix tests by @SunMarc in https://github.com/huggingface/optimum/pull/1598
- Show correct error message on using BT for SDPA models by @fxmarty in https://github.com/huggingface/optimum/pull/1599
New Contributors
- @tomaarsen made their first contribution in https://github.com/huggingface/optimum/pull/1597
Full Changelog: https://github.com/huggingface/optimum/compare/v1.16.0...v1.16.1
- Python
Published by fxmarty over 2 years ago
optimum - v1.16.0: Transformers 4.36 compatibility, extended ONNX support, Mixtral GPTQ
Transformers 4.36 compatiblity
Notably, the ONNX exports aten::scaled_dot_product_attention in a standardized way for the compatible models.
- Compatibility with Transformers 4.36 by @fxmarty in https://github.com/huggingface/optimum/pull/1590
Extended ONNX support: timm, sentence-transformers, Phi, ESM
- Add ONNX export for phi models by @xenova in https://github.com/huggingface/optimum/pull/1579
- Add ESM onnx support by @xenova in https://github.com/huggingface/optimum/pull/1581
- Add timm models export by @mht-sharma in https://github.com/huggingface/optimum/pull/1587
- Proper sentence-transformers ONNX export support by @fxmarty in https://github.com/huggingface/optimum/pull/1589
GPTQ for Mixtral
Work in progress.
- add
modules_in_block_to_quantizearg for gptq by @SunMarc in https://github.com/huggingface/optimum/pull/1585
What's Changed
- Update version to 1.16.0.dev0 by @fxmarty in https://github.com/huggingface/optimum/pull/1571
- Use doc links in the README for subpackages by @fxmarty in https://github.com/huggingface/optimum/pull/1572
- Fix GPTQ compatibility with AutoGPTQ by @fxmarty in https://github.com/huggingface/optimum/pull/1574
- Refactoring EC2 CIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/1575
- Remove inputs from sentence-transformers ONNX output by @fxmarty in https://github.com/huggingface/optimum/pull/1593
- Gptq tokenized dataset by @SunMarc in https://github.com/huggingface/optimum/pull/1584
- Run timm ONNX CI only once per day by @fxmarty in https://github.com/huggingface/optimum/pull/1594
- Run timm ONNX CI nightly v2 by @fxmarty in https://github.com/huggingface/optimum/pull/1595
Full Changelog: https://github.com/huggingface/optimum/compare/v1.15.0...v1.16.0
- Python
Published by fxmarty over 2 years ago
optimum - v1.15.0: ROCMExecutionProvider support
ROCMExecutionProvider support
The Optimum ONNX Runtime integration is extended to officially support ROCMExecutionProvider. See more details in the documentation.
- Add AMD GPU support by @mht-sharma in https://github.com/huggingface/optimum/pull/1546
- Update ROCM ORT doc by @mht-sharma in https://github.com/huggingface/optimum/pull/1564
Extended ONNX export
The Swin2sr, DPT, GLPN, ConvNextv2 are now supported in the ONNX export.
- Swin2sr onnx by @baskrahmer in https://github.com/huggingface/optimum/pull/1492
- Add depth-estimation w/ DPT+GLPN by @xenova in https://github.com/huggingface/optimum/pull/1529
- Add
convnextv2onnx export by @xenova in https://github.com/huggingface/optimum/pull/1560
What's Changed
- Add OV export CLI to README by @echarlaix in https://github.com/huggingface/optimum/pull/1526
- Refactor NormalizedConfigs for GQA by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1539
- Fix model patcher ONNX decoder export by @fxmarty in https://github.com/huggingface/optimum/pull/1547
- Add AMD to the documentation main page by @mfuntowicz in https://github.com/huggingface/optimum/pull/1540
- Add Optimum-amd documentation to the PR & release doc by @fxmarty in https://github.com/huggingface/optimum/pull/1562
- Add amd documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1557
- Remove
delete_doc_commentworkflows by @regisss in https://github.com/huggingface/optimum/pull/1565 - optimum-nvidia by @mfuntowicz in https://github.com/huggingface/optimum/pull/1566
- Update installation instructions in README by @echarlaix in https://github.com/huggingface/optimum/pull/1568
- Update doc for AMD by @mht-sharma in https://github.com/huggingface/optimum/pull/1570
- Add amd extra to setup.py by @echarlaix in https://github.com/huggingface/optimum/pull/1567
New Contributors
- @xenova made their first contribution in https://github.com/huggingface/optimum/pull/1529
Full Changelog: https://github.com/huggingface/optimum/compare/v1.14.0...v1.15.0
- Python
Published by fxmarty over 2 years ago
optimum - v1.14.1: Patch release
- Update optimum-intel required version by @echarlaix in https://github.com/huggingface/optimum/pull/1521
- Swin2sr onnx by @baskrahmer in https://github.com/huggingface/optimum/pull/1492
- Fix Falcon ONNX export with alibi by @fxmarty in https://github.com/huggingface/optimum/pull/1524
- Fix whisper v3 ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1525
- Add new fusion argument to fix compatibility with onnxruntime v1.16.2 by @echarlaix in https://github.com/huggingface/optimum/pull/1535
- Add depth-estimation w/ DPT+GLPN by @xenova in https://github.com/huggingface/optimum/pull/1529
- Python
Published by echarlaix over 2 years ago
optimum - v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization
ONNX
New architectures
Falcon
- Add ONNX and ORT support for Falcon by @fxmarty in https://github.com/huggingface/optimum/pull/1391
SpeechT5
- SpeechT5 ONNX support by @fxmarty in https://github.com/huggingface/optimum/pull/1404
Mistral
- Add Mistral models ONNX export support by @echarlaix in https://github.com/huggingface/optimum/pull/1425
TrOCR
- Enable KV cache support by @fxmarty in https://github.com/huggingface/optimum/pull/1456
LCMs
Enable LCMs (available in in diffusers since v0.22.0) ONNX export and ORT inference by @echarlaix in https://github.com/huggingface/optimum/pull/1469
```python from optimum.onnxruntime import ORTLatentConsistencyModelPipeline
pipe = ORTLatentConsistencyModelPipeline.frompretrained("SimianLuo/LCMDreamshaperv7", export=True) prompt = "sailing ship in storm by Leonardo da Vinci" images = pipe(prompt=prompt, numinferencesteps=4, guidancescale=8.0).images ``` Also enable ONNX export using the CLI :
bash
optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/
Decoder refactorization
- Add position ids as input during ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1381
- Enable the export of only one decoder for decoder-only models by @echarlaix in https://github.com/huggingface/optimum/pull/1257
GPTQ
- Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in https://github.com/huggingface/optimum/pull/1419
- Disable exllamav2 for quantization by @SunMarc in https://github.com/huggingface/optimum/pull/1482
- Default to exllama when exllamav2 is disabled by @SunMarc in https://github.com/huggingface/optimum/pull/1494
- Added cacheblockoutputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in https://github.com/huggingface/optimum/pull/1479
- Add support for CPU Inference by @vivekkhandelwal1 in https://github.com/huggingface/optimum/pull/1496
- Fix minimum version of auto-gptq by @fxmarty in https://github.com/huggingface/optimum/pull/1504
- switch to exllama_config instead of disabling exllamav2 by @SunMarc in https://github.com/huggingface/optimum/pull/1505
Other changes and bugfixes
- Fix wrong dtype in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1369
- Add support for loading quantization from config by @aarnphm https://github.com/huggingface/optimum/pull/1363
- Guard multiprocessing set start method by @fxmarty in https://github.com/huggingface/optimum/pull/1377
- Do not output KV cache when not using
with-pastin the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1358 - Fix provider availability check on ORT 1.16.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/1403
- Fix quantization for onnxruntime v1.16.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1405
- Fix normalized config key for models architecture by @echarlaix in https://github.com/huggingface/optimum/pull/1408
- Fix arg in bettertransformer llama attention by @SunMarc in https://github.com/huggingface/optimum/pull/1421
- Ignore .xml files for Stable Diffusion ORT downloads by @baskrahmer in https://github.com/huggingface/optimum/pull/1428
- Falcon BetterTransformer requires transformers>=4.34 by @fxmarty in https://github.com/huggingface/optimum/pull/1431
- Fix llama ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1432
- Update attention.py by @DongHande in https://github.com/huggingface/optimum/pull/1416
- Remove SharedDDP as it was deprecated from Transformers by @AdamLouly in https://github.com/huggingface/optimum/pull/1443
- Fix owlvit task detection by @fxmarty in https://github.com/huggingface/optimum/pull/1453
- Improve ONNX quantization doc by @fxmarty in https://github.com/huggingface/optimum/pull/1451
- Fix perceiver tests and dummy inputs for ONNX by @baskrahmer in https://github.com/huggingface/optimum/pull/1449
- Disable bart onnx export for text-classification and question-answering by @fxmarty in https://github.com/huggingface/optimum/pull/1457
- Fix ONNX exporter library_name by @baskrahmer in https://github.com/huggingface/optimum/pull/1460
- [ORT Training] Some important updates of ONNX Runtime training APIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/1335
- Fix typo in BetterTransformer CLIP by @fxmarty in https://github.com/huggingface/optimum/pull/1468
- Fix custom architecture detection in onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/1472
- Fix whisper export by @mht-sharma in https://github.com/huggingface/optimum/pull/1503
- Update Transformers dependency for Habana extra by @regisss in https://github.com/huggingface/optimum/pull/1508
- Fix argument error by @ranchlai in https://github.com/huggingface/optimum/pull/1501
- Remove attention mask patching by @fxmarty in https://github.com/huggingface/optimum/pull/1509
- Fix generation input by @echarlaix in https://github.com/huggingface/optimum/pull/1512
- Fix tests ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/1517
- Fix BT on transformers 4.35 release by @fxmarty in https://github.com/huggingface/optimum/pull/1518
New Contributors
- @aarnphm made their first contribution in https://github.com/huggingface/optimum/pull/1363
- @DongHande made their first contribution in https://github.com/huggingface/optimum/pull/1416
- @AlexKoff88 made their first contribution in https://github.com/huggingface/optimum/pull/1479
- @vivekkhandelwal1 made their first contribution in https://github.com/huggingface/optimum/pull/1496
- @ranchlai made their first contribution in https://github.com/huggingface/optimum/pull/1501
- Python
Published by echarlaix over 2 years ago
optimum - v1.13.3: Patch release
Patch release for transformers==4.34.1 compatibility. We will do a release next week for transformers==4.35 compatibility and new features. Please bear with us!
- Falcon BetterTransformer requires transformers>=4.34 by @fxmarty https://github.com/huggingface/optimum/pull/1431
- Fix arg in bettertransformer llama attention by @SunMarc #1421
- Update Transformers dependency for Habana extra by @regisss #1508
- temporarily pin to transformers<4.35 by @fxmarty https://github.com/huggingface/optimum/commit/616931019b9bd7546918a48d475a07efb92f51b1
- Python
Published by fxmarty over 2 years ago
optimum - v1.13.2: Patch release
- Fix provider availability check on ORT 1.16.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/1403
- Fix ONNX Runtime quantization compatibility for onnxruntime v1.16.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1405
- Python
Published by echarlaix over 2 years ago
optimum - v1.13.1: Patch release
Fix ONNX fp16 export that broke in 1.13.0.
What's Changed
- Fix wrong dtype in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1369
- Fix tests collection for TFLite export and trigger TFLite tests only when relevant by @fxmarty in https://github.com/huggingface/optimum/pull/1368
- upgrade min compatible optimum-intel version by @echarlaix in https://github.com/huggingface/optimum/pull/1371
- Fix fp16 ONNX export test by @fxmarty in https://github.com/huggingface/optimum/pull/1373
- Python
Published by fxmarty almost 3 years ago
optimum - v1.13.0: ONNX weight deduplication, ONNX export and ORT extension
Deduplicate Embedding / LM head weight in the ONNX export
Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: https://github.com/pytorch/pytorch/issues/108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.
- Fix PyTorch tied weights being duplicated in the exported ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/1326
- Fix initializer detection for weight deduplication by @fxmarty in https://github.com/huggingface/optimum/pull/1333
Extended ONNX Runtime support
ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.
- Pix2Struct onnxruntime support by @krathul in https://github.com/huggingface/optimum/pull/1296
- Add MPT onnx and ORT support by @jiqing-feng in https://github.com/huggingface/optimum/pull/1161
- Donut iobinding by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1209
- Add encoder decoder model by @mht-sharma in https://github.com/huggingface/optimum/pull/851
Extended ONNX export: MPT, TIMM models, Encoder-Decoder
Additionally, the model SAM is now be default exported as a visionencoder.onnx, and promptencodermaskdecoder.onnx.
- Add MPT onnx and ORT support by @jiqing-feng in https://github.com/huggingface/optimum/pull/1161
- Adds ONNX Export Support for Timm Models by @mht-sharma in https://github.com/huggingface/optimum/pull/965
- Add encoder decoder model by @mht-sharma in https://github.com/huggingface/optimum/pull/851
- Fix SAM ONNX export requirements with transformers 4.32, export vision encoder separately by @fxmarty in https://github.com/huggingface/optimum/pull/1301
BetterTransformer supports Falcon
- [
BetterTransformer] Add falcon toBetterTransformerby @younesbelkada in https://github.com/huggingface/optimum/pull/1343
Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration
The function exllama_set_max_input_length from auto-gptq can now be used with Transformers GPTQ models.
- Version bump + add maxinputlength to gptq by @SunMarc in https://github.com/huggingface/optimum/pull/1329
Other changes and bugfixes
Update version to 1.12.1.dev0 following release by @fxmarty in https://github.com/huggingface/optimum/pull/1312
Add GPTQ prefill benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1313
Precise ORTModel documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1268
Improve BetterTransformer backward compatibility by @fxmarty in https://github.com/huggingface/optimum/pull/1314
Improve ORTModel documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1245
Add bitsandbytes benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1320
fix typo in log message by @AAnirudh07 in https://github.com/huggingface/optimum/pull/1322
Support customize dtype for dummy generators by @JingyaHuang in https://github.com/huggingface/optimum/pull/1307
Fix opset custom onnx export by @mht-sharma in https://github.com/huggingface/optimum/pull/1331
Replace mpt to ernie custom export by @mht-sharma in https://github.com/huggingface/optimum/pull/1332
Fix BT benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/1344
Add nameorpath for donut generation by @fxmarty in https://github.com/huggingface/optimum/pull/1345
send both negative prompt embeds to ORT SDXL by @ssube in https://github.com/huggingface/optimum/pull/1339
add vae image processor by @echarlaix in https://github.com/huggingface/optimum/pull/1219
add negative prompt test by @echarlaix in https://github.com/huggingface/optimum/pull/1347
Add GPT BigCode to the BT documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1356
Add BT dummy objects by @fxmarty in https://github.com/huggingface/optimum/pull/1355
Add text2text-generation-with-past test for encoder-decoder model by @mht-sharma in https://github.com/huggingface/optimum/pull/1338
Fix sentence transformer export by @mht-sharma in https://github.com/huggingface/optimum/pull/1366
New Contributors
- @krathul made their first contribution in https://github.com/huggingface/optimum/pull/1296
- @AAnirudh07 made their first contribution in https://github.com/huggingface/optimum/pull/1322
- @jiqing-feng made their first contribution in https://github.com/huggingface/optimum/pull/1161
- @ssube made their first contribution in https://github.com/huggingface/optimum/pull/1339
Full Changelog: https://github.com/huggingface/optimum/compare/v1.12.0...v1.13.0
- Python
Published by fxmarty almost 3 years ago
optimum - v1.12.0: AutoGPTQ integration, extended BetterTransformer support
AutoGPTQ integration
Part of AutoGPTQ library has been integrated in Optimum, with utilities to ease the integration in other Hugging Face libraries. Reference: https://huggingface.co/docs/optimum/llmquantization/usageguides/quantization
- Add GPTQ Quantization by @SunMarc in https://github.com/huggingface/optimum/pull/1216
- Fix GPTQ doc by @regisss in https://github.com/huggingface/optimum/pull/1267
- Add AutoGPTQ benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/1292
- Fix gptq params by @SunMarc in https://github.com/huggingface/optimum/pull/1284
Extended BetterTransformer support
BetterTransformer now supports BLOOM and GPT-BigCode architectures.
- Bt bloom by @baskrahmer in https://github.com/huggingface/optimum/pull/1221
- Support gpt_bigcode in bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/1252
- Fix BetterTransformer starcoder init by @fxmarty in https://github.com/huggingface/optimum/pull/1254
- Fix BT starcoder fp16 by @fxmarty in https://github.com/huggingface/optimum/pull/1255
- SDPA dispatches to flash for MQA by @fxmarty in https://github.com/huggingface/optimum/pull/1259
- Check output_attentions is False in BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/1306
Other changes and bugfixes
- Update bug report template by @fxmarty in https://github.com/huggingface/optimum/pull/1266
- Fix ORTModule uses fp32 model issue by @jingyanwangms in https://github.com/huggingface/optimum/pull/1264
- Fix build PR doc workflow by @fxmarty in https://github.com/huggingface/optimum/pull/1270
- Avoid triggering stop job on label by @fxmarty in https://github.com/huggingface/optimum/pull/1274
- Update version following 1.11.1 patch by @fxmarty in https://github.com/huggingface/optimum/pull/1275
- Fix fp16 ONNX detection for decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/1276
- Update version following 1.11.2 patch by @regisss in https://github.com/huggingface/optimum/pull/1291
- Pin tensorflow<=2.12.1 by @fxmarty in https://github.com/huggingface/optimum/pull/1305
- ONNX: disable text-generation models for sequence classification & fixes for transformers 4.32 by @fxmarty in https://github.com/huggingface/optimum/pull/1308
- Fix staging tests following transformers 4.32 release by @fxmarty in https://github.com/huggingface/optimum/pull/1309
- More fixes following transformers 4.32 release by @fxmarty in https://github.com/huggingface/optimum/pull/1311
New Contributors
- @SunMarc made their first contribution in https://github.com/huggingface/optimum/pull/1216
- @jingyanwangms made their first contribution in https://github.com/huggingface/optimum/pull/1264
Full Changelog: https://github.com/huggingface/optimum/compare/v1.11.2...v1.12.0
- Python
Published by fxmarty almost 3 years ago
optimum - v1.11.2: Patch release
Remove the Transformers version constraint on optimum[habana].
- Remove Transformers version constraint on Optimum Habana #1290 by @regisss
Full Changelog: https://github.com/huggingface/optimum/compare/v1.11.1...v1.11.2
- Python
Published by regisss almost 3 years ago
optimum - v1.11.1: Patch release
Minor fix: documentation building for 1.11.
- Accelerate as a soft dependency by @fxmarty
Full Changelog: https://github.com/huggingface/optimum/compare/v1.11.0...v1.11.1
- Python
Published by fxmarty almost 3 years ago
optimum - v1.11.0: Extended ONNX, ONNX Runtime, BetterTransformer support
Extended ONNX and ONNX Runtime support
Add ONNX export and ONNX Runtime inference support for gpt bigcode.
- Add ONNX / ONNXRuntime support for StarCoder by @JingyaHuang in #1042
Extended BetterTransformer support
BetterTransformer now supports Llama 2 and bark.
Training and autocast are now supported for most architectures, please refer to the documentation for more details: https://huggingface.co/docs/optimum/main/en/bettertransformer/overview
- Support Llama 2 in BetterTransformer. by @noamwies in #1235
- BetterTransformer support training & autocast for all archs by @fxmarty in #1225
- Add bark into bettertransformer by @ylacombe in https://github.com/huggingface/optimum/pull/1199
- Drop mask for training in all cases for BetterTransformer & precise documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1250
Major bugfixes
- Update ORT training to be compatible with transformers 4.31 by @JingyaHuang in #1227
Other improvements and bugfix
- add upgrade strategy by @echarlaix in https://github.com/huggingface/optimum/pull/1228
- fix typo README by @echarlaix in https://github.com/huggingface/optimum/pull/1230
- Fix OwlViT exporter config by @regisss in https://github.com/huggingface/optimum/pull/1188
- Add example SD XL documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1233
- fix SD loading when safetensors weights only by @echarlaix in https://github.com/huggingface/optimum/pull/1232
- fix optimum-intel min version by @echarlaix in https://github.com/huggingface/optimum/pull/1234
- fix typo documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1238
- update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1240
- Update onnxruntime minimum version to 1.11 by @fxmarty in https://github.com/huggingface/optimum/pull/1244
- ORT quantizes by default all ops by @fxmarty in https://github.com/huggingface/optimum/pull/1246
New Contributors
- @ylacombe made their first contribution in https://github.com/huggingface/optimum/pull/1199
- @noamwies made their first contribution in https://github.com/huggingface/optimum/pull/1235
Full Changelog: https://github.com/huggingface/optimum/compare/v1.10.0...v1.11.0
- Python
Published by JingyaHuang almost 3 years ago
optimum - v1.10.1: Patch release
Fix OwlViT exporter by @regisss in https://github.com/huggingface/optimum/pull/1188
Fix SD loading when safetensors weights only by @echarlaix in https://github.com/huggingface/optimum/pull/1232
Fix
optimum-intelversion requirements by @echarlaix in https://github.com/huggingface/optimum/pull/1234
Full Changelog: https://github.com/huggingface/optimum/compare/v1.10.0...v1.10.1
- Python
Published by echarlaix almost 3 years ago
optimum - v1.10.0: Stable Diffusion XL pipelines
Stable Diffusion XL
Enable SD XL ONNX export and ONNX Runtime inference by @echarlaix in https://github.com/huggingface/optimum/pull/1168
- Enable SD XL ONNX export using the CLI :
optimum-cli export onnx --model stabilityai/stable-diffusion-xl-base-0.9 --task stable-diffusion-xl ./sd_xl_onnx
- Add SD XL pipelines for ONNX Runtime inference (supported tasks : text-to-image and image-to-image) :
```python from optimum.onnxruntime import ORTStableDiffusionXLPipeline
modelid = "stabilityai/stable-diffusion-xl-base-0.9" pipeline = ORTStableDiffusionXLPipeline.frompretrained(model_id, export=True)
prompt = "sailing ship in storm by Leonardo da Vinci" image = pipeline(prompt).images[0] pipeline.save_pretrained("onnx-sd-xl-base-0.9") ```
Stable Diffusion pipelines
Enable image-to-image and inpainting pipelines for ONNX Runtime inference by @echarlaix in https://github.com/huggingface/optimum/pull/1121
More examples in documentation
Major bugfixes
- Fix bloom KV cache usage in ORTForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/1152
What's Changed
- Add stable diffusion example by @prathikr in https://github.com/huggingface/optimum/pull/1136
- Fixed incomplete ONNX export model memory release issue by @sharpbai in https://github.com/huggingface/optimum/pull/1154
- Add trust remote code option for config by @changwangss in https://github.com/huggingface/optimum/pull/1151
- Fix typos of ONNXRuntimme -> ONNXRuntime by @mgoin in https://github.com/huggingface/optimum/pull/1155
- Fix ONNX export for MobileViT for segmentation by @regisss in https://github.com/huggingface/optimum/pull/1128
- Revert "update the default block size" by @rui-ren in https://github.com/huggingface/optimum/pull/1162
- ONNX export for custom architectures & models with custom modeling code by @fxmarty in https://github.com/huggingface/optimum/pull/1166
- Update Optimum Neuron doc by @regisss in https://github.com/huggingface/optimum/pull/1164
- Fix stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1173
- Add gptbigcode modeltype to NormalizedTextConfig by @changwangss in https://github.com/huggingface/optimum/pull/1170
- Allow
attention_mask=Nonefor BetterTransformer in the inference batched case for gpt2 & gpt-neo by @fxmarty in https://github.com/huggingface/optimum/pull/1180 - Fix encoder attention mask input order for ORT by @fxmarty in https://github.com/huggingface/optimum/pull/1181
- Fix ORTModel initialization on specific device id by @fxmarty in https://github.com/huggingface/optimum/pull/1182
- Add stable diffusion img2img and inpain documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1149
- Fix SD XL ONNX export for img2img task by @echarlaix in https://github.com/huggingface/optimum/pull/1194
- Remove graphcore from documentation quickstart by @echarlaix in https://github.com/huggingface/optimum/pull/1201
- Unpin tensorflow by @fxmarty in https://github.com/huggingface/optimum/pull/1211
- Fix ORT test for unknown architecture for task by @fxmarty in https://github.com/huggingface/optimum/pull/1212
- add ort + stable diffusion documentation by @prathikr in https://github.com/huggingface/optimum/pull/1205
- Fix vision encoder decoder that may not cache cross-attention by @fxmarty in https://github.com/huggingface/optimum/pull/1210
- Add documentation for Optimum Furiosa by @regisss in https://github.com/huggingface/optimum/pull/1165
- Add BLIP-2 to BetterTransformer documentation by @fxmarty in https://github.com/huggingface/optimum/pull/1218
- Set default value to unet config sample size by @echarlaix in https://github.com/huggingface/optimum/pull/1223
- Fix broken link in doc by @regisss in https://github.com/huggingface/optimum/pull/1222
- Fix BT test by @fxmarty in https://github.com/huggingface/optimum/pull/1224
- Add SD XL documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1198
- Update setup.py to add optimum-furiosa extras by @mht-sharma in https://github.com/huggingface/optimum/pull/1226
New Contributors
- @sharpbai made their first contribution in https://github.com/huggingface/optimum/pull/1154
- @mgoin made their first contribution in https://github.com/huggingface/optimum/pull/1155
Full Changelog: https://github.com/huggingface/optimum/compare/v1.9.0...v1.10.0
- Python
Published by echarlaix almost 3 years ago
optimum - v1.9.1: Patch release
- Fix stable diffusion ONNX export for
diffusers>=v0.18.0by @echarlaix in https://github.com/huggingface/optimum/pull/1173
Full Changelog: https://github.com/huggingface/optimum/compare/v1.9.0...v1.9.1
- Python
Published by echarlaix almost 3 years ago
optimum - v1.9: extended ONNX, ONNX Runtime support
Improved memory management in the ONNX export
Lower memory usage during the ONNX export. This is especially useful to export large models, or on cuda device. Until PyTorch 2.1 release, we recommend to use PyTorch nightly in case memory issues are encountered, as two major bugs were fixed on PyTorch side: https://github.com/pytorch/pytorch/pull/101134 https://github.com/pytorch/pytorch/pull/101148
- Run validation of exported model in no_grad mode by @fxmarty in https://github.com/huggingface/optimum/pull/1111
- Load model directly on cuda device for the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1112
- Lower GPU memory requirements at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1115
Extended ONNX export
The ONNX export now supports the sam, lilt, pix2struct, cvt and owlvit architectures.
- Sam ONNX export support by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1025
- Add onnx exporter for Lilt model by @mariababich in https://github.com/huggingface/optimum/pull/1098
- Add pix2struct to ONNX support (v2) by @arvisioncode in https://github.com/huggingface/optimum/pull/1034
- Add CvTONNX Config by @rishabbala in https://github.com/huggingface/optimum/pull/1131
- Support document-question-answering ONNX export for vision-encoder-decoder by @fxmarty in https://github.com/huggingface/optimum/pull/1110
- add owlvit by @darwinharianto in https://github.com/huggingface/optimum/pull/1067
Support of custom ONNX configurations for export
The method main_export now supports two arguments model_kwargs and custom_onnx_configs that allow for a more custom export for advanced users. Reference.
- [ONNX export] Ability to pass arbitrary kwargs, custom ONNX configs by @fxmarty in https://github.com/huggingface/optimum/pull/1143
Extended BetterTransformer support
- Add blip-2 to bettertransformer by @baskrahmer in https://github.com/huggingface/optimum/pull/1125
- Support llama bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/998
ONNX Runtime: use IO Binding by default for decoder models on CPUExecutionProvider
IO Binding is useful not only to avoid RAM/device memory copies, but also simply between numpy tensors and OrtValue. Thus, for autoregressive tasks we enable IO Binding as a default on CPUExecutionProvider as well, which may bring >10% speedup for large context lengths.
- Enable useiobinding = True on CPU by @yihonglyu in https://github.com/huggingface/optimum/pull/1087
ORTModelForSpeechSeq2Seq supported in ORTOptimizer
- added ORTModelForSpeechSeq2Seq support to optimizer by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1068
Major bugfixes
- Use mask for seq2seq ONNX decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/1076
What's Changed
- Fix protobuf max allowed size by @fxmarty in https://github.com/huggingface/optimum/pull/988
- Add Whisper to ORT optimizer configuration by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/986
- Fix sentence-similarity task in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/996
- Simplify auto task detection by @fxmarty in https://github.com/huggingface/optimum/pull/997
- Fix merged decoder usage with fp16 by @fxmarty in https://github.com/huggingface/optimum/pull/1006
- Fix past key value generator used for ONNX export validation for t5/mt5 by @fxmarty in https://github.com/huggingface/optimum/pull/1007
- Fix typo for custom shapes passed at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1008
- Fix _versions.yml upload in doc build by @regisss in https://github.com/huggingface/optimum/pull/1003
- ORTQuantizer supports subgraphs by @fxmarty in https://github.com/huggingface/optimum/pull/1009
- fix for huggingface_hub last release by @echarlaix in https://github.com/huggingface/optimum/pull/1014
- Add links to documentation to README by @echarlaix in https://github.com/huggingface/optimum/pull/1013
- Upate documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1011
- update optimum intel description by @echarlaix in https://github.com/huggingface/optimum/pull/1015
- fix: ValueError offload_dir by @orangetin in https://github.com/huggingface/optimum/pull/993
- Sentence transformers ONNX export fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1029
- Add OpenVINO notebooks by @echarlaix in https://github.com/huggingface/optimum/pull/1030
- Fix task inference for sam by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1031
- fix typo by @echarlaix in https://github.com/huggingface/optimum/pull/1033
- added types to new fields in
OptimizationConfigby @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1036 - Fix some typos in the quantization guide by @dcferreira in https://github.com/huggingface/optimum/pull/1041
- Optional
attention_maskinORTModelForxxxby @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1045 - ONNX SAM export - change
input_pointsdata type by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1048 masked-imoutput name fix for transformers >= 4.29.0 by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1049- remove torchvision requirement by @BramVanroy in https://github.com/huggingface/optimum/pull/1052
- Update version by @regisss in https://github.com/huggingface/optimum/pull/1058
- Bump package version by @regisss in https://github.com/huggingface/optimum/pull/1062
- Raise MinimumVersionError when OnnxConfig.MINTORCHVERSION is not satisfied by @regisss in https://github.com/huggingface/optimum/pull/1070
- Remove deprecated argument from tests and examples by @echarlaix in https://github.com/huggingface/optimum/pull/1072
- Detect model type for all transformers models in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/1075
- Fix HF Push to hub by @JingyaHuang in https://github.com/huggingface/optimum/pull/1080
- Fix float16 ORT conversion for models > 2GB by @fxmarty in https://github.com/huggingface/optimum/pull/1079
- Update doc workflows by @regisss in https://github.com/huggingface/optimum/pull/1093
- Error out on
ORTQuantizer.quantizecall for static quantization when no calibration range is provided by @fxmarty in https://github.com/huggingface/optimum/pull/1094 - Add mpt model_type to NormalizedTextConfig by @changwangss in https://github.com/huggingface/optimum/pull/1101
- Fix doc build by @regisss in https://github.com/huggingface/optimum/pull/1107
- Improve the offline support for the ONNX/TFLite export by @fxmarty in https://github.com/huggingface/optimum/pull/1109
- Add ViT to ORTConfigManager by @baskrahmer in https://github.com/huggingface/optimum/pull/1117
- Fix TasksManager getmodelfrom_task with None device by @fxmarty in https://github.com/huggingface/optimum/pull/1122
- Small typos by @baskrahmer in https://github.com/huggingface/optimum/pull/1124
- Refactor BetterTransformerManager requirement validation methods by @baskrahmer in https://github.com/huggingface/optimum/pull/1132
- update the default block size by @rui-ren in https://github.com/huggingface/optimum/pull/1137
- Update ORT training docker to 1.15 by @JingyaHuang in https://github.com/huggingface/optimum/pull/1139
- Adamlouly/fix unwrap model eval by @AdamLouly in https://github.com/huggingface/optimum/pull/1099
- Remove version pinning for onnx package by @cody-moveworks in https://github.com/huggingface/optimum/pull/1141
New Contributors
- @orangetin made their first contribution in https://github.com/huggingface/optimum/pull/993
- @dcferreira made their first contribution in https://github.com/huggingface/optimum/pull/1041
- @BramVanroy made their first contribution in https://github.com/huggingface/optimum/pull/1052
- @darwinharianto made their first contribution in https://github.com/huggingface/optimum/pull/1067
- @mariababich made their first contribution in https://github.com/huggingface/optimum/pull/1098
- @changwangss made their first contribution in https://github.com/huggingface/optimum/pull/1101
- @arvisioncode made their first contribution in https://github.com/huggingface/optimum/pull/1034
- @yihonglyu made their first contribution in https://github.com/huggingface/optimum/pull/1087
- @rui-ren made their first contribution in https://github.com/huggingface/optimum/pull/1137
- @cody-moveworks made their first contribution in https://github.com/huggingface/optimum/pull/1141
- @rishabbala made their first contribution in https://github.com/huggingface/optimum/pull/1131
Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.0...v1.9.0
- Python
Published by fxmarty almost 3 years ago
optimum - v1.8.8: Patch release
- Fix optimum model inference compatibility with
transformers>=v4.30.0by @echarlaix in https://github.com/huggingface/optimum/pull/1102 - Fix stable diffusion ONNX export following diffusers breaking change by @fxmarty in https://github.com/huggingface/optimum/pull/1116
Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.7...v1.8.8
- Python
Published by echarlaix about 3 years ago
optimum - v1.8.7: Patch release
- Restrict transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1097
Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.6...v1.8.7
- Python
Published by echarlaix about 3 years ago
optimum - v1.8.6: Patch release
- Fix CLI for exporting models to TFLite by @regisss #1059
Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.5...v1.8.6
- Python
Published by regisss about 3 years ago
optimum - v1.8.5: Patch release
- Add
transformers<4.29.0in Habana extra by @regisss in #1047
Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.4...v1.8.5
- Python
Published by regisss about 3 years ago
optimum - v1.8.4: Patch release
- Set onnx requirement by @echarlaix @regisss in https://github.com/huggingface/optimum/pull/1037
Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.3...v1.8.4
- Python
Published by echarlaix about 3 years ago
optimum - v1.8.3: Patch release
- Fix Stable Diffusion model ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1020
- Add
optimum-neuronextra by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1021
Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.2...v1.8.3
- Python
Published by echarlaix about 3 years ago
optimum - v1.8: extended BetterTransformer support, ONNX merged seq2seq models
Extended BetterTransformer support
Various improvements in the PyTorch BetterTransformer integration.
- [BT] add
BetterTransformersupport for ProphetNet by @hirotasoshu in https://github.com/huggingface/optimum/pull/923 - Improve bettertransformer benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/939
- Fix sdpa with batch size = 1, better benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/915
- Fix slow tests & sdpa dropout by @fxmarty in https://github.com/huggingface/optimum/pull/974
- Remove getattr overhead in spda by @fxmarty in https://github.com/huggingface/optimum/pull/934
- [
BT] Improve docs by @younesbelkada in https://github.com/huggingface/optimum/pull/944
ONNX merged seq2seq models
Instead of using two separate decoder_model.onnx and decoder_with_past_model.onnx models, a single decoder can be used for encoder-decoder models: decoder_model_merged.onnx. This allows to avoid duplicated weights in the two without/with past ONNX models.
By default, if available, the decoder_model_merged.onnx will be used in the ORTModel integration. This can be disabled with the option --no-post-process in the ONNX export CLI, and with use_merged=False in the ORTModel.from_pretrained method.
Example:
optimum-cli export onnx --model t5-small t5_onnx
will give:
└── t5_onnx
  ├── config.json
  ├── decoder_model_merged.onnx
  ├── decoder_model.onnx
  ├── decoder_with_past_model.onnx
  ├── encoder_model.onnx
  ├── generation_config.json
  ├── special_tokens_map.json
  ├── spiece.model
  ├── tokenizer_config.json
  └── tokenizer.json
And decoder_model_merged.onnx is enough to be used for inference. We strongly recommend to inspect the subgraphs with netron to understand what are the inputs/outputs, in case the exported model is to be used with an other engine than ONNX Runtime in the Optimum integration.
- Fix encoder-decoder ONNX merge by @fxmarty in https://github.com/huggingface/optimum/pull/924
- Support the merge of decoder without/with past for encoder-decoder models in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/926
- Support merged seq2seq models in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/930
New models in the ONNX export
- Add llama onnx export & onnxruntime support by @nenkoru in https://github.com/huggingface/optimum/pull/975
Major bugfix
- Remove constant output in encoder-decoder ONNX models decoder with past by @fxmarty in https://github.com/huggingface/optimum/pull/920
- Hash tensor data during deduplication by @VikParuchuri in https://github.com/huggingface/optimum/pull/932
Potentially breaking changes
The TasksManager replaces legacy tasks names by the canonical ones used on the Hub and in transformers metadata:
- sequence-classification becomes text-classification,
- causal-lm becomes text-generation,
- seq2seq-lm becomes text2text-generation,
- speech2seq-lm and audio-ctc becomes automatic-speech-recognition,
- default becomes feature-extraction,
- masked-lm becomes fill-mask,
- vision2seq-lm becomes image-to-text
This should not break anything except if you rely on private methods and attributes from TasksManager.
- Allow to use a custom class in TasksManager & use canonical tasks names by @fxmarty in https://github.com/huggingface/optimum/pull/967
What's Changed
- Update ort trainer to transformers 4.27.2 by @JingyaHuang in https://github.com/huggingface/optimum/pull/917
- Compute Loss inside the training step. by @AdamLouly in https://github.com/huggingface/optimum/pull/686
- Fix ORTModel MRO for whisper by @fxmarty in https://github.com/huggingface/optimum/pull/919
- add ORTStableDiffusionPipeline reference in documentation by @echarlaix in https://github.com/huggingface/optimum/pull/890
- Fix decoder ONNX model loading from the Hub by @fxmarty in https://github.com/huggingface/optimum/pull/929
optimun-cli onnxruntime quantize / optimizeoutput argument is now required by @michaelbenayoun in https://github.com/huggingface/optimum/pull/927- Register mechanism for the Optimum CLI by @michaelbenayoun in https://github.com/huggingface/optimum/pull/928
- Ensure backward compatibility of ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/933
- Update the README by @michaelbenayoun in https://github.com/huggingface/optimum/pull/925
- Update README by @echarlaix in https://github.com/huggingface/optimum/pull/941
- Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/942
- Remove GC from README by @michaelbenayoun in https://github.com/huggingface/optimum/pull/943
- Add user and token for CI by @michaelbenayoun in https://github.com/huggingface/optimum/pull/945
- Update README by @echarlaix in https://github.com/huggingface/optimum/pull/946
optimum-cliprint the help of subcommands by @michaelbenayoun in https://github.com/huggingface/optimum/pull/940- Remove from_transformers references from the documentation by @fxmarty in https://github.com/huggingface/optimum/pull/935
- Turn command import into optional by @JingyaHuang in https://github.com/huggingface/optimum/pull/936
- Auto-set usemerged to False if usecache is passed as False by @fxmarty in https://github.com/huggingface/optimum/pull/954
- Raise error with usecache=False, useio_binding=True by @fxmarty in https://github.com/huggingface/optimum/pull/955
- Add an ORT training notebook by @JingyaHuang in https://github.com/huggingface/optimum/pull/959
- Fix issue with doc build sometimes failing silently in GH workflows by @regisss in https://github.com/huggingface/optimum/pull/960
- Fix typos by @regisss in https://github.com/huggingface/optimum/pull/963
- Disable tests upon transformers 4.28 release by @fxmarty in https://github.com/huggingface/optimum/pull/976
New Contributors
- @hirotasoshu made their first contribution in https://github.com/huggingface/optimum/pull/923
- @VikParuchuri made their first contribution in https://github.com/huggingface/optimum/pull/932
Full Changelog: https://github.com/huggingface/optimum/compare/v1.7.3...v1.8.2
- Python
Published by fxmarty about 3 years ago
optimum - v1.7.3: Patch release for PyTorch 2.0 and transformers 4.27.0
This patch releases fixes a few bugs with PyTorch 2.0 release, and include a few new features as well.
Breaking change: constant outputs removed from ONNX encoder-decoder models
We removed some constant past key values outputs from encoder-decoder models in the ONNX export. Beware that this could potentially break your existing code, but we recommend to use the new exported models as this removes unnecessary Identity nodes in the models.
- Remove constant outputs from decoder with past ONNX model for encoder-decoder architectures by @fxmarty in https://github.com/huggingface/optimum/pull/872
torch.nn.functional.scaled_dot_product_attention support for decoders in BetterTransformer
Pytorch 2.0 introduces in beta torch.nn.functional.scaled_dot_product_attention, a fastpath for attention extending their accelerated transformer features. This is included in optimum.bettertransformer to be used with the following architectures: Bart, Blenderbot, GPT2, GTP-J, M2M100, Marian, Mbart, OPT, Pegasus, T5.
Beware that this is still experimental and speedups have yet to be validated on all architectures.
PyTorch's scaled_dot_product_attention allows to use flash attention and memory efficient attention natively in PyTorch.
Usage is as follow:
```python from transformers import AutoTokenizer, AutoModelForCausalLM from optimum.bettertransformer import BetterTransformer
tokenizer = AutoTokenizer.frompretrained("gpt2") model = AutoModelForCausalLM.frompretrained("gpt2")
model = BetterTransformer.transform(model) # modify transformers modeling to use native scaleddotproduct_attention
do you inference or training here
model = BetterTransformer.reverse(model) # go back to using canonical transformers modeling model.savepretrained("gpt2model") ```
Inference benchmark (on fp16):
| Model | batch size | Input sequence length | Generated tokens | Latency eager (s) | Latency BT (s) | Speedup | Peak memory eager (MB) | Peak memory BT (MB) | Memory savings | |--------------|------------|-----------------------|------------------|-------------------|-------------------------------|---------|------------------------|------------------------------------|----------------| | gpt2 | 1 | 64 | 256 | 1.800 | 1.607 | 12.0% | 569.90 | 569.89 | 0% | | gpt2 | 64 | 64 | 256 | 2.159 | 1.617 | 33.5% | 2067.45 | 2093.80 | 0% | | opt-1.3b | 1 | 64 | 256 | 3.010 | 2.667 | 12.9% | 5408.238 | 5408.238 | 0% | | gpt-neox-20b | 1 | 64 | 256 | 10.869 | 9.937 | 9.4% | 83670.67 | 83673.53 | 0% |
Training benchmark (on fp16):
| Model | batch size | Sequence length | time/epoch (eager, s) | time/epoch (BT, s) | Speedup | Peak memory eager (MB) | Peak memory BT (MB) | Memory savings | |-------|------------|-----------------|------------------------------|------------------------------------------|---------|------------------------|------------------------------------|----------------| | gpt2 | 8 | 1024 | 17.732 | 14.037 | 26.3% | 13291.16 | 10191.52 | 30.4% | | gpt2 | 32 | 1024 | 17.336 | 13.309 | 30.3% | 52834.83 | 38858.56 | 36.0% | | gpt2 | 64 | 1024 | OOM | 14.067 | / | OOM | 75600.08 | / |
Benchmarks can be reproduced using the inference script and training script:
python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256
python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256 --seqlen-stdev 0
- Add scaleddotproduct_attention support for decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/853
- Support scaleddotproduct_attention for t5 by @fxmarty in https://github.com/huggingface/optimum/pull/856
- [
BT] add decoder benchmark script by @younesbelkada in https://github.com/huggingface/optimum/pull/857 - [
BT] Fix bt benchmark by @younesbelkada in https://github.com/huggingface/optimum/pull/858 - Fix pytorch version check in bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/862
- [
BT] Add fp16 support by @younesbelkada in https://github.com/huggingface/optimum/pull/859 - [
BT] Add decoder training support by @younesbelkada in https://github.com/huggingface/optimum/pull/860 - Bart support scaleddotproduct_attention by @fxmarty in https://github.com/huggingface/optimum/pull/863
- [
BT] addaccelerate_testmarkers by @younesbelkada in https://github.com/huggingface/optimum/pull/864 - Mbart, pegasus, blenderbot, marian, m2m100 support scaleddotproductattention by @fxmarty in https://github.com/huggingface/optimum/pull/865
- Add bettertransformer reverse transform by @fxmarty in https://github.com/huggingface/optimum/pull/868
- Add bettertransformer training benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/873
New architectures in the ONNX export
Three additional architectures are supported in the ONNX export: ImageGPT, RegNet, OPT.
- Adding ONNX support for ImageGPT by @adit299 in https://github.com/huggingface/optimum/pull/819
- Add ONNX support for RegNet by @asrimanth in https://github.com/huggingface/optimum/pull/833
- Adding support for Facebook's OPT models by @hivaze in https://github.com/huggingface/optimum/pull/852
(WIP) TFLite export with quantization support
Continued progress in the TFLite export with quantization support. This is work in progress and not documented yet.
- Quantization with TFLite by @michaelbenayoun in https://github.com/huggingface/optimum/pull/854
Bugfixes and improvements
- Update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/843
- Fix typo in documentation by @regisss in https://github.com/huggingface/optimum/pull/848
- Remove redundant code by @mht-sharma in https://github.com/huggingface/optimum/pull/841
- Update README by @echarlaix in https://github.com/huggingface/optimum/pull/850
- Update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/855
- Remove iobinding ORTModelForCTC by @mht-sharma in https://github.com/huggingface/optimum/pull/840
- Fix typo in documentation by @echarlaix in https://github.com/huggingface/optimum/pull/861
- Fix causal-lm ONNX axis names by @fxmarty in https://github.com/huggingface/optimum/pull/871
- add NNCF openvino notebook by @echarlaix in https://github.com/huggingface/optimum/pull/875
- Remove positional-only parameters not support by python < v3.8 by @echarlaix in https://github.com/huggingface/optimum/pull/881
- lazy import for task manager by @JingyaHuang in https://github.com/huggingface/optimum/pull/844
- Remove onnx and ort dependencies on the TasksManager by @michaelbenayoun in https://github.com/huggingface/optimum/pull/846
- Reactivate export & optimization tests for causal-lm models by @fxmarty in https://github.com/huggingface/optimum/pull/885
- Fix ONNX export on transformers 4.27 release by @fxmarty in https://github.com/huggingface/optimum/pull/884
- Do not use scaleddotproduct_attention for stable diffusion onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/888
- Fix loading of an ONNX stable diffusion model when config doesn't match by @echarlaix in https://github.com/huggingface/optimum/pull/887
- Automatic framework detection in TasksManager for large models by @fxmarty in https://github.com/huggingface/optimum/pull/883
- Fix WavLM onnx export upon torch 2.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/889
- Fix PushToHubMixin.createrepo according to transformers 4.27 release by @fxmarty in https://github.com/huggingface/optimum/pull/892
- Fix stable diffusion framework detection by @fxmarty in https://github.com/huggingface/optimum/pull/893
- Add donut CPU inference ORT by @mht-sharma in https://github.com/huggingface/optimum/pull/761
- Fix check_model for large merged ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/896
- Drop python 3.7 support by @fxmarty in https://github.com/huggingface/optimum/pull/891
- Fix dummy label generator for vision tasks by @JingyaHuang in https://github.com/huggingface/optimum/pull/900
- Add stable diffusion dummy object by @echarlaix in https://github.com/huggingface/optimum/pull/899
- Automatic support for large ONNX models in ORTOptimizer by @fxmarty in https://github.com/huggingface/optimum/pull/886
- Remove subprocess calls in ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/897
- Registering mechanism for the
TasksManagerby @michaelbenayoun in https://github.com/huggingface/optimum/pull/898 - add option to run inference with ort by @prathikr in https://github.com/huggingface/optimum/pull/838
- Check min diffusers version by @echarlaix in https://github.com/huggingface/optimum/pull/902
- Update bug-report.yml by @lewtun in https://github.com/huggingface/optimum/pull/895
- Fix axis name for seq2seq ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/904
- Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/909
- Fix misleading error message in ORTOptimizer by @fxmarty in https://github.com/huggingface/optimum/pull/910
- Delete all Docker images before building the doc of Optimum by @regisss in https://github.com/huggingface/optimum/pull/911
- Fix onnx export preprocessors save by @fxmarty in https://github.com/huggingface/optimum/pull/913
- Fix GPU CI by @fxmarty in https://github.com/huggingface/optimum/pull/914
New Contributors
- @adit299 made their first contribution in https://github.com/huggingface/optimum/pull/819
- @asrimanth made their first contribution in https://github.com/huggingface/optimum/pull/833
- @hivaze made their first contribution in https://github.com/huggingface/optimum/pull/852
Full Changelog: https://github.com/huggingface/optimum/compare/v1.2.0...v1.7.2
- Python
Published by fxmarty about 3 years ago
optimum - v1.7.1: Patch release
Temporarily fix a critical bug in BetterTransformer https://github.com/huggingface/optimum/pull/849
Full Changelog: https://github.com/huggingface/optimum/compare/v1.7.0...v1.7.1
- Python
Published by fxmarty over 3 years ago
optimum - v1.7.0: ONNX export extension, TFLite export, single-ONNX decoding, ONNX Runtime extension for audio, vision tasks, stable diffusion
New models supported in the ONNX export
Additional architectures are supported in the ONNX export: PoolFormer, Pegasus, Audio Spectrogram Transformer, Hubert, SEW, Speech2Text, UniSpeech, UniSpeech-SAT, Wav2Vec2, Wav2Vec2-Conformer, WavLM, Data2Vec Audio, MPNet, stable diffusion VAE encoder, vision encoder decoder, Nystromformer, Splinter, GPT NeoX.
- Add PoolFormer support in exporters.onnx by @BakingBrains in https://github.com/huggingface/optimum/pull/646
- Support pegasus exporters by @mht-sharma in https://github.com/huggingface/optimum/pull/620
- Audio models support with
optimum.exporters.onnxby @michaelbenayoun in https://github.com/huggingface/optimum/pull/622 - Add MPNet ONNX export by @jplu in https://github.com/huggingface/optimum/pull/691
- Add stable diffusion VAE encoder export by @echarlaix in https://github.com/huggingface/optimum/pull/705
- Add vision encoder decoder model in exporters by @mht-sharma in https://github.com/huggingface/optimum/pull/588
- Nystromformer ONNX export by @whr778 in https://github.com/huggingface/optimum/pull/728
- Support Splinter exporters (#555) by @Allanbeddouk in https://github.com/huggingface/optimum/pull/736
- Add gpt-neo-x support by @sidthekidder in https://github.com/huggingface/optimum/pull/745
New models supported in BetterTransformer
A few additional architectures are supported in BetterTransformer: RoCBERT, RoFormer, Marian
- Add RoCBert support for Bettertransformer by @shogohida in https://github.com/huggingface/optimum/pull/542
- Add better transformer support for RoFormer by @manish-p-gupta in https://github.com/huggingface/optimum/pull/680
- added BetterTransformer support for Marian by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/808
Additional tasks supported in the ONNX Runtime integration
With ORTModelForMaskedLM, ORTModelForVision2Seq, ORTModelForAudioClassification, ORTModelForCTC, ORTModelForAudioXVector, ORTModelForAudioFrameClassification, ORTStableDiffusionPipeline.
Reference: https://huggingface.co/docs/optimum/main/en/onnxruntime/packagereference/modelingort and https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models#export-and-inference-of-stable-diffusion-models
- Add ORTModelForMaskedLM class by @JingyaHuang in https://github.com/huggingface/optimum/pull/729
- Add ORTModelForVision2Seq for VisionEncoderDecoder models inference by @mht-sharma in https://github.com/huggingface/optimum/pull/742
- Add ORTModelXXX for audio by @mht-sharma in https://github.com/huggingface/optimum/pull/774
- Add stable diffusion onnx runtime pipeline by @echarlaix in https://github.com/huggingface/optimum/pull/786
Support of the ONNX export from PyTorch on float16
In the ONNX export, it is possible to pass the options --fp16 --device cuda to export using float16 when a GPU is available, directly with the native torch.onnx.export.
Example: optimum-cli export onnx --model gpt2 --fp16 --device cuda gpt2_onnx/
- Support ONNX export on
torch.float16type by @fxmarty in https://github.com/huggingface/optimum/pull/749
TFLite export
TFLite export is now supported, with static shapes:
optimum-cli export tflite --help
optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/
exporters.tfliteinitial support by @michaelbenayoun in https://github.com/huggingface/optimum/pull/716- TFLite auto-encoder models by @michaelbenayoun in https://github.com/huggingface/optimum/pull/757
- [TFLite Export] Adds support for ResNet by @sayakpaul in https://github.com/huggingface/optimum/pull/813
ONNX Runtime optimization and quantization directly in the CLI
- Add optimize and quantize command CLI by @jplu in https://github.com/huggingface/optimum/pull/700
- Support ONNX Runtime optimizations in exporters.onnx by @fxmarty in https://github.com/huggingface/optimum/pull/807
The ONNX export optionally supports the ONNX Runtime optimizations directly in the export, passing the --optimize O1, up to --optimize O4 option:
optimum-cli export onnx --help
optimum-cli export onnx --model t5-small --optimize O3 t5small_onnx/
ONNX Runtime quantization is supported directly in command line, using optimum-cli onnxruntime quantize:
optimum-cli onnxruntime quantize --help
optimum-cli onnxruntime quantize --onnx_model distilbert_onnx --avx512
ONNX Runtime optimization is supported directly in command line, using optimum-cli onnxruntime optimize:
optimum-cli onnxruntime optimize --help
optimum-cli onnxruntime optimize --onnx_model distilbert_onnx -O3
ORTModelForCausalLM supports decoding with a single ONNX
Up no now, for decoders, two ONNX were used: * One handling the first forward pass where no past key values have been cached yet - thus not taking them as input. * One handling the following forward pass where past key values have been cached, thus taking them as input.
This release introduces the support in the ONNX export and in ORTModelForCausalLM of a single ONNX handling both steps of the decoding. This allows to reduce memory usage, as weights are not duplicated between two separate models during inference.
Using a single ONNX for decoders can be used by passing use_merged=True to ORTModelForCausalLM.from_pretrained, loading directly from a PyTorch model:
```python from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.frompretrained("gpt2", export=True, usemerged=True) ```
Alternatively, using a single ONNX for decoders is the default behavior in the ONNX export, that can later be used for example with ORTModelForCausalLM, the command optimum-cli export onnx --model gpt2 gpt2_onnx/ will produce:
└── gpt2_onnx
  ├── config.json
  ├── decoder_model_merged.onnx
  ├── decoder_model.onnx
  ├── decoder_with_past_model.onnx
  ├── merges.txt
  ├── special_tokens_map.json
  ├── tokenizer_config.json
  ├── tokenizer.json
  └── vocab.json
The decoder_model.onnx and decoder_with_past_model.onnx are kept separate for backward compatibility, but during inference using solely decoder_model_merged.onnx is enough.
- Enable inference with a merged decoder in
ORTModelForCausalLMby @JingyaHuang in https://github.com/huggingface/optimum/pull/647
Single-file ORTModel accept numpy arrays
ORTModel accept numpy arrays as inputs, in addition to PyTorch tensors. This is only the case for models that use a single ONNX.
- Accept numpy.ndarray as input and output to ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/790
ORTOptimizer support for ORTModelForCausalLM
- ORTOptimizer support ORTModelForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/794
- Support IO Binding for merged decoder by @fxmarty in https://github.com/huggingface/optimum/pull/797
Breaking changes
- In the ONNX export, exporting models in several ONNX (encoder, decoder) is now the default behavior: https://github.com/huggingface/optimum/pull/747. The old behavior is still accessible with
--monolith. - In decoders, reusing past key values is now the default in the ONNX export: https://github.com/huggingface/optimum/pull/748. The old behavior is still accessible by explicitly passing, for example,
--task causal-lminstead of--task causal-lm-with-past. - BigBird support in the ONNX export is removed, due to the
block_sparseattention type being written in pure numpy in Transformers, and hence not exportable to ONNX: https://github.com/huggingface/optimum/pull/778 - The parameter
from_transformersofORTModel.from_pretrainedwill be deprecated in favor ofexport.
Bugfixes and improvements
- Fix disable shape inference for optimization by @regisss in https://github.com/huggingface/optimum/pull/652
- Fix uninformative message when passing
use_cache=Trueto ORTModel and no ONNX with cache is available by @fxmarty in https://github.com/huggingface/optimum/pull/650 - Fix provider options when several providers are passed by @fxmarty in https://github.com/huggingface/optimum/pull/653
- Add TensorRT engine to ONNX Runtime GPU documentation by @fxmarty in https://github.com/huggingface/optimum/pull/657
- Improve documentation around ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/666
- minor updates on ONNX config guide by @mszsorondo in https://github.com/huggingface/optimum/pull/662
- Fix FlaubertOnnxConfig by @michaelbenayoun in https://github.com/huggingface/optimum/pull/669
- Use nvcr.io/nvidia/tensorrt image for GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/660
- Better Transformer doc fix by @HamidShojanazeri in https://github.com/huggingface/optimum/pull/670
- Add support for LongT5 optimization using ORT transformer optimizer script by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/683
- Add test for missing execution providers error messages by @fxmarty in https://github.com/huggingface/optimum/pull/659
- ONNX transformation to cast int64 constants to int32 when possible by @fxmarty in https://github.com/huggingface/optimum/pull/655
- Add missing normalized configs by @fxmarty in https://github.com/huggingface/optimum/pull/694
- Remove code duplication in ORTModel's load_model by @fxmarty in https://github.com/huggingface/optimum/pull/695
- Test more architectures in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/675
- Avoid initializing unwanted attributes for ORTModel's having several inference sessions by @fxmarty in https://github.com/huggingface/optimum/pull/696
- Fix the ORTQuantizer loading from specific file by @echarlaix in https://github.com/huggingface/optimum/pull/701
- Add saving of diffusion model additional components for onnx export by @echarlaix in https://github.com/huggingface/optimum/pull/699
- Fix whisper export by @mht-sharma in https://github.com/huggingface/optimum/pull/629
- Support trust remote code option in ONNX export and ONNX Runtime integration by @fxmarty in https://github.com/huggingface/optimum/pull/702
- Add nightly tests on dependencies dev versions by @fxmarty in https://github.com/huggingface/optimum/pull/703
- Fix exception condition by @mht-sharma in https://github.com/huggingface/optimum/pull/706
- Add ORTModelForMultipleChoice to the documentation by @fxmarty in https://github.com/huggingface/optimum/pull/712
- Fix yaml format for dev tests by @fxmarty in https://github.com/huggingface/optimum/pull/710
- Add ONNX Runtime training benchmark by @JingyaHuang in https://github.com/huggingface/optimum/pull/592
- Allow
from optimum.onnxruntime import QuantizationConfigby @fxmarty in https://github.com/huggingface/optimum/pull/715 - Fix documentation for doctest tests to pass by @fxmarty in https://github.com/huggingface/optimum/pull/713
- Use transformers>=4.26.0 in setup.py by @fxmarty in https://github.com/huggingface/optimum/pull/723
- Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/724
- Fix ONNX Runtime inference in
ORTTrainerby @JingyaHuang in https://github.com/huggingface/optimum/pull/709 onnxruntime/modeling_ort.pyrefactor, part 1 by @michaelbenayoun in https://github.com/huggingface/optimum/pull/698- Update docker and doc of ORT Trainer by @JingyaHuang in https://github.com/huggingface/optimum/pull/725
- Add test for code examples in the documentation and docstrings by @fxmarty in https://github.com/huggingface/optimum/pull/704
- add image classification example to optimum by @prathikr in https://github.com/huggingface/optimum/pull/711
- Add TensorrtExecutionProvider modeling tests by @fxmarty in https://github.com/huggingface/optimum/pull/722
- Whisper shape inference fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/726
- Add some redirections to Optimum Habana's documentation by @regisss in https://github.com/huggingface/optimum/pull/735
- Patch
ORTTrainerinference with ONNX Runtime backend by @JingyaHuang in https://github.com/huggingface/optimum/pull/737 - Remove dead code in whisper ONNX output by @fxmarty in https://github.com/huggingface/optimum/pull/741
- Unpin protobuf 3.20.1 by @fxmarty in https://github.com/huggingface/optimum/pull/738
- Fix speech2text export by @mht-sharma in https://github.com/huggingface/optimum/pull/746
- Raise error on double call to
BetterTransformer.transform()by @fxmarty in https://github.com/huggingface/optimum/pull/750 exporters.onnxoutput names and dynamic axes fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/731- Fix NNCF supported quantization strategies README table by @echarlaix in https://github.com/huggingface/optimum/pull/752
- Add GPU tests for BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/751
- Fix doctest by @fxmarty in https://github.com/huggingface/optimum/pull/759
- Fix ONNX Runtime cache usage for decoders, add relevant tests by @fxmarty in https://github.com/huggingface/optimum/pull/756
- Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/758
- Update quality tooling for formatting by @regisss in https://github.com/huggingface/optimum/pull/760
- Fix wrong shapes used at ONNX export and validation by @fxmarty in https://github.com/huggingface/optimum/pull/764
- Change type annotation by @michaelbenayoun in https://github.com/huggingface/optimum/pull/768
- Fix stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/762
- Disable ONNX Runtime provider check on Windows by @fxmarty in https://github.com/huggingface/optimum/pull/771
- Fix FusionOptions following ORT 1.14 release by @fxmarty in https://github.com/huggingface/optimum/pull/772
- Unpin numpy <1.24.0 by @fxmarty in https://github.com/huggingface/optimum/pull/773
- Fix flaky ONNX Runtime generation test with past key value reuse by @fxmarty in https://github.com/huggingface/optimum/pull/765
- Fix output shape dimension for OnnxConfigWithPast by @fxmarty in https://github.com/huggingface/optimum/pull/780
- Fix used shapes, device at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/777
- Pin numpy only for tensorflow export by @fxmarty in https://github.com/huggingface/optimum/pull/781
- Fixed broken paper space links by @Muhtasham in https://github.com/huggingface/optimum/pull/766
- Temporarily disable python 3.9 + macOS test due to onnxruntime 1.14 regression by @fxmarty in https://github.com/huggingface/optimum/pull/783
- Update ORT Training to 1.14.0 by @JingyaHuang in https://github.com/huggingface/optimum/pull/787
- Temporarily disable segformer TensorRT test by @fxmarty in https://github.com/huggingface/optimum/pull/799
- Use a stateful orderedinputnames in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/796
- Test ORTOptimizer with IO Binding by @fxmarty in https://github.com/huggingface/optimum/pull/801
- [
BT] Add stable layer-norm Wav2vec2 by @younesbelkada in https://github.com/huggingface/optimum/pull/803 - Update rules for ruff by @regisss in https://github.com/huggingface/optimum/pull/806
- Improve orttrainer test by @JingyaHuang in https://github.com/huggingface/optimum/pull/779
- Fix ORT quantization for TensorRT documentation by @fxmarty in https://github.com/huggingface/optimum/pull/812
- Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/814
- Update ONNX Runtime training doc - use torchrun by @JingyaHuang in https://github.com/huggingface/optimum/pull/820
- Fix ONNX export tests by @fxmarty in https://github.com/huggingface/optimum/pull/822
- All back workflow dispatch on GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/823
- BetterTransformer pipeline padding issue fix by @vrdn-23 in https://github.com/huggingface/optimum/pull/821
- Fix optimum pipeline initialization by @fxmarty in https://github.com/huggingface/optimum/pull/824
- Fix failing GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/829
- Remove feature dimension as dynamic axes for stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/816
- Fix pipeline task dropping arguments bug by @fxmarty in https://github.com/huggingface/optimum/pull/828
- Fix ORTQuantizer behavior with ORTModelForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/831
- Update tests by @mht-sharma in https://github.com/huggingface/optimum/pull/826
- Fix exporters GPU CI by @fxmarty in https://github.com/huggingface/optimum/pull/835
- Keep intermediary models for ONNX causal-lm by @fxmarty in https://github.com/huggingface/optimum/pull/834
- Fix duplicate name merged decoder by @fxmarty in https://github.com/huggingface/optimum/pull/837
- Apply lazy import for exporters by @JingyaHuang in https://github.com/huggingface/optimum/pull/836
Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.0...v1.7.0
- Python
Published by fxmarty over 3 years ago
optimum - v1.6.4: Patch release
Bugfix
- Fix past key/value reuse in decoders following transformers 4.26.0 release and renaming: https://github.com/huggingface/optimum/commit/b9211d6826b92700e73f48821d6e14bd08226abc
- ONNX Runtime 1.14 support: https://github.com/huggingface/optimum/pull/772
Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.3...v1.6.4
- Python
Published by fxmarty over 3 years ago
optimum - v1.6.3: Patch release
Fixes ORTTrainer for the inference with the ONNX Runtime backend.
- Python
Published by JingyaHuang over 3 years ago
optimum - v1.6.2: Patch release
Hotfixes
- Support generation config in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/651
Regressions
The export of speech-to-text architecture as a single ONNX file (that handles both the encoding and decoding) fails do to a regression with the latest transformers version: https://github.com/huggingface/optimum/issues/721
Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.1...v1.6.2
- Python
Published by fxmarty over 3 years ago
optimum - v1.6.1: Patch release
Hotfixes
- Revert breaking removal of EncoderOnnxConfig, DecoderOnnxConfig, _DecoderWithLMhead by @fxmarty in https://github.com/huggingface/optimum/pull/643
- Fix item access of some TASKSTO_AUTOMODELS by @fxmarty in https://github.com/huggingface/optimum/pull/642
Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.0...v1.6.1
- Python
Published by fxmarty over 3 years ago
optimum - v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures
Optimum CLI
The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:
optimum-cli --help
optimum-cli export onnx --help
optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/
- Add Optimum CLI backbone by @fxmarty in https://github.com/huggingface/optimum/pull/593
Stable Diffusion ONNX export
Optimum now supports the ONNX export of stable diffusion models from the diffusers library:
optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/
- Add Stable Diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/570
BetterTransformer support for more architectures
BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT
The complete list of supported models is available in the documentation.
- [BT] Add
Bettertransformersupport for FSMT by @Sumanth077 in https://github.com/huggingface/optimum/pull/494 - [BT] add
BetterTransformersupport for ViLT architecture by @ka00ri in https://github.com/huggingface/optimum/pull/508 - Add
MBartsupport forBetterTransformerby @ravenouse in https://github.com/huggingface/optimum/pull/516 - Add CLIP BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/534
- Add BetterTransformer support for RemBERT by @hchings in https://github.com/huggingface/optimum/pull/545
ONNX export for more architectures
The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.
- Add Swin support in exporters.onnx by @fxmarty in https://github.com/huggingface/optimum/pull/528
- [
ONNX] addmobilenetsupport by @younesbelkada in https://github.com/huggingface/optimum/pull/633
Extended ONNX export for encoder-decoder and decoder models
Encoder-decoder or decoder-only models normally making use of the generate() method in transformers can now be exported in several files using the --for-ort argument:
optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx
yielding:
.
└── t5_small_onnx
  ├── config.json
  ├── decoder_model.onnx
  ├── decoder_with_past_model.onnx
  ├── encoder_model.onnx
  ├── special_tokens_map.json
  ├── spiece.model
  ├── tokenizer_config.json
  └── tokenizer.json
Passing --for-ort, exported models are expected to be loadable directly into ORTModel.
- Add ort export in exporters for encoder-decoder models by @mht-sharma in https://github.com/huggingface/optimum/pull/497
- Support decoder generated with
--for-ortfromoptimum.exporters.onnxinORTDecoderby @fxmarty in https://github.com/huggingface/optimum/pull/554
Support for ONNX models with external data at export, optimization, quantization
The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a .onnx_data file if necessary.
- Handling ONNX models with external data by @NouamaneTazi in https://github.com/huggingface/optimum/pull/586
- Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by @JingyaHuang in https://github.com/huggingface/optimum/pull/332
ONNX Runtime API improvement
Various improvements to allow for a better user experience in the ONNX Runtime integration:
ORTModel,ORTModelDecoderandORTModelForConditionalGenerationcan now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument.ORTModel.from_pretrained()withfrom_transformers=Truenow downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it.ORTQuantizer.save_pretrained()now saves the model configuration and the preprocessor, making the exported directory usable end-to-end.ORTOptimizer.save_pretrained()now saves the preprocessor, making the exported directory usable end-to-end.ONNX Runtime integration API improvement by @michaelbenayoun in https://github.com/huggingface/optimum/pull/515
Custom shapes support at ONNX export
The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.
Read more: optimum-cli export onnx --help
- Support custom shapes for dummy inputs by @fxmarty in https://github.com/huggingface/optimum/pull/522
- Support for custom input shapes in exporters onnx by @fxmarty in https://github.com/huggingface/optimum/pull/575
Enable use_cache=True for ORTModelForCausalLM
Reusing past key values for models using ORTModelForCausalLM (e.g. gpt2) is now possible using use_cache=True, avoiding to recompute them at each iteration of the decoding:
```python from transformers import AutoTokenizer from optimum.onnxruntime import ORTModelForCausalLM import torch
tokenizer = AutoTokenizer.frompretrained("gpt2") model = ORTModelForCausalLM.frompretrained("gpt2", fromtransformers=True, usecache=True)
inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")
gentokens = model.generate(**inputs) tokenizer.batchdecode(gen_tokens) ```
- Enable pastkeyvalues for ORTModelForCausalLM by @echarlaix in https://github.com/huggingface/optimum/pull/326
IO binding support for ORTModelForCustomTasks
ORTModelForCustomTasks now supports IO Binding when using CUDAExecutionProvider.
- Add IO binding support for custom ORTModel by @JingyaHuang in https://github.com/huggingface/optimum/pull/447
Experimental support to merge ONNX decoder with/without past key values
Along with --for-ort, when passing --task causal-lm-with-past, --task seq2seq-with-past or --task speech2seq-lm-with-past during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.
An experimental support is introduced to merge the two models in one. Example:
optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/
```python import onnx from optimum.onnx import merge_decoders
decoder = onnx.load("t5onnx/decodermodel.onnx") decoderwithpast = onnx.load("t5onnx/decoderwithpastmodel.onnx")
mergedmodel = mergedecoders(decoder, decoderwithpast) onnx.save(mergedmodel, "t5onnx/decodermergedmodel.onnx") ```
- Merge ONNX decoder models by @JingyaHuang in https://github.com/huggingface/optimum/pull/587
Major bugs fixed
- Fix BetterTransformer with padding="max_length" by @fxmarty in https://github.com/huggingface/optimum/pull/543
- Fix non-nesting bug in BetterTransformer integration by @younesbelkada in https://github.com/huggingface/optimum/pull/637
Other changes, bugfixes and improvements
- Fix doc-builder premission error by @mishig25 in https://github.com/huggingface/optimum/pull/482
- Fix doc build pr premissions by @mishig25 in https://github.com/huggingface/optimum/pull/484
- Re-order the task manager doc by @michaelbenayoun in https://github.com/huggingface/optimum/pull/483
- Fix whisper device for gpu test by @fxmarty in https://github.com/huggingface/optimum/pull/486
- Fix tensorflow CI by @fxmarty in https://github.com/huggingface/optimum/pull/489
- Fix PR doc generation by @regisss in https://github.com/huggingface/optimum/pull/495
- Fix broken links in the doc by @fxmarty in https://github.com/huggingface/optimum/pull/499
- Update iobinding ORT encoder whisper by @mht-sharma in https://github.com/huggingface/optimum/pull/498
- fix NormalizedConfig init error message by @PaulQbFeng in https://github.com/huggingface/optimum/pull/500
- Change import structure for ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/456
- [BT] Fix failing CI tests by @younesbelkada in https://github.com/huggingface/optimum/pull/501
- Remove redundant condition statement in ORTDecoder(Seq2seq) by @JingyaHuang in https://github.com/huggingface/optimum/pull/504
- [BT] put decorator on the correct place by @younesbelkada in https://github.com/huggingface/optimum/pull/509
- [BT] clearer error message for
norm_firstby @younesbelkada in https://github.com/huggingface/optimum/pull/510 - Deprecate PyTorch 1.12. for BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/513
- Fix ORTModelForSeq2SeqLM test by @fxmarty in https://github.com/huggingface/optimum/pull/455
- Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by @fxmarty in https://github.com/huggingface/optimum/pull/514
- [BT] Fix doc bugs by @younesbelkada in https://github.com/huggingface/optimum/pull/517
- Replace sklearn by scikit-learn by @lesteve in https://github.com/huggingface/optimum/pull/502
- ORTModel uses optimum.exporters.onnx by @michaelbenayoun in https://github.com/huggingface/optimum/pull/490
- Cleanup deprecated ONNX Runtime training docker files by @JingyaHuang in https://github.com/huggingface/optimum/pull/523
- Added support for Tapas Model by @JuheonChu in https://github.com/huggingface/optimum/pull/520
- Add benchmark results to gpu doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/525
- ORTModelForConditionalGeneration uses optimum.exporters.onnx by @mht-sharma in https://github.com/huggingface/optimum/pull/529
- Better error message when wrong task is given to exporters by @fxmarty in https://github.com/huggingface/optimum/pull/531
- Add OrtModelForSpeechSeq2Seq to doc by @fxmarty in https://github.com/huggingface/optimum/pull/533
- Fold sections by default in the documentation's side-bar by @regisss in https://github.com/huggingface/optimum/pull/535
- Import GenerationMixin from transformers.generation if transformers >= 4.25.0 by @regisss in https://github.com/huggingface/optimum/pull/536
- Add checkiftransformers_greater to manage different versions of transformers by @regisss in https://github.com/huggingface/optimum/pull/537
- Enable to push some sections to the end of the TOC in the doc by @regisss in https://github.com/huggingface/optimum/pull/532
- Fix import in ONNX export CLI by @fxmarty in https://github.com/huggingface/optimum/pull/553
- Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/550
- Refactor of 2 functions used in ORTModel by @michaelbenayoun in https://github.com/huggingface/optimum/pull/551
- Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/556
- Fix ORTTrainer wrapper duplication / PyTorch evaluate / update with transformers 4.25.1 by @JingyaHuang in https://github.com/huggingface/optimum/pull/561
- Fix flaky BetterTransformer test by @fxmarty in https://github.com/huggingface/optimum/pull/564
- enable FP16Optimizer for fp16 deepspeed training. by @AdamLouly in https://github.com/huggingface/optimum/pull/547
- Update documentation quick tour section by @echarlaix in https://github.com/huggingface/optimum/pull/574
- Move custom IOBinding to IOBindingHelper by @JingyaHuang in https://github.com/huggingface/optimum/pull/571
- Add test for exporters.onnx CLI by @fxmarty in https://github.com/huggingface/optimum/pull/573
- Documentation on quantization by @michaelbenayoun in https://github.com/huggingface/optimum/pull/565
- More robust tests for ORTModel using decoders and use_cache=True by @fxmarty in https://github.com/huggingface/optimum/pull/576
- Fix errors in onnxruntime modeling tests by @fxmarty in https://github.com/huggingface/optimum/pull/585
- [BT] fix flaky test by @younesbelkada in https://github.com/huggingface/optimum/pull/591
- Fix exporters onnx shapes by @fxmarty in https://github.com/huggingface/optimum/pull/581
- Fix exporters.onnx tests by @fxmarty in https://github.com/huggingface/optimum/pull/584
- Update on the ONNX Runtime documentation by @michaelbenayoun in https://github.com/huggingface/optimum/pull/567
- Add the ORTModelForSemanticSegmentation class by @TheoMrc in https://github.com/huggingface/optimum/pull/539
- Refactor BetterTransformer to be able to raise more informative error messages by @fxmarty in https://github.com/huggingface/optimum/pull/594
- Constraint temprarily NumPy version to save CIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/614
- Add
encoder_last_hidden_stateas an output for encoder-decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/601 - Update dev version by @fxmarty in https://github.com/huggingface/optimum/pull/617
- Fix documentation example by @echarlaix in https://github.com/huggingface/optimum/pull/603
- Documentation improvements by @fxmarty in https://github.com/huggingface/optimum/pull/598
- More informative message at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/609
- Use optimum exporter for current weight sharing test by @JingyaHuang in https://github.com/huggingface/optimum/pull/616
- OnnxConfig now handle the export to encoder / decoder / decoderwithpast themselves by @michaelbenayoun in https://github.com/huggingface/optimum/pull/590
- Set explictly the device index by @JingyaHuang in https://github.com/huggingface/optimum/pull/613
- Fix ORT GPU test by @JingyaHuang in https://github.com/huggingface/optimum/pull/624
- Add GPT-J normalized config by @fxmarty in https://github.com/huggingface/optimum/pull/623
- Remove diffusers dependency in onnxruntime code by @fxmarty in https://github.com/huggingface/optimum/pull/619
- Use exporters in ORTTrainer by @mht-sharma in https://github.com/huggingface/optimum/pull/546
- Improve
use_io_bindingdefault value for different execution providers by @JingyaHuang in https://github.com/huggingface/optimum/pull/604 - fixed FuseBiasInLinear by specifying device by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/630
- Fixed GPU documentation for HF pipelines by @smiraldr in https://github.com/huggingface/optimum/pull/602
- Add argument in the CLI to specify device to do the ONNX export on by @fxmarty in https://github.com/huggingface/optimum/pull/634
- Allow kwargs in all generatedummyinputs() methods by @fxmarty in https://github.com/huggingface/optimum/pull/638
Full Changelog: https://github.com/huggingface/optimum/compare/v1.5.2...v1.6.0
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @TheoMrc
- Add ORTModelForSemanticSegmentation https://github.com/huggingface/optimum/pull/539
- @ravenouse
- Add MBart support for BetterTransformer https://github.com/huggingface/optimum/pull/516
- @ka00ri
- Add BetterTransformer support for ViLT architecture https://github.com/huggingface/optimum/pull/508
- @Sumanth077
- Add Bettertransformer support for FSMT https://github.com/huggingface/optimum/pull/494
- Python
Published by fxmarty over 3 years ago
optimum - v1.5.2: Patch release
Constraint temporarily numpy<1.24.0 (#614)
- Python
Published by fxmarty over 3 years ago
optimum - v1.5.1: Patch release
Deprecate PyTorch 1.12. for BetterTransformer with better error message (#513)
- Python
Published by fxmarty over 3 years ago
optimum - v1.5.0: BetterTransformer Integration, IOBinding, Optimum Exporters, and Whisper with ONNX Runtime
BetterTransformer
Convert your model into its PyTorch BetterTransformer format using a one liner with the new BetterTransformer integration for faster inference on CPU and GPU!
```python from optimum.bettertransformer import BetterTransformer
model = BetterTransformer.transform(model) ``` Check the full list of supported models in the documentaiton, and check out the Google Colab demo.
Contributions
BetterTransformerintegration (#423)- ViT and Wav2Vec2 support (#470)
ONNX Runtime IOBinding support
ORT models (except for ORTModelForCustomTasks) now support IOBinding to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.
By default, use_io_binding is set to True when using CUDA. You can turn off the IOBinding in case of any memory issue:
```python from optimum.onnxruntime import ORTModelForSeq2SeqLM
model = ORTModelForSeq2SeqLM.frompretrained("optimum/t5-small", useio_binding=False) ```
Contributions
- Add IOBinding support to ONNX Runtime module (#421)
Optimum Exporters
optimum.exporters is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.
The export can be done via the CLI:
bash
python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/
For more information, check the documentation.
Contributions
optimum.exporterscreation (#403)- Automatic task detection (#445)
Whisper
- Whisper can be exported to ONNX using
optimum.exporters. - Whisper can also be exported and ran using
optimum.onnxruntime, IO binding is also supported.
Note: For the now the export from optimum.exporters will not be usable by ORTModelForSpeechSeq2Seq. To be able to run inference, export Whisper directly using ORTModelForSpeechSeq2Seq. This will be solved in the next release.
Contributions
- Whisper support with
optimum.onnxruntimeandoptimum.exporters(#420)
Other contributions
- ONNX Runtime training now supports ORT 1.13.1 and
transformers4.23.1 (#434) ORTModelcan load models from subfolders in a similar fashion as intransformers(#443)ORTOptimizerhas been refactored, and a factory class has been added to create commonOptimizationConfigs (#457)- Fixes and updates in the documentation (#411, #432, #437, #441)
- Fixes IOBinding (#454, #461)
- Python
Published by michaelbenayoun over 3 years ago
optimum - v1.4.1: Patch release
- Add inference with
ORTModeltoORTTrainerandORTSeq2SeqTrainer#189 - Add
InferenceSessionoptions and provider toORTModel#271 - Add mT5 (#341) and Marian (#393) support to
ORTOptimizer - Add batchnorm folding
torch.fxtransformations #348 - The
torch.fxtransformations now use the marking methodsmark_as_transformed,mark_as_restored,get_transformed_nodes#385 - Update
BaseConfigfortransformers4.22.0release #386 - Update
ORTTrainerfortransformers4.22.1release #388 - Add extra ONNX Runtime quantization options #398
- Add possibility to pass
provider_optionstoORTModel#401 - Add support to pass a specific device for
ORTModel, astransformersdoes for pipelines #427 - Fixes to support onnxruntime 1.13.1 #430
- Python
Published by echarlaix over 3 years ago
optimum - v1.4.0: ORTQuantizer and ORTOptimizer refactorization
ONNX Runtime
- Refactorization of
ORTQuantizer(#270) andORTOptimizer(#294) - Add ONNX Runtime fused Adam Optimizer (#295)
- Add
ORTModelForCustomTasksallowing ONNX Runtime inference support for custom tasks (#303) - Add
ORTModelForMultipleChoiceallowing ONNX Runtime inference for models with multiple choice classification head (#358)
Torch FX
- Add
FuseBiasInLineara transformation that fuses the weight and the bias of linear modules (#253)
Improvements and bugfixes
- Enable the possibility to disregard the precomputed
past_key_valuesduring ONNX Runtime inference of Seq2Seq models (#241) - Enable node exclusion from quantization for benchmark suite (#284)
- Enable possibility to use a token authentication when loading a calibration dataset (#289)
- Fix optimum pipeline when no model is given (#301)
- Python
Published by echarlaix almost 4 years ago
optimum - v1.3.0: Torch FX transformations, ORTModelForSeq2SeqLM and ORTModelForImageClassification
Torch FX
The optimum.fx.optimization module (#232) provides a set of torch.fx graph transformations, along with classes and functions to write your own transformations and compose them.
- The
TransformationandReversibleTransformationrepresent non-reversible and reversible transformations, and it is possible to write such transformations by inheriting from those classes - The
composeutility function enables transformation composition - Two reversible transformations were added:
MergeLinears: merges linear layers that have the same input-
ChangeTrueDivToMulByInverse: changes a division by a static value to a multiplication of its inverse
ORTModelForSeq2SeqLM
ORTModelForSeq2SeqLM (#199) allows ONNX export and ONNX Runtime inference for Seq2Seq models.
* When exported, Seq2Seq models are decomposed into three parts : the encoder, the decoder (actually consisting of the decoder with the language modeling head), and the decoder with pre-computed key/values as additional inputs.
* This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding.
Below is an example that downloads a T5 model from the Hugging Face Hub, exports it through the ONNX format and saves it :
```python from optimum.onnxruntime import ORTModelForSeq2SeqLM
Load model from hub and export it through the ONNX format
model = ORTModelForSeq2SeqLM.frompretrained("t5-small", fromtransformers=True)
Save the exported model in the given directory
model.savepretrained(outputdir) ```
ORTModelForImageClassification
ORTModelForImageClassification (#226) allows ONNX Runtime inference for models with an image classification head.
Below is an example that downloads a ViT model from the Hugging Face Hub, exports it through the ONNX format and saves it :
```python from optimum.onnxruntime import ORTModelForImageClassification
Load model from hub and export it through the ONNX format
model = ORTModelForImageClassification.frompretrained("google/vit-base-patch16-224", fromtransformers=True)
Save the exported model in the given directory
model.savepretrained(outputdir) ```
ORTOptimizer
Adds support for converting model weights from fp32 to fp16 by adding a new optimization parameter (fp16) to OptimizationConfig (#273).
Pipelines
Additional pipelines tasks are now supported, here is a list of the supported tasks along with the default model for each:
- Image Classification (ViT)
- Text-to-Text Generation (T5 small)
- Summarization (T5 base)
- Translation (T5 base)
Below is an example that downloads a T5 small model from the Hub and loads it with transformers pipeline for translation :
```python from transformers import AutoTokenizer, pipeline from optimum.onnxruntime import ORTModelForSeq2SeqLM
tokenizer = AutoTokenizer.frompretrained("optimum/t5-small") model = ORTModelForSeq2SeqLM.frompretrained("optimum/t5-small") onnxtranslation = pipeline("translationentofr", model=model, tokenizer=tokenizer)
text = "What a beautiful day !" pred = onnx_translation(text)
[{'translation_text': "C'est une belle journée !"}]
```
Breaking change
The ORTModelForXXX execution provider default value is now set to CPUExecutionProvider (#203). Before, if no execution provider was provided, it was set to CUDAExecutionProvider if a gpu was detected, or to CPUExecutionProvider otherwise.
- Python
Published by echarlaix almost 4 years ago
optimum - v1.2.3: Patch release
- Remove intel sub-package, migrating to
optimum-intel(#212) - Fix the loading and saving of
ORTModeloptimized and quantized models (#214)
- Python
Published by echarlaix about 4 years ago
optimum - v1.2.2: Patch release
- Extend
QuantizationPreprocessorto dynamic quantization (https://github.com/huggingface/optimum/pull/196) - Introduce unified approach to create transformers vs optimized models benchmark (https://github.com/huggingface/optimum/pull/194)
- Bump
huggingface_hubversion andprotobuffix (https://github.com/huggingface/optimum/pull/205)
- Python
Published by echarlaix about 4 years ago
optimum - v1.2.1: Patch release
Add support to Python version 3.7 (https://github.com/huggingface/optimum/pull/176)
- Python
Published by echarlaix about 4 years ago
optimum - v1.2.0: pipeline and AutoModelForXxx classes to run ONNX Runtime inference
ORTModel
ORTModelForXXX classes such as ORTModelForSequenceClassification were integrated with the Hugging Face Hub in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the save_pretrained and push_to_hub methods. An already optimized and / or quantized ONNX model can also be loaded using the ORTModelForXXX classes using the from_pretrained method.
Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :
```python from optimum.onnxruntime import ORTModelForSequenceClassification
Load model from hub and export it through the ONNX format
model = ORTModelForSequenceClassification.frompretrained( "distilbert-base-uncased-finetuned-sst-2-english", fromtransformers=True )
Save the exported model
model.savepretrained("alocalpathforconvertonnx_model") ```
Pipelines
Built-in support for transformers pipelines was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as ONNX Runtime.
The currently supported tasks with the default model for each are the following :
- Text Classification (DistilBERT model fine-tuned on SST-2)
- Question Answering (DistilBERT model fine-tuned on SQuAD v1.1)
- Token Classification(BERT large fine-tuned on CoNLL2003)
- Feature Extraction (DistilBERT)
- Zero Shot Classification (BART model fine-tuned on MNLI)
- Text Generation (DistilGPT2)
Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with transformers pipeline for question-answering.
```python from transformers import AutoTokenizer, pipeline from optimum.onnxruntime import ORTModelForQuestionAnswering
load vanilla transformers and convert to onnx
model = ORTModelForQuestionAnswering.frompretrained("deepset/roberta-base-squad2",fromtransformers=True) tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
test the model with using transformers pipeline, with handleimpossibleanswer for squad_v2
optimumqa = pipeline(task, model=model, tokenizer=tokenizer, handleimpossibleanswer=True) prediction = optimumqa( question="What's my name?", context="My name is Philipp and I live in Nuremberg." )
print(prediction)
{'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}
```
Improvements
- Add loss when performing the evalutation step using an instance of
ORTTrainer, previously not enabled when inference was performed with ONNX Runtime in #152
- Python
Published by echarlaix about 4 years ago
optimum - v1.1.1: Patch release
Habana
- Installation details added for Optimum-Habana which provides optimized transformers integration for Intel's Habana Gaudi Processor (HPU).
ONNX Runtime
- Add the possibility to specify the execution provider in
ORTModel. - Add
IncludeFullyConnectedNodesclass to find the nodes composing the fully connected layers in order to (only) target the latter for quantization to limit the accuracy drop. - Update
QuantizationPreprocessorso that the intersection of the two sets representing the nodes to quantize and the nodes to exclude from quantization to be an empty set. - Rename
Seq2SeqORTTrainertoORTSeq2SeqTrainerfor clarity and to keep consistency. - Add
ORTOptimizersupport for ELECTRA models. - Fix the loading of pretrained
ORTConfigwhich contains optimization and quantization config.
- Python
Published by JingyaHuang about 4 years ago
optimum - v1.1.0: ORTTrainer, Seq2SeqORTTrainer, ONNX Runtime optimization and quantization API improvements
ORTTrainer and Seq2SeqORTTrainer
The ORTTrainer and Seq2SeqORTTrainer are two newly experimental classes.
- Both ORTTrainer and Seq2SeqORTTrainer were created to have a similar user-facing API as the Trainer and Seq2SeqTrainer of the Transformers library.
- ORTTrainer allows the usage of the ONNX Runtime backend to train a given PyTorch model in order to accelerate training. ONNX Runtime will run the forward and backward passes using an optimized automatically-exported ONNX computation graph, while the rest of the training loop is executed by native PyTorch.
- ORTTrainer allows the usage of ONNX Runtime inferencing during both the evaluation and the prediction step.
- For Seq2SeqORTTrainer, ONNX Runtime inferencing is incompatible with --predict_with_generate, as the generate method is not supported yet.
ONNX Runtime optimization and quantization APIs improvements
The ORTQuantizer and ORTOptimizer classes underwent a massive refactoring that should allow a simpler and more flexible user-facing API.
- Addition of the possibility to iteratively compute the quantization activation ranges when applying static quantization by using the
ORTQuantizermethodpartial_fit. This is especially useful when using memory-hungry calibration methods such as Entropy and Percentile methods. - When using the MinMax calibration method, it is now possible to compute the moving average of the minimum and maximum values representing the activations quantization ranges instead of the global minimum and maximum (feature available with onnxruntime v1.11.0 or higher).
- The classes
OptimizationConfig,QuantizationConfigandCalibrationConfigwere added in order to better segment the different ONNX Runtime related parameters instead of having one unique configurationORTConfig. - The
QuantizationPreprocessorclass was added in order to find the nodes to include and / or exclude from quantization, by finding the nodes following a given pattern (such as the nodes forming LayerNorm for example). This is particularly useful in the context of static quantization, where the quantization of modules such as LayerNorm or GELU are responsible of important drop in accuracy.
- Python
Published by echarlaix about 4 years ago
optimum - v1.0.0: ONNX Runtime optimization and quantization support
ONNX Runtime support
- An
ORTConfigclass was introduced, allowing the user to define the desired export, optimization and quantization strategies. - The
ORTOptimizerclass takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance ofORTOptimizer, the user needs to provide anORTConfigobject, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling theORTOptimizer.fitmethod. - ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added
ORTQuantizerclass. In order to create an instance ofORTQuantizer, the user needs to provide anORTConfigobject, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling theORTQuantizer.fitmethod.
Additionnal features for Intel Neural Compressor
We have also added a new class called IncOptimizer which will take care of combining the pruning and the quantization processes.
- Python
Published by echarlaix over 4 years ago
optimum - v0.1.2: Intel Neural Compressor's pruning support
With this release, we enable Intel Neural Compressor v1.8 magnitude pruning for a variety of NLP tasks with the introduction of IncTrainer which handles the pruning process.
- Python
Published by echarlaix over 4 years ago
optimum - v0.1.1: Intel Neural Compressor's dynamic, post-training and aware-training quantization support
With this release, we enable Intel Neural Compressor v1.7 PyTorch dynamic, post-training and aware-training quantization for a variety of NLP tasks. This support includes the overall process, from quantization application to the loading of the resulting quantized model. The latter being enabled by the introduction of the IncQuantizedModel class.
- Python
Published by echarlaix over 4 years ago
optimum - Optimum v0.0.1 - EAP
Initial release for early access to Optimum library featuring Intel's LPOT quantization and pruning support.
- Python
Published by mfuntowicz almost 5 years ago