Release Notes - ChatLLaMA 0.0.4

Major release of ChatLLaMA adding support to efficient training using LoRA.

New Features

HF-based models can now be trained on both actor and RLHF using LoRA. This will help users in reducing the memory needed for training.

- Python
Published by diegofiori over 3 years ago

Release Notes - ChatLLaMA 0.0.3

Major release of ChatLLaMA solving multiple bugs and expanding the support to distrubuted training.

New Features

Training produces a log file containing also the training stats
Add Template during the dataset creation
Changed default training parameters to InstructGPT paper ones
Implemented Cosine Scheduler for LR
Add Conversation Logs during RLHF
Improved management of checkpoints
Add support for Accelerate backend for distributed training
Defer import of LLaMA modules, avoiding the LLaMA dependecy while another model is used

Bug Fix

Fix bug causing crashes during training due to long sequences
Fix multiple bugs on deepspeed MultiGPU training
Fix bugs on RLHF training
Fix bug with different versions of LangChain during synthetic data generation

New Contributors

@PierpaoloSorbellini made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/185
@theSekyi made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/187
@egrefen made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/190
@AAnirudh07 made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/202
@dentathor made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/204
@bzantium made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/212
@pgzhang made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/230
@sebastianschramm made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/237
@HuangLK made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/253
@zhzou2020 made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/271

- Python
Published by diegofiori over 3 years ago

Nebullvm 0.9.0 Release Notes

Major release for Nebullvm, adding support to diffusion model optimization.

New features

Add support for diffusers UNet

Bug fixed

Fix CI pipelines triggers

Speedster 0.3.0 Release Notes

Major release adding support to the diffusers library.

New features

Add support to diffusers library. Speedster can now optimize diffusion models with a single line of code.
Update readme for HF models.

Bug fixed

Fix import error in Google Colab
Fix few typos in the docs and update the benchmarks

New Contributors

@mfumanelli made their first contribution in https://github.com/nebuly-ai/nebullvm/pull/178

- Python
Published by diegofiori over 3 years ago

nebullvm 0.8.1 Release Notes

This is a minor release fixing multiple bugs.

New features

Changed the Auto-Installer API
Added support to onnxruntime TensorrtExecutionProvider

Bug fixed

Fixed bug in torchscript casting integers to fp16.
Optimized the memory usage for inference learners
Now the tensorrt workspace size is dynamically computed according to the free memory available on the gpu device
Fixed a bug in openvino when using static quantization

speedster 0.2.1 Release Notes

This is a minor release fixing multiple bugs.

New Features

Added support for device selection when working multi-gpu environment
Added support to input data with inconsistent batch size
Implemented benchmark functions also for TensorFlow and ONNX

Bug Fixed

Optimized the gpu memory usage during speedster optimization

Contributors

@valeriosofi
@cccntu
@VinishUchiha

- Python
Published by diegofiori over 3 years ago

nebullvm 0.8.0 Release Notes

This is a major release fixing multiple bugs and implementing two new functions for loading and saving the models.

New Features

Implements two new functions for loading and saving inference learners.

Bug fixed

Fixed bug for ONNXRuntime models being loaded on the wrong device after the optimization.
Fixes TensorRT behaviour when using dynamic shape
Improves significantly the performance of TensorRT with ONNX interface
Limits the gpu memory used by tensorflow, to avoid memory issues during tensorflow models optimization
Fixes some issues with unit tests, and adds additional controls to ensure that an optimized model with dynamic shape works properly.
removes setuptools from tensorrt installation, it's not needed anymore by the newer version and it was causing issues with the azure pipelines.

speedster 0.2.0 Release Notes

Major release for Speedster adding the load_model and save_model functions.

New Features

Improved the logs
Save and load model functions can be imported directly from speedster
Brand new documentation moved directly in the GitHub repository and built using mkdocs

Bug Fixed

Fixed version compatibility issue on Colab
adds missing port forwarding in the docker run command inside the notebooks readme.

Contributors

Valerio Sofi (@valeriosofi)
Diego Fiori (@diegofiori)
Leonardo Zecchin (@ZecchinLeonardo)
Arian Ghasemi (@arianGh1)

- Python
Published by diegofiori over 3 years ago

nebullvm 0.7.3 Release Notes

This is a minor release modifying the metric_drop_ths behaviour.

New Features

metric_drop_ths by default set to 0.001
Half precision is activated for each value of metric_drop_ths>0
Int8 quantization is activated just for values of metric_drop_ths>0.01

Speedster 0.1.2 Release Notes

This is a minor release editing the default value for metric_drop_ths and modifying the summarisation table at the end of the optimization.

New Features

The summary at the end of the optimization now shows in a clear way the improvements.
Warning with tips on how to improve the optimization result is now shown also for speedups lower than 2x.

- Python
Published by diegofiori over 3 years ago

nebullvm 0.7.2 Release Notes

This is a minor release editing the logs style.

New Features

Warning logs are now less intrusive.

Speedster 0.1.1 Release Notes

This is a minor release adding tips and improving the logs.

New Features

Suggestions are provided to the user on how to increase model performance when optimization does not provide the required speedup.
Improved the quality of the logs.

- Python
Published by diegofiori over 3 years ago

nebullvm 0.7.1 Release Notes

This is a minor release fixing few bugs for supporting YOLOv8 models.

Bug fixed

Fixed bugs with half-precision in PyTorch for supporting YOLOv8 models.

- Python
Published by diegofiori over 3 years ago

nebullvm 0.7.0 Release Notes

This is a major release simplifying the backends installation and improving the UX.

New Features

The auto-installer interface has been simplified and clarified.

Bug fixed

Fixed problem with shell not found when installing TensorRT on specific Linux systems.

speedster 0.1.0 Release Notes

Major release for Speedster, now supporting TF backend for HuggingFace transformers.

New Features

Add support for HuggingFace models having a TensorFlow backends.
Improved the logs style.
Optimization results are now summarised into a nice and informative table.

Contributors

Valerio Sofi (@valeriosofi)

- Python
Published by diegofiori over 3 years ago

nebullvm 0.6.0 Release Notes

This release of Nebullvm modifies the structure of the library. Nebullvm library is transformed into a framework which can be used for building Apps for AI optimization. The end-to-end optimization tool is moved into speedster, an App built using nebullvm as underlying framework.

New Features

The API for model optimization has been moved in app/accelerate/speedster.
The optimize_model function in nebullvm has been deprecated. It will be removed in the next major release.
Added support for windows platforms.

speedster 0.0.1 Release Notes

Speedster is the new library replacing the previous nebullvm API optimizing DL models in a single line of code. Speedster keeps the same interface as the previous nebullvm API.

New Features

Model optimization API moved into speedster. Now it is possible to import the optimize_model function running from speedster import optimize_model.

Contributors

Diego Fiori (@morgoth95)
Valerio Sofi (@valeriosofi)

- Python
Published by diegofiori over 3 years ago

optimate - v0.5.0

nebullvm 0.5.0 Release Notes

This release of Nebullvm simplifies the needed requirements and adds various improvements in code stability.

New Features

All frameworks are not required anymore for running nebullvm
Compilers are not installed anymore by default the first time nebullvm is imported.
From the Auto-Installer the users can select which libraries and compilers they want to use.
Improve test coverage.

Bug fixed

Fixed multiple bugs while using TF interface

Contributors

Diego Fiori (@morgoth95)
Valerio Sofi (@valeriosofi)

- Python
Published by diegofiori over 3 years ago

optimate - v0.4.4

nebullvm 0.4.4 Release Notes

This release of Nebullvm provides new optimizers and various improvements in code stability.

New Features

Update notebooks with new api.
Improve test coverage.
Add Intel Neural compressor pruning and quantization.
The computation of the latency of the models now uses all the data and not only the first sample.
Dynamic shape of openvino has been updated with the new method available from version 2
Now the optimized model is discarted if the result is different from the original model (metric_drop_ths=0)

Bug fixed

Fix an issue during onnx quantization, now it's much faster than before.
Fix a tensor RT bug in static quantization with onnx interface.
Fixes and improvements on the torchscript compiler: now it supports also trace and torch.fx for tracing the model.
Fix a bug on macos related to ONNX and int8 quantization.
Fix a bug on sparseml that prevented it from working on colab.
Bug-fixes on the deepsparse compiler.
Fixes and improvements on the onnx internal model handling.
Fix an issue on tensorflow backend.
Fixes on torch and onnx tensorrt with transformers.
Fix a bug on tensor rt static quantization when using a new version of polygraphy
Fix a bug on huggingface when passing the tokenizer to the optimize_model function
Fix a bug when using quantization with a few data

Contributors

Diego Fiori (@morgoth95)
Valerio Sofi (@valeriosofi)

- Python
Published by diegofiori over 3 years ago

optimate - v0.4.3

nebullvm 0.4.3 Release Notes

Minor release that fixes some bugs added in v0.4.2.

Bug fixed

Fix bug preventing the installation without TensorFlow.
Fix a bug while using the HuggingFace Interface

Contributors

Diego Fiori (@morgoth95)
Valerio Sofi (@valeriosofi)

- Python
Published by diegofiori almost 4 years ago

optimate - v0.4.2

nebullvm 0.4.2 Release Notes

Minor release that fixes some bugs and reduces the number of strict requirements needed to run Nebullvm.

New Features

Support ignore_compilers also for torchscript and tflite
Tensorflow is not a strict nebullvm requirement anymore.

Bug fixed

Solve bug on half-precision with onnx-runtime
Fix a bug on tensor rt quantization: numpy arrays were passed to inference learner instead of tensors.

Contributors

Diego Fiori (@morgoth95)
Valerio Sofi (@valeriosofi)

- Python
Published by diegofiori almost 4 years ago

optimate - v0.4.1

nebullvm 0.4.1 Release Notes

Minor release fixing some bugs and extending support for TensorRT directly with the PyTorch interface.

New Features

Support for TensorRT directly with PyTorch models.

Bug fixed

Bug in conversion to onnx that could lead to wrong inference results

Contributors

Diego Fiori (@morgoth95)
Valerio Sofi (@valeriosofi)

- Python
Published by diegofiori almost 4 years ago

optimate - v0.4.0

nebullvm 0.4.0 Release Notes

"One API to rule them all". This major release of Nebullvm provides a brand new API unique to all Deep Learning frameworks.

New Features

New unique API for all the Deep Learning frameworks.
Support for SparseML pruning.
Beta-feature Support for Intel-Neural-Compressor's Pruning.
Add support for BladeDISC compiler.
Modify the latency calculation for each model by using the median instead of the mean across different model runs.
Implement an early stop mechanism for latency computation.

Bug fixed

Fix bug with HuggingFace models causing a failure during optimizations.

Contributors

Diego Fiori (@morgoth95)
Valerio Sofi (@valeriosofi)
Reiase (@reiase)

- Python
Published by diegofiori almost 4 years ago

optimate - v0.3.2

nebullvm 0.3.2 Release Notes

Minor release for maintenance purposes. It fixes bugs and generally improves the code stability.

New Features

In the Pytorch framework, whenever input data is provided for optimization, the model converter also uses it during the conversion of the model to onnx, instead of using the data only at the stage of applying the "precision reduction techniques."

Bug fixed

Fix bug with OpenVino 2.0 not working with 1-dimensional arrays.
Fix bug while using TensorRT engine which was returning cpu-tensors also when input tensors where on GPU.
Fix requirements conflicts on Intel CPUs due to an old numpy version required by OpenVino.

Contributors

Diego Fiori (@morgoth95)
Valerio Sofi (@valeriosofi)
SolomidHero (@SolomidHero)
Emile Courthoud (@emilecourthoud)

- Python
Published by diegofiori almost 4 years ago

optimate - v0.3.1

nebullvm 0.3.1 Release Notes

We are pleased to announce that we have added the option to run nebullvm from a Docker container. We provide both a Docker image on Docker Hub and the Dockerfile code to produce the Docker container directly from the latest version of the source code.

New Features

Add Dockerfile and upload docker images on Docker Hub.
Implement new backend for the Tensorflow API running on top of TensorFlow and TFLite.
Implement new backend for the PyTorch API running on top of TorchScript.

Bug fixed

Fix bug with TensorRT in the Tensorflow API.
Fix bug with OpenVino 2.0 not using the quantization on intel devices.

Contributors

Diego Fiori (@morgoth95)
Valerio Sofi (@valeriosofi)
Emile Courthoud (@emilecourthoud)

- Python
Published by diegofiori almost 4 years ago

optimate - v0.3.0

nebullvm 0.3.0 Release Notes

We are super excited to announce the new major release nebullvm 0.3.0, where nebullvm's AI inference accelerator becomes more powerful, stable and covers more use cases.

nebullvm is an open-source library that generates an optimized version of your deep learning model that runs 2-10 times faster in inference without performance loss by leveraging multiple deep learning compilers (OpenVINO, TensorRT, etc.). With the new release 0.3.0, nebullvm can now accelerate inference up to 30x if you specify that you are willing to trade off a self-defined amount of accuracy/precision to get an even lower response time and a lighter model. This additional acceleration is achieved by exploiting optimization techniques that slightly modify the model graph to make it lighter, such as quantization, half precision, distillation, sparsity, etc.

Find tutorials and examples on how to use nebullvm, as well as installation instructions in the main readme of nebullvm library. And check below if you want to learn more about

Overview of Nebullvm 0.3.0
Benchmarks
How the new Nebullvm 0.3.0 API Works
New Features & Bug Fixes

Overview of Nebullvm

With this new version, nebullvm continues in its mission to be:

☘️ Easy-to-use. It takes a few lines of code to install the library and optimize your models.

🔥 Framework agnostic. nebullvm supports the most widely used frameworks (PyTorch, TensorFlow, 🆕ONNX🆕 and Hugging Face, etc.) and provides as output an optimized version of your model with the same interface (PyTorch, TensorFlow, etc.).

💻 Deep learning model agnostic. nebullvm supports all the most popular deep learning architectures such as transformers, LSTM, CNN and FCN.

🤖 Hardware agnostic. The library now works on most CPU and GPU and will soon support TPU and other deep learning-specific ASIC.

🔑 Secure. Everything runs locally on your hardware.

✨ Leveraging the best optimization techniques. There are many inference techniques such as deep learning compilers, 🆕quantization or half precision🆕, and soon sparsity and distillation, which are all meant to optimize the way your AI models run on your hardware.

Benchmarks

We have tested nebullvm on popular AI models and hardware from leading vendors.

The table below shows the inference speedup provided by nebullvm. The speedup is calculated as the response time of the unoptimized model divided by the response time of the accelerated model, as an average over 100 experiments. As an example, if the response time of an unoptimized model was on average 600 milliseconds and after nebullvm optimization only 240 milliseconds, the resulting speedup is 2.5x times, meaning 150% faster inference.

A complete overview of the experiment and findings can be found on this page.

| | M1 Pro | Intel Xeon | AMD EPYC | Nvidia T4 | |-------------------------|:------------:|:---------------:|:-------------:|:-------------:| | EfficientNetB0 | 23.3x | 3.5x | 2.7x | 1.3x | | EfficientNetB2 | 19.6x | 2.8x | 1.5x | 2.7x | | EfficientNetB6 | 19.8x | 2.4x | 2.5x | 1.7x | | Resnet18 | 1.2x | 1.9x | 1.7x | 7.3x | | Resnet152 | 1.3x | 2.1x | 1.5x | 2.5x | | SqueezeNet | 1.9x | 2.7x | 2.0x | 1.3x | | Convnext tiny | 3.2x | 1.3x | 1.8x | 5.0x | | Convnext large | 3.2x | 1.1x | 1.6x | 4.6x | | GPT2 - 10 tokens | 2.8x | 3.2x | 2.8x | 3.8x | | GPT2 - 1024 tokens | - | 1.7x | 1.9x | 1.4x | | Bert - 8 tokens | 6.4x | 2.9x | 4.8x | 4.1x | | Bert - 512 tokens | 1.8x | 1.3x | 1.6x | 3.1x | | ____________________ | ____________ | ____________ | ____________ | ____________ |

Overall, the library provides great results, with more than 2x acceleration in most cases and around 20x in a few applications. We can also observe that acceleration varies greatly across different hardware-model couplings, so we suggest you test nebullvm on your model and hardware to assess its full potential. You can find the instructions below.

Besides, across all scenarios, nebullvm is very helpful for its ease of use, allowing you to take advantage of inference optimization techniques without having to spend hours studying, testing and debugging these technologies.

How the New Nebullvm API Works

With the latest release, nebullvm has a new API and can be deployed in two ways.

Option A: 2-10x acceleration, NO performance loss

If you choose this option, nebullvm will test multiple deep learning compilers (TensorRT, OpenVINO, ONNX Runtime, etc.) and identify the optimal way to compile your model on your hardware, increasing inference speed by 2-10 times without affecting the performance of your model.

Option B: 2-30x acceleration, supervised performance loss

Nebullvm is capable of speeding up inference by much more than 10 times in case you are willing to sacrifice a fraction of your model's performance. If you specify how much performance loss you are willing to sustain, nebullvm will push your model's response time to its limits by identifying the best possible blend of state-of-the-art inference optimization techniques, such as deep learning compilers, distillation, quantization, half precision, sparsity, etc.

Performance monitoring is accomplished using the perf_loss_ths (performance loss threshold), and the perf_metric for performance estimation.

When a predefined metric (e.g. "accuracy") or a custom metric is passed as the perfmetric argument, the value of perfloss_ths will be used as the maximum acceptable loss for the given metric evaluated on your datasets (Option B.1).

When no perf_metric is provided as input, nebullvm calculates the performance loss using the default precision function. If the dataset is provided, the precision will be calculated on 100 sampled data (option B.2). Otherwise, the data will be randomly generated from the metadata provided as input, i.e. input_sizes and batch_size (option B.3).

Check out the main GitHub readme if you want to take a look at nebullvm's performance and benchmarks, tutorials and notebooks on how to implement nebullvm with ease. And please leave a ⭐ if you enjoy the project and join the Discord community where we chat about nebullvm and AI optimization.

New Features and Bug Fixes

New features

Implemented quantization or half precision optimization techniques
Added support for models in the ONNX framework
Improved performance of Microsoft ONNX Runtime with transformers
Implemented nebullvm into Jina's amazing Clip-as-a-Service library for performance boost ( coming soon)
Accelerated library installation
Refactored the code to include support for datasets as an API
Released new benchmarks, notebooks and tutorials that can be found on the github readme

Bug fixing

Fixed bug related to Intel OpenVINO applied to dynamic shapes. Thanks @kartikeyporwal for the support!
Fixed bug with model storage.
Fixed bug causing issues with NVIDIA TensorRT output. Thanks @UnibsMatt for identifying the problem.

Contributors

@morgoth95 🥳
@emilecourthoud 🚀
@kartikeyporwal 🥇
@aurimgg 🚗

- Python
Published by diegofiori about 4 years ago

optimate - v0.2.2

nebullvm 0.2.2 Release Notes

The nebullvm 0.2.2 is minor release fixing some bugs.

New Features

Allow the user to select the maximum number of CPU-threads per model to use during optimization and inference.

Bug fixed

Fix bug in ONNXRuntime InferenceLearner

Contributors

Diego Fiori (@morgoth95)

- Python
Published by diegofiori about 4 years ago

optimate - v0.2.1

nebullvm 0.2.1 Release Notes

The nebullvm 0.2.1 is minor release fixing some bugs and supporting optimization directly on ONNX models.

New Features

ONNX interface for model optimization

Bug fixed

Fix bug in tensorRT

Contributors

Diego Fiori (@morgoth95)

- Python
Published by diegofiori about 4 years ago

optimate - v0.2.0

nebullvm 0.2.0 Release Notes

The nebullvm 0.2.0 is major release implementing new important features and fixing some bugs.

New Features

Support for dynamic shapes for both the PyTorch and TensorFlow interfaces
Support for Transformer models built using the HuggingFace framework
Add ONNXRuntime to the supported backends for optimized models
New README, updated with benchmarks on SOTA models for both NLP and Computer Vision

Bug fixed

Fix error in the tensorflow API preventing the usage of the optimize_tf_model function

Contributors

Diego Fiori (@morgoth95)
Emile Courthoud (@emilecourthoud)

- Python
Published by diegofiori about 4 years ago

optimate - v0.1.2

nebullvm 0.1.2 Release Notes

The nebullvm 0.1.2 is maintenance release fixing few bugs and implementing new features.

New Features

Support for the TorchScript API when optimising with ApacheTVM compiler.

Bug fixed

The learners optimised with OpenVino now do not raise KeyErrors at prediction time anymore.
The learners optimised with ApacheTVM can be saved and loaded multiple times. Previously, trying to save a loaded model ended up in raising an error.
Fix bug in the auto-installer feature due to incompatibilities between Tensorflow 2.8 and OpenVino
Modify the behaviour of MultiCompilerOptimizeravoiding errors due to the pickling of C-related files.

Contributors

Diego Fiori (@morgoth95)

- Python
Published by diegofiori over 4 years ago

optimate - v0.1.1

nebullvm 0.1.1 Release Notes

Official Alpha release of the nebullvmlibrary. The all-in-one library for deep learning compilers.

Main features

The main release contains:

wheels for installing with pip
auto-installation feature for supported compilers
support for OpenVINO, TensorRT and ApacheTVM
support for model built in Tensorflow and PyTorch
Optimised model API identical to the one of the input model

Contributors

A total of 3 people contributed to this release. * Diego Fiori (@morgoth95) * Emile Courthoud (@emilecourthoud) * Francesco Signorato (@FrancescoSignorato)

- Python
Published by nebuly-ai over 4 years ago

Recent Releases of optimate

optimate - ChatLLaMA 0.0.4

Release Notes - ChatLLaMA 0.0.4

New Features

optimate - ChatLLaMA 0.0.3

Release Notes - ChatLLaMA 0.0.3

New Features

Bug Fix

New Contributors

optimate - v0.9.0

Nebullvm 0.9.0 Release Notes

New features

Bug fixed

Speedster 0.3.0 Release Notes

New features

Bug fixed

New Contributors

optimate - v0.8.1

nebullvm 0.8.1 Release Notes

New features

Bug fixed

speedster 0.2.1 Release Notes

New Features

Bug Fixed

Contributors

optimate - v0.8.0

nebullvm 0.8.0 Release Notes

New Features

Bug fixed

speedster 0.2.0 Release Notes

New Features

Bug Fixed

Contributors

optimate - v0.7.3

nebullvm 0.7.3 Release Notes

New Features

Speedster 0.1.2 Release Notes

New Features

optimate - v0.7.2

nebullvm 0.7.2 Release Notes

New Features

Speedster 0.1.1 Release Notes

New Features

optimate - v0.7.1

nebullvm 0.7.1 Release Notes

Bug fixed

optimate - v0.7.0

nebullvm 0.7.0 Release Notes

New Features

Bug fixed

speedster 0.1.0 Release Notes

New Features

Contributors

optimate - v0.6.0

nebullvm 0.6.0 Release Notes

New Features

speedster 0.0.1 Release Notes

New Features

Contributors

optimate - v0.5.0

nebullvm 0.5.0 Release Notes

New Features

Bug fixed

Contributors

optimate - v0.4.4

nebullvm 0.4.4 Release Notes

New Features

Bug fixed

Contributors

optimate - v0.4.3

nebullvm 0.4.3 Release Notes

Bug fixed

Contributors

optimate - v0.4.2

nebullvm 0.4.2 Release Notes

New Features

Bug fixed

Contributors

optimate - v0.4.1

nebullvm 0.4.1 Release Notes