Recent Releases of llmcompressor
llmcompressor - v0.7.1
What's Changed
- [Examples] Create qwen25vlexample.py by @Zhao-Dongyu in https://github.com/vllm-project/llm-compressor/pull/1752
- [fix] Fix visual layer ignore pattern for Qwen2.5-VL models by @Zhao-Dongyu in https://github.com/vllm-project/llm-compressor/pull/1766
- [Transform] Fix QuIP targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1770
New Contributors
- @Zhao-Dongyu made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1752
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.7.0...0.7.1
- Python
Published by dbarbuzzi 6 months ago
llmcompressor - v0.7.0
LLM Compressor v0.7.0 release notes
This LLM Compressor v0.7.0 release introduces the following new features and enhancements: * Transforms support, including QuIP and SpinQuant algorithms * Apply multiple compressors to a single model for mixed-precision quantization * Support for DeepSeekV3-style block FP8 quantization * Expanded Mixture of Experts (MoE) calibration support, including support with NVFP4 quantization * Llama4 quantization support with vLLM compatibility * Configurable observer arguments * Simplified and unified Recipe classes for easier usage and debugging
Introducing Transforms :sparkles:
LLM Compressor now supports transforms. With transforms, you can inject additional matrix operations within a model for the purposes of increasing the accuracy recovery as a result of quantization. Transforms allow rotating weights or activations into spaces with smaller dynamic ranges, reducing quantization error.
Two algorithms are supported in this release: * QuIP transforms inject transforms before and after weights to assist with weight-only quantization * SpinQuant transforms inject transforms whose inverses span across multiple weights, assisting in both weight and activation quantization. In this release, fused R1 and R2 (i.e. offline) transforms are available. The full lifecycle has been validated to confirm that the models produced by LLM Compressor match the performance outlined in the original SpinQuant paper. Learned rotations and online R3 and R4 rotations will be added in a future release.
The functionality for both algorithms available through the new QuIPModifier and SpinQuantModifier classes.
Applying multiple compressors to a single model
LLM Compressor now supports applying multiple compressors to a single model. This extends support for non-uniform quantization recipes, such as combining NVFP4 and FP8 quantization. This provides finer control over per-layer quantization, allowing more precise handling of layers that are especially sensitive to certain quantization types.
Models with more than one compressor applied have their format set to mixed-precision in the config.json file. Additionally, each config_group now includes a format key that specifies the format used for the layers targeted by that group.
Support for DeepSeekV3-style block FP8 quantization
You can now apply DeepSeekV3-style block FP8 quantization during model compression, a technique designed to further compress large language models for more efficient inference. The changes encompass the fundamental implementation of block-wise quantization, robust handling of quantization parameters, updated documentation, and a practical example to guide users in applying this new compression scheme.
Mixture of Experts support
LLM Compressor now includes enhanced general Mixture of Experts (MoE) calibration support, including support for MoEs with NVFP4 quantization. Forward passes of the MoE models can be controlled during calibration by adding custom modules to the replace_modules_for_calibration function which permanently changes the MoE module or moe_calibration_context function to temporarily update modules during calibration.
Llama4 quantization
LLama4 quantization is now supported in LLM Compressor. To be quantized and runnable in vLLM, Llama4TextMoe modules are permanently replaced using the replace_modules_for_calibration method which linearizes the modules. This allows the model to be quantized to schemes including WN16 with GPTQ and NVFP4.
Simplified and updated Recipe classes
Recipe classes have been updated with the following features:
- Merged multiple recipe-related classes into a single, unified
Recipeclass - Simplified modifier creation, lifecycle management, and parsing logic
- Improved serialization and deserialization for clarity and maintainability
- Reduced redundant stages and arguments handling for easier debugging and usage
Configurable Observer arguments
Observer arguments can now be configured as a dict through the observer_kwargs quantization argument, which can be set through oneshot recipes.
- Python
Published by dhuangnm 6 months ago
llmcompressor - v0.6.0.1
What's Changed
- Cap transformers version for hotfix 0.6.0.1 by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/1671
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.6.0...0.6.0.1
- Python
Published by dhuangnm 7 months ago
llmcompressor - v0.6.0
What's Changed
- [Experimental] Mistral-format FP8 quantization by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1359
- [Examples] [Bugfix] skip sparsity stats when saving checkpoints by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1528
- [Examples] [Bugfix] Fix debug message by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1529
- [Tests][NVFP4] No longer skip NVFP4A16 e2e test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1538
- [AWQ] Support for Calibration Datasets of varying feature dimension by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1536
- fix qwen 2.5 VL multimodal example by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1541
- [Example] [Bugfix] Fix Gemma ignore list by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1531
- [Tests][NVFP4] Add e2e nvfp4 test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1543
- [Examples] Use more robust splits by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1544
- [Bugfix] [Autowrapper] Fix visit_Delete by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1532
- [Example] Fix Qwen VL ignore list by @arunmadhusud in https://github.com/vllm-project/llm-compressor/pull/1545
- [Tests] Fix
Qwen2.5-VL-7B-InstructRecipe by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1548 - [Bugfix] Fix gemma2 generation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1552
- fix skipif check on tests involving gated HF models by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1553
- [NVFP4] Fix global scale update when dealing with offloaded layers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1554
- oneshot entrypoint update by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1445
- LM Eval tests -- ignore vision tower for VL fp8 test by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1562
- [Performance] Sequential onloading by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1263
- [BugFix] Explicitly set gpumemoryutilization by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1560
- Add Axolotl blog link by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1563
- [Bugfix] Fix multigpu
dispatch_for_generationby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1567 - [Testing] Set
VLLM_WORKER_MULTIPROC_METHODfor e2e testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1569 - [BugFix] Fix quantizaiton2of4sparse_w4a16 example by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1565
- [Pipelines] infer model device with optional override by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1572
- bump up requirement for compressed-tensors to 0.10.2 by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/1581
New Contributors
- @arunmadhusud made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1545
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.5.2...0.6.0
- Python
Published by dhuangnm 8 months ago
llmcompressor - v0.5.2
What's Changed
- Exclude images from package by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1397
- [Tracing] Skip non-ancestors of sequential targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1389
- Consolidate build config by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1398
- [Tests] Disable silently failing kv cache test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1371
- Drop
flash_attnskip for quantizing_moe example tests by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1396 - [VLM] Fix mllama targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1402
- [Tests] Use requires_gpu, fix missing gpu test skip, add explicit test for gpu from gha by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1264
- Implement
QuantizationMixinby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1351 - Add new-features section by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1408
- [Tracing] Support tracing of Gemma3 [#1248] by @kelkelcheng in https://github.com/vllm-project/llm-compressor/pull/1373
- bugfix kv cache quantization with ignored layers by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1312
- AWQ sanitize_kwargs minor cleanup by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1405
- [Tracing][Testing] Add tracing tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1335
- fix lm eval test reproducibility issues by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1260
- Pipeline Extraction by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1279
- Add
pull_requesttrigger to base tests workflow by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1417 - removing RecipeMetadata and references by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1414
- Update examples to only load required number of samples from dataset by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1118
- [Tracing] Reinstate ignore functionality by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1423
- [Typo] overriden by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1420
- Rename SparsityModifierMixin to SparsityModifierBase by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1416
- Remove RecipeArgs class & its references by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1429
- [Examples] Standardize AWQ example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1412
- [Logging] Support logging once by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1431
- Add: deepseekv2 smoothquant mappings by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1433
- AWQ QuantizationMixin + SequentialPipeline by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1426
- patch awq tests/readme after QuantizationMixin refactor by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1439
- Added more tests for Quantization24SparseW4A16 by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1434
- [GPTQ] Add
actorderoption to modifier by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1424 - [Bugfix][Tracing] Fix qwen25vl by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1448
- [Tests] Use proper offloading utils in
test_compress_tensor_utilsby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1449 - [Tracing] Fix Traceable Imports by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1452
- [NVFP4] Enable FP4 Weight-Only Quantization by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1309
- Pin transformers to <4.52.0 by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1459
- AWQ Apply Scales Bugfix when smooth layer output length doesn't match balance layer input length by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1451
- Fix #1344 Extend e2e tests to add asym support for W8A8-Int8 by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1345
- [Tests] Fix activation recipe for w8a8 asym by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1461
- AWQ Qwen and Phi mappings by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1440
- [Observer] Optimize mse observer by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1450
- Fix: Improve
SmoothQuantSupport for Mixture of Experts (MoE) Models by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1455 - [Tests] Add nvfp4a16 e2e test case by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1463
- [Docs] Update README to list fp4 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1462
- Remove duplicate model id var from awq example recipe by @AndrewMead10 in https://github.com/vllm-project/llm-compressor/pull/1467
- Added observer type for testminmax by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1466
- Disable kernels during calibration (and tracing) by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1454
- [GPTQ] Fix actorder resolution, add sentinel by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1453
- Set
show_progressto True by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1471 - Remove
compressby @dsikka in https://github.com/vllm-project/llm-compressor/pull/1470 - raise error if block quantization is used, as it is not yet supported by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1476
- [Tests] Increase max seq length for tracing tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1478
- [Tests] Fix dynamic field to be a bool, not string by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1480
- [Examples] Fix qwen vision examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1481
- [NVFP4] Update to use
tensor_groupstrategy; update observers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1484 - loosen lmeval assertions to upper or lower bound by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1477
- Revert "expand observers to calculate gparams, add example for activa… by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1486
- fix rest of the minmax tests by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1469
- Add warning for non-divisible group quantization by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1401
- [AWQ] Support accumulation for reduced memory usage by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1435
- [Tracing] Code AutoWrapper by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1411
- Removed RecipeTuple & RecipeContainer class by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1460
- Unpin to support
transformers==4.52.3by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1479 - [Tests] GPTQ Actorder Resolution Tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1468
- [Testing] Skip FP4 Test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1499
- [Bugfix] Remove tracing imports from tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1498
- [Testing] Use a slightly larger model that works with group_size 128 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1502
- skip tracing tests if token unavailable by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1493
- Fix missing logs when calling oneshot by @kelkelcheng in https://github.com/vllm-project/llm-compressor/pull/1446
- [NVFP4] Expand observers to calculate gparam, support NVFP4 Activations by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1487
- [Tests] Remove duplicate test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1500
- [Model] Mistral3 example and test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1490
- [NVFP4] Use observers to generate global weight scales by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1504
- Revert "[NVFP4] Use observers to generate global weight scales " by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1507
- [NVFP4] Update global scale generation by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1508
- [NVFP4] Fix onloading of fused layers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1512
- Pin pandas to <2.3 by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1515
- AWQModifier fast resolve mappings, better logging, MoE support by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1444
- Update setup.py by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1516
- Use model compression pathways by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1419
- [Example] [Bugfix] Fix Gemma3 Generation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1517
- [Docs] Update ReadME details for FP4 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1519
- [Examples] [Bugfix] Perform sample generation before saving as compressed by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1530
- Add citation information both in README as well as native GitHub file support by @markurtz in https://github.com/vllm-project/llm-compressor/pull/1527
- update compressed-tensors version requirement by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/1534
New Contributors
- @kelkelcheng made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1373
- @AndrewMead10 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1467
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.5.1...0.5.2
- Python
Published by dhuangnm 8 months ago
llmcompressor - v0.5.1
What's Changed
- Update nm-actions/changed-files to v1.16.0 by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1311
- docs: fix missing git clone command and repo name typos in DEVELOPING.md by @gattshjott in https://github.com/vllm-project/llm-compressor/pull/1325
- Update e2e/lm-eval test infrastructure by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1323
- fix(logger): normalize logfilelevel input for consistency by @gattshjott in https://github.com/vllm-project/llm-compressor/pull/1324
- [Utils] Replace
preserve_attrwithpatch_attrby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1187 - Fix cut off log in entrypoints/utils.py
post_process()by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1336 - [Tests] Update condition for sparsity check to be more robust by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1337
- [Utils] Add
skip_weights_downloadfor developers and testing by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1334 - replace custom version handling with setuptools-scm by @dhellmann in https://github.com/vllm-project/llm-compressor/pull/1322
- [Compression] Update sparsity calculation lifecycle when fetching the compressor by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1332
- [Sequential] Support models with nested
_no_split_modulesby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1329 - [Tracing] Remove
TraceableWhisperForConditionalGenerationby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1310 - Add torch device to list of offloadable types by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1348
- Reduce SmoothQuant Repr by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1289
- Use
align_module_deviceutil by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1298 - Fix project URL in setup.py by @tiran in https://github.com/vllm-project/llm-compressor/pull/1353
- Update trigger on PR comment workflow by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1357
- Add timing functionality to lm-eval tests by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1346
- [Callbacks][Docs] Add docstrings to saving functions by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1201
- Move: recipe parsing test from
e2e/to main test suite by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1360 - Smoothquant typehinting by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1285
- AWQ Modifier by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1177
- [Tests] Update transformers tests to run kv_cache tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1364
- [Transformers] Support latest transformers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1352
- Update testconsecutiveruns.py by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1366
- [Docs] Mention AWQ, some clean-up by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1367
- Fix versioning for source installs by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1370
- [Testing] Reduce error verbosity of cleanup by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1365
- Update testoneshotand_finetune.py to use pytest.approx by @markurtz in https://github.com/vllm-project/llm-compressor/pull/1339
- [Tracing] Better runtime error messages by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1307
- [Tests] Fix test case; update structure by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1375
- fix: Make Recipe.modeldump() output compatible with modelvalidate() by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1328
- Add: documentation for enhanced
save_pretrainedparameters by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1377 - Revert "fix: Make Recipe.model_dump() output compatible .... by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1378
- AWQ resolved mappings -- ensure shapes align by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1372
- Update w4a16actorderweight.yaml lmeval config by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1380
- [WIP] Add AWQ Asym e2e test case by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1374
- Bump version; set ct version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1381
- bugfix AWQ with Llama models and python 3.9 by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1384
- awq -- hotfix to missing kwargs by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1395
New Contributors
- @gattshjott made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1325
- @dhellmann made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1322
- @tiran made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1353
- @ved1beta made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1346
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.5.0...0.5.1
- Python
Published by dbarbuzzi 10 months ago
llmcompressor - v0.5.0
What's Changed
- re-add vllm e2e test now that bug is fixed by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1162
- Fix Readme Imports by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1165
- Remove event_called by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1155
- Update: Test name by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1172
- Remove lifecycle initialized_structure attribute by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1156
- [VLM] Qwen 2.5 VL by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1113
- Revert bump by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1178
- Remove CLI by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1144
- Add group act order case to lm_eval test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1080
- Update e2e test timings ouputs by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1179
- [Oneshot Refactor] Main refactor by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1110
- [StageRunner Removal] Remove Evalulate / validate pathway by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1145
- [StageRemoval] Remove Predict pathway by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1146
- Fix 2of4 Apply Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1181
- Fix Sparse2of4 Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1182
- Add qwen moe w4a16 example by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1186
- [Callbacks] Consolidate Saving Methods by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1168
- lmeval tests multimodal by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1150
- [Dataset Performance] Add num workers on dataset processing - labels, tokenization by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1189
- Fix a minor typo by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/1191
- [Callbacks] Remove preinitializestructure by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1160
- Make
transformers-testsjob conditional on files changed by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1197 - Update finetune tests to decrease execution time by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1208
- Update transformers tests to speed-up execution by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1211
- Fix logging bug in oneshot.py by @aman2304 in https://github.com/vllm-project/llm-compressor/pull/1213
- [Training] Decouple Argument parser by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1207
- Remove MonkeyPatch for GPUs by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1227
- [Cosmetic] Rename dataargs to datasetargs by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1206
- [Training] Datasets - update Module by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1209
- [BugFix] Fix logging disabling bug and add tests by @aman2304 in https://github.com/vllm-project/llm-compressor/pull/1218
- [Training] Unifying Preprocess + Postprocessing logic for Train/Oneshot by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1212
- [Docs] Add info on when to use which PTQ/Sparsification by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1157
- [Callbacks] Remove
MagnitudePruningModifier.leave_enabledby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1198 - Replace Xenova model stub with nm-testing model stub by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1239
- Offload Cache Support torch.dtype by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1141
- Remove unused/duplicated/non-applicable utils from pytorch/utils/helpers by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1174
- [Bugfix] Staged 2of4 example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1238
- wandb/tensorboard loggers set default init to False by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1235
- fixing reproducibility of lmeval tests by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1220
- [Audio] People's Speech dataset and tracer tool by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1086
- Use KV cache constant names provided by compressed tensors by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1200
- [Bugfix] Raise error for processor remote code by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1184
- Remove missing weights silencers in favor of HFQuantizer solution by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1017
- Fix run_compressed tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1246
- [Train] Training Pipeline by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1214
- [Tests] Increase maximum quantization error by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1245
- [Callbacks] Remove EventLifecycle and on_start event by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1170
- [Bugfix] Disable generation of deepseek models with transformers>=4.48 by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1259
- Remove clear_ml by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1261
- [Tests] Remove clear_ml test from GHA by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1265
- Remove click by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1262
- [Bugfix] Remove constant pruning from 2of4 examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1267
- Addback: ConstantPruningModifier for finetuning cases by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1272
- Remove docker by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1255
- move failing mulitmodal lmeval tests to skipped folder by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1273
- Replace tj-action/changed-files by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1270
- [BugFix]: Sparse2of4 example sparsity-only case by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1282
- Revert "update" by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1296
- Fix Multi-Context Manager Syntax for Python 3.9 Compatibility by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1287
- Revert "Fix Multi-Context Manager Syntax for Python 3.9 Compatibility… by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1300
- [StageRunner] Stage Runner entrypoint and pipeline by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1202
- Bump: Min python version to 3.9 by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1288
- Keep quantization enabled during calibration by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1299
- [BugFix] TRL distillation bug fix by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1278
- Update: Readme for fp8 support by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1304
- [GPTQ] Add inversion fallback by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1283
- fix typo by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/1290
- [Tests] Fix oneshot + finetune test by passing splits to oneshot by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1316
- [Tests] Remove the
compressentrypoint by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1317 - Fix Multi-Context Manager Syntax for Python 3.9 Compatibility by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1313
- [BugFix] Directly Convert Modifiers to Recipe Instance by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1271
- bump version, tag ct by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1318
New Contributors
- @aman2304 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1213
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.4.1...0.5.0
- Python
Published by dhuangnm 11 months ago
llmcompressor - v0.4.1
What's Changed
- Remove version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1077
- Require 'ready' label for transformers tests by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1079
- GPTQModifier Nits and Code Clarity by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1068
- Also run on pushes to
mainby @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1083 - VLM: Phi3 Vision Example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1032
- VLM: Qwen2_VL Example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1027
- Composability with sparse and quantization compressors by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/948
- Remove
TraceableMistralForCausalLMby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1052 - [Fix Test Failure]: Propagate name change to test by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1088
- [Audio] Support Audio Datasets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1085
- [Test Fix] Add Quantization then finetune tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/964
- [Smoothquant] Phi3 Vision Mappings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1089
- [VLM] Multimodal Data Collator by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1087
- VLM: Model Tracing Guide by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1030
- Turn off 2:4 sparse compression until supported in vllm by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1092
- [Test Fix] Fix Consecutive oneshot by @horheynm in https://github.com/vllm-project/llm-compressor/pull/971
- [Bug Fix] Fix test that requre GPU by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1096
- Add Idefics3/SmolVLM quant support via traceable class by @leon-seidel in https://github.com/vllm-project/llm-compressor/pull/1095
- Traceability Guide: Clarity and typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1099
- [VLM] Examples README by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1057
- Raise warning for 24 compressed sparse-only models by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1107
- Remove logmodelload by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1016
- Return empty sparsity config if targets and ignores are empty by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1115
- Remove uses of get_observer by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/939
- FSDP utils cleanup by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/854
- Update maintainers, add notice by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1091
- Replace readme paths with urls by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1097
- GPTQ add Arkiv link, move file location by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1100
- Extend
remove_hooksto remove subsets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1021 - [Audio] Whisper Example and Readme by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1106
- [Audio] Add whisper fp8 dynamic example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1111
- [VLM] Update pixtral data collator to reflect latest transformers changes by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1116
- Use unique test names in
TestvLLMby @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1124 - Remove smoothquant from examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1121
- Extend
disable_hooksto keep subsets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1023 - Unpin
pynvmlto fix e2e test failures with vLLM by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1125 - Replace LayerCompressor with HooksMixin by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1038
- [Oneshot Refactor] Rename getsharedprocessorsrc to getprocessornamefrom_model by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1108
- Allow Shortcutting Min-max Observer by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/887
- [Polish] Remove unused code by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1128
- Properly restore training mode with
eval_contextby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1126 - SQ and QM: Remove
torch.cuda.empty_cache, usecalibration_forward_contextby @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1114 - [Oneshot Refactor] dataclass Arguments by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1103
- [Bugfix] SparseGPT, Pipelines by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1130
- [Oneshot refactor] Refactor initializemodelfrom_path by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1109
- [e2e] Update vllm tests with additional datasets by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1131
- Update: SparseGPT recipes by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1142
- Add timer support for testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1137
- [Audio] Support Whisper V3 by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1147
- Fix: Re-enable Sparse Compression for 2of4 Examples by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1153
- [VLM] Add caption to flickr dataset by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1138
- [VLM] Update mllama traceable definition by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1140
- Fix CPU Offloading by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1159
- [TRLSFTTrainer] Fix and Update Examples code by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1161
- [TRLSFTTrainer] Fix TRL-SFT Distillation Training by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1163
- Bump version for patch release by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1166
- Update DeepSeek Examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1175
- Update gemma2 examples with a note about sample generation by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1176
New Contributors
- @leon-seidel made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1095
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.4.0...0.4.1
- Python
Published by dhuangnm about 1 year ago
llmcompressor - v0.4.0
What's Changed
- Record config file name as test suite property by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/947
- Update setup.py by @dsikka in https://github.com/vllm-project/llm-compressor/pull/975
- Depreciate OBCQ Helpers by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/977
- KV Cache, E2E Tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/742
- Use 1 GPU for offloading examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/979
- Replace tokenizer with processor by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/955
- Revert "KV Cache, E2E Tests (#742)" by @dsikka in https://github.com/vllm-project/llm-compressor/pull/989
- Fix SmoothQuant offload bug by @dsikka in https://github.com/vllm-project/llm-compressor/pull/978
- Add LM Eval Configs by @dsikka in https://github.com/vllm-project/llm-compressor/pull/980
- Fix
test_model_reloadtest by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1005 - Calibration and Compression Contexts by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/998
- Add info for clarity by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1009
- [Bugfix] Pass
trust_remote_code_model=Truefor deepseek examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1012 - Vision Datasets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/943
- Add example for fp8 kv cache of phi3.5 and gemma2 by @mgoin in https://github.com/vllm-project/llm-compressor/pull/991
- Update ReadMe and test for cpu_offloading by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1013
- Adding amdsmi for AMD gpus by @citrix123 in https://github.com/vllm-project/llm-compressor/pull/1018
- CompressionLogger add time units by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1026
- patchtiedtensors_bug: support malformed model definitions by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1014
- Add: 2of4 example with/without fp8 quantization by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1033
- Remove unccessary step in 2of4 Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1034
- Remove Neural Magic copyright from files by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/992
- VLM Support via GPTQ Hooks and Data Pipelines by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/914
- [E2E Testing] KV-Cache by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1004
- [E2E Testing] Add recipe check vllm e2e by @horheynm in https://github.com/vllm-project/llm-compressor/pull/929
- [MoE] GPTQ compress using callback not hook by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1049
- Explicit dataset tokenizer
textkwarg by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1031 - Fix smoothquant ignore, Fix typing, Add glm mappings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1015
- [Test Fix] Quant model reload by @horheynm in https://github.com/vllm-project/llm-compressor/pull/974
- Remove old examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1062
- VLM: Fix typo bug in TraceableLlavaForConditionalGeneration by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1065
- Add tests for "examples/sparse2of4[...]" by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1067
- VLM Image Examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1064
- Add quick warning for DeepSeek with transformers 4.48.0 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1066
- [KV Cache] kv-cache end to end unit tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/141
- [E2E Testing] Fix HF upload by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1061
- [Test Fix] Fix/update testruncompressed by @horheynm in https://github.com/vllm-project/llm-compressor/pull/970
- Revert "[Test Fix] Fix/update testruncompressed" by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1071
- Sparse 2:4 + FP8 Quantization e2e vLLM tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1073
- [Test Patch] Remove redundant code for "Fix/update testruncompressed" by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1072
- bump; set ct version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1076
New Contributors
- @citrix123 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1018
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.3.1...0.4.0
- Python
Published by dhuangnm about 1 year ago
llmcompressor - v0.3.1
What's Changed
- BLOOM Default Smoothquant Mappings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/906
- [SparseAutoModelForCausalLM Deprecation] Feature change by @horheynm in https://github.com/vllm-project/llm-compressor/pull/881
- Correct "dyanmic" typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/888
- Explicit defaults for QuantizationModifier targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/889
- [SparseAutoModelForCausalLM Deprecation] Update examples by @horheynm in https://github.com/vllm-project/llm-compressor/pull/880
- Support pack_quantized format for nonuniform mixed-precision by @mgoin in https://github.com/vllm-project/llm-compressor/pull/913
- Actually make the
run_compressedtest useful by @dsikka in https://github.com/vllm-project/llm-compressor/pull/920 - Fix for e2e tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/927
- [Bugfix] Correct metrics calculations by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/878
- Update kv_cache example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/921
- [1/2] Expand e2e testing to prepare for lm-eval by @dsikka in https://github.com/vllm-project/llm-compressor/pull/922
- Update pytest command to capture results to file by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/932
- [Bugfix] DisableKVCache Context by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/834
- Add helpful info to the marlin-24 example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/946
- Remove requires_torch by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/949
- Remove unused sparseml.export utilities by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/950
- Implement HooksMixin by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/917
- Add LM Eval Testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/945
- update version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/969
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.3.0...0.3.1
- Python
Published by dhuangnm about 1 year ago
llmcompressor - v0.3.0
What's New in v0.3.0
Key Features and Improvements
- GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
- Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers
mappingsbased on model architecture, making SmoothQuant easier to apply across various models. - Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
- Generic Wrapper for Any Hugging Face Model (#185): Added
wrap_hf_model_classutility, enabling better support and integration for Hugging Face models i.e. not based onAutoModelForCausalLM. - Observer Restructure (#837): Introduced calibration and frozen steps within
QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.
Bug Fixes
- Fix Tied Tensors Bug (#659)
- Observer Initialization in GPTQ Wrapper (#883)
- Sparsity Reload Testing (#882)
Documentation
- Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.
What's Changed
- Fix compresed typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/188
- GPTQ Quantized-weight Sequential Updating by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/177
- Add: targets and ignore inference for sparse compression by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/191
- switch tests from weekly to nightly by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/658
- Compression wrapper abstract methods by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/170
- Explicitly set sequential_update in examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/187
- Increase Sparsity Threshold for compressors by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/679
- Add a generic
wrap_hf_model_classutility to support VLMs by @mgoin in https://github.com/vllm-project/llm-compressor/pull/185 - Add tests for examples by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/149
- Rename to quantization config by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/730
- Implement Missing Modifier Methods by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/166
- Fix 2/4 GPTQ Model Tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/769
- SmoothQuant mappings tutorial by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/115
- Fix import of
ModelCompressorby @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/776 - update test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/773
- [Bugfix] Fix saving offloaded state dict by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/172
- Auto-Infer
mappingsArgument forSmoothQuantModifierBased on Model Architecture by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/119 - Update workflows/actions by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/774
- [Bugfix] Prepare KD Models when Saving by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/174
- Set Sparse compression to save_compressed by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/821
- Install compressed-tensors after llm-compressor by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/825
- Fix test typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/828
- Add
AutoModelForCausalLMexample by @dsikka in https://github.com/vllm-project/llm-compressor/pull/698 - [Bugfix] Workaround tied tensors bug by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/659
- Only untie word embeddings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/839
- Check for config hidden size by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/840
- Use float32 for Hessian dtype by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/847
- GPTQ: Depreciate non-sequential update option by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/762
- Typehint nits by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/826
- [ DOC ] Remove version restrictions in W8A8 exmaple by @miaojinc in https://github.com/vllm-project/llm-compressor/pull/849
- Fix inconsistence in example config of 2:4 sparse quantization by @yzlnew in https://github.com/vllm-project/llm-compressor/pull/80
- Fix forward function pass call by @dsikka in https://github.com/vllm-project/llm-compressor/pull/845
- [Bugfix] Use weight parameter of linear layer by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/836
- [Bugfix] Rename files to remove colons by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/846
- cover all 3.9-3.12 in commit testing by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/864
- Add marlin-24 recipe/configs for e2e testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/866
- [Bugfix] onload during sparsity calculation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/862
- Fix HFTrainer overloads by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/869
- Support Model Offloading Tied Tensors Patch by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/872
- Add advice about dealing with non-invertable hessians by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/875
- seed commit workflow by @andy-neuma in https://github.com/vllm-project/llm-compressor/pull/877
- [Observer Restructure]: Add Observers; Add
calibrationandfrozensteps toQuantizationModifierby @dsikka in https://github.com/vllm-project/llm-compressor/pull/837 - Bugfix observer initialization in
gptq_wrapperby @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/883 - BugFix: Fix Sparsity Reload Testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/882
- Use custom unique test names for e2e tests by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/892
- Revert "Use custom unique test names for e2e tests (#892)" by @dsikka in https://github.com/vllm-project/llm-compressor/pull/893
- Move config["testconfig_path"] assignment by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/895
- Cap accelerate version to avoid bug by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/897
- Fix observing offloaded weight by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/896
- Update image in README.md by @mgoin in https://github.com/vllm-project/llm-compressor/pull/861
- update accelerate version by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/899
- [GPTQ] Iterative Parameter Updating by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/863
- Small fixes for release by @dsikka in https://github.com/vllm-project/llm-compressor/pull/901
- use smaller portion of dataset by @dsikka in https://github.com/vllm-project/llm-compressor/pull/902
- Update example to not fail hessian inversion by @dsikka in https://github.com/vllm-project/llm-compressor/pull/904
- Bump version to 0.3.0 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/907
New Contributors
- @miaojinc made their first contribution in https://github.com/vllm-project/llm-compressor/pull/849
- @yzlnew made their first contribution in https://github.com/vllm-project/llm-compressor/pull/80
- @andy-neuma made their first contribution in https://github.com/vllm-project/llm-compressor/pull/877
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.2.0...0.3.0
- Python
Published by dhuangnm over 1 year ago
llmcompressor - v0.2.0
What's Changed
- Correct Typo in SparseAutoModelForCausalLM docstring by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/56
- Disable Default Bitmask Compression by @Satrat in https://github.com/vllm-project/llm-compressor/pull/60
- TRL Example fix by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/59
- Fix typo by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/63
- Correct typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/61
- correct import in README.md by @zzc0430 in https://github.com/vllm-project/llm-compressor/pull/66
- Fix for issue #43 -- starcoder model by @horheynm in https://github.com/vllm-project/llm-compressor/pull/71
- Update README.md by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/74
- Layer by Layer Sequential GPTQ Updates by @Satrat in https://github.com/vllm-project/llm-compressor/pull/47
- [ Docs ] Update main readme by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/77
- [ Docs ]
gemma2examples by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/78 - [ Docs ] Update
FP8example to use dynamic per token by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/75 - [ Docs ] Overhaul
accelerateuser guide by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/76 - Support
kv_cache_schemefor quantizing KV Cache by @mgoin in https://github.com/vllm-project/llm-compressor/pull/88 - Propagate
trust_remote_codeArgument by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/90 - Fix for issue #81 by @horheynm in https://github.com/vllm-project/llm-compressor/pull/84
- Fix for issue 83 by @horheynm in https://github.com/vllm-project/llm-compressor/pull/85
- [ DOC ] Big Model Example by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/99
- Enable obcq/finetune integration tests with
commitcadence by @dsikka in https://github.com/vllm-project/llm-compressor/pull/101 - metric logging on GPTQ path by @horheynm in https://github.com/vllm-project/llm-compressor/pull/65
- Update test config files by @dsikka in https://github.com/vllm-project/llm-compressor/pull/97
- remove workflows + update runners by @dsikka in https://github.com/vllm-project/llm-compressor/pull/103
- metrics by @horheynm in https://github.com/vllm-project/llm-compressor/pull/104
- add debug by @horheynm in https://github.com/vllm-project/llm-compressor/pull/108
- Add FP8 KV Cache quant example by @mgoin in https://github.com/vllm-project/llm-compressor/pull/113
- Add vLLM e2e tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/117
- Fix style, fix noqa by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/123
- GPTQ Algorithm Cleanup by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/120
- GPTQ Activation Ordering by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/94
- demote recipe string initialization to debug and make more descriptive by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/116
- compressed-tensors main dependency for base-tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/125
- Set
readylabel for transformer tests; add message reminder on PR opened by @dsikka in https://github.com/vllm-project/llm-compressor/pull/126 - Fix markdown check test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/127
- Naive Run Compressed Pt. 2 by @Satrat in https://github.com/vllm-project/llm-compressor/pull/62
- Fix transformer test conditions by @dsikka in https://github.com/vllm-project/llm-compressor/pull/131
- Run Compressed Tests by @Satrat in https://github.com/vllm-project/llm-compressor/pull/132
- Correct typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/124
- Activation Ordering Strategies by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/121
- Fix README Issue by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/139
- update by @dsikka in https://github.com/vllm-project/llm-compressor/pull/143
- Update finetune and oneshot tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/114
- Validate Recipe Parsing Output by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/100
- fix build error for nightly by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/145
- Fix recipe nested in configs by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/140
- MOE example with warning by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/87
- Bug Fix: recipe stages were not being concatenated by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/150
- fix package name bug for nightly by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/155
- Add descriptions for pytest marks by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/156
- Fix Sparsity Unit Test by @Satrat in https://github.com/vllm-project/llm-compressor/pull/153
- Fix: Error during model saving with shared tensors by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/158
- Update 2:4 Examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/161
- DeepSeek: Fix Hessian Estimation by @Satrat in https://github.com/vllm-project/llm-compressor/pull/157
- bump up main to 0.2.0 by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/163
- Fix help dialogue by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/151
- Add MoE and Compressed Inference Examples by @Satrat in https://github.com/vllm-project/llm-compressor/pull/160
- Separate
trust_remote_codeargs by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/152 - Enable a skipped finetune test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/169
- Fix filename in example command by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/173
- Add DeepSeek V2.5 Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/171
- fix quality by @dsikka in https://github.com/vllm-project/llm-compressor/pull/176
- Patch log function name in gptq by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/168
- README for Modifiers by @Satrat in https://github.com/vllm-project/llm-compressor/pull/165
- Fix default for sequential updates by @dsikka in https://github.com/vllm-project/llm-compressor/pull/186
- fix default test case by @dsikka in https://github.com/vllm-project/llm-compressor/pull/193
- Fix Initalize typo by @Imss27 in https://github.com/vllm-project/llm-compressor/pull/190
- Update MoE examples by @mgoin in https://github.com/vllm-project/llm-compressor/pull/192
New Contributors
- @zzc0430 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/66
- @horheynm made their first contribution in https://github.com/vllm-project/llm-compressor/pull/71
- @dsikka made their first contribution in https://github.com/vllm-project/llm-compressor/pull/101
- @dhuangnm made their first contribution in https://github.com/vllm-project/llm-compressor/pull/145
- @Imss27 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/190
Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.1.0...0.2.0
- Python
Published by dhuangnm over 1 year ago
llmcompressor - v0.1.0
What's Changed
- Address Test Failures by @Satrat in https://github.com/vllm-project/llm-compressor/pull/1
- Remove SparseZoo Usage by @Satrat in https://github.com/vllm-project/llm-compressor/pull/2
- SparseML Cleanup by @markurtz in https://github.com/vllm-project/llm-compressor/pull/6
- Remove all references to Neural Magic copyright within LLM Compressor by @markurtz in https://github.com/vllm-project/llm-compressor/pull/7
- Add FP8 Support by @Satrat in https://github.com/vllm-project/llm-compressor/pull/4
- Fix Weekly Test Failure by @Satrat in https://github.com/vllm-project/llm-compressor/pull/8
- Add Scheme UX for QuantizationModifier by @Satrat in https://github.com/vllm-project/llm-compressor/pull/9
- Add Group Quantization Test Case by @Satrat in https://github.com/vllm-project/llm-compressor/pull/10
- Loguru logging standardization for LLM Compressor by @markurtz in https://github.com/vllm-project/llm-compressor/pull/11
- Clarify Function Names for Logging by @Satrat in https://github.com/vllm-project/llm-compressor/pull/12
- [ Examples ] E2E Examples by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/5
- Update setup.py by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/15
- SmoothQuant Mapping Defaults by @Satrat in https://github.com/vllm-project/llm-compressor/pull/13
- Initial README by @bfineran in https://github.com/vllm-project/llm-compressor/pull/3
- [Bug] Fix validation errors for smoothquant modifier + update examples by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/19
- [MOE Quantization] Warn against "undercalibrated" modules by @dbogunowicz in https://github.com/vllm-project/llm-compressor/pull/20
- Port SparseML Remote Code Fix by @Satrat in https://github.com/vllm-project/llm-compressor/pull/21
- Update Quantization Save Defaults by @Satrat in https://github.com/vllm-project/llm-compressor/pull/22
- [Bugfix] Add fix to preserve modifier order when passed as a list by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/26
- GPTQ - move calibration of quantiztion params to after hessian calibration by @bfineran in https://github.com/vllm-project/llm-compressor/pull/25
- Fix typos by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/31
- Remove ceiling from
datasetsdep by @mgoin in https://github.com/vllm-project/llm-compressor/pull/27 - Revert naive compression format by @Satrat in https://github.com/vllm-project/llm-compressor/pull/32
- Fix layerwise targets by @Satrat in https://github.com/vllm-project/llm-compressor/pull/36
- Move Weight Update Out Of Loop by @Satrat in https://github.com/vllm-project/llm-compressor/pull/40
- Fix End Epoch Default by @Satrat in https://github.com/vllm-project/llm-compressor/pull/39
- Fix typos in example for w8a8 quant by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/38
- Model Offloading Support Pt 2 by @Satrat in https://github.com/vllm-project/llm-compressor/pull/34
- set version to 1.0.0 for release by @bfineran in https://github.com/vllm-project/llm-compressor/pull/44
- Update version for first release by @markurtz in https://github.com/vllm-project/llm-compressor/pull/50
- BugFix: Update TRL example scripts to point to the right SFTTrainer by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/51
- Update examples/quantization24sparse_w4a16 README by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/52
- Fix Failing Transformers Tests by @Satrat in https://github.com/vllm-project/llm-compressor/pull/53
- Offloading Bug Fix by @Satrat in https://github.com/vllm-project/llm-compressor/pull/58
New Contributors
- @markurtz made their first contribution in https://github.com/vllm-project/llm-compressor/pull/6
- @bfineran made their first contribution in https://github.com/vllm-project/llm-compressor/pull/3
- @dbogunowicz made their first contribution in https://github.com/vllm-project/llm-compressor/pull/20
- @eldarkurtic made their first contribution in https://github.com/vllm-project/llm-compressor/pull/31
- @mgoin made their first contribution in https://github.com/vllm-project/llm-compressor/pull/27
- @dbarbuzzi made their first contribution in https://github.com/vllm-project/llm-compressor/pull/52
Full Changelog: https://github.com/vllm-project/llm-compressor/commits/0.1.0
- Python
Published by dhuangnm over 1 year ago