Recent Releases of llmcompressor

What's Changed

[Examples] Create qwen25vlexample.py by @Zhao-Dongyu in https://github.com/vllm-project/llm-compressor/pull/1752
[fix] Fix visual layer ignore pattern for Qwen2.5-VL models by @Zhao-Dongyu in https://github.com/vllm-project/llm-compressor/pull/1766
[Transform] Fix QuIP targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1770

New Contributors

@Zhao-Dongyu made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1752

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.7.0...0.7.1

- Python
Published by dbarbuzzi 9 months ago

LLM Compressor v0.7.0 release notes

This LLM Compressor v0.7.0 release introduces the following new features and enhancements: * Transforms support, including QuIP and SpinQuant algorithms * Apply multiple compressors to a single model for mixed-precision quantization * Support for DeepSeekV3-style block FP8 quantization * Expanded Mixture of Experts (MoE) calibration support, including support with NVFP4 quantization * Llama4 quantization support with vLLM compatibility * Configurable observer arguments * Simplified and unified Recipe classes for easier usage and debugging

Introducing Transforms :sparkles:

LLM Compressor now supports transforms. With transforms, you can inject additional matrix operations within a model for the purposes of increasing the accuracy recovery as a result of quantization. Transforms allow rotating weights or activations into spaces with smaller dynamic ranges, reducing quantization error.

Two algorithms are supported in this release: * QuIP transforms inject transforms before and after weights to assist with weight-only quantization * SpinQuant transforms inject transforms whose inverses span across multiple weights, assisting in both weight and activation quantization. In this release, fused R1 and R2 (i.e. offline) transforms are available. The full lifecycle has been validated to confirm that the models produced by LLM Compressor match the performance outlined in the original SpinQuant paper. Learned rotations and online R3 and R4 rotations will be added in a future release.

The functionality for both algorithms available through the new QuIPModifier and SpinQuantModifier classes.

Applying multiple compressors to a single model

LLM Compressor now supports applying multiple compressors to a single model. This extends support for non-uniform quantization recipes, such as combining NVFP4 and FP8 quantization. This provides finer control over per-layer quantization, allowing more precise handling of layers that are especially sensitive to certain quantization types.

Models with more than one compressor applied have their format set to mixed-precision in the config.json file. Additionally, each config_group now includes a format key that specifies the format used for the layers targeted by that group.

Support for DeepSeekV3-style block FP8 quantization

You can now apply DeepSeekV3-style block FP8 quantization during model compression, a technique designed to further compress large language models for more efficient inference. The changes encompass the fundamental implementation of block-wise quantization, robust handling of quantization parameters, updated documentation, and a practical example to guide users in applying this new compression scheme.

Mixture of Experts support

LLM Compressor now includes enhanced general Mixture of Experts (MoE) calibration support, including support for MoEs with NVFP4 quantization. Forward passes of the MoE models can be controlled during calibration by adding custom modules to the replace_modules_for_calibration function which permanently changes the MoE module or moe_calibration_context function to temporarily update modules during calibration.

Llama4 quantization

LLama4 quantization is now supported in LLM Compressor. To be quantized and runnable in vLLM, Llama4TextMoe modules are permanently replaced using the replace_modules_for_calibration method which linearizes the modules. This allows the model to be quantized to schemes including WN16 with GPTQ and NVFP4.

Simplified and updated Recipe classes

Recipe classes have been updated with the following features:

Merged multiple recipe-related classes into a single, unified Recipe class
Simplified modifier creation, lifecycle management, and parsing logic
Improved serialization and deserialization for clarity and maintainability
Reduced redundant stages and arguments handling for easier debugging and usage

Configurable Observer arguments

Observer arguments can now be configured as a dict through the observer_kwargs quantization argument, which can be set through oneshot recipes.

- Python
Published by dhuangnm 9 months ago

llmcompressor - v0.6.0.1

What's Changed

Cap transformers version for hotfix 0.6.0.1 by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/1671

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.6.0...0.6.0.1

- Python
Published by dhuangnm 10 months ago

llmcompressor - v0.6.0

What's Changed

[Experimental] Mistral-format FP8 quantization by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1359
[Examples] [Bugfix] skip sparsity stats when saving checkpoints by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1528
[Examples] [Bugfix] Fix debug message by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1529
[Tests][NVFP4] No longer skip NVFP4A16 e2e test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1538
[AWQ] Support for Calibration Datasets of varying feature dimension by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1536
fix qwen 2.5 VL multimodal example by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1541
[Example] [Bugfix] Fix Gemma ignore list by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1531
[Tests][NVFP4] Add e2e nvfp4 test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1543
[Examples] Use more robust splits by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1544
[Bugfix] [Autowrapper] Fix visit_Delete by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1532
[Example] Fix Qwen VL ignore list by @arunmadhusud in https://github.com/vllm-project/llm-compressor/pull/1545
[Tests] Fix Qwen2.5-VL-7B-Instruct Recipe by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1548
[Bugfix] Fix gemma2 generation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1552
fix skipif check on tests involving gated HF models by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1553
[NVFP4] Fix global scale update when dealing with offloaded layers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1554
oneshot entrypoint update by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1445
LM Eval tests -- ignore vision tower for VL fp8 test by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1562
[Performance] Sequential onloading by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1263
[BugFix] Explicitly set gpumemoryutilization by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1560
Add Axolotl blog link by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1563
[Bugfix] Fix multigpu dispatch_for_generation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1567
[Testing] Set VLLM_WORKER_MULTIPROC_METHOD for e2e testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1569
[BugFix] Fix quantizaiton2of4sparse_w4a16 example by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1565
[Pipelines] infer model device with optional override by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1572
bump up requirement for compressed-tensors to 0.10.2 by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/1581

New Contributors

@arunmadhusud made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1545

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.5.2...0.6.0

- Python
Published by dhuangnm 11 months ago

llmcompressor - v0.5.2

What's Changed

Exclude images from package by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1397
[Tracing] Skip non-ancestors of sequential targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1389
Consolidate build config by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1398
[Tests] Disable silently failing kv cache test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1371
Drop flash_attn skip for quantizing_moe example tests by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1396
[VLM] Fix mllama targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1402
[Tests] Use requires_gpu, fix missing gpu test skip, add explicit test for gpu from gha by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1264
Implement QuantizationMixin by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1351
Add new-features section by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1408
[Tracing] Support tracing of Gemma3 [#1248] by @kelkelcheng in https://github.com/vllm-project/llm-compressor/pull/1373
bugfix kv cache quantization with ignored layers by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1312
AWQ sanitize_kwargs minor cleanup by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1405
[Tracing][Testing] Add tracing tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1335
fix lm eval test reproducibility issues by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1260
Pipeline Extraction by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1279
Add pull_request trigger to base tests workflow by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1417
removing RecipeMetadata and references by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1414
Update examples to only load required number of samples from dataset by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1118
[Tracing] Reinstate ignore functionality by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1423
[Typo] overriden by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1420
Rename SparsityModifierMixin to SparsityModifierBase by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1416
Remove RecipeArgs class & its references by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1429
[Examples] Standardize AWQ example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1412
[Logging] Support logging once by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1431
Add: deepseekv2 smoothquant mappings by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1433
AWQ QuantizationMixin + SequentialPipeline by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1426
patch awq tests/readme after QuantizationMixin refactor by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1439
Added more tests for Quantization24SparseW4A16 by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1434
[GPTQ] Add actorder option to modifier by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1424
[Bugfix][Tracing] Fix qwen25vl by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1448
[Tests] Use proper offloading utils in test_compress_tensor_utils by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1449
[Tracing] Fix Traceable Imports by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1452
[NVFP4] Enable FP4 Weight-Only Quantization by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1309
Pin transformers to <4.52.0 by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1459
AWQ Apply Scales Bugfix when smooth layer output length doesn't match balance layer input length by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1451
Fix #1344 Extend e2e tests to add asym support for W8A8-Int8 by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1345
[Tests] Fix activation recipe for w8a8 asym by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1461
AWQ Qwen and Phi mappings by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1440
[Observer] Optimize mse observer by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1450
Fix: Improve SmoothQuant Support for Mixture of Experts (MoE) Models by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1455
[Tests] Add nvfp4a16 e2e test case by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1463
[Docs] Update README to list fp4 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1462
Remove duplicate model id var from awq example recipe by @AndrewMead10 in https://github.com/vllm-project/llm-compressor/pull/1467
Added observer type for testminmax by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1466
Disable kernels during calibration (and tracing) by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1454
[GPTQ] Fix actorder resolution, add sentinel by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1453
Set show_progress to True by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1471
Remove compress by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1470
raise error if block quantization is used, as it is not yet supported by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1476
[Tests] Increase max seq length for tracing tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1478
[Tests] Fix dynamic field to be a bool, not string by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1480
[Examples] Fix qwen vision examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1481
[NVFP4] Update to use tensor_group strategy; update observers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1484
loosen lmeval assertions to upper or lower bound by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1477
Revert "expand observers to calculate gparams, add example for activa… by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1486
fix rest of the minmax tests by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1469
Add warning for non-divisible group quantization by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1401
[AWQ] Support accumulation for reduced memory usage by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1435
[Tracing] Code AutoWrapper by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1411
Removed RecipeTuple & RecipeContainer class by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1460
Unpin to support transformers==4.52.3 by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1479
[Tests] GPTQ Actorder Resolution Tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1468
[Testing] Skip FP4 Test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1499
[Bugfix] Remove tracing imports from tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1498
[Testing] Use a slightly larger model that works with group_size 128 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1502
skip tracing tests if token unavailable by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1493
Fix missing logs when calling oneshot by @kelkelcheng in https://github.com/vllm-project/llm-compressor/pull/1446
[NVFP4] Expand observers to calculate gparam, support NVFP4 Activations by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1487
[Tests] Remove duplicate test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1500
[Model] Mistral3 example and test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1490
[NVFP4] Use observers to generate global weight scales by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1504
Revert "[NVFP4] Use observers to generate global weight scales " by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1507
[NVFP4] Update global scale generation by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1508
[NVFP4] Fix onloading of fused layers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1512
Pin pandas to <2.3 by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1515
AWQModifier fast resolve mappings, better logging, MoE support by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1444
Update setup.py by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1516
Use model compression pathways by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1419
[Example] [Bugfix] Fix Gemma3 Generation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1517
[Docs] Update ReadME details for FP4 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1519
[Examples] [Bugfix] Perform sample generation before saving as compressed by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1530
Add citation information both in README as well as native GitHub file support by @markurtz in https://github.com/vllm-project/llm-compressor/pull/1527
update compressed-tensors version requirement by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/1534

New Contributors

@kelkelcheng made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1373
@AndrewMead10 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1467

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.5.1...0.5.2

- Python
Published by dhuangnm 11 months ago

llmcompressor - v0.5.1

What's Changed

Update nm-actions/changed-files to v1.16.0 by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1311
docs: fix missing git clone command and repo name typos in DEVELOPING.md by @gattshjott in https://github.com/vllm-project/llm-compressor/pull/1325
Update e2e/lm-eval test infrastructure by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1323
fix(logger): normalize logfilelevel input for consistency by @gattshjott in https://github.com/vllm-project/llm-compressor/pull/1324
[Utils] Replace preserve_attr with patch_attr by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1187
Fix cut off log in entrypoints/utils.py post_process() by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1336
[Tests] Update condition for sparsity check to be more robust by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1337
[Utils] Add skip_weights_download for developers and testing by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1334
replace custom version handling with setuptools-scm by @dhellmann in https://github.com/vllm-project/llm-compressor/pull/1322
[Compression] Update sparsity calculation lifecycle when fetching the compressor by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1332
[Sequential] Support models with nested _no_split_modules by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1329
[Tracing] Remove TraceableWhisperForConditionalGeneration by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1310
Add torch device to list of offloadable types by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1348
Reduce SmoothQuant Repr by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1289
Use align_module_device util by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1298
Fix project URL in setup.py by @tiran in https://github.com/vllm-project/llm-compressor/pull/1353
Update trigger on PR comment workflow by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1357
Add timing functionality to lm-eval tests by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1346
[Callbacks][Docs] Add docstrings to saving functions by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1201
Move: recipe parsing test from e2e/ to main test suite by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1360
Smoothquant typehinting by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1285
AWQ Modifier by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1177
[Tests] Update transformers tests to run kv_cache tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1364
[Transformers] Support latest transformers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1352
Update testconsecutiveruns.py by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1366
[Docs] Mention AWQ, some clean-up by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1367
Fix versioning for source installs by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1370
[Testing] Reduce error verbosity of cleanup by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1365
Update testoneshotand_finetune.py to use pytest.approx by @markurtz in https://github.com/vllm-project/llm-compressor/pull/1339
[Tracing] Better runtime error messages by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1307
[Tests] Fix test case; update structure by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1375
fix: Make Recipe.modeldump() output compatible with modelvalidate() by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1328
Add: documentation for enhanced save_pretrained parameters by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1377
Revert "fix: Make Recipe.model_dump() output compatible .... by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1378
AWQ resolved mappings -- ensure shapes align by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1372
Update w4a16actorderweight.yaml lmeval config by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1380
[WIP] Add AWQ Asym e2e test case by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1374
Bump version; set ct version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1381
bugfix AWQ with Llama models and python 3.9 by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1384
awq -- hotfix to missing kwargs by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1395

New Contributors

@gattshjott made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1325
@dhellmann made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1322
@tiran made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1353
@ved1beta made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1346

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.5.0...0.5.1

- Python
Published by dbarbuzzi about 1 year ago

llmcompressor - v0.5.0

What's Changed

re-add vllm e2e test now that bug is fixed by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1162
Fix Readme Imports by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1165
Remove event_called by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1155
Update: Test name by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1172
Remove lifecycle initialized_structure attribute by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1156
[VLM] Qwen 2.5 VL by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1113
Revert bump by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1178
Remove CLI by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1144
Add group act order case to lm_eval test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1080
Update e2e test timings ouputs by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1179
[Oneshot Refactor] Main refactor by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1110
[StageRunner Removal] Remove Evalulate / validate pathway by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1145
[StageRemoval] Remove Predict pathway by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1146
Fix 2of4 Apply Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1181
Fix Sparse2of4 Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1182
Add qwen moe w4a16 example by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1186
[Callbacks] Consolidate Saving Methods by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1168
lmeval tests multimodal by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1150
[Dataset Performance] Add num workers on dataset processing - labels, tokenization by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1189
Fix a minor typo by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/1191
[Callbacks] Remove preinitializestructure by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1160
Make transformers-tests job conditional on files changed by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1197
Update finetune tests to decrease execution time by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1208
Update transformers tests to speed-up execution by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1211
Fix logging bug in oneshot.py by @aman2304 in https://github.com/vllm-project/llm-compressor/pull/1213
[Training] Decouple Argument parser by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1207
Remove MonkeyPatch for GPUs by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1227
[Cosmetic] Rename dataargs to datasetargs by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1206
[Training] Datasets - update Module by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1209
[BugFix] Fix logging disabling bug and add tests by @aman2304 in https://github.com/vllm-project/llm-compressor/pull/1218
[Training] Unifying Preprocess + Postprocessing logic for Train/Oneshot by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1212
[Docs] Add info on when to use which PTQ/Sparsification by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1157
[Callbacks] Remove MagnitudePruningModifier.leave_enabled by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1198
Replace Xenova model stub with nm-testing model stub by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1239
Offload Cache Support torch.dtype by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1141
Remove unused/duplicated/non-applicable utils from pytorch/utils/helpers by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1174
[Bugfix] Staged 2of4 example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1238
wandb/tensorboard loggers set default init to False by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1235
fixing reproducibility of lmeval tests by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1220
[Audio] People's Speech dataset and tracer tool by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1086
Use KV cache constant names provided by compressed tensors by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1200
[Bugfix] Raise error for processor remote code by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1184
Remove missing weights silencers in favor of HFQuantizer solution by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1017
Fix run_compressed tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1246
[Train] Training Pipeline by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1214
[Tests] Increase maximum quantization error by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1245
[Callbacks] Remove EventLifecycle and on_start event by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1170
[Bugfix] Disable generation of deepseek models with transformers>=4.48 by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1259
Remove clear_ml by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1261
[Tests] Remove clear_ml test from GHA by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1265
Remove click by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1262
[Bugfix] Remove constant pruning from 2of4 examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1267
Addback: ConstantPruningModifier for finetuning cases by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1272
Remove docker by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1255
move failing mulitmodal lmeval tests to skipped folder by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1273
Replace tj-action/changed-files by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1270
[BugFix]: Sparse2of4 example sparsity-only case by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1282
Revert "update" by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1296
Fix Multi-Context Manager Syntax for Python 3.9 Compatibility by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1287
Revert "Fix Multi-Context Manager Syntax for Python 3.9 Compatibility… by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1300
[StageRunner] Stage Runner entrypoint and pipeline by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1202
Bump: Min python version to 3.9 by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1288
Keep quantization enabled during calibration by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1299
[BugFix] TRL distillation bug fix by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1278
Update: Readme for fp8 support by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1304
[GPTQ] Add inversion fallback by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1283
fix typo by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/1290
[Tests] Fix oneshot + finetune test by passing splits to oneshot by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1316
[Tests] Remove the compress entrypoint by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1317
Fix Multi-Context Manager Syntax for Python 3.9 Compatibility by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1313
[BugFix] Directly Convert Modifiers to Recipe Instance by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1271
bump version, tag ct by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1318

New Contributors

@aman2304 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1213

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.4.1...0.5.0

- Python
Published by dhuangnm about 1 year ago

llmcompressor - v0.4.1

What's Changed

Remove version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1077
Require 'ready' label for transformers tests by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1079
GPTQModifier Nits and Code Clarity by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1068
Also run on pushes to main by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1083
VLM: Phi3 Vision Example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1032
VLM: Qwen2_VL Example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1027
Composability with sparse and quantization compressors by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/948
Remove TraceableMistralForCausalLM by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1052
[Fix Test Failure]: Propagate name change to test by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1088
[Audio] Support Audio Datasets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1085
[Test Fix] Add Quantization then finetune tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/964
[Smoothquant] Phi3 Vision Mappings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1089
[VLM] Multimodal Data Collator by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1087
VLM: Model Tracing Guide by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1030
Turn off 2:4 sparse compression until supported in vllm by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1092
[Test Fix] Fix Consecutive oneshot by @horheynm in https://github.com/vllm-project/llm-compressor/pull/971
[Bug Fix] Fix test that requre GPU by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1096
Add Idefics3/SmolVLM quant support via traceable class by @leon-seidel in https://github.com/vllm-project/llm-compressor/pull/1095
Traceability Guide: Clarity and typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1099
[VLM] Examples README by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1057
Raise warning for 24 compressed sparse-only models by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1107
Remove logmodelload by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1016
Return empty sparsity config if targets and ignores are empty by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1115
Remove uses of get_observer by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/939
FSDP utils cleanup by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/854
Update maintainers, add notice by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1091
Replace readme paths with urls by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1097
GPTQ add Arkiv link, move file location by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1100
Extend remove_hooks to remove subsets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1021
[Audio] Whisper Example and Readme by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1106
[Audio] Add whisper fp8 dynamic example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1111
[VLM] Update pixtral data collator to reflect latest transformers changes by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1116
Use unique test names in TestvLLM by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1124
Remove smoothquant from examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1121
Extend disable_hooks to keep subsets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1023
Unpin pynvml to fix e2e test failures with vLLM by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1125
Replace LayerCompressor with HooksMixin by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1038
[Oneshot Refactor] Rename getsharedprocessorsrc to getprocessornamefrom_model by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1108
Allow Shortcutting Min-max Observer by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/887
[Polish] Remove unused code by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1128
Properly restore training mode with eval_context by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1126
SQ and QM: Remove torch.cuda.empty_cache, use calibration_forward_context by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1114
[Oneshot Refactor] dataclass Arguments by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1103
[Bugfix] SparseGPT, Pipelines by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1130
[Oneshot refactor] Refactor initializemodelfrom_path by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1109
[e2e] Update vllm tests with additional datasets by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1131
Update: SparseGPT recipes by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1142
Add timer support for testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1137
[Audio] Support Whisper V3 by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1147
Fix: Re-enable Sparse Compression for 2of4 Examples by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1153
[VLM] Add caption to flickr dataset by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1138
[VLM] Update mllama traceable definition by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1140
Fix CPU Offloading by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1159
[TRLSFTTrainer] Fix and Update Examples code by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1161
[TRLSFTTrainer] Fix TRL-SFT Distillation Training by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1163
Bump version for patch release by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1166
Update DeepSeek Examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1175
Update gemma2 examples with a note about sample generation by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1176

New Contributors

@leon-seidel made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1095

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.4.0...0.4.1

- Python
Published by dhuangnm over 1 year ago

llmcompressor - v0.4.0

What's Changed

Record config file name as test suite property by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/947
Update setup.py by @dsikka in https://github.com/vllm-project/llm-compressor/pull/975
Depreciate OBCQ Helpers by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/977
KV Cache, E2E Tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/742
Use 1 GPU for offloading examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/979
Replace tokenizer with processor by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/955
Revert "KV Cache, E2E Tests (#742)" by @dsikka in https://github.com/vllm-project/llm-compressor/pull/989
Fix SmoothQuant offload bug by @dsikka in https://github.com/vllm-project/llm-compressor/pull/978
Add LM Eval Configs by @dsikka in https://github.com/vllm-project/llm-compressor/pull/980
Fix test_model_reload test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1005
Calibration and Compression Contexts by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/998
Add info for clarity by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1009
[Bugfix] Pass trust_remote_code_model=True for deepseek examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1012
Vision Datasets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/943
Add example for fp8 kv cache of phi3.5 and gemma2 by @mgoin in https://github.com/vllm-project/llm-compressor/pull/991
Update ReadMe and test for cpu_offloading by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1013
Adding amdsmi for AMD gpus by @citrix123 in https://github.com/vllm-project/llm-compressor/pull/1018
CompressionLogger add time units by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1026
patchtiedtensors_bug: support malformed model definitions by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1014
Add: 2of4 example with/without fp8 quantization by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1033
Remove unccessary step in 2of4 Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1034
Remove Neural Magic copyright from files by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/992
VLM Support via GPTQ Hooks and Data Pipelines by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/914
[E2E Testing] KV-Cache by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1004
[E2E Testing] Add recipe check vllm e2e by @horheynm in https://github.com/vllm-project/llm-compressor/pull/929
[MoE] GPTQ compress using callback not hook by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1049
Explicit dataset tokenizer text kwarg by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1031
Fix smoothquant ignore, Fix typing, Add glm mappings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1015
[Test Fix] Quant model reload by @horheynm in https://github.com/vllm-project/llm-compressor/pull/974
Remove old examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1062
VLM: Fix typo bug in TraceableLlavaForConditionalGeneration by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1065
Add tests for "examples/sparse2of4[...]" by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1067
VLM Image Examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1064
Add quick warning for DeepSeek with transformers 4.48.0 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1066
[KV Cache] kv-cache end to end unit tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/141
[E2E Testing] Fix HF upload by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1061
[Test Fix] Fix/update testruncompressed by @horheynm in https://github.com/vllm-project/llm-compressor/pull/970
Revert "[Test Fix] Fix/update testruncompressed" by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1071
Sparse 2:4 + FP8 Quantization e2e vLLM tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1073
[Test Patch] Remove redundant code for "Fix/update testruncompressed" by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1072
bump; set ct version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1076

New Contributors

@citrix123 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1018

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.3.1...0.4.0

- Python
Published by dhuangnm over 1 year ago

llmcompressor - v0.3.1

What's Changed

BLOOM Default Smoothquant Mappings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/906
[SparseAutoModelForCausalLM Deprecation] Feature change by @horheynm in https://github.com/vllm-project/llm-compressor/pull/881
Correct "dyanmic" typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/888
Explicit defaults for QuantizationModifier targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/889
[SparseAutoModelForCausalLM Deprecation] Update examples by @horheynm in https://github.com/vllm-project/llm-compressor/pull/880
Support pack_quantized format for nonuniform mixed-precision by @mgoin in https://github.com/vllm-project/llm-compressor/pull/913
Actually make the run_compressed test useful by @dsikka in https://github.com/vllm-project/llm-compressor/pull/920
Fix for e2e tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/927
[Bugfix] Correct metrics calculations by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/878
Update kv_cache example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/921
[1/2] Expand e2e testing to prepare for lm-eval by @dsikka in https://github.com/vllm-project/llm-compressor/pull/922
Update pytest command to capture results to file by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/932
[Bugfix] DisableKVCache Context by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/834
Add helpful info to the marlin-24 example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/946
Remove requires_torch by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/949
Remove unused sparseml.export utilities by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/950
Implement HooksMixin by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/917
Add LM Eval Testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/945
update version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/969

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.3.0...0.3.1

- Python
Published by dhuangnm over 1 year ago

llmcompressor - v0.3.0

What's New in v0.3.0

Key Features and Improvements

GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

Fix Tied Tensors Bug (#659)
Observer Initialization in GPTQ Wrapper (#883)
Sparsity Reload Testing (#882)

Documentation

Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

Fix compresed typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/188
GPTQ Quantized-weight Sequential Updating by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/177
Add: targets and ignore inference for sparse compression by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/191
switch tests from weekly to nightly by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/658
Compression wrapper abstract methods by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/170
Explicitly set sequential_update in examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/187
Increase Sparsity Threshold for compressors by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/679
Add a generic wrap_hf_model_class utility to support VLMs by @mgoin in https://github.com/vllm-project/llm-compressor/pull/185
Add tests for examples by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/149
Rename to quantization config by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/730
Implement Missing Modifier Methods by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/166
Fix 2/4 GPTQ Model Tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/769
SmoothQuant mappings tutorial by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/115
Fix import of ModelCompressor by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/776
update test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/773
[Bugfix] Fix saving offloaded state dict by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/172
Auto-Infer mappings Argument for SmoothQuantModifier Based on Model Architecture by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/119
Update workflows/actions by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/774
[Bugfix] Prepare KD Models when Saving by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/174
Set Sparse compression to save_compressed by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/821
Install compressed-tensors after llm-compressor by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/825
Fix test typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/828
Add AutoModelForCausalLM example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/698
[Bugfix] Workaround tied tensors bug by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/659
Only untie word embeddings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/839
Check for config hidden size by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/840
Use float32 for Hessian dtype by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/847
GPTQ: Depreciate non-sequential update option by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/762
Typehint nits by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/826
[ DOC ] Remove version restrictions in W8A8 exmaple by @miaojinc in https://github.com/vllm-project/llm-compressor/pull/849
Fix inconsistence in example config of 2:4 sparse quantization by @yzlnew in https://github.com/vllm-project/llm-compressor/pull/80
Fix forward function pass call by @dsikka in https://github.com/vllm-project/llm-compressor/pull/845
[Bugfix] Use weight parameter of linear layer by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/836
[Bugfix] Rename files to remove colons by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/846
cover all 3.9-3.12 in commit testing by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/864
Add marlin-24 recipe/configs for e2e testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/866
[Bugfix] onload during sparsity calculation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/862
Fix HFTrainer overloads by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/869
Support Model Offloading Tied Tensors Patch by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/872
Add advice about dealing with non-invertable hessians by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/875
seed commit workflow by @andy-neuma in https://github.com/vllm-project/llm-compressor/pull/877
[Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier by @dsikka in https://github.com/vllm-project/llm-compressor/pull/837
Bugfix observer initialization in gptq_wrapper by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/883
BugFix: Fix Sparsity Reload Testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/882
Use custom unique test names for e2e tests by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/892
Revert "Use custom unique test names for e2e tests (#892)" by @dsikka in https://github.com/vllm-project/llm-compressor/pull/893
Move config["testconfig_path"] assignment by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/895
Cap accelerate version to avoid bug by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/897
Fix observing offloaded weight by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/896
Update image in README.md by @mgoin in https://github.com/vllm-project/llm-compressor/pull/861
update accelerate version by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/899
[GPTQ] Iterative Parameter Updating by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/863
Small fixes for release by @dsikka in https://github.com/vllm-project/llm-compressor/pull/901
use smaller portion of dataset by @dsikka in https://github.com/vllm-project/llm-compressor/pull/902
Update example to not fail hessian inversion by @dsikka in https://github.com/vllm-project/llm-compressor/pull/904
Bump version to 0.3.0 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/907

New Contributors

@miaojinc made their first contribution in https://github.com/vllm-project/llm-compressor/pull/849
@yzlnew made their first contribution in https://github.com/vllm-project/llm-compressor/pull/80
@andy-neuma made their first contribution in https://github.com/vllm-project/llm-compressor/pull/877

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.2.0...0.3.0

- Python
Published by dhuangnm over 1 year ago

llmcompressor - v0.2.0

What's Changed

Correct Typo in SparseAutoModelForCausalLM docstring by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/56
Disable Default Bitmask Compression by @Satrat in https://github.com/vllm-project/llm-compressor/pull/60
TRL Example fix by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/59
Fix typo by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/63
Correct typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/61
correct import in README.md by @zzc0430 in https://github.com/vllm-project/llm-compressor/pull/66
Fix for issue #43 -- starcoder model by @horheynm in https://github.com/vllm-project/llm-compressor/pull/71
Update README.md by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/74
Layer by Layer Sequential GPTQ Updates by @Satrat in https://github.com/vllm-project/llm-compressor/pull/47
[ Docs ] Update main readme by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/77
[ Docs ] gemma2 examples by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/78
[ Docs ] Update FP8 example to use dynamic per token by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/75
[ Docs ] Overhaul accelerate user guide by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/76
Support kv_cache_scheme for quantizing KV Cache by @mgoin in https://github.com/vllm-project/llm-compressor/pull/88
Propagate trust_remote_code Argument by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/90
Fix for issue #81 by @horheynm in https://github.com/vllm-project/llm-compressor/pull/84
Fix for issue 83 by @horheynm in https://github.com/vllm-project/llm-compressor/pull/85
[ DOC ] Big Model Example by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/99
Enable obcq/finetune integration tests with commit cadence by @dsikka in https://github.com/vllm-project/llm-compressor/pull/101
metric logging on GPTQ path by @horheynm in https://github.com/vllm-project/llm-compressor/pull/65
Update test config files by @dsikka in https://github.com/vllm-project/llm-compressor/pull/97
remove workflows + update runners by @dsikka in https://github.com/vllm-project/llm-compressor/pull/103
metrics by @horheynm in https://github.com/vllm-project/llm-compressor/pull/104
add debug by @horheynm in https://github.com/vllm-project/llm-compressor/pull/108
Add FP8 KV Cache quant example by @mgoin in https://github.com/vllm-project/llm-compressor/pull/113
Add vLLM e2e tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/117
Fix style, fix noqa by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/123
GPTQ Algorithm Cleanup by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/120
GPTQ Activation Ordering by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/94
demote recipe string initialization to debug and make more descriptive by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/116
compressed-tensors main dependency for base-tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/125
Set ready label for transformer tests; add message reminder on PR opened by @dsikka in https://github.com/vllm-project/llm-compressor/pull/126
Fix markdown check test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/127
Naive Run Compressed Pt. 2 by @Satrat in https://github.com/vllm-project/llm-compressor/pull/62
Fix transformer test conditions by @dsikka in https://github.com/vllm-project/llm-compressor/pull/131
Run Compressed Tests by @Satrat in https://github.com/vllm-project/llm-compressor/pull/132
Correct typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/124
Activation Ordering Strategies by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/121
Fix README Issue by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/139
update by @dsikka in https://github.com/vllm-project/llm-compressor/pull/143
Update finetune and oneshot tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/114
Validate Recipe Parsing Output by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/100
fix build error for nightly by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/145
Fix recipe nested in configs by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/140
MOE example with warning by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/87
Bug Fix: recipe stages were not being concatenated by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/150
fix package name bug for nightly by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/155
Add descriptions for pytest marks by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/156
Fix Sparsity Unit Test by @Satrat in https://github.com/vllm-project/llm-compressor/pull/153
Fix: Error during model saving with shared tensors by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/158
Update 2:4 Examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/161
DeepSeek: Fix Hessian Estimation by @Satrat in https://github.com/vllm-project/llm-compressor/pull/157
bump up main to 0.2.0 by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/163
Fix help dialogue by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/151
Add MoE and Compressed Inference Examples by @Satrat in https://github.com/vllm-project/llm-compressor/pull/160
Separate trust_remote_code args by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/152
Enable a skipped finetune test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/169
Fix filename in example command by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/173
Add DeepSeek V2.5 Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/171
fix quality by @dsikka in https://github.com/vllm-project/llm-compressor/pull/176
Patch log function name in gptq by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/168
README for Modifiers by @Satrat in https://github.com/vllm-project/llm-compressor/pull/165
Fix default for sequential updates by @dsikka in https://github.com/vllm-project/llm-compressor/pull/186
fix default test case by @dsikka in https://github.com/vllm-project/llm-compressor/pull/193
Fix Initalize typo by @Imss27 in https://github.com/vllm-project/llm-compressor/pull/190
Update MoE examples by @mgoin in https://github.com/vllm-project/llm-compressor/pull/192

New Contributors

@zzc0430 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/66
@horheynm made their first contribution in https://github.com/vllm-project/llm-compressor/pull/71
@dsikka made their first contribution in https://github.com/vllm-project/llm-compressor/pull/101
@dhuangnm made their first contribution in https://github.com/vllm-project/llm-compressor/pull/145
@Imss27 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/190

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.1.0...0.2.0

- Python
Published by dhuangnm over 1 year ago

llmcompressor - v0.1.0

What's Changed

Address Test Failures by @Satrat in https://github.com/vllm-project/llm-compressor/pull/1
Remove SparseZoo Usage by @Satrat in https://github.com/vllm-project/llm-compressor/pull/2
SparseML Cleanup by @markurtz in https://github.com/vllm-project/llm-compressor/pull/6
Remove all references to Neural Magic copyright within LLM Compressor by @markurtz in https://github.com/vllm-project/llm-compressor/pull/7
Add FP8 Support by @Satrat in https://github.com/vllm-project/llm-compressor/pull/4
Fix Weekly Test Failure by @Satrat in https://github.com/vllm-project/llm-compressor/pull/8
Add Scheme UX for QuantizationModifier by @Satrat in https://github.com/vllm-project/llm-compressor/pull/9
Add Group Quantization Test Case by @Satrat in https://github.com/vllm-project/llm-compressor/pull/10
Loguru logging standardization for LLM Compressor by @markurtz in https://github.com/vllm-project/llm-compressor/pull/11
Clarify Function Names for Logging by @Satrat in https://github.com/vllm-project/llm-compressor/pull/12
[ Examples ] E2E Examples by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/5
Update setup.py by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/15
SmoothQuant Mapping Defaults by @Satrat in https://github.com/vllm-project/llm-compressor/pull/13
Initial README by @bfineran in https://github.com/vllm-project/llm-compressor/pull/3
[Bug] Fix validation errors for smoothquant modifier + update examples by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/19
[MOE Quantization] Warn against "undercalibrated" modules by @dbogunowicz in https://github.com/vllm-project/llm-compressor/pull/20
Port SparseML Remote Code Fix by @Satrat in https://github.com/vllm-project/llm-compressor/pull/21
Update Quantization Save Defaults by @Satrat in https://github.com/vllm-project/llm-compressor/pull/22
[Bugfix] Add fix to preserve modifier order when passed as a list by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/26
GPTQ - move calibration of quantiztion params to after hessian calibration by @bfineran in https://github.com/vllm-project/llm-compressor/pull/25
Fix typos by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/31
Remove ceiling from datasets dep by @mgoin in https://github.com/vllm-project/llm-compressor/pull/27
Revert naive compression format by @Satrat in https://github.com/vllm-project/llm-compressor/pull/32
Fix layerwise targets by @Satrat in https://github.com/vllm-project/llm-compressor/pull/36
Move Weight Update Out Of Loop by @Satrat in https://github.com/vllm-project/llm-compressor/pull/40
Fix End Epoch Default by @Satrat in https://github.com/vllm-project/llm-compressor/pull/39
Fix typos in example for w8a8 quant by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/38
Model Offloading Support Pt 2 by @Satrat in https://github.com/vllm-project/llm-compressor/pull/34
set version to 1.0.0 for release by @bfineran in https://github.com/vllm-project/llm-compressor/pull/44
Update version for first release by @markurtz in https://github.com/vllm-project/llm-compressor/pull/50
BugFix: Update TRL example scripts to point to the right SFTTrainer by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/51
Update examples/quantization24sparse_w4a16 README by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/52
Fix Failing Transformers Tests by @Satrat in https://github.com/vllm-project/llm-compressor/pull/53
Offloading Bug Fix by @Satrat in https://github.com/vllm-project/llm-compressor/pull/58

New Contributors

@markurtz made their first contribution in https://github.com/vllm-project/llm-compressor/pull/6
@bfineran made their first contribution in https://github.com/vllm-project/llm-compressor/pull/3
@dbogunowicz made their first contribution in https://github.com/vllm-project/llm-compressor/pull/20
@eldarkurtic made their first contribution in https://github.com/vllm-project/llm-compressor/pull/31
@mgoin made their first contribution in https://github.com/vllm-project/llm-compressor/pull/27
@dbarbuzzi made their first contribution in https://github.com/vllm-project/llm-compressor/pull/52

Full Changelog: https://github.com/vllm-project/llm-compressor/commits/0.1.0

- Python
Published by dhuangnm almost 2 years ago

Recent Releases of llmcompressor

llmcompressor - v0.7.1

What's Changed

New Contributors

llmcompressor - v0.7.0

LLM Compressor v0.7.0 release notes

Introducing Transforms :sparkles:

Applying multiple compressors to a single model

Support for DeepSeekV3-style block FP8 quantization

Mixture of Experts support

Llama4 quantization

Simplified and updated Recipe classes

Configurable Observer arguments

llmcompressor - v0.6.0.1

What's Changed

llmcompressor - v0.6.0

What's Changed

New Contributors

llmcompressor - v0.5.2

What's Changed

New Contributors

llmcompressor - v0.5.1

What's Changed

New Contributors

llmcompressor - v0.5.0

What's Changed

New Contributors

llmcompressor - v0.4.1

What's Changed

New Contributors

llmcompressor - v0.4.0

What's Changed

New Contributors

llmcompressor - v0.3.1

What's Changed

llmcompressor - v0.3.0

What's New in v0.3.0

Key Features and Improvements

Bug Fixes

Documentation

What's Changed

New Contributors

llmcompressor - v0.2.0

What's Changed

New Contributors

llmcompressor - v0.1.0

What's Changed

New Contributors