Recent Releases of llmcompressor

llmcompressor - v0.7.1

What's Changed

  • [Examples] Create qwen25vlexample.py by @Zhao-Dongyu in https://github.com/vllm-project/llm-compressor/pull/1752
  • [fix] Fix visual layer ignore pattern for Qwen2.5-VL models by @Zhao-Dongyu in https://github.com/vllm-project/llm-compressor/pull/1766
  • [Transform] Fix QuIP targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1770

New Contributors

  • @Zhao-Dongyu made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1752

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.7.0...0.7.1

- Python
Published by dbarbuzzi 6 months ago

llmcompressor - v0.7.0

lc

LLM Compressor v0.7.0 release notes

This LLM Compressor v0.7.0 release introduces the following new features and enhancements: * Transforms support, including QuIP and SpinQuant algorithms * Apply multiple compressors to a single model for mixed-precision quantization * Support for DeepSeekV3-style block FP8 quantization * Expanded Mixture of Experts (MoE) calibration support, including support with NVFP4 quantization * Llama4 quantization support with vLLM compatibility * Configurable observer arguments * Simplified and unified Recipe classes for easier usage and debugging

Introducing Transforms :sparkles:

LLM Compressor now supports transforms. With transforms, you can inject additional matrix operations within a model for the purposes of increasing the accuracy recovery as a result of quantization. Transforms allow rotating weights or activations into spaces with smaller dynamic ranges, reducing quantization error.

Two algorithms are supported in this release: * QuIP transforms inject transforms before and after weights to assist with weight-only quantization * SpinQuant transforms inject transforms whose inverses span across multiple weights, assisting in both weight and activation quantization. In this release, fused R1 and R2 (i.e. offline) transforms are available. The full lifecycle has been validated to confirm that the models produced by LLM Compressor match the performance outlined in the original SpinQuant paper. Learned rotations and online R3 and R4 rotations will be added in a future release.

The functionality for both algorithms available through the new QuIPModifier and SpinQuantModifier classes.

Applying multiple compressors to a single model

LLM Compressor now supports applying multiple compressors to a single model. This extends support for non-uniform quantization recipes, such as combining NVFP4 and FP8 quantization. This provides finer control over per-layer quantization, allowing more precise handling of layers that are especially sensitive to certain quantization types.

Models with more than one compressor applied have their format set to mixed-precision in the config.json file. Additionally, each config_group now includes a format key that specifies the format used for the layers targeted by that group.

Support for DeepSeekV3-style block FP8 quantization

You can now apply DeepSeekV3-style block FP8 quantization during model compression, a technique designed to further compress large language models for more efficient inference. The changes encompass the fundamental implementation of block-wise quantization, robust handling of quantization parameters, updated documentation, and a practical example to guide users in applying this new compression scheme.

Mixture of Experts support

LLM Compressor now includes enhanced general Mixture of Experts (MoE) calibration support, including support for MoEs with NVFP4 quantization. Forward passes of the MoE models can be controlled during calibration by adding custom modules to the replace_modules_for_calibration function which permanently changes the MoE module or moe_calibration_context function to temporarily update modules during calibration.

Llama4 quantization

LLama4 quantization is now supported in LLM Compressor. To be quantized and runnable in vLLM, Llama4TextMoe modules are permanently replaced using the replace_modules_for_calibration method which linearizes the modules. This allows the model to be quantized to schemes including WN16 with GPTQ and NVFP4.

Simplified and updated Recipe classes

Recipe classes have been updated with the following features:

  • Merged multiple recipe-related classes into a single, unified Recipe class
  • Simplified modifier creation, lifecycle management, and parsing logic
  • Improved serialization and deserialization for clarity and maintainability
  • Reduced redundant stages and arguments handling for easier debugging and usage

Configurable Observer arguments

Observer arguments can now be configured as a dict through the observer_kwargs quantization argument, which can be set through oneshot recipes.

- Python
Published by dhuangnm 6 months ago

llmcompressor - v0.6.0.1

What's Changed

  • Cap transformers version for hotfix 0.6.0.1 by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/1671

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.6.0...0.6.0.1

- Python
Published by dhuangnm 7 months ago

llmcompressor - v0.6.0

What's Changed

  • [Experimental] Mistral-format FP8 quantization by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1359
  • [Examples] [Bugfix] skip sparsity stats when saving checkpoints by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1528
  • [Examples] [Bugfix] Fix debug message by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1529
  • [Tests][NVFP4] No longer skip NVFP4A16 e2e test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1538
  • [AWQ] Support for Calibration Datasets of varying feature dimension by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1536
  • fix qwen 2.5 VL multimodal example by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1541
  • [Example] [Bugfix] Fix Gemma ignore list by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1531
  • [Tests][NVFP4] Add e2e nvfp4 test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1543
  • [Examples] Use more robust splits by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1544
  • [Bugfix] [Autowrapper] Fix visit_Delete by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1532
  • [Example] Fix Qwen VL ignore list by @arunmadhusud in https://github.com/vllm-project/llm-compressor/pull/1545
  • [Tests] Fix Qwen2.5-VL-7B-Instruct Recipe by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1548
  • [Bugfix] Fix gemma2 generation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1552
  • fix skipif check on tests involving gated HF models by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1553
  • [NVFP4] Fix global scale update when dealing with offloaded layers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1554
  • oneshot entrypoint update by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1445
  • LM Eval tests -- ignore vision tower for VL fp8 test by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1562
  • [Performance] Sequential onloading by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1263
  • [BugFix] Explicitly set gpumemoryutilization by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1560
  • Add Axolotl blog link by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1563
  • [Bugfix] Fix multigpu dispatch_for_generation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1567
  • [Testing] Set VLLM_WORKER_MULTIPROC_METHOD for e2e testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1569
  • [BugFix] Fix quantizaiton2of4sparse_w4a16 example by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1565
  • [Pipelines] infer model device with optional override by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1572
  • bump up requirement for compressed-tensors to 0.10.2 by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/1581

New Contributors

  • @arunmadhusud made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1545

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.5.2...0.6.0

- Python
Published by dhuangnm 8 months ago

llmcompressor - v0.5.2

What's Changed

  • Exclude images from package by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1397
  • [Tracing] Skip non-ancestors of sequential targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1389
  • Consolidate build config by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1398
  • [Tests] Disable silently failing kv cache test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1371
  • Drop flash_attn skip for quantizing_moe example tests by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1396
  • [VLM] Fix mllama targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1402
  • [Tests] Use requires_gpu, fix missing gpu test skip, add explicit test for gpu from gha by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1264
  • Implement QuantizationMixin by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1351
  • Add new-features section by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1408
  • [Tracing] Support tracing of Gemma3 [#1248] by @kelkelcheng in https://github.com/vllm-project/llm-compressor/pull/1373
  • bugfix kv cache quantization with ignored layers by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1312
  • AWQ sanitize_kwargs minor cleanup by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1405
  • [Tracing][Testing] Add tracing tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1335
  • fix lm eval test reproducibility issues by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1260
  • Pipeline Extraction by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1279
  • Add pull_request trigger to base tests workflow by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1417
  • removing RecipeMetadata and references by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1414
  • Update examples to only load required number of samples from dataset by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1118
  • [Tracing] Reinstate ignore functionality by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1423
  • [Typo] overriden by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1420
  • Rename SparsityModifierMixin to SparsityModifierBase by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1416
  • Remove RecipeArgs class & its references by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1429
  • [Examples] Standardize AWQ example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1412
  • [Logging] Support logging once by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1431
  • Add: deepseekv2 smoothquant mappings by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1433
  • AWQ QuantizationMixin + SequentialPipeline by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1426
  • patch awq tests/readme after QuantizationMixin refactor by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1439
  • Added more tests for Quantization24SparseW4A16 by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1434
  • [GPTQ] Add actorder option to modifier by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1424
  • [Bugfix][Tracing] Fix qwen25vl by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1448
  • [Tests] Use proper offloading utils in test_compress_tensor_utils by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1449
  • [Tracing] Fix Traceable Imports by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1452
  • [NVFP4] Enable FP4 Weight-Only Quantization by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1309
  • Pin transformers to <4.52.0 by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1459
  • AWQ Apply Scales Bugfix when smooth layer output length doesn't match balance layer input length by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1451
  • Fix #1344 Extend e2e tests to add asym support for W8A8-Int8 by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1345
  • [Tests] Fix activation recipe for w8a8 asym by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1461
  • AWQ Qwen and Phi mappings by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1440
  • [Observer] Optimize mse observer by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1450
  • Fix: Improve SmoothQuant Support for Mixture of Experts (MoE) Models by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1455
  • [Tests] Add nvfp4a16 e2e test case by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1463
  • [Docs] Update README to list fp4 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1462
  • Remove duplicate model id var from awq example recipe by @AndrewMead10 in https://github.com/vllm-project/llm-compressor/pull/1467
  • Added observer type for testminmax by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1466
  • Disable kernels during calibration (and tracing) by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1454
  • [GPTQ] Fix actorder resolution, add sentinel by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1453
  • Set show_progress to True by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1471
  • Remove compress by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1470
  • raise error if block quantization is used, as it is not yet supported by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1476
  • [Tests] Increase max seq length for tracing tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1478
  • [Tests] Fix dynamic field to be a bool, not string by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1480
  • [Examples] Fix qwen vision examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1481
  • [NVFP4] Update to use tensor_group strategy; update observers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1484
  • loosen lmeval assertions to upper or lower bound by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1477
  • Revert "expand observers to calculate gparams, add example for activa… by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1486
  • fix rest of the minmax tests by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1469
  • Add warning for non-divisible group quantization by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1401
  • [AWQ] Support accumulation for reduced memory usage by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1435
  • [Tracing] Code AutoWrapper by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1411
  • Removed RecipeTuple & RecipeContainer class by @shanjiaz in https://github.com/vllm-project/llm-compressor/pull/1460
  • Unpin to support transformers==4.52.3 by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1479
  • [Tests] GPTQ Actorder Resolution Tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1468
  • [Testing] Skip FP4 Test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1499
  • [Bugfix] Remove tracing imports from tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1498
  • [Testing] Use a slightly larger model that works with group_size 128 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1502
  • skip tracing tests if token unavailable by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1493
  • Fix missing logs when calling oneshot by @kelkelcheng in https://github.com/vllm-project/llm-compressor/pull/1446
  • [NVFP4] Expand observers to calculate gparam, support NVFP4 Activations by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1487
  • [Tests] Remove duplicate test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1500
  • [Model] Mistral3 example and test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1490
  • [NVFP4] Use observers to generate global weight scales by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1504
  • Revert "[NVFP4] Use observers to generate global weight scales " by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1507
  • [NVFP4] Update global scale generation by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1508
  • [NVFP4] Fix onloading of fused layers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1512
  • Pin pandas to <2.3 by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1515
  • AWQModifier fast resolve mappings, better logging, MoE support by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1444
  • Update setup.py by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1516
  • Use model compression pathways by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1419
  • [Example] [Bugfix] Fix Gemma3 Generation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1517
  • [Docs] Update ReadME details for FP4 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1519
  • [Examples] [Bugfix] Perform sample generation before saving as compressed by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1530
  • Add citation information both in README as well as native GitHub file support by @markurtz in https://github.com/vllm-project/llm-compressor/pull/1527
  • update compressed-tensors version requirement by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/1534

New Contributors

  • @kelkelcheng made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1373
  • @AndrewMead10 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1467

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.5.1...0.5.2

- Python
Published by dhuangnm 8 months ago

llmcompressor - v0.5.1

What's Changed

  • Update nm-actions/changed-files to v1.16.0 by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1311
  • docs: fix missing git clone command and repo name typos in DEVELOPING.md by @gattshjott in https://github.com/vllm-project/llm-compressor/pull/1325
  • Update e2e/lm-eval test infrastructure by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1323
  • fix(logger): normalize logfilelevel input for consistency by @gattshjott in https://github.com/vllm-project/llm-compressor/pull/1324
  • [Utils] Replace preserve_attr with patch_attr by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1187
  • Fix cut off log in entrypoints/utils.py post_process() by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1336
  • [Tests] Update condition for sparsity check to be more robust by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1337
  • [Utils] Add skip_weights_download for developers and testing by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1334
  • replace custom version handling with setuptools-scm by @dhellmann in https://github.com/vllm-project/llm-compressor/pull/1322
  • [Compression] Update sparsity calculation lifecycle when fetching the compressor by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1332
  • [Sequential] Support models with nested _no_split_modules by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1329
  • [Tracing] Remove TraceableWhisperForConditionalGeneration by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1310
  • Add torch device to list of offloadable types by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1348
  • Reduce SmoothQuant Repr by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1289
  • Use align_module_device util by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1298
  • Fix project URL in setup.py by @tiran in https://github.com/vllm-project/llm-compressor/pull/1353
  • Update trigger on PR comment workflow by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1357
  • Add timing functionality to lm-eval tests by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1346
  • [Callbacks][Docs] Add docstrings to saving functions by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1201
  • Move: recipe parsing test from e2e/ to main test suite by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1360
  • Smoothquant typehinting by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1285
  • AWQ Modifier by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1177
  • [Tests] Update transformers tests to run kv_cache tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1364
  • [Transformers] Support latest transformers by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1352
  • Update testconsecutiveruns.py by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1366
  • [Docs] Mention AWQ, some clean-up by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1367
  • Fix versioning for source installs by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1370
  • [Testing] Reduce error verbosity of cleanup by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1365
  • Update testoneshotand_finetune.py to use pytest.approx by @markurtz in https://github.com/vllm-project/llm-compressor/pull/1339
  • [Tracing] Better runtime error messages by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1307
  • [Tests] Fix test case; update structure by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1375
  • fix: Make Recipe.modeldump() output compatible with modelvalidate() by @ved1beta in https://github.com/vllm-project/llm-compressor/pull/1328
  • Add: documentation for enhanced save_pretrained parameters by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1377
  • Revert "fix: Make Recipe.model_dump() output compatible .... by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1378
  • AWQ resolved mappings -- ensure shapes align by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1372
  • Update w4a16actorderweight.yaml lmeval config by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1380
  • [WIP] Add AWQ Asym e2e test case by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1374
  • Bump version; set ct version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1381
  • bugfix AWQ with Llama models and python 3.9 by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1384
  • awq -- hotfix to missing kwargs by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1395

New Contributors

  • @gattshjott made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1325
  • @dhellmann made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1322
  • @tiran made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1353
  • @ved1beta made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1346

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.5.0...0.5.1

- Python
Published by dbarbuzzi 10 months ago

llmcompressor - v0.5.0

What's Changed

  • re-add vllm e2e test now that bug is fixed by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1162
  • Fix Readme Imports by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1165
  • Remove event_called by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1155
  • Update: Test name by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1172
  • Remove lifecycle initialized_structure attribute by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1156
  • [VLM] Qwen 2.5 VL by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1113
  • Revert bump by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1178
  • Remove CLI by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1144
  • Add group act order case to lm_eval test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1080
  • Update e2e test timings ouputs by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1179
  • [Oneshot Refactor] Main refactor by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1110
  • [StageRunner Removal] Remove Evalulate / validate pathway by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1145
  • [StageRemoval] Remove Predict pathway by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1146
  • Fix 2of4 Apply Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1181
  • Fix Sparse2of4 Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1182
  • Add qwen moe w4a16 example by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1186
  • [Callbacks] Consolidate Saving Methods by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1168
  • lmeval tests multimodal by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1150
  • [Dataset Performance] Add num workers on dataset processing - labels, tokenization by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1189
  • Fix a minor typo by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/1191
  • [Callbacks] Remove preinitializestructure by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1160
  • Make transformers-tests job conditional on files changed by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1197
  • Update finetune tests to decrease execution time by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1208
  • Update transformers tests to speed-up execution by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1211
  • Fix logging bug in oneshot.py by @aman2304 in https://github.com/vllm-project/llm-compressor/pull/1213
  • [Training] Decouple Argument parser by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1207
  • Remove MonkeyPatch for GPUs by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1227
  • [Cosmetic] Rename dataargs to datasetargs by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1206
  • [Training] Datasets - update Module by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1209
  • [BugFix] Fix logging disabling bug and add tests by @aman2304 in https://github.com/vllm-project/llm-compressor/pull/1218
  • [Training] Unifying Preprocess + Postprocessing logic for Train/Oneshot by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1212
  • [Docs] Add info on when to use which PTQ/Sparsification by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1157
  • [Callbacks] Remove MagnitudePruningModifier.leave_enabled by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1198
  • Replace Xenova model stub with nm-testing model stub by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1239
  • Offload Cache Support torch.dtype by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1141
  • Remove unused/duplicated/non-applicable utils from pytorch/utils/helpers by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1174
  • [Bugfix] Staged 2of4 example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1238
  • wandb/tensorboard loggers set default init to False by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1235
  • fixing reproducibility of lmeval tests by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1220
  • [Audio] People's Speech dataset and tracer tool by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1086
  • Use KV cache constant names provided by compressed tensors by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1200
  • [Bugfix] Raise error for processor remote code by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1184
  • Remove missing weights silencers in favor of HFQuantizer solution by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1017
  • Fix run_compressed tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1246
  • [Train] Training Pipeline by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1214
  • [Tests] Increase maximum quantization error by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1245
  • [Callbacks] Remove EventLifecycle and on_start event by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1170
  • [Bugfix] Disable generation of deepseek models with transformers>=4.48 by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1259
  • Remove clear_ml by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1261
  • [Tests] Remove clear_ml test from GHA by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1265
  • Remove click by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1262
  • [Bugfix] Remove constant pruning from 2of4 examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1267
  • Addback: ConstantPruningModifier for finetuning cases by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1272
  • Remove docker by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1255
  • move failing mulitmodal lmeval tests to skipped folder by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1273
  • Replace tj-action/changed-files by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1270
  • [BugFix]: Sparse2of4 example sparsity-only case by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1282
  • Revert "update" by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1296
  • Fix Multi-Context Manager Syntax for Python 3.9 Compatibility by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1287
  • Revert "Fix Multi-Context Manager Syntax for Python 3.9 Compatibility… by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1300
  • [StageRunner] Stage Runner entrypoint and pipeline by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1202
  • Bump: Min python version to 3.9 by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1288
  • Keep quantization enabled during calibration by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1299
  • [BugFix] TRL distillation bug fix by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1278
  • Update: Readme for fp8 support by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1304
  • [GPTQ] Add inversion fallback by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1283
  • fix typo by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/1290
  • [Tests] Fix oneshot + finetune test by passing splits to oneshot by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1316
  • [Tests] Remove the compress entrypoint by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1317
  • Fix Multi-Context Manager Syntax for Python 3.9 Compatibility by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1313
  • [BugFix] Directly Convert Modifiers to Recipe Instance by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1271
  • bump version, tag ct by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1318

New Contributors

  • @aman2304 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1213

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.4.1...0.5.0

- Python
Published by dhuangnm 11 months ago

llmcompressor - v0.4.1

What's Changed

  • Remove version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1077
  • Require 'ready' label for transformers tests by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1079
  • GPTQModifier Nits and Code Clarity by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1068
  • Also run on pushes to main by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1083
  • VLM: Phi3 Vision Example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1032
  • VLM: Qwen2_VL Example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1027
  • Composability with sparse and quantization compressors by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/948
  • Remove TraceableMistralForCausalLM by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1052
  • [Fix Test Failure]: Propagate name change to test by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1088
  • [Audio] Support Audio Datasets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1085
  • [Test Fix] Add Quantization then finetune tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/964
  • [Smoothquant] Phi3 Vision Mappings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1089
  • [VLM] Multimodal Data Collator by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1087
  • VLM: Model Tracing Guide by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1030
  • Turn off 2:4 sparse compression until supported in vllm by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1092
  • [Test Fix] Fix Consecutive oneshot by @horheynm in https://github.com/vllm-project/llm-compressor/pull/971
  • [Bug Fix] Fix test that requre GPU by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1096
  • Add Idefics3/SmolVLM quant support via traceable class by @leon-seidel in https://github.com/vllm-project/llm-compressor/pull/1095
  • Traceability Guide: Clarity and typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1099
  • [VLM] Examples README by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1057
  • Raise warning for 24 compressed sparse-only models by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1107
  • Remove logmodelload by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1016
  • Return empty sparsity config if targets and ignores are empty by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1115
  • Remove uses of get_observer by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/939
  • FSDP utils cleanup by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/854
  • Update maintainers, add notice by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1091
  • Replace readme paths with urls by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1097
  • GPTQ add Arkiv link, move file location by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1100
  • Extend remove_hooks to remove subsets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1021
  • [Audio] Whisper Example and Readme by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1106
  • [Audio] Add whisper fp8 dynamic example by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1111
  • [VLM] Update pixtral data collator to reflect latest transformers changes by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1116
  • Use unique test names in TestvLLM by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1124
  • Remove smoothquant from examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1121
  • Extend disable_hooks to keep subsets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1023
  • Unpin pynvml to fix e2e test failures with vLLM by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1125
  • Replace LayerCompressor with HooksMixin by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1038
  • [Oneshot Refactor] Rename getsharedprocessorsrc to getprocessornamefrom_model by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1108
  • Allow Shortcutting Min-max Observer by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/887
  • [Polish] Remove unused code by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1128
  • Properly restore training mode with eval_context by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1126
  • SQ and QM: Remove torch.cuda.empty_cache, use calibration_forward_context by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1114
  • [Oneshot Refactor] dataclass Arguments by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1103
  • [Bugfix] SparseGPT, Pipelines by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1130
  • [Oneshot refactor] Refactor initializemodelfrom_path by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1109
  • [e2e] Update vllm tests with additional datasets by @brian-dellabetta in https://github.com/vllm-project/llm-compressor/pull/1131
  • Update: SparseGPT recipes by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1142
  • Add timer support for testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1137
  • [Audio] Support Whisper V3 by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1147
  • Fix: Re-enable Sparse Compression for 2of4 Examples by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1153
  • [VLM] Add caption to flickr dataset by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1138
  • [VLM] Update mllama traceable definition by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1140
  • Fix CPU Offloading by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1159
  • [TRLSFTTrainer] Fix and Update Examples code by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1161
  • [TRLSFTTrainer] Fix TRL-SFT Distillation Training by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1163
  • Bump version for patch release by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1166
  • Update DeepSeek Examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1175
  • Update gemma2 examples with a note about sample generation by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1176

New Contributors

  • @leon-seidel made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1095

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.4.0...0.4.1

- Python
Published by dhuangnm about 1 year ago

llmcompressor - v0.4.0

What's Changed

  • Record config file name as test suite property by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/947
  • Update setup.py by @dsikka in https://github.com/vllm-project/llm-compressor/pull/975
  • Depreciate OBCQ Helpers by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/977
  • KV Cache, E2E Tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/742
  • Use 1 GPU for offloading examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/979
  • Replace tokenizer with processor by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/955
  • Revert "KV Cache, E2E Tests (#742)" by @dsikka in https://github.com/vllm-project/llm-compressor/pull/989
  • Fix SmoothQuant offload bug by @dsikka in https://github.com/vllm-project/llm-compressor/pull/978
  • Add LM Eval Configs by @dsikka in https://github.com/vllm-project/llm-compressor/pull/980
  • Fix test_model_reload test by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1005
  • Calibration and Compression Contexts by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/998
  • Add info for clarity by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1009
  • [Bugfix] Pass trust_remote_code_model=True for deepseek examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1012
  • Vision Datasets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/943
  • Add example for fp8 kv cache of phi3.5 and gemma2 by @mgoin in https://github.com/vllm-project/llm-compressor/pull/991
  • Update ReadMe and test for cpu_offloading by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1013
  • Adding amdsmi for AMD gpus by @citrix123 in https://github.com/vllm-project/llm-compressor/pull/1018
  • CompressionLogger add time units by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1026
  • patchtiedtensors_bug: support malformed model definitions by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1014
  • Add: 2of4 example with/without fp8 quantization by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/1033
  • Remove unccessary step in 2of4 Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1034
  • Remove Neural Magic copyright from files by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/992
  • VLM Support via GPTQ Hooks and Data Pipelines by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/914
  • [E2E Testing] KV-Cache by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1004
  • [E2E Testing] Add recipe check vllm e2e by @horheynm in https://github.com/vllm-project/llm-compressor/pull/929
  • [MoE] GPTQ compress using callback not hook by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1049
  • Explicit dataset tokenizer text kwarg by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1031
  • Fix smoothquant ignore, Fix typing, Add glm mappings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1015
  • [Test Fix] Quant model reload by @horheynm in https://github.com/vllm-project/llm-compressor/pull/974
  • Remove old examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1062
  • VLM: Fix typo bug in TraceableLlavaForConditionalGeneration by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1065
  • Add tests for "examples/sparse2of4[...]" by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/1067
  • VLM Image Examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/1064
  • Add quick warning for DeepSeek with transformers 4.48.0 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1066
  • [KV Cache] kv-cache end to end unit tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/141
  • [E2E Testing] Fix HF upload by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1061
  • [Test Fix] Fix/update testruncompressed by @horheynm in https://github.com/vllm-project/llm-compressor/pull/970
  • Revert "[Test Fix] Fix/update testruncompressed" by @mgoin in https://github.com/vllm-project/llm-compressor/pull/1071
  • Sparse 2:4 + FP8 Quantization e2e vLLM tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1073
  • [Test Patch] Remove redundant code for "Fix/update testruncompressed" by @horheynm in https://github.com/vllm-project/llm-compressor/pull/1072
  • bump; set ct version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/1076

New Contributors

  • @citrix123 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/1018

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.3.1...0.4.0

- Python
Published by dhuangnm about 1 year ago

llmcompressor - v0.3.1

What's Changed

  • BLOOM Default Smoothquant Mappings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/906
  • [SparseAutoModelForCausalLM Deprecation] Feature change by @horheynm in https://github.com/vllm-project/llm-compressor/pull/881
  • Correct "dyanmic" typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/888
  • Explicit defaults for QuantizationModifier targets by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/889
  • [SparseAutoModelForCausalLM Deprecation] Update examples by @horheynm in https://github.com/vllm-project/llm-compressor/pull/880
  • Support pack_quantized format for nonuniform mixed-precision by @mgoin in https://github.com/vllm-project/llm-compressor/pull/913
  • Actually make the run_compressed test useful by @dsikka in https://github.com/vllm-project/llm-compressor/pull/920
  • Fix for e2e tests by @horheynm in https://github.com/vllm-project/llm-compressor/pull/927
  • [Bugfix] Correct metrics calculations by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/878
  • Update kv_cache example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/921
  • [1/2] Expand e2e testing to prepare for lm-eval by @dsikka in https://github.com/vllm-project/llm-compressor/pull/922
  • Update pytest command to capture results to file by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/932
  • [Bugfix] DisableKVCache Context by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/834
  • Add helpful info to the marlin-24 example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/946
  • Remove requires_torch by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/949
  • Remove unused sparseml.export utilities by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/950
  • Implement HooksMixin by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/917
  • Add LM Eval Testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/945
  • update version by @dsikka in https://github.com/vllm-project/llm-compressor/pull/969

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.3.0...0.3.1

- Python
Published by dhuangnm about 1 year ago

llmcompressor - v0.3.0

What's New in v0.3.0

Key Features and Improvements

  • GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
  • Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
  • Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
  • Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
  • Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

  • Fix Tied Tensors Bug (#659)
  • Observer Initialization in GPTQ Wrapper (#883)
  • Sparsity Reload Testing (#882)

Documentation

  • Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

  • Fix compresed typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/188
  • GPTQ Quantized-weight Sequential Updating by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/177
  • Add: targets and ignore inference for sparse compression by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/191
  • switch tests from weekly to nightly by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/658
  • Compression wrapper abstract methods by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/170
  • Explicitly set sequential_update in examples by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/187
  • Increase Sparsity Threshold for compressors by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/679
  • Add a generic wrap_hf_model_class utility to support VLMs by @mgoin in https://github.com/vllm-project/llm-compressor/pull/185
  • Add tests for examples by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/149
  • Rename to quantization config by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/730
  • Implement Missing Modifier Methods by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/166
  • Fix 2/4 GPTQ Model Tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/769
  • SmoothQuant mappings tutorial by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/115
  • Fix import of ModelCompressor by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/776
  • update test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/773
  • [Bugfix] Fix saving offloaded state dict by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/172
  • Auto-Infer mappings Argument for SmoothQuantModifier Based on Model Architecture by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/119
  • Update workflows/actions by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/774
  • [Bugfix] Prepare KD Models when Saving by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/174
  • Set Sparse compression to save_compressed by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/821
  • Install compressed-tensors after llm-compressor by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/825
  • Fix test typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/828
  • Add AutoModelForCausalLM example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/698
  • [Bugfix] Workaround tied tensors bug by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/659
  • Only untie word embeddings by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/839
  • Check for config hidden size by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/840
  • Use float32 for Hessian dtype by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/847
  • GPTQ: Depreciate non-sequential update option by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/762
  • Typehint nits by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/826
  • [ DOC ] Remove version restrictions in W8A8 exmaple by @miaojinc in https://github.com/vllm-project/llm-compressor/pull/849
  • Fix inconsistence in example config of 2:4 sparse quantization by @yzlnew in https://github.com/vllm-project/llm-compressor/pull/80
  • Fix forward function pass call by @dsikka in https://github.com/vllm-project/llm-compressor/pull/845
  • [Bugfix] Use weight parameter of linear layer by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/836
  • [Bugfix] Rename files to remove colons by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/846
  • cover all 3.9-3.12 in commit testing by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/864
  • Add marlin-24 recipe/configs for e2e testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/866
  • [Bugfix] onload during sparsity calculation by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/862
  • Fix HFTrainer overloads by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/869
  • Support Model Offloading Tied Tensors Patch by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/872
  • Add advice about dealing with non-invertable hessians by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/875
  • seed commit workflow by @andy-neuma in https://github.com/vllm-project/llm-compressor/pull/877
  • [Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier by @dsikka in https://github.com/vllm-project/llm-compressor/pull/837
  • Bugfix observer initialization in gptq_wrapper by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/883
  • BugFix: Fix Sparsity Reload Testing by @dsikka in https://github.com/vllm-project/llm-compressor/pull/882
  • Use custom unique test names for e2e tests by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/892
  • Revert "Use custom unique test names for e2e tests (#892)" by @dsikka in https://github.com/vllm-project/llm-compressor/pull/893
  • Move config["testconfig_path"] assignment by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/895
  • Cap accelerate version to avoid bug by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/897
  • Fix observing offloaded weight by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/896
  • Update image in README.md by @mgoin in https://github.com/vllm-project/llm-compressor/pull/861
  • update accelerate version by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/899
  • [GPTQ] Iterative Parameter Updating by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/863
  • Small fixes for release by @dsikka in https://github.com/vllm-project/llm-compressor/pull/901
  • use smaller portion of dataset by @dsikka in https://github.com/vllm-project/llm-compressor/pull/902
  • Update example to not fail hessian inversion by @dsikka in https://github.com/vllm-project/llm-compressor/pull/904
  • Bump version to 0.3.0 by @dsikka in https://github.com/vllm-project/llm-compressor/pull/907

New Contributors

  • @miaojinc made their first contribution in https://github.com/vllm-project/llm-compressor/pull/849
  • @yzlnew made their first contribution in https://github.com/vllm-project/llm-compressor/pull/80
  • @andy-neuma made their first contribution in https://github.com/vllm-project/llm-compressor/pull/877

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.2.0...0.3.0

- Python
Published by dhuangnm over 1 year ago

llmcompressor - v0.2.0

What's Changed

  • Correct Typo in SparseAutoModelForCausalLM docstring by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/56
  • Disable Default Bitmask Compression by @Satrat in https://github.com/vllm-project/llm-compressor/pull/60
  • TRL Example fix by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/59
  • Fix typo by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/63
  • Correct typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/61
  • correct import in README.md by @zzc0430 in https://github.com/vllm-project/llm-compressor/pull/66
  • Fix for issue #43 -- starcoder model by @horheynm in https://github.com/vllm-project/llm-compressor/pull/71
  • Update README.md by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/74
  • Layer by Layer Sequential GPTQ Updates by @Satrat in https://github.com/vllm-project/llm-compressor/pull/47
  • [ Docs ] Update main readme by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/77
  • [ Docs ] gemma2 examples by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/78
  • [ Docs ] Update FP8 example to use dynamic per token by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/75
  • [ Docs ] Overhaul accelerate user guide by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/76
  • Support kv_cache_scheme for quantizing KV Cache by @mgoin in https://github.com/vllm-project/llm-compressor/pull/88
  • Propagate trust_remote_code Argument by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/90
  • Fix for issue #81 by @horheynm in https://github.com/vllm-project/llm-compressor/pull/84
  • Fix for issue 83 by @horheynm in https://github.com/vllm-project/llm-compressor/pull/85
  • [ DOC ] Big Model Example by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/99
  • Enable obcq/finetune integration tests with commit cadence by @dsikka in https://github.com/vllm-project/llm-compressor/pull/101
  • metric logging on GPTQ path by @horheynm in https://github.com/vllm-project/llm-compressor/pull/65
  • Update test config files by @dsikka in https://github.com/vllm-project/llm-compressor/pull/97
  • remove workflows + update runners by @dsikka in https://github.com/vllm-project/llm-compressor/pull/103
  • metrics by @horheynm in https://github.com/vllm-project/llm-compressor/pull/104
  • add debug by @horheynm in https://github.com/vllm-project/llm-compressor/pull/108
  • Add FP8 KV Cache quant example by @mgoin in https://github.com/vllm-project/llm-compressor/pull/113
  • Add vLLM e2e tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/117
  • Fix style, fix noqa by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/123
  • GPTQ Algorithm Cleanup by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/120
  • GPTQ Activation Ordering by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/94
  • demote recipe string initialization to debug and make more descriptive by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/116
  • compressed-tensors main dependency for base-tests by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/125
  • Set ready label for transformer tests; add message reminder on PR opened by @dsikka in https://github.com/vllm-project/llm-compressor/pull/126
  • Fix markdown check test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/127
  • Naive Run Compressed Pt. 2 by @Satrat in https://github.com/vllm-project/llm-compressor/pull/62
  • Fix transformer test conditions by @dsikka in https://github.com/vllm-project/llm-compressor/pull/131
  • Run Compressed Tests by @Satrat in https://github.com/vllm-project/llm-compressor/pull/132
  • Correct typo by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/124
  • Activation Ordering Strategies by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/121
  • Fix README Issue by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/139
  • update by @dsikka in https://github.com/vllm-project/llm-compressor/pull/143
  • Update finetune and oneshot tests by @dsikka in https://github.com/vllm-project/llm-compressor/pull/114
  • Validate Recipe Parsing Output by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/100
  • fix build error for nightly by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/145
  • Fix recipe nested in configs by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/140
  • MOE example with warning by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/87
  • Bug Fix: recipe stages were not being concatenated by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/150
  • fix package name bug for nightly by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/155
  • Add descriptions for pytest marks by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/156
  • Fix Sparsity Unit Test by @Satrat in https://github.com/vllm-project/llm-compressor/pull/153
  • Fix: Error during model saving with shared tensors by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/158
  • Update 2:4 Examples by @dsikka in https://github.com/vllm-project/llm-compressor/pull/161
  • DeepSeek: Fix Hessian Estimation by @Satrat in https://github.com/vllm-project/llm-compressor/pull/157
  • bump up main to 0.2.0 by @dhuangnm in https://github.com/vllm-project/llm-compressor/pull/163
  • Fix help dialogue by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/151
  • Add MoE and Compressed Inference Examples by @Satrat in https://github.com/vllm-project/llm-compressor/pull/160
  • Separate trust_remote_code args by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/152
  • Enable a skipped finetune test by @dsikka in https://github.com/vllm-project/llm-compressor/pull/169
  • Fix filename in example command by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/173
  • Add DeepSeek V2.5 Example by @dsikka in https://github.com/vllm-project/llm-compressor/pull/171
  • fix quality by @dsikka in https://github.com/vllm-project/llm-compressor/pull/176
  • Patch log function name in gptq by @kylesayrs in https://github.com/vllm-project/llm-compressor/pull/168
  • README for Modifiers by @Satrat in https://github.com/vllm-project/llm-compressor/pull/165
  • Fix default for sequential updates by @dsikka in https://github.com/vllm-project/llm-compressor/pull/186
  • fix default test case by @dsikka in https://github.com/vllm-project/llm-compressor/pull/193
  • Fix Initalize typo by @Imss27 in https://github.com/vllm-project/llm-compressor/pull/190
  • Update MoE examples by @mgoin in https://github.com/vllm-project/llm-compressor/pull/192

New Contributors

  • @zzc0430 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/66
  • @horheynm made their first contribution in https://github.com/vllm-project/llm-compressor/pull/71
  • @dsikka made their first contribution in https://github.com/vllm-project/llm-compressor/pull/101
  • @dhuangnm made their first contribution in https://github.com/vllm-project/llm-compressor/pull/145
  • @Imss27 made their first contribution in https://github.com/vllm-project/llm-compressor/pull/190

Full Changelog: https://github.com/vllm-project/llm-compressor/compare/0.1.0...0.2.0

- Python
Published by dhuangnm over 1 year ago

llmcompressor - v0.1.0

What's Changed

  • Address Test Failures by @Satrat in https://github.com/vllm-project/llm-compressor/pull/1
  • Remove SparseZoo Usage by @Satrat in https://github.com/vllm-project/llm-compressor/pull/2
  • SparseML Cleanup by @markurtz in https://github.com/vllm-project/llm-compressor/pull/6
  • Remove all references to Neural Magic copyright within LLM Compressor by @markurtz in https://github.com/vllm-project/llm-compressor/pull/7
  • Add FP8 Support by @Satrat in https://github.com/vllm-project/llm-compressor/pull/4
  • Fix Weekly Test Failure by @Satrat in https://github.com/vllm-project/llm-compressor/pull/8
  • Add Scheme UX for QuantizationModifier by @Satrat in https://github.com/vllm-project/llm-compressor/pull/9
  • Add Group Quantization Test Case by @Satrat in https://github.com/vllm-project/llm-compressor/pull/10
  • Loguru logging standardization for LLM Compressor by @markurtz in https://github.com/vllm-project/llm-compressor/pull/11
  • Clarify Function Names for Logging by @Satrat in https://github.com/vllm-project/llm-compressor/pull/12
  • [ Examples ] E2E Examples by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/5
  • Update setup.py by @robertgshaw2-neuralmagic in https://github.com/vllm-project/llm-compressor/pull/15
  • SmoothQuant Mapping Defaults by @Satrat in https://github.com/vllm-project/llm-compressor/pull/13
  • Initial README by @bfineran in https://github.com/vllm-project/llm-compressor/pull/3
  • [Bug] Fix validation errors for smoothquant modifier + update examples by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/19
  • [MOE Quantization] Warn against "undercalibrated" modules by @dbogunowicz in https://github.com/vllm-project/llm-compressor/pull/20
  • Port SparseML Remote Code Fix by @Satrat in https://github.com/vllm-project/llm-compressor/pull/21
  • Update Quantization Save Defaults by @Satrat in https://github.com/vllm-project/llm-compressor/pull/22
  • [Bugfix] Add fix to preserve modifier order when passed as a list by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/26
  • GPTQ - move calibration of quantiztion params to after hessian calibration by @bfineran in https://github.com/vllm-project/llm-compressor/pull/25
  • Fix typos by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/31
  • Remove ceiling from datasets dep by @mgoin in https://github.com/vllm-project/llm-compressor/pull/27
  • Revert naive compression format by @Satrat in https://github.com/vllm-project/llm-compressor/pull/32
  • Fix layerwise targets by @Satrat in https://github.com/vllm-project/llm-compressor/pull/36
  • Move Weight Update Out Of Loop by @Satrat in https://github.com/vllm-project/llm-compressor/pull/40
  • Fix End Epoch Default by @Satrat in https://github.com/vllm-project/llm-compressor/pull/39
  • Fix typos in example for w8a8 quant by @eldarkurtic in https://github.com/vllm-project/llm-compressor/pull/38
  • Model Offloading Support Pt 2 by @Satrat in https://github.com/vllm-project/llm-compressor/pull/34
  • set version to 1.0.0 for release by @bfineran in https://github.com/vllm-project/llm-compressor/pull/44
  • Update version for first release by @markurtz in https://github.com/vllm-project/llm-compressor/pull/50
  • BugFix: Update TRL example scripts to point to the right SFTTrainer by @rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/51
  • Update examples/quantization24sparse_w4a16 README by @dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/52
  • Fix Failing Transformers Tests by @Satrat in https://github.com/vllm-project/llm-compressor/pull/53
  • Offloading Bug Fix by @Satrat in https://github.com/vllm-project/llm-compressor/pull/58

New Contributors

  • @markurtz made their first contribution in https://github.com/vllm-project/llm-compressor/pull/6
  • @bfineran made their first contribution in https://github.com/vllm-project/llm-compressor/pull/3
  • @dbogunowicz made their first contribution in https://github.com/vllm-project/llm-compressor/pull/20
  • @eldarkurtic made their first contribution in https://github.com/vllm-project/llm-compressor/pull/31
  • @mgoin made their first contribution in https://github.com/vllm-project/llm-compressor/pull/27
  • @dbarbuzzi made their first contribution in https://github.com/vllm-project/llm-compressor/pull/52

Full Changelog: https://github.com/vllm-project/llm-compressor/commits/0.1.0

- Python
Published by dhuangnm over 1 year ago