Recent Releases of kvpress
kvpress - v0.2.9
What's Changed
- Refactor evaluation by @alessiodevoto in https://github.com/NVIDIA/kvpress/pull/96
- Fix QFilters and DuotAttention when used with wrapper presses by @alessiodevoto in https://github.com/NVIDIA/kvpress/pull/97
- Add HuggingFace leaderboard by @alessiodevoto in https://github.com/NVIDIA/kvpress/pull/98
- Fix links in benchmarks directory by @alessiodevoto in https://github.com/NVIDIA/kvpress/pull/101
- Add KVzipPress by @Janghyun1230 in https://github.com/NVIDIA/kvpress/pull/93
- Test head-wise compression by @alessiodevoto in https://github.com/NVIDIA/kvpress/pull/103
- run backbone model only for prefill by @giulio98 in https://github.com/NVIDIA/kvpress/pull/100
- Transformers compatibility + evaluation by @alessiodevoto in https://github.com/NVIDIA/kvpress/pull/105
Full Changelog: https://github.com/NVIDIA/kvpress/compare/v0.2.8...v0.2.9
- Python
Published by alessiodevoto 7 months ago
kvpress - v0.2.8
What's Changed
🐛 Bug Fixes
- Fix failing tests by @maxjeblick in https://github.com/NVIDIA/kvpress/pull/94
Reverts changes to
CriticalKVPressperformed in #90 that caused the press to initialize incorrectly. The PR also fixes some test logic.
Full Changelog: https://github.com/NVIDIA/kvpress/compare/v0.2.7...v0.2.8
- Python
Published by maxjeblick 8 months ago
kvpress - v0.2.7
What's Changed
🐛 Bug Fixes - Fix FinchPress for Qwen models family by @alessiodevoto in #82 Resolved compatibility issues with Qwen model architecture in FinchPress compression
✨ New Features - Add KeyDiffPress and BlockPress by @figuremout in #86 Introduces new compression methods based on key difference analysis - Fix for Qwen with Yarn by @giulio98 in #85 Enable Yarn scaling in FinchPress and KeyRerotationPress
📚 Documentation & Maintenance - Improve documentation by @maxjeblick in #90 Add docstrings to all presses, with their corresponding parameters and paper reference. - Add @alessiodevoto's to authors by @maxjeblick in #92 🚀
Full Changelog: https://github.com/NVIDIA/kvpress/compare/v0.2.6...v0.2.7
- Python
Published by maxjeblick 8 months ago
kvpress - v0.2.1
- Add
ChunkPress, #40 by @maxjeblick and @giulio98 - Update README, including new huggingface space, #41 and #42 by @SimJeg
- Python
Published by SimJeg about 1 year ago
kvpress - v0.2.0
Transformers v4.48 introduced breaking changes handled in this release. The release also features AdaKVPress, the first press allowing head-wise compression by patching the attention functions registered in ALL_ATTENTION_FUNCTIONS since v4.48. When combined with ExpectedAttentionPress, AdaKVPress achieved the best results observed yet on the RULER benchmark (see this post).
- Add
AdaKVPress, #38 by @SimJeg and @FFY0 - Handle transformers 4.48, #39 by @SimJeg
- Add InfiniteBench results, #11 by @maxjeblick
- Python
Published by SimJeg about 1 year ago
kvpress - v0.1.1
What's Changed
- https://github.com/NVIDIA/kvpress/pull/33 by @SimJeg fixes a small bug in the pipeline
- https://github.com/NVIDIA/kvpress/pull/36 by @maxjeblick sets transformers <4.48 as a dependency
Full Changelog: https://github.com/NVIDIA/kvpress/compare/v0.1.0...v0.1.1
- Python
Published by maxjeblick about 1 year ago
kvpress - v0.1.0
24 by @maxjeblick and #29 by @SimJeg introduce a non-breaking refactoring:
- a press does not require the
compression_ratioinput argument anymore as some presses do not explicitly require it (e.g.ThinKPress,SimLayerKVPress). However every press must have acompression_ratioattribute after any forward pass (assertion added in tests) to allow average compression ratio measurement on a benchmark - the core compression logic has been moved from
BasePress.forward_hooktoBasePress.compress.BasePress.forward_hooknow only checks ifcompressmust be called (pre-filling vs decoding), de-quantize cache beforecompressand re-quantize it afterwards - the
BasePressdoes not implement ascoremethod anymore, this has been moved to theScorerPresswith the associatedScorerPress.compressmethod
Other features:
- Add SimLayerKVPress, #28 by @SimJeg and @dame-cell
- Add ComposedPress, #29 by @SimJeg
- Add KeyReRotationPress, #31 by @maxjeblick and @giulio98
- Fix QuantizedCache, #30 by @maxjeblick
- Add new tests, including an integration test on a sample from RULER
- Python
Published by SimJeg about 1 year ago
kvpress -
- Update speed and memory plots, #10 by @maxjeblick
- Add
TOVAPress, #12 by @SimJeg
- Python
Published by SimJeg about 1 year ago
kvpress - Release v0.0.2
Release v0.0.2
- Add support for
QuantizedCache, #5 by @SimJeg - Add colab demo notebook, #6 by @maxjeblick
- Python
Published by SimJeg over 1 year ago