Recent Releases of https://github.com/cvs-health/uqlm
https://github.com/cvs-health/uqlm - v0.2.6
Highlights
- Remove unused attributes in
UQEnsemblethat was creating a bug withLLMPanel.score - Fix alignment in
uqlm.utils.plot_model_accuraciesfunction and enable displaying sample sizes as percentages
What's Changed
- Refactor plot model accuracies by @mohitcek in https://github.com/cvs-health/uqlm/pull/157
- Patch release:
v0.2.6by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/159
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.5...v0.2.6
- Python
Published by dylanbouchard 6 months ago
https://github.com/cvs-health/uqlm - v0.2.5
Highlights
- Add missing
num_responsesparameter togenerate_candidate_responsesmethod inBlackBoxUQ,SemanticEntropy, andUQEnsemble. - Add missing fields/links to pyproject.toml
What's Changed
- v0.2.4 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/154
- Add attribute
num_responsesby @mohitcek in https://github.com/cvs-health/uqlm/pull/155 - Patch release:
v0.2.5by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/156
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.4...v0.2.5
- Python
Published by dylanbouchard 6 months ago
https://github.com/cvs-health/uqlm - v0.2.4
Highlights
- Enable specification of LLM Judge scoring templates in
UQEnsemblewithscoring_templatesargument. - Enable specification of postprocesed response return options in
UQEnsemble: return only raw responses, return only postprocessed responses, or return both.
What's Changed
- Enable different postprocessing return options and judge scoring templates with UQEnsemble by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/145
- Patch release:
v0.2.4by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/146
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.3...v0.2.4
- Python
Published by dylanbouchard 7 months ago
https://github.com/cvs-health/uqlm - v0.2.3
Highlights
- Replaces use of
bert_score.scorewithbert_score.BERTScorer.scorefor a ~43x speedup. While the former (old approach) re-checks and re-assignstorch.devicewith each use ofscore, the latter (updated approach) assignstorch.deviceonly once during instantiation. - Creates the option for users to specify whether they want only postprocessed responses, only raw responses, or both versions when they specify a postprocessor. This applies to
BlackBoxUQ,UQEnsemble, andSemanticEntropy. To do so, users can respectively specify 'postprocessed', 'raw', or 'all' in the 'return_responses' argument in the constructor of these classes. By default, 'all' is specified. [black]is removed where specified inrichprint statements to avoid inconsistent colors in progress bars.
What's Changed
- v0.2.2 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/140
- use bert_score class rather than function for 43x speedup by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/141
- Enable different handling of raw vs postprocessed responses by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/143
- Patch release:
v0.2.3by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/144
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.2...v0.2.3
- Python
Published by dylanbouchard 7 months ago
https://github.com/cvs-health/uqlm - v0.2.2
Highlights
- improved handling of missing logprobs
- adds warning when logprobs missing
- removes benign transformers warning for NLI instantiation and BERTScore scoring
- Add flaky and skip logic for unit tests to avoid benign failures
- Fix escape character usage in Judge prompt
- Update version of LangChain per Dependabot suggestion
What's Changed
- Release/v0.2.0 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/118
- Release/v0.2.0 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/128
- v0.2.1 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/131
- Bump langchain from 0.3.26 to 0.3.27 by @dependabot[bot] in https://github.com/cvs-health/uqlm/pull/121
- Db/missing logprobs unittest skip by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/132
- Add missing logprobs warning and update version by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/133
- fix escape character in judge prompt by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/134
- Fix logprobs syntax error and judge prompt escape character by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/137
- suppress benign transformers warnings by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/138
- Patch release: v0.2.2 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/139
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.1...v0.2.2
- Python
Published by dylanbouchard 7 months ago
https://github.com/cvs-health/uqlm - v0.2.1
Highlights
- If exception is raised during generation (e.g.
RateLimitError), the progress bar is stopped to avoidLiveErrorupon retry. - Fix BERTScore printed text
- Fix Ensemble diagram for dark mode
- Fixes missing
max_calls_per_minbeing passed toLLMPanelconstructor inside ofUQEnsemble. After this fix,max_calls_per_minwill be applied to ensemble judges as well. - Add flaky retry logic using
@pytest.mark.flaky(retries=3)to tests that fail due to network issues related to HuggingFace. - Fix handling of missing
logprobswith multiple responses inUQEnsemble
What's Changed
- Patch release: v0.2.1 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/129
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.0...v0.2.1
- Python
Published by dylanbouchard 7 months ago
https://github.com/cvs-health/uqlm - v0.2.0
These release notes are for minor release v0.2.0.
New Features
1. Progress bars with rich
This feature enables the use of progress bars when generating LLM responses, scoring responses, and tuning ensemble weights. This feature introduced rich and ipywidgets as new dependencies.
By default, progress bars are turned on, but users can turn them off by setting show_progress_bars=False in generate_and_score, score, and tune methods for the scorer classes. Below is a screenshot illustrating the use of rich progress bars with the UQEnsemble.tune method:
2. Ensemble weights printing
After running the UQEnsemble.tune method, ensemble weights are now printed in a pretty table using rich. Ensemble weights are sorted from highest to lowest. See the above screenshot for an example. Users can also display this table with an already tuned ensemble using the UQEnsemble.print_weights method.
3. Support for Python 3.13
As of v0.2.0, uqlm can now be used with Python 3.13. All previous functionality is supported except for bleurt, which is not compatible with Python 3.13.
4. Ensemble saving and loading
UQEnsemble now offers two new methods: save_config and load_config. These methods offer user-friendly saving and loading the ensemble scorer components and weights.
Example use of ensemble saving:
python
uqe_tuned_config_file = "uqe_config_tuned.json"uqe.save_config(uqe_tuned_config_file)
Example use of ensemble loading:
python
loaded_ensemble = UQEnsemble.load_config("uqe_config_tuned.json")
These methods make storing a tuned ensemble an easier process for later use.
5. Token-probability-based Semantic Entropy
The SemanticEntropy class now supports token-probability-based estimates of semantic entropy and associated confidence scores. Note that attribute names in the returned object and column names in the associated dataframe have changed from those in v0.1.
Breaking Changes
1. BLEURT Deprecation
This release deprecates BLEURT as a black-box scorer. The following code will now produce errors:
* Use of uqlm.black_box.BLEURTScorer
* Use of "bleurt" in uqlm.scorers.BlackBoxUQ scorers parameter
* Use of "bleurt" in uqlm.scorers.UQEnsemble scorers parameter
- Python
Published by dylanbouchard 7 months ago
https://github.com/cvs-health/uqlm - v0.1.9
What's Changed
- Patch/v0.1.9 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/105
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.8...v0.1.9
- Python
Published by dylanbouchard 7 months ago
https://github.com/cvs-health/uqlm - v0.1.8
Highlights
- update version of
pillowper Dependabot security alert
What's Changed
- patch release: v0.1.8 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/77
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.7...v0.1.8
- Python
Published by dylanbouchard 8 months ago
https://github.com/cvs-health/uqlm - v0.1.7
Highlights
- Fixes bug related to floating point precision causing ensemble score greater than 1 (1.00000002). This was throwing an error when certain tuner metrics were being computed. Patched with
np.clip. - Allow use of
brier_scoreandaverage_precisionwithTunerandUQEnsemble
What's Changed
- v0.1.6 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/68
- New metrics by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/69
- Patch/v0.1.7 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/70
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.6...v0.1.7
- Python
Published by dylanbouchard 8 months ago
https://github.com/cvs-health/uqlm - v0.1.6
Highlights
- Add missing unit tests
- Update version of
urllib3per Dependabot security alert
What's Changed
- Additional unit tests for ResponseGenerator class by @zeya30 in https://github.com/cvs-health/uqlm/pull/58
- Additional unit tests for UncertaintyQuantifier class by @zeya30 in https://github.com/cvs-health/uqlm/pull/60
- Improving coverage for unit tests by @zeya30 in https://github.com/cvs-health/uqlm/pull/53
- Additional unit tests for LLM Panel class by @zeya30 in https://github.com/cvs-health/uqlm/pull/62
- Unit tests Tuner class by @mohitcek in https://github.com/cvs-health/uqlm/pull/63
- v0.1.5 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/65
- Patch release: v0.1.6 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/66
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.5...v0.1.6
- Python
Published by dylanbouchard 8 months ago
https://github.com/cvs-health/uqlm - v0.1.5
Highlights
- add missing unit tests to achieve 100% code coverage
- implement auto-linting/formatting with
ruff - reduce
Tuner(andUQEnsemble.tune) latency (no API changes) - allow
likertoption for judges
What's Changed
- v0.1.2 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/34
- adding Likert scale scoring for LLMJudge class by @zeya30 in https://github.com/cvs-health/uqlm/pull/36
- Tuner class: Low Latency by @mohitcek in https://github.com/cvs-health/uqlm/pull/39
- Linting CI workflow by @dimtsap in https://github.com/cvs-health/uqlm/pull/28
- Za/unit tests by @zeya30 in https://github.com/cvs-health/uqlm/pull/50
- Bugfix/ruff linting by @mohitcek in https://github.com/cvs-health/uqlm/pull/55
- Additional unit tests for UQensemble class by @mohitcek in https://github.com/cvs-health/uqlm/pull/54
- dependabot security fix by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/56
- patch release: reduce tuner latency, add unit tests, auto linting by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/64
New Contributors
- @dimtsap made their first contribution in https://github.com/cvs-health/uqlm/pull/28
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.4...v0.1.5
- Python
Published by dylanbouchard 9 months ago
https://github.com/cvs-health/uqlm - v0.1.4
What's Changed
- Patch release: v0.1.3 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/48
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.2...v0.1.4
- Python
Published by dylanbouchard 9 months ago
https://github.com/cvs-health/uqlm - v0.1.3
Highlights
- upgrade tornado version per dependabot
What's Changed
- Patch release: v0.1.3 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/48
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.2...v0.1.3
- Python
Published by dylanbouchard 9 months ago
https://github.com/cvs-health/uqlm - v0.1.2
Highlights
- streamline workflow for
LLMPanelby enabling scoring template specification in the constructor - update
LLMPaneldemo - fix typos in readme
- update readme badges
- fix bleurt error message typo
What's Changed
- v0.1.0 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/16
- v0.1.1 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/18
- Update readme, error message by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/22
- Simplify
LLMPanelworkflow by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/23 - Patch/v0.1.2 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/26
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.1...v0.1.2
- Python
Published by dylanbouchard 10 months ago
https://github.com/cvs-health/uqlm - v0.1.1
Highlights
- Restore missing argument,
thresh_objective, forUQEnsemble
What's Changed
- Refactor UQEnsemble class by @mohitcek in https://github.com/cvs-health/uqlm/pull/17
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.0...v0.1.1
- Python
Published by dylanbouchard 10 months ago
https://github.com/cvs-health/uqlm - v0.1.0
UQLM v0.1.0 Release Notes
Introducing UQLM: Uncertainty Quantification for Language Models. UQLM is an Python library for detecting LLM hallucinations using state-of-the-art uncertainty quantification techniques.
Highlights
Comprehensive Scorer Suite
UQLM offers a versatile suite of response-level scorers, each providing a confidence score to indicate the likelihood of errors or hallucinations. The scorers are categorized into four main types:
🎯 Black-Box Scorers: Assess uncertainty through response consistency, compatible with any LLM.
🎲 White-Box Scorers: Utilize token probabilities for faster and cost-effective uncertainty estimation.
⚖️ LLM-as-a-Judge Scorers: Employ LLMs to evaluate response reliability, customizable through prompt engineering.
🔀 Ensemble Scorers: Combine multiple scorers for robust and flexible uncertainty/confidence estimates.
Installation:
Install the latest version from PyPI with:
bash
pip install uqlm
Documentation and Demos:
Visit our documentation site for detailed instructions, API references, and demo notebooks showcasing various hallucination detection methods. The following demo notebooks are available:
- Black-Box Uncertainty Quantification: A notebook demonstrating hallucination detection with black-box (consistency) scorers.
- White-Box Uncertainty Quantification: A notebook demonstrating hallucination detection with white-box (token probability-based) scorers.
- LLM-as-a-Judge: A notebook demonstrating hallucination detection with LLM-as-a-Judge.
- Tunable UQ Ensemble: A notebook demonstrating hallucination detection with a tunable ensemble of UQ scorers (Bouchard & Chauhan, 2023).
- Off-the-Shelf UQ Ensemble: A notebook demonstrating hallucination detection using BS Detector (Chen & Mueller, 2023) off-the-shelf ensemble.
Associated Research:
Our companion paper provides a technical description of the UQLM scorers and extensive experimental results, introducing a novel, tunable ensemble approach.
- Python
Published by dylanbouchard 10 months ago
https://github.com/cvs-health/uqlm - v0.1.0rc0
UQLM pre-release
- Python
Published by dylanbouchard 10 months ago