https://github.com/cvs-health/uqlm - v0.2.6

Highlights

Remove unused attributes in UQEnsemble that was creating a bug with LLMPanel.score
Fix alignment in uqlm.utils.plot_model_accuracies function and enable displaying sample sizes as percentages

What's Changed

Refactor plot model accuracies by @mohitcek in https://github.com/cvs-health/uqlm/pull/157
Patch release: v0.2.6 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/159

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.5...v0.2.6

- Python
Published by dylanbouchard 6 months ago

https://github.com/cvs-health/uqlm - v0.2.5

Highlights

Add missing num_responses parameter to generate_candidate_responses method in BlackBoxUQ, SemanticEntropy, and UQEnsemble.
Add missing fields/links to pyproject.toml

What's Changed

v0.2.4 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/154
Add attribute num_responses by @mohitcek in https://github.com/cvs-health/uqlm/pull/155
Patch release: v0.2.5 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/156

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.4...v0.2.5

- Python
Published by dylanbouchard 6 months ago

https://github.com/cvs-health/uqlm - v0.2.4

Highlights

Enable specification of LLM Judge scoring templates in UQEnsemble with scoring_templates argument.
Enable specification of postprocesed response return options in UQEnsemble: return only raw responses, return only postprocessed responses, or return both.

What's Changed

Enable different postprocessing return options and judge scoring templates with UQEnsemble by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/145
Patch release: v0.2.4 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/146

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.3...v0.2.4

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.2.3

Highlights

Replaces use of bert_score.score with bert_score.BERTScorer.score for a ~43x speedup. While the former (old approach) re-checks and re-assigns torch.device with each use of score, the latter (updated approach) assigns torch.device only once during instantiation.
Creates the option for users to specify whether they want only postprocessed responses, only raw responses, or both versions when they specify a postprocessor. This applies to BlackBoxUQ, UQEnsemble, and SemanticEntropy. To do so, users can respectively specify 'postprocessed', 'raw', or 'all' in the 'return_responses' argument in the constructor of these classes. By default, 'all' is specified.
[black] is removed where specified in rich print statements to avoid inconsistent colors in progress bars.

What's Changed

v0.2.2 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/140
use bert_score class rather than function for 43x speedup by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/141
Enable different handling of raw vs postprocessed responses by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/143
Patch release: v0.2.3 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/144

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.2...v0.2.3

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.2.2

Highlights

improved handling of missing logprobs
adds warning when logprobs missing
removes benign transformers warning for NLI instantiation and BERTScore scoring
Add flaky and skip logic for unit tests to avoid benign failures
Fix escape character usage in Judge prompt
Update version of LangChain per Dependabot suggestion

What's Changed

Release/v0.2.0 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/118
Release/v0.2.0 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/128
v0.2.1 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/131
Bump langchain from 0.3.26 to 0.3.27 by @dependabot[bot] in https://github.com/cvs-health/uqlm/pull/121
Db/missing logprobs unittest skip by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/132
Add missing logprobs warning and update version by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/133
fix escape character in judge prompt by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/134
Fix logprobs syntax error and judge prompt escape character by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/137
suppress benign transformers warnings by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/138
Patch release: v0.2.2 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/139

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.1...v0.2.2

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.2.1

Highlights

If exception is raised during generation (e.g. RateLimitError), the progress bar is stopped to avoid LiveError upon retry.
Fix BERTScore printed text
Fix Ensemble diagram for dark mode
Fixes missing max_calls_per_min being passed to LLMPanel constructor inside of UQEnsemble. After this fix, max_calls_per_min will be applied to ensemble judges as well.
Add flaky retry logic using @pytest.mark.flaky(retries=3) to tests that fail due to network issues related to HuggingFace.
Fix handling of missing logprobs with multiple responses in UQEnsemble

What's Changed

Patch release: v0.2.1 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/129

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.0...v0.2.1

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.2.0

These release notes are for minor release v0.2.0.

New Features

1. Progress bars with `rich`

This feature enables the use of progress bars when generating LLM responses, scoring responses, and tuning ensemble weights. This feature introduced rich and ipywidgets as new dependencies. By default, progress bars are turned on, but users can turn them off by setting show_progress_bars=False in generate_and_score, score, and tune methods for the scorer classes. Below is a screenshot illustrating the use of rich progress bars with the UQEnsemble.tune method:

2. Ensemble weights printing

After running the UQEnsemble.tune method, ensemble weights are now printed in a pretty table using rich. Ensemble weights are sorted from highest to lowest. See the above screenshot for an example. Users can also display this table with an already tuned ensemble using the UQEnsemble.print_weights method.

3. Support for Python 3.13

As of v0.2.0, uqlm can now be used with Python 3.13. All previous functionality is supported except for bleurt, which is not compatible with Python 3.13.

4. Ensemble saving and loading

UQEnsemble now offers two new methods: save_config and load_config. These methods offer user-friendly saving and loading the ensemble scorer components and weights.

Example use of ensemble saving: python uqe_tuned_config_file = "uqe_config_tuned.json"uqe.save_config(uqe_tuned_config_file)

Example use of ensemble loading: python loaded_ensemble = UQEnsemble.load_config("uqe_config_tuned.json")

These methods make storing a tuned ensemble an easier process for later use.

5. Token-probability-based Semantic Entropy

The SemanticEntropy class now supports token-probability-based estimates of semantic entropy and associated confidence scores. Note that attribute names in the returned object and column names in the associated dataframe have changed from those in v0.1.

Breaking Changes

1. BLEURT Deprecation

This release deprecates BLEURT as a black-box scorer. The following code will now produce errors: * Use of uqlm.black_box.BLEURTScorer * Use of "bleurt" in uqlm.scorers.BlackBoxUQ scorers parameter * Use of "bleurt" in uqlm.scorers.UQEnsemble scorers parameter

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.1.9

What's Changed

Patch/v0.1.9 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/105

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.8...v0.1.9

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.1.8

Highlights

update version of pillow per Dependabot security alert

What's Changed

patch release: v0.1.8 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/77

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.7...v0.1.8

- Python
Published by dylanbouchard 8 months ago

https://github.com/cvs-health/uqlm - v0.1.7

Highlights

Fixes bug related to floating point precision causing ensemble score greater than 1 (1.00000002). This was throwing an error when certain tuner metrics were being computed. Patched with np.clip.
Allow use of brier_score and average_precision with Tuner and UQEnsemble

What's Changed

v0.1.6 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/68
New metrics by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/69
Patch/v0.1.7 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/70

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.6...v0.1.7

- Python
Published by dylanbouchard 8 months ago

https://github.com/cvs-health/uqlm - v0.1.6

Highlights

Add missing unit tests
Update version of urllib3 per Dependabot security alert

What's Changed

Additional unit tests for ResponseGenerator class by @zeya30 in https://github.com/cvs-health/uqlm/pull/58
Additional unit tests for UncertaintyQuantifier class by @zeya30 in https://github.com/cvs-health/uqlm/pull/60
Improving coverage for unit tests by @zeya30 in https://github.com/cvs-health/uqlm/pull/53
Additional unit tests for LLM Panel class by @zeya30 in https://github.com/cvs-health/uqlm/pull/62
Unit tests Tuner class by @mohitcek in https://github.com/cvs-health/uqlm/pull/63
v0.1.5 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/65
Patch release: v0.1.6 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/66

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.5...v0.1.6

- Python
Published by dylanbouchard 8 months ago

https://github.com/cvs-health/uqlm - v0.1.5

Highlights

add missing unit tests to achieve 100% code coverage
implement auto-linting/formatting with ruff
reduce Tuner (and UQEnsemble.tune) latency (no API changes)
allow likert option for judges

What's Changed

v0.1.2 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/34
adding Likert scale scoring for LLMJudge class by @zeya30 in https://github.com/cvs-health/uqlm/pull/36
Tuner class: Low Latency by @mohitcek in https://github.com/cvs-health/uqlm/pull/39
Linting CI workflow by @dimtsap in https://github.com/cvs-health/uqlm/pull/28
Za/unit tests by @zeya30 in https://github.com/cvs-health/uqlm/pull/50
Bugfix/ruff linting by @mohitcek in https://github.com/cvs-health/uqlm/pull/55
Additional unit tests for UQensemble class by @mohitcek in https://github.com/cvs-health/uqlm/pull/54
dependabot security fix by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/56
patch release: reduce tuner latency, add unit tests, auto linting by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/64

New Contributors

@dimtsap made their first contribution in https://github.com/cvs-health/uqlm/pull/28

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.4...v0.1.5

- Python
Published by dylanbouchard 9 months ago

https://github.com/cvs-health/uqlm - v0.1.4

What's Changed

Patch release: v0.1.3 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/48

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.2...v0.1.4

- Python
Published by dylanbouchard 9 months ago

https://github.com/cvs-health/uqlm - v0.1.3

Highlights

upgrade tornado version per dependabot

What's Changed

Patch release: v0.1.3 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/48

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.2...v0.1.3

- Python
Published by dylanbouchard 9 months ago

https://github.com/cvs-health/uqlm - v0.1.2

Highlights

streamline workflow for LLMPanel by enabling scoring template specification in the constructor
update LLMPanel demo
fix typos in readme
update readme badges
fix bleurt error message typo

What's Changed

v0.1.0 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/16
v0.1.1 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/18
Update readme, error message by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/22
Simplify LLMPanel workflow by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/23
Patch/v0.1.2 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/26

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.1...v0.1.2

- Python
Published by dylanbouchard 10 months ago

https://github.com/cvs-health/uqlm - v0.1.1

Highlights

Restore missing argument, thresh_objective, for UQEnsemble

What's Changed

Refactor UQEnsemble class by @mohitcek in https://github.com/cvs-health/uqlm/pull/17

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.0...v0.1.1

- Python
Published by dylanbouchard 10 months ago

https://github.com/cvs-health/uqlm - v0.1.0

UQLM v0.1.0 Release Notes

Introducing UQLM: Uncertainty Quantification for Language Models. UQLM is an Python library for detecting LLM hallucinations using state-of-the-art uncertainty quantification techniques.

Highlights

Comprehensive Scorer Suite

UQLM offers a versatile suite of response-level scorers, each providing a confidence score to indicate the likelihood of errors or hallucinations. The scorers are categorized into four main types:

🎯 Black-Box Scorers: Assess uncertainty through response consistency, compatible with any LLM.

🎲 White-Box Scorers: Utilize token probabilities for faster and cost-effective uncertainty estimation.

⚖️ LLM-as-a-Judge Scorers: Employ LLMs to evaluate response reliability, customizable through prompt engineering.

🔀 Ensemble Scorers: Combine multiple scorers for robust and flexible uncertainty/confidence estimates.

Installation:

Install the latest version from PyPI with: bash pip install uqlm

Documentation and Demos:

Visit our documentation site for detailed instructions, API references, and demo notebooks showcasing various hallucination detection methods. The following demo notebooks are available:

Black-Box Uncertainty Quantification: A notebook demonstrating hallucination detection with black-box (consistency) scorers.
White-Box Uncertainty Quantification: A notebook demonstrating hallucination detection with white-box (token probability-based) scorers.
LLM-as-a-Judge: A notebook demonstrating hallucination detection with LLM-as-a-Judge.
Tunable UQ Ensemble: A notebook demonstrating hallucination detection with a tunable ensemble of UQ scorers (Bouchard & Chauhan, 2023).
Off-the-Shelf UQ Ensemble: A notebook demonstrating hallucination detection using BS Detector (Chen & Mueller, 2023) off-the-shelf ensemble.

Associated Research:

Our companion paper provides a technical description of the UQLM scorers and extensive experimental results, introducing a novel, tunable ensemble approach.

- Python
Published by dylanbouchard 10 months ago

https://github.com/cvs-health/uqlm - v0.1.0rc0

UQLM pre-release

- Python
Published by dylanbouchard 10 months ago

Recent Releases of https://github.com/cvs-health/uqlm

https://github.com/cvs-health/uqlm - v0.2.6

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.2.5

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.2.4

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.2.3

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.2.2

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.2.1

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.2.0

New Features

1. Progress bars with rich

2. Ensemble weights printing

3. Support for Python 3.13

4. Ensemble saving and loading

5. Token-probability-based Semantic Entropy

Breaking Changes

1. BLEURT Deprecation

https://github.com/cvs-health/uqlm - v0.1.9

What's Changed

https://github.com/cvs-health/uqlm - v0.1.8

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.1.7

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.1.6

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.1.5

Highlights

What's Changed

New Contributors

https://github.com/cvs-health/uqlm - v0.1.4

What's Changed

https://github.com/cvs-health/uqlm - v0.1.3

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.1.2

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.1.1

Highlights

What's Changed

https://github.com/cvs-health/uqlm - v0.1.0

UQLM v0.1.0 Release Notes

Highlights

Comprehensive Scorer Suite

Installation:

Documentation and Demos:

Associated Research:

https://github.com/cvs-health/uqlm - v0.1.0rc0

1. Progress bars with `rich`