Recent Releases of https://github.com/cvs-health/uqlm

https://github.com/cvs-health/uqlm - v0.2.6

Highlights

  • Remove unused attributes in UQEnsemble that was creating a bug with LLMPanel.score
  • Fix alignment in uqlm.utils.plot_model_accuracies function and enable displaying sample sizes as percentages

What's Changed

  • Refactor plot model accuracies by @mohitcek in https://github.com/cvs-health/uqlm/pull/157
  • Patch release: v0.2.6 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/159

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.5...v0.2.6

- Python
Published by dylanbouchard 6 months ago

https://github.com/cvs-health/uqlm - v0.2.5

Highlights

  • Add missing num_responses parameter to generate_candidate_responses method in BlackBoxUQ, SemanticEntropy, and UQEnsemble.
  • Add missing fields/links to pyproject.toml

What's Changed

  • v0.2.4 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/154
  • Add attribute num_responses by @mohitcek in https://github.com/cvs-health/uqlm/pull/155
  • Patch release: v0.2.5 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/156

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.4...v0.2.5

- Python
Published by dylanbouchard 6 months ago

https://github.com/cvs-health/uqlm - v0.2.4

Highlights

  • Enable specification of LLM Judge scoring templates in UQEnsemble with scoring_templates argument.
  • Enable specification of postprocesed response return options in UQEnsemble: return only raw responses, return only postprocessed responses, or return both.

What's Changed

  • Enable different postprocessing return options and judge scoring templates with UQEnsemble by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/145
  • Patch release: v0.2.4 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/146

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.3...v0.2.4

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.2.3

Highlights

  • Replaces use of bert_score.score with bert_score.BERTScorer.score for a ~43x speedup. While the former (old approach) re-checks and re-assigns torch.device with each use of score, the latter (updated approach) assigns torch.device only once during instantiation.
  • Creates the option for users to specify whether they want only postprocessed responses, only raw responses, or both versions when they specify a postprocessor. This applies to BlackBoxUQ, UQEnsemble, and SemanticEntropy. To do so, users can respectively specify 'postprocessed', 'raw', or 'all' in the 'return_responses' argument in the constructor of these classes. By default, 'all' is specified.
  • [black] is removed where specified in rich print statements to avoid inconsistent colors in progress bars.

What's Changed

  • v0.2.2 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/140
  • use bert_score class rather than function for 43x speedup by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/141
  • Enable different handling of raw vs postprocessed responses by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/143
  • Patch release: v0.2.3 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/144

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.2...v0.2.3

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.2.2

Highlights

  • improved handling of missing logprobs
  • adds warning when logprobs missing
  • removes benign transformers warning for NLI instantiation and BERTScore scoring
  • Add flaky and skip logic for unit tests to avoid benign failures
  • Fix escape character usage in Judge prompt
  • Update version of LangChain per Dependabot suggestion

What's Changed

  • Release/v0.2.0 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/118
  • Release/v0.2.0 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/128
  • v0.2.1 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/131
  • Bump langchain from 0.3.26 to 0.3.27 by @dependabot[bot] in https://github.com/cvs-health/uqlm/pull/121
  • Db/missing logprobs unittest skip by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/132
  • Add missing logprobs warning and update version by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/133
  • fix escape character in judge prompt by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/134
  • Fix logprobs syntax error and judge prompt escape character by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/137
  • suppress benign transformers warnings by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/138
  • Patch release: v0.2.2 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/139

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.1...v0.2.2

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.2.1

Highlights

  • If exception is raised during generation (e.g. RateLimitError), the progress bar is stopped to avoid LiveError upon retry.
  • Fix BERTScore printed text
  • Fix Ensemble diagram for dark mode
  • Fixes missing max_calls_per_min being passed to LLMPanel constructor inside of UQEnsemble. After this fix, max_calls_per_min will be applied to ensemble judges as well.
  • Add flaky retry logic using @pytest.mark.flaky(retries=3) to tests that fail due to network issues related to HuggingFace.
  • Fix handling of missing logprobs with multiple responses in UQEnsemble

What's Changed

  • Patch release: v0.2.1 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/129

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.0...v0.2.1

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.2.0

These release notes are for minor release v0.2.0.

New Features

1. Progress bars with rich

This feature enables the use of progress bars when generating LLM responses, scoring responses, and tuning ensemble weights. This feature introduced rich and ipywidgets as new dependencies. By default, progress bars are turned on, but users can turn them off by setting show_progress_bars=False in generate_and_score, score, and tune methods for the scorer classes. Below is a screenshot illustrating the use of rich progress bars with the UQEnsemble.tune method: image

2. Ensemble weights printing

After running the UQEnsemble.tune method, ensemble weights are now printed in a pretty table using rich. Ensemble weights are sorted from highest to lowest. See the above screenshot for an example. Users can also display this table with an already tuned ensemble using the UQEnsemble.print_weights method.

3. Support for Python 3.13

As of v0.2.0, uqlm can now be used with Python 3.13. All previous functionality is supported except for bleurt, which is not compatible with Python 3.13.

4. Ensemble saving and loading

UQEnsemble now offers two new methods: save_config and load_config. These methods offer user-friendly saving and loading the ensemble scorer components and weights.

Example use of ensemble saving: python uqe_tuned_config_file = "uqe_config_tuned.json"uqe.save_config(uqe_tuned_config_file)

Example use of ensemble loading: python loaded_ensemble = UQEnsemble.load_config("uqe_config_tuned.json")

These methods make storing a tuned ensemble an easier process for later use.

5. Token-probability-based Semantic Entropy

The SemanticEntropy class now supports token-probability-based estimates of semantic entropy and associated confidence scores. Note that attribute names in the returned object and column names in the associated dataframe have changed from those in v0.1.

Breaking Changes

1. BLEURT Deprecation

This release deprecates BLEURT as a black-box scorer. The following code will now produce errors: * Use of uqlm.black_box.BLEURTScorer * Use of "bleurt" in uqlm.scorers.BlackBoxUQ scorers parameter * Use of "bleurt" in uqlm.scorers.UQEnsemble scorers parameter

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.1.9

What's Changed

  • Patch/v0.1.9 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/105

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.8...v0.1.9

- Python
Published by dylanbouchard 7 months ago

https://github.com/cvs-health/uqlm - v0.1.8

Highlights

  • update version of pillow per Dependabot security alert

What's Changed

  • patch release: v0.1.8 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/77

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.7...v0.1.8

- Python
Published by dylanbouchard 8 months ago

https://github.com/cvs-health/uqlm - v0.1.7

Highlights

  • Fixes bug related to floating point precision causing ensemble score greater than 1 (1.00000002). This was throwing an error when certain tuner metrics were being computed. Patched with np.clip.
  • Allow use of brier_score and average_precision with Tuner and UQEnsemble

What's Changed

  • v0.1.6 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/68
  • New metrics by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/69
  • Patch/v0.1.7 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/70

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.6...v0.1.7

- Python
Published by dylanbouchard 8 months ago

https://github.com/cvs-health/uqlm - v0.1.6

Highlights

  • Add missing unit tests
  • Update version of urllib3 per Dependabot security alert

What's Changed

  • Additional unit tests for ResponseGenerator class by @zeya30 in https://github.com/cvs-health/uqlm/pull/58
  • Additional unit tests for UncertaintyQuantifier class by @zeya30 in https://github.com/cvs-health/uqlm/pull/60
  • Improving coverage for unit tests by @zeya30 in https://github.com/cvs-health/uqlm/pull/53
  • Additional unit tests for LLM Panel class by @zeya30 in https://github.com/cvs-health/uqlm/pull/62
  • Unit tests Tuner class by @mohitcek in https://github.com/cvs-health/uqlm/pull/63
  • v0.1.5 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/65
  • Patch release: v0.1.6 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/66

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.5...v0.1.6

- Python
Published by dylanbouchard 8 months ago

https://github.com/cvs-health/uqlm - v0.1.5

Highlights

  • add missing unit tests to achieve 100% code coverage
  • implement auto-linting/formatting with ruff
  • reduce Tuner (and UQEnsemble.tune) latency (no API changes)
  • allow likert option for judges

What's Changed

  • v0.1.2 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/34
  • adding Likert scale scoring for LLMJudge class by @zeya30 in https://github.com/cvs-health/uqlm/pull/36
  • Tuner class: Low Latency by @mohitcek in https://github.com/cvs-health/uqlm/pull/39
  • Linting CI workflow by @dimtsap in https://github.com/cvs-health/uqlm/pull/28
  • Za/unit tests by @zeya30 in https://github.com/cvs-health/uqlm/pull/50
  • Bugfix/ruff linting by @mohitcek in https://github.com/cvs-health/uqlm/pull/55
  • Additional unit tests for UQensemble class by @mohitcek in https://github.com/cvs-health/uqlm/pull/54
  • dependabot security fix by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/56
  • patch release: reduce tuner latency, add unit tests, auto linting by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/64

New Contributors

  • @dimtsap made their first contribution in https://github.com/cvs-health/uqlm/pull/28

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.4...v0.1.5

- Python
Published by dylanbouchard 9 months ago

https://github.com/cvs-health/uqlm - v0.1.4

What's Changed

  • Patch release: v0.1.3 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/48

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.2...v0.1.4

- Python
Published by dylanbouchard 9 months ago

https://github.com/cvs-health/uqlm - v0.1.3

Highlights

  • upgrade tornado version per dependabot

What's Changed

  • Patch release: v0.1.3 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/48

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.2...v0.1.3

- Python
Published by dylanbouchard 9 months ago

https://github.com/cvs-health/uqlm - v0.1.2

Highlights

  • streamline workflow for LLMPanel by enabling scoring template specification in the constructor
  • update LLMPanel demo
  • fix typos in readme
  • update readme badges
  • fix bleurt error message typo

What's Changed

  • v0.1.0 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/16
  • v0.1.1 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/18
  • Update readme, error message by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/22
  • Simplify LLMPanel workflow by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/23
  • Patch/v0.1.2 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/26

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.1...v0.1.2

- Python
Published by dylanbouchard 10 months ago

https://github.com/cvs-health/uqlm - v0.1.1

Highlights

  • Restore missing argument, thresh_objective, for UQEnsemble

What's Changed

  • Refactor UQEnsemble class by @mohitcek in https://github.com/cvs-health/uqlm/pull/17

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.1.0...v0.1.1

- Python
Published by dylanbouchard 10 months ago

https://github.com/cvs-health/uqlm - v0.1.0

UQLM v0.1.0 Release Notes

Introducing UQLM: Uncertainty Quantification for Language Models. UQLM is an Python library for detecting LLM hallucinations using state-of-the-art uncertainty quantification techniques.

Highlights

Comprehensive Scorer Suite

UQLM offers a versatile suite of response-level scorers, each providing a confidence score to indicate the likelihood of errors or hallucinations. The scorers are categorized into four main types:

🎯 Black-Box Scorers: Assess uncertainty through response consistency, compatible with any LLM.

🎲 White-Box Scorers: Utilize token probabilities for faster and cost-effective uncertainty estimation.

⚖️ LLM-as-a-Judge Scorers: Employ LLMs to evaluate response reliability, customizable through prompt engineering.

🔀 Ensemble Scorers: Combine multiple scorers for robust and flexible uncertainty/confidence estimates.

Installation:

Install the latest version from PyPI with: bash pip install uqlm

Documentation and Demos:

Visit our documentation site for detailed instructions, API references, and demo notebooks showcasing various hallucination detection methods. The following demo notebooks are available:

Associated Research:

Our companion paper provides a technical description of the UQLM scorers and extensive experimental results, introducing a novel, tunable ensemble approach.

- Python
Published by dylanbouchard 10 months ago

https://github.com/cvs-health/uqlm - v0.1.0rc0

UQLM pre-release

- Python
Published by dylanbouchard 10 months ago