https://github.com/daffidwilde/certus

Confidence parsing of LLM outputs

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Confidence parsing of LLM outputs

Basic Info

Host: GitHub
Owner: daffidwilde
License: mit
Language: Python
Default Branch: main
Size: 87.9 KB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 2

Created 10 months ago · Last pushed 9 months ago

Metadata Files

Readme

Certus: understanding LLM certainty

Certus allows you to estimate confidence in a LLM response, both as a whole and in each part. It does this by parsing the log-probabilities from your response into a tree of nodes.

We build this tree from an ordered collection of certus.nodes.core.Token instances and gathering them up recursively into a tree matching the structure of the response. Each Token is considered a leaf node in the tree, and higher-up nodes in the tree are of other types.

Installation

The most convenient way to install Certus is to do so from PyPI:

bash python -m pip install certus

Developers

If you are planning to do some development work on Certus, please install the package from source and use uv:

bash git clone https://github.com/daffidwilde/certus cd certus uv sync --dev

Usage

Extracting token nodes from a response

To map your LLM response to the collection of leaf nodes, use the certus.interface module:

```python

import certus as ct from google.genai import types

data = "certus" logprobs = types.LogprobsResult( # taken from response.candidates[0].logprobsResult ... chosencandidates=[ ... types.LogprobsResultCandidate(logprobability=0.0, token='"', tokenid=24), ... types.LogprobsResultCandidate(logprobability=-0.0123, token="certus", tokenid=42), ... types.LogprobsResultCandidate(logprobability=0.0, token='"', tokenid=24), ... ] ... ) tokens = ct.interface.fromgoogle(logprobs) tokens [Token(value='"', logprob=0.0, start=0), Token(value='certus', logprob=-0.0123, start=1), Token(value='"', logprob=0.0, start=7)]

```

This list of token nodes is ready to be parsed into a tree.

Building a tree

Consider this piece of JSON-friendly data:

```python

import certus as ct

data = { ... "name": "Henry Wilde", ... "age": 29, ... "longestwalkkm": 160.9, ... "pets": [ ... { ... "name": "Billie", ... "species": "cat", ... "favouritefoods": [ ... "fish", ... "oat milk", ... { ... "name": "chicken", ... "preparation": "boiled", ... "whensick": True, ... }, ... ], ... }, ... ], ... }

```

Let's say this data came from a gpt-4o response. We can tokenise this dictionary using tiktoken and simulate some log-probabilities to go with them. From there, we can create a collection of Token leaf nodes ready for parsing; details to do this are hidden below.

Simulating data tokens

```python >>> import json >>> import random >>> >>> import tiktoken >>> >>> def tokenise_string(string: str, encoder: tiktoken.Encoding) -> list[str]: ... encoded = encoder.encode(string) ... return [encoder.decode_single_token_bytes(e).decode() for e in encoded] >>> >>> encoder = tiktoken.encoding_for_model("gpt-4o") >>> data_tokenised = tokenise_string(json.dumps(data), encoder) >>> >>> random.seed(0) >>> tokens, position = [], 0 >>> for t in data_tokenised: ... tokens.append(ct.nodes.Token(t, -round(random.expovariate(1e4), 6), position)) ... position += len(t) >>> >>> assert json.loads("".join(t.value for t in tokens)) == data >>> ```

Now, we can parse this dictionary response and token nodes into a single Object node using the certus.parsers.parse_json() function:

```python

parsed = ct.parsers.parsejson(data, tokens) parsed # doctest:+SKIP Object( fields={ 'name': Composite(children=[Token(value=' "', logprob=-3e-05, start=8), Token(value='Henry', logprob=-7.2e-05, start=10), Token(value=' Wilde', logprob=-5.2e-05, start=15), Token(value='",', logprob=-0.000153, start=21)]), 'age': Token(value='29', logprob=-7e-05, start=31), 'longestwalkkm': Composite(children=[Token(value='160', logprob=-0.000131, start=54), Token(value='.', logprob=-0.000229, start=57), Token(value='9', logprob=-0.000115, start=58)]), 'pets': Array( elements=[ Object( fields={ 'name': Composite(children=[Token(value=' "', logprob=-0.0002, start=78), Token(value='Bill', logprob=-3e-05, start=80), Token(value='ie', logprob=-0.000163, start=84), Token(value='",', logprob=-8e-05, start=86)]), 'species': Composite(children=[Token(value=' "', logprob=-0.000174, start=99), Token(value='cat', logprob=-0.00011, start=101), Token(value='",', logprob=-0.0, start=104)]), 'favouritefoods': Array( elements=[ Composite(children=[Token(value=' ["', logprob=-8.4e-05, start=125), Token(value='fish', logprob=-2.7e-05, start=128), Token(value='",', logprob=-0.000343, start=132)]), Composite(children=[Token(value=' "', logprob=-0.000163, start=134), Token(value='o', logprob=-5.9e-05, start=136), Token(value='at', logprob=-8e-06, start=137), Token(value=' milk', logprob=-3.9e-05, start=139), Token(value='",', logprob=-7.1e-05, start=144)]), Object( fields={ 'name': Composite(children=[Token(value=' "', logprob=-0.000123, start=155), Token(value='ch', logprob=-7.9e-05, start=157), Token(value='icken', logprob=-0.000168, start=159), Token(value='",', logprob=-7.8e-05, start=164)]), 'preparation': Composite(children=[Token(value=' "', logprob=-9.1e-05, start=181), Token(value='bo', logprob=-4.9e-05, start=183), Token(value='iled', logprob=-8.6e-05, start=185), Token(value='",', logprob=-3.4e-05, start=189)]), 'when_sick': Token(value=' true', logprob=-9e-06, start=204) } ) ] ) }
) ] ) } )

```

That's a lot of information, but you should be able to see a few node types here:

certus.nodes.core.Composite: a collection of Token nodes
certus.nodes.struct.Array: a collection of node elements, which behaves like a list
certus.nodes.struct.Object: a mapping of keys to nodes, which behaves like a dict

We can leverage the list/dict-like properties of our Object node to look at the confidence in its various components:

```python

parsed.confidence # the whole response 0.9999025047529705 for key, value in parsed.items(): ... print(key.ljust(16), value.confidence) name 0.9999232529452059 age 0.9999300024499428 longestwalkkm 0.9998416792007273 pets 0.9999055044649844

parsed["pets"][0]["favourite_foods"][-1]["name"].confidence # Billie's last favourite food 0.9998880062717659

```

Owner

Name: Henry Wilde
Login: daffidwilde
Kind: user
Location: Cardiff, UK
Company: Dŵr Cymru Welsh Water

Repositories: 29
Profile: https://github.com/daffidwilde

Data scientist and advocate for open-source, sustainably developed software 🛸 🐐 🦆

GitHub Events

Total

Create event: 9
Issues event: 2
Release event: 2
Watch event: 1
Delete event: 4
Push event: 12
Pull request event: 6

Last Year

Create event: 9
Issues event: 2
Release event: 2
Watch event: 1
Delete event: 4
Push event: 12
Pull request event: 6

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 1
Total pull requests: 7
Average time to close issues: N/A
Average time to close pull requests: about 1 hour
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 7
Average time to close issues: N/A
Average time to close pull requests: about 1 hour
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

daffidwilde (1)

Pull Request Authors

daffidwilde (7)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 199 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

pypi.org: certus

Confidence parsing of LLM outputs

Documentation: https://certus.readthedocs.io/
License: Copyright 2025 Henry Wilde Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Latest release: 0.0.2
published 10 months ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 199 Last month

Rankings

Dependent packages count: 8.7%

Average: 28.9%

Dependent repos count: 49.0%

Maintainers (1)

daffidwilde

Last synced: 9 months ago

Dependencies

pyproject.toml pypi

uv.lock pypi

attrs 25.3.0
certus *
colorama 0.4.6
coverage 7.10.3
hypothesis 6.137.3
iniconfig 2.1.0
nodeenv 1.9.1
packaging 25.0
pluggy 1.6.0
pygments 2.19.2
pyright 1.1.403
pytest 8.4.1
pytest-cov 6.2.1
pytest-randomly 3.16.0
pytest-sugar 1.0.0
ruff 0.12.8
sortedcontainers 2.4.0
termcolor 3.1.0
typing-extensions 4.14.1

.github/workflows/ci.yml actions

actions/checkout v4 composite
astral-sh/setup-uv v6 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/daffidwilde/certus

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Certus: understanding LLM certainty

Installation

Developers

Usage

Extracting token nodes from a response

Building a tree

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: certus

Rankings

Maintainers (1)

Dependencies