prob-conf-mat

Confusion matrices with uncertainty quantification, experiment aggregation and significance testing.

https://github.com/ioverho/prob_conf_mat

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: pubmed.ncbi, ncbi.nlm.nih.gov, springer.com, frontiersin.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.5%) to scientific vocabulary

Keywords

classification confusion-matrices confusion-matrix probabilistic python statistics

Last synced: 6 months ago · JSON representation ·

Repository

Confusion matrices with uncertainty quantification, experiment aggregation and significance testing.

Basic Info

Host: GitHub
Owner: ioverho
License: mit
Language: Python
Default Branch: main
Homepage: http://www.ivoverhoeven.nl/prob_conf_mat/
Size: 15 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 3
Releases: 3

Topics

classification confusion-matrices confusion-matrix probabilistic python statistics

Created about 3 years ago · Last pushed 7 months ago

Metadata Files

Readme License Citation

Probabilistic Confusion Matrices

prob_conf_mat is a Python package for performing statistical inference with confusion matrices. It quantifies the amount of uncertainty present, aggregates semantically related experiments into experiment groups, and compares experiments against each other for significance.

Installation

Installation can be done using from pypi can be done using pip:

bash pip install prob_conf_mat

Or, if you're using uv, simply run:

bash uv add prob_conf_mat

The project currently depends on the following packages:

Dependency tree

```txt prob-conf-mat ├── jaxtyping ├── matplotlib ├── numpy ├── scipy └── tabulate ``` Additionally, [`pandas`](https://pandas.pydata.org/) is an optional dependency for some reporting functions.

Development Environment

This project was developed using uv. To install the development environment, simply clone this github repo:

bash git clone https://github.com/ioverho/prob_conf_mat.git

And then run the uv sync --dev command:

bash uv sync --dev

The development dependencies should automatically install into the .venv folder.

Documentation

For more information about the package, motivation, how-to guides and implementation, please see the documentation website. We try to use Daniele Procida's structure for Python documentation.

The documentation is broadly divided into 4 sections:

Getting Started: a collection of small tutorials to help new users get started
How To: more expansive guides on how to achieve specific things
Reference: in-depth information about how to interface with the library
Explanation: explanations about why things are the way they are

| | Learning | Coding | | --------------- | ------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | | Practical | Getting Started | How-To Guides | | Theoretical | Explanation | Reference |

Quick Start

In depth tutorials taking you through all basic steps are available on the documentation site. For the impatient, here's a standard use case.

First define a study, and set some sensible hyperparameters for the simulated confusion matrices.

```python from probconfmat import Study

study = Study( seed=0, numsamples=10000, ciprobability=0.95, ) ```

Then add a experiment and confusion matrix to the study:

python study.add_experiment( experiment_name="model_1/fold_0", confusion_matrix=[ [13, 0, 0], [0, 10, 6], [0, 0, 9], ], confusion_prior=0, prevalence_prior=1, )

Finally, add some metrics to the study:

python study.add_metric("acc")

We are now ready to start generating summary statistics about this experiment. For example:

python study.report_metric_summaries( metric="acc", table_fmt="github" )

| Group | Experiment | Observed | Median | Mode | 95.0% HDI | MU | Skew | Kurt | |---------|--------------|------------|----------|--------|------------------|--------|---------|--------| | model1 | fold0 | 0.8421 | 0.8499 | 0.8673 | [0.7307, 0.9464] | 0.2157 | -0.5627 | 0.2720 |

So while this experiment achieves an accuracy of 84.21%, a more reasonable estimate (given the size of the test set, and) would be 84.99%. There is a 95% probability that the true accuracy lies between 73.07%-94.64%.

Visually that looks something like:

python fig = study.plot_metric_summaries(metric="acc")

Metric distribution

Now let's add a confusion matrix for the same model, but estimated using a different fold. We want to know what the average performance is for that model across the different folds:

python study.add_experiment( experiment_name="model_1/fold_1", confusion_matrix=[ [12, 1, 0], [1, 8, 7], [0, 2, 7], ], confusion_prior=0, prevalence_prior=1, )

We can equip each metric with an inter-experiment aggregation method, and we can then request summary statistics about the aggregate performance of the experiments using 'model_1':

```python study.add_metric( metric="acc", aggregation="beta", )

fig = study.plotforestplot(metric="acc") ```

Forest plot

Note that estimated aggregate accuracy has much less uncertainty (a smaller HDI/MU).

These experiments seem pretty different. But is this difference significant? Let's assume that for this example a difference needs to be at least '0.05' to be considered significant. In that case, we can quickly request the probability of their difference:

python fig = study.plot_pairwise_comparison( metric="acc", experiment_a="model_1/fold_0", experiment_b="model_1/fold_1", min_sig_diff=0.05, )

Comparison plot

There's about an 82% probability that the difference is in fact significant. While likely, there isn't quite enough data to be sure.

Development

This project was developed using the following (amazing) tools:

Package management: uv
Linting: ruff
Static Type-Checking: pyright
Documentation: mkdocs
CI: pre-commit

Most of the common development commands are included in ./Makefile. If make is installed, you can immediately run the following commands:

```txt Usage: make

Utility help Display this help hello-world Tests uv and make

Environment install Install default dependencies install-dev Install dev dependencies upgrade Upgrade installed dependencies export Export uv to requirements.txt file

Testing, Linting, Typing & Formatting test Runs all tests coverage Checks test coverage lint Run linting type Run static typechecking commit Run pre-commit checks

Documentation mkdocs Update the docs mkdocs-serve Serve documentation site ```

Credits

The following are some packages and libraries which served as inspiration for aspects of this project: arviz, bayestestR, BERTopic, jaxtyping, mici, , python-ci, statsmodels.

A lot of the approaches and methods used in this project come from published works. Some especially important works include:

Goutte, C., & Gaussier, E. (2005). A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In European conference on information retrieval (pp. 345-359). Berlin, Heidelberg: Springer Berlin Heidelberg.
Tötsch, N., & Hoffmann, D. (2021). Classifier uncertainty: evidence, potential impact, and probabilistic treatment. PeerJ Computer Science, 7, e398.
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2), 573.
Makowski, D., Ben-Shachar, M. S., Chen, S. A., & Lüdecke, D. (2019). Indices of effect existence and significance in the Bayesian framework. Frontiers in psychology, 10, 2767.
Hill, T. (2011). Conflations of probability distributions. Transactions of the American Mathematical Society, 363(6), 3351-3372.
Chandler, J., Cumpston, M., Li, T., Page, M. J., & Welch, V. J. H. W. (2019). Cochrane handbook for systematic reviews of interventions. Hoboken: Wiley, 4.

Citation

bibtex @software{ioverho_prob_conf_mat, author = {Verhoeven, Ivo}, license = {MIT}, title = {{prob\_conf\_mat}}, url = {https://github.com/ioverho/prob_conf_mat} }

Owner

Name: Ivo Verhoeven
Login: ioverho
Kind: user
Location: Amsterdam, the Netherlands
Company: University of Amsterdam

Website: www.ivoverhoeven.com
Repositories: 4
Profile: https://github.com/ioverho

NLP PhD candidate at the University of Amsterdam's Institute for Logic, Language and Computation.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: prob_conf_mat
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Ivo
    family-names: Verhoeven
    email: mail@ivoverhoeven.nl
    affiliation: University of Amsterdam
    orcid: 'https://orcid.org/0000-0002-5163-210X'
repository-code: 'https://github.com/ioverho/prob_conf_mat'
abstract: >-
  Confusion matrices with uncertainty quantification,
  experiment aggregation and significance testing.
keywords:
  - confusion matrices
  - classification
  - confusion matrix
  - statistics
  - probabilistic
license: MIT

GitHub Events

Total

Release event: 2
Delete event: 3
Issue comment event: 5
Push event: 32
Pull request event: 17
Create event: 10

Last Year

Release event: 2
Delete event: 3
Issue comment event: 5
Push event: 32
Pull request event: 17
Create event: 10

Packages

Total packages: 1
Total downloads:
- pypi 263 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

pypi.org: prob-conf-mat

Confusion matrices with uncertainty quantification, experiment aggregation and significance testing.

Homepage: https://www.ivoverhoeven.nl/prob_conf_mat/
Documentation: https://www.ivoverhoeven.nl/prob_conf_mat/
License: MIT License Copyright (c) 2025 Ivo Verhoeven Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Latest release: 0.1.0rc5
published 8 months ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 263 Last month

Rankings

Dependent packages count: 8.9%

Average: 29.5%

Dependent repos count: 50.1%

Maintainers (1)

ioverho

Last synced: 7 months ago

Dependencies

.github/workflows/documentation.yaml actions

actions/checkout v4 composite
astral-sh/setup-uv v5 composite

.github/workflows/test.yaml actions

actions/checkout v4 composite
astral-sh/setup-uv v5 composite
codecov/codecov-action v5 composite

pyproject.toml pypi

jaxtyping >=0.3
matplotlib >=3.10
numpy >=2.2
scipy >=1.15
seaborn >=0.13
tabulate >=0.9

requirements.txt pypi

contourpy ==1.3.2
cycler ==0.12.1
fonttools ==4.58.4
jaxtyping ==0.3.2
kiwisolver ==1.4.8
matplotlib ==3.10.3
numpy ==2.3.0
packaging ==25.0
pandas ==2.3.0
pillow ==10.4.0
pyparsing ==3.2.3
python-dateutil ==2.9.0.post0
pytz ==2025.2
scipy ==1.15.3
seaborn ==0.13.2
six ==1.17.0
tabulate ==0.9.0
tzdata ==2025.2
wadler-lindig ==0.1.7

uv.lock pypi

143 dependencies

.github/workflows/pypi-publish.yaml actions

actions/checkout v4 composite
astral-sh/setup-uv v6 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

prob-conf-mat

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Probabilistic Confusion Matrices

Installation

Development Environment

Documentation

Quick Start

Development

Credits

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Packages

pypi.org: prob-conf-mat

Rankings

Maintainers (1)

Dependencies