infohdpy

https://github.com/damianghl/infohdpy

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: mdpi.com
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: damianghl
Language: Jupyter Notebook
Default Branch: master
Size: 1.47 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 3

Created over 1 year ago · Last pushed 10 months ago

Metadata Files

Readme Citation

infohdpy: Python package for mutual information estimation using Hierarchical Dirichlet Priors

Estimating the mutual information between discrete variables with limited samples.

Installation and requirements

Clone the Repository: bash git clone https://github.com/dghernandez/info-estimation.git cd info-estimation
Check and Install Requirements: bash python --version pip install -r requirements.txt
Install the Package: bash pip install -e .

Basic usage

```python

Import

from infohdp.estimators import MulticlassFullInfoHDPEstimator

Create an instance of Estimator for multiclass case

estimator = MulticlassFullInfoHDPEstimator()

Samples in format [(x0, y0), (x1, y1), (x2, y2), ...]

samples = [(15, 1), (35, 0), (2, 0), (29, 1), (35, 0), (35, 0), (21, 1), (21, 0), (29, 1), (21, 1)]

ihdp, dihdp = estimator.estimatemutualinformation(samples) print(f"Ihdp full multiclass, mutual information estimation [nats]: {ihdp:.4f} ± {dihdp:.4f}") ```

Core calculations and conditions

The main parts of the code are rather simple and they can be implemented in any programming language: (1) a maximization of the marginal log-likelihood (or posterior) over the hyperparameter beta, and (2) the evaluation of the posterior information in such beta.

First, we need to obtain the ML (or MAP) estimate for the hyperparameter beta (from Eq. (14) in the paper, or Eq. (21) for the symmetric binary case). I recommend to do this maximization with the argument log(beta), and explore the interval log(0.01)1) need to be considered, as the others add a constant term in beta. In an undersampled regime, there would be many repeated terms and they can be grouped together for a more efficient evaluation (using multiplicities). Secondly, we evaluate the posterior information (see Eq. (16) in the paper, or Eq. (20) for the symmetric binary case) in the beta found previously. In this evaluation, all occupied states (n_x>0) need to be included.

Our method needs coincidences on the large entropy variable X, which starts to happen when N> exp(H(X)/2). If there are no coincidences, then the marginal likelihood is flat on beta. If there are few coincidences and no prior on beta is used, the maximum may be attained for beta tending to zero or infinity. In such cases the posterior information is still well-defined and takes the values of H(Y) or zero, respectively.

This page is maintained by Damián G. Hernández. (email address in paper https://www.mdpi.com/1099-4300/21/6/623)

Owner

Name: d3mian
Login: damianghl
Kind: user

Repositories: 1
Profile: https://github.com/damianghl

Bayesian Inference, Neural Coding, Information Theory

Citation (CITATION.cff)

cff-version: 1.2.0
title: >-
  infohdp: Python Package For Mutual Information Estimation
  Using Hierarchical Dirichlet Priors
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Damián G.
    family-names: Hernández
date-released: 2024-09-06
identifiers:
  - type: url
    value: 'https://github.com/damianghl/infohdpy'

GitHub Events

Total

Push event: 3

Last Year

Push event: 3

Dependencies

requirements.txt pypi

jupyter *
matplotlib *
ndd *
numpy *
pandas *
scipy *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science