dit

dit: a Python package for discrete information theory - Published in JOSS (2018)

https://github.com/dit/dit

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    3 of 22 committers (13.6%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

information-theory python

Scientific Fields

Sociology Social Sciences - 40% confidence
Last synced: 4 months ago · JSON representation ·

Repository

Python package for information theory.

Basic Info
  • Host: GitHub
  • Owner: dit
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Homepage: http://docs.dit.io
  • Size: 4.42 MB
Statistics
  • Stars: 551
  • Watchers: 24
  • Forks: 90
  • Open Issues: 33
  • Releases: 17
Topics
information-theory python
Created about 12 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.rst

``dit`` is a Python package for information theory.

|build| |codecov| |codacy| |deps|

|docs| |slack| |saythanks| |conda|

|joss| |zenodo|

Try ``dit`` live: |binder|

Introduction
------------

Information theory is a powerful extension to probability and statistics, quantifying dependencies
among arbitrary random variables in a way that is consistent and comparable across systems and
scales. Information theory was originally developed to quantify how quickly and reliably information
could be transmitted across an arbitrary channel. The demands of modern, data-driven science have
been coopting and extending these quantities and methods into unknown, multivariate settings where
the interpretation and best practices are not known. For example, there are at least four reasonable
multivariate generalizations of the mutual information, none of which inherit all the
interpretations of the standard bivariate case. Which is best to use is context-dependent. ``dit``
implements a vast range of multivariate information measures in an effort to allow information
practitioners to study how these various measures behave and interact in a variety of contexts. We
hope that having all these measures and techniques implemented in one place will allow the
development of robust techniques for the automated quantification of dependencies within a system
and concrete interpretation of what those dependencies mean.

Citing
------

If you use `dit` in your research, please cite it as::

   @article{dit,
     Author = {James, R. G. and Ellison, C. J. and Crutchfield, J. P.},
     Title = {{dit}: a {P}ython package for discrete information theory},
     Journal = {The Journal of Open Source Software},
     Volume = {3},
     Number = {25},
     Pages = {738},
     Year = {2018},
     Doi = {https://doi.org/10.21105/joss.00738}
   }

Basic Information
-----------------

Documentation
*************

http://docs.dit.io

Downloads
*********

https://pypi.org/project/dit/

https://anaconda.org/conda-forge/dit

+-------------------------------------------------------------------+
| Dependencies                                                      |
+===================================================================+
| * Python 3.3+                                                     |
| * `boltons `_                     |
| * `debtcollector `_    |
| * `lattices `_                   |
| * `networkx `_                       |
| * `numpy `_                                |
| * `PLTable `_                |
| * `scipy `_                               |
+-------------------------------------------------------------------+

Optional Dependencies
~~~~~~~~~~~~~~~~~~~~~
* colorama: colored column heads in PID indicating failure modes
* cython: faster sampling from distributions
* hypothesis: random sampling of distributions
* matplotlib, python-ternary: plotting of various information-theoretic expansions
* numdifftools: numerical evaluation of gradients and hessians during optimization
* pint: add units to informational values
* scikit-learn: faster nearest-neighbor lookups during entropy/mutual information estimation from samples

Install
*******

The easiest way to install is:

.. code-block:: bash

  pip install dit

If you want to install `dit` within a conda environment, you can simply do:

.. code-block:: bash

  conda install -c conda-forge dit

Alternatively, you can clone this repository, move into the newly created
``dit`` directory, and then install the package:

.. code-block:: bash

  git clone https://github.com/dit/dit.git
  cd dit
  pip install .

.. note::

  The cython extensions are currently not supported on windows. Please install
  using the ``--nocython`` option.


Testing
*******
.. code-block:: shell

  $ git clone https://github.com/dit/dit.git
  $ cd dit
  $ pip install -r requirements_testing.txt
  $ py.test

Code and bug tracker
********************

https://github.com/dit/dit

License
*******

BSD 3-Clause, see LICENSE.txt for details.

Implemented Measures
--------------------

``dit`` implements the following information measures. Most of these are implemented in multivariate & conditional
generality, where such generalizations either exist in the literature or are relatively obvious --- for example,
though it is not in the literature, the multivariate conditional exact common information is implemented here.

+------------------------------------------+-----------------------------------------+-----------------------------------+
| Entropies                                | Mutual Informations                     | Divergences                       |
|                                          |                                         |                                   |
| * Shannon Entropy                        | * Co-Information                        | * Variational Distance            |
| * Renyi Entropy                          | * Interaction Information               | * Kullback-Leibler Divergence \   |
| * Tsallis Entropy                        | * Total Correlation /                   |   Relative Entropy                |
| * Necessary Conditional Entropy          |   Multi-Information                     | * Cross Entropy                   |
| * Residual Entropy /                     | * Dual Total Correlation /              | * Jensen-Shannon Divergence       |
|   Independent Information /              |   Binding Information                   | * Earth Mover's Distance          |
|   Variation of Information               | * CAEKL Multivariate Mutual Information +-----------------------------------+
+------------------------------------------+-----------------------------------------+ Other Measures                    |
| Common Informations                      | Partial Information Decomposition       |                                   |
|                                          |                                         | * Channel Capacity                |
| * Gacs-Korner Common Information         | * :math:`I_{min}`                       | * Complexity Profile              |
| * Wyner Common Information               | * :math:`I_{\wedge}`                    | * Connected Informations          |
| * Exact Common Information               | * :math:`I_{RR}`                        | * Copy Mutual Information         |
| * Functional Common Information          | * :math:`I_{\downarrow}`                | * Cumulative Residual Entropy     |
| * MSS Common Information                 | * :math:`I_{proj}`                      | * Extropy                         |
+------------------------------------------+ * :math:`I_{BROJA}`                     | * Hypercontractivity Coefficient  |
| Secret Key Agreement Bounds              | * :math:`I_{ccs}`                       | * Information Bottleneck          |
|                                          | * :math:`I_{\pm}`                       | * Information Diagrams            |
| * Secrecy Capacity                       | * :math:`I_{dep}`                       | * Information Trimming            |
| * Intrinsic Mutual Information           | * :math:`I_{RAV}`                       | * Lautum Information              |
| * Reduced Intrinsic Mutual Information   | * :math:`I_{mmi}`                       | * LMPR Complexity                 |
| * Minimal Intrinsic Mutual Information   | * :math:`I_{\prec}`                     | * Marginal Utility of Information |
| * Necessary Intrinsic Mutual Information | * :math:`I_{RA}`                        | * Maximum Correlation             |
| * Two-Part Intrinsic Mutual Information  | * :math:`I_{SKAR}`                      | * Maximum Entropy Distributions   |
|                                          | * :math:`I_{IG}`                        | * Perplexity                      |
|                                          | * :math:`I_{RDR}`                       | * Rate-Distortion Theory          |
|                                          |                                         | * TSE Complexity                  |
+------------------------------------------+-----------------------------------------+-----------------------------------+

Quickstart
----------

The basic usage of ``dit`` corresponds to creating distributions, modifying them
if need be, and then computing properties of those distributions. First, we
import:

.. code:: python

   >>> import dit

Suppose we have a really thick coin, one so thick that there is a reasonable
chance of it landing on its edge. Here is how we might represent the coin in
``dit``.

.. code:: python

   >>> d = dit.Distribution(['H', 'T', 'E'], [.4, .4, .2])
   >>> print(d)
   Class:          Distribution
   Alphabet:       ('E', 'H', 'T') for all rvs
   Base:           linear
   Outcome Class:  str
   Outcome Length: 1
   RV Names:       None

   x   p(x)
   E   0.2
   H   0.4
   T   0.4

Calculate the probability of ``H`` and also of the combination ``H or T``.

.. code:: python

   >>> d['H']
   0.4
   >>> d.event_probability(['H','T'])
   0.8

Calculate the Shannon entropy and extropy of the joint distribution.

.. code:: python

   >>> dit.shannon.entropy(d)
   1.5219280948873621
   >>> dit.other.extropy(d)
   1.1419011889093373

Create a distribution where ``Z = xor(X, Y)``.

.. code:: python

   >>> import dit.example_dists
   >>> d = dit.example_dists.Xor()
   >>> d.set_rv_names(['X', 'Y', 'Z'])
   >>> print(d)
   Class:          Distribution
   Alphabet:       ('0', '1') for all rvs
   Base:           linear
   Outcome Class:  str
   Outcome Length: 3
   RV Names:       ('X', 'Y', 'Z')

   x     p(x)
   000   0.25
   011   0.25
   101   0.25
   110   0.25

Calculate the Shannon mutual informations ``I[X:Z]``, ``I[Y:Z]``, and
``I[X,Y:Z]``.

.. code:: python

   >>> dit.shannon.mutual_information(d, ['X'], ['Z'])
   0.0
   >>> dit.shannon.mutual_information(d, ['Y'], ['Z'])
   0.0
   >>> dit.shannon.mutual_information(d, ['X', 'Y'], ['Z'])
   1.0

Calculate the marginal distribution ``P(X,Z)``.
Then print its probabilities as fractions, showing the mask.

.. code:: python

   >>> d2 = d.marginal(['X', 'Z'])
   >>> print(d2.to_string(show_mask=True, exact=True))
   Class:          Distribution
   Alphabet:       ('0', '1') for all rvs
   Base:           linear
   Outcome Class:  str
   Outcome Length: 2 (mask: 3)
   RV Names:       ('X', 'Z')

   x     p(x)
   0*0   1/4
   0*1   1/4
   1*0   1/4
   1*1   1/4

Convert the distribution probabilities to log (base 3.5) probabilities, and
access its probability mass function.

.. code:: python

   >>> d2.set_base(3.5)
   >>> d2.pmf
   array([-1.10658951, -1.10658951, -1.10658951, -1.10658951])

Draw 5 random samples from this distribution.

.. code:: python

   >>> dit.math.prng.seed(1)
   >>> d2.rand(5)
   ['01', '10', '00', '01', '00']

Contributions & Help
--------------------

If you'd like to feature added to ``dit``, please file an issue. Or, better yet, open a pull request. Ideally, all code should be tested and documented, but please don't let this be a barrier to contributing. We'll work with you to ensure that all pull requests are in a mergable state.

If you'd like to get in contact about anything, you can reach us through our `slack channel `_.


.. badges:

.. |build| image:: https://github.com/dit/dit/actions/workflows/build.yml/badge.svg
   :target: https://github.com/dit/dit/actions/workflows/build.yml
   :alt: Continuous Integration Status

.. |codecov| image:: https://codecov.io/gh/dit/dit/branch/master/graph/badge.svg
  :target: https://codecov.io/gh/dit/dit
  :alt: Test Coverage Status

.. |coveralls| image:: https://coveralls.io/repos/dit/dit/badge.svg?branch=master
   :target: https://coveralls.io/r/dit/dit?branch=master
   :alt: Test Coverage Status

.. |docs| image:: https://readthedocs.org/projects/dit/badge/?version=latest
   :target: http://dit.readthedocs.org/en/latest/?badge=latest
   :alt: Documentation Status

.. |health| image:: https://landscape.io/github/dit/dit/master/landscape.svg?style=flat
   :target: https://landscape.io/github/dit/dit/master
   :alt: Code Health

.. |codacy| image:: https://api.codacy.com/project/badge/Grade/b1beeea8ada647d49f97648216fd9687
   :target: https://www.codacy.com/app/Autoplectic/dit?utm_source=github.com&utm_medium=referral&utm_content=dit/dit&utm_campaign=Badge_Grade
   :alt: Code Quality

.. |deps| image:: https://requires.io/github/dit/dit/requirements.svg?branch=master
   :target: https://requires.io/github/dit/dit/requirements/?branch=master
   :alt: Requirements Status

.. |conda| image:: https://anaconda.org/conda-forge/dit/badges/installer/conda.svg
   :target: https://anaconda.org/conda-forge/dit
   :alt: Conda installation

.. |zenodo| image:: https://zenodo.org/badge/13201610.svg
   :target: https://zenodo.org/badge/latestdoi/13201610
   :alt: DOI

.. |gitter| image:: https://badges.gitter.im/Join%20Chat.svg
   :target: https://gitter.im/dit/dit?utm_source=badge&utm_medium=badge
   :alt: Join the Chat

.. |saythanks| image:: https://img.shields.io/badge/SayThanks.io-%E2%98%BC-1EAEDB.svg
   :target: https://saythanks.io/to/Autoplectic
   :alt: Say Thanks!

.. |depsy| image:: http://depsy.org/api/package/pypi/dit/badge.svg
   :target: http://depsy.org/package/python/dit
   :alt: Research software impact

.. |waffle| image:: https://badge.waffle.io/dit/dit.png?label=ready&title=Ready
   :target: https://waffle.io/dit/dit?utm_source=badge
   :alt: Stories in Ready

.. |slack| image:: https://img.shields.io/badge/Slack-dit--python-lightgrey.svg
   :target: https://dit-python.slack.com/
   :alt: dit chat

.. |joss| image:: http://joss.theoj.org/papers/10.21105/joss.00738/status.svg
   :target: https://doi.org/10.21105/joss.00738
   :alt: JOSS Status

.. |binder| image:: https://mybinder.org/badge.svg
   :target: https://mybinder.org/v2/gh/dit/dit/master?filepath=examples
   :alt: Run `dit` live!

Owner

  • Name: dit
  • Login: dit
  • Kind: organization

JOSS Publication

dit: a Python package for discrete information theory
Published
May 31, 2018
Volume 3, Issue 25, Page 738
Authors
Ryan G. James ORCID
Complexity Sciences Center, Department of Physics, University of California at Davis
Christopher J. Ellison
None
James P. Crutchfield
Complexity Sciences Center, Department of Physics, University of California at Davis
Editor
Arfon Smith ORCID
Tags
information theory

Citation (CITATION)

# Citation

## bibtex

@article{dit,
    Author = {James, R. G. and Ellison, C. J. and Crutchfield, J. P.},
    Title = {{dit}: a {P}ython package for discrete information theory},
    Journal = {The Journal of Open Source Software},
    Volume = {3},
    Number = {25},
    Pages = {738},
    Year = {2018},
    Doi = {https://doi.org/10.21105/joss.00738}
}

## other

dit: a Python package for discrete information theory
RG James, CJ Ellison, JP Crutchfield
The Journal of Open Source Software 3 (25), 738

Papers & Mentions

Total mentions: 5

MENSAdb: a thorough structural analysis of membrane protein dimers
Last synced: 2 months ago
Unique Information and Secret Key Agreement
Last synced: 2 months ago
How Complexity and Uncertainty Grew with Algorithmic Trading
Last synced: 2 months ago
MAXENT3D_PID: An Estimator for the Maximum-Entropy Trivariate Partial Information Decomposition
Last synced: 2 months ago
A Tutorial for Information Theory in Neuroscience
Last synced: 2 months ago

GitHub Events

Total
  • Issues event: 3
  • Watch event: 34
  • Issue comment event: 2
  • Push event: 8
  • Pull request event: 1
  • Fork event: 5
Last Year
  • Issues event: 3
  • Watch event: 34
  • Issue comment event: 2
  • Push event: 8
  • Pull request event: 1
  • Fork event: 5

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 1,800
  • Total Committers: 22
  • Avg Commits per committer: 81.818
  • Development Distribution Score (DDS): 0.354
Past Year
  • Commits: 10
  • Committers: 2
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.1
Top Committers
Name Email Commits
Ryan James r****s@g****m 1,162
chebee7i c****i@g****m 478
Ryan James r****s@r****m 99
Marc Harper m****r@g****m 29
Artemy Kolchinsky a****k@g****m 5
Scott Sievert s****t 5
Ryan Gregory James r****s@g****m 4
Ryan G James r****s@d****u 2
Elias Jaffe 3****e 2
Robin Ince g****b@r****t 2
Making GitHub Delicious i****n@w****o 1
Kunal Marwaha m****a@b****u 1
Harald Schilly h****y@g****m 1
Freya Behrens f****s 1
Diego Volpatto v****o@l****r 1
Aaron Griffith a****i@g****m 1
Pattarawat Chormai p****i@g****m 1
Thomas Kluyver t****l@g****m 1
jemenheiser j****r@u****u 1
kokokostation k****n@g****m 1
tobmag t****s@g****e 1
赵丰 (Zhao Feng) 6****8@q****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 61
  • Total pull requests: 45
  • Average time to close issues: over 1 year
  • Average time to close pull requests: 26 days
  • Total issue authors: 25
  • Total pull request authors: 19
  • Average comments per issue: 1.13
  • Average comments per pull request: 1.16
  • Merged pull requests: 42
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 26 days
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Autoplectic (29)
  • stsievert (4)
  • chebee7i (3)
  • whyihaveyou (2)
  • juancarlosfarah (2)
  • vigneswaran-chandrasekaran (2)
  • Benjamin-Lee (1)
  • ivan-marroquin (1)
  • d-jeon (1)
  • ZBC043 (1)
  • fabioanza (1)
  • alimasn (1)
  • wuhaochen (1)
  • htcml (1)
  • ilongshan (1)
Pull Request Authors
  • Autoplectic (16)
  • artemyk (6)
  • stsievert (5)
  • Ejjaffe (2)
  • robince (2)
  • takluyver (2)
  • volpatto (1)
  • zhaofeng-shu33 (1)
  • haraldschilly (1)
  • kokokostation (1)
  • feeds (1)
  • marwahaha (1)
  • tobmag (1)
  • j1c (1)
  • agrif (1)
Top Labels
Issue Labels
enhancement (16) measure (10) channels (4) PID (3) distribution (1)
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 3,646 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 18
    (may contain duplicates)
  • Total versions: 44
  • Total maintainers: 1
proxy.golang.org: github.com/dit/dit
  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 3.2%
Forks count: 3.2%
Average: 4.3%
Dependent packages count: 5.2%
Dependent repos count: 5.6%
Last synced: 4 months ago
pypi.org: dit

Python package for information theory.

  • Versions: 36
  • Dependent Packages: 0
  • Dependent Repositories: 18
  • Downloads: 3,646 Last month
Rankings
Dependent repos count: 3.4%
Average: 7.0%
Downloads: 7.4%
Dependent packages count: 10.0%
Maintainers (1)
Last synced: 4 months ago
conda-forge.org: dit

Information theory is a powerful extension to probability and statistics, quantifying dependencies among arbitrary random variables in a way that is consistent and comparable across systems and scales. Information theory was originally developed to quantify how quickly and reliably information could be transmitted across an arbitrary channel. The demands of modern, data-driven science have been coopting and extending these quantities and methods into unknown, multivariate settings where the interpretation and best practices are not known. For example, there are at least four reasonable multivariate generalizations of the mutual information, none of which inherit all the interpretations of the standard bivariate case. Which is best to use is context-dependent. dit implements a vast range of multivariate information measures in an effort to allow information practitioners to study how these various measures behave and interact in a variety of contexts. We hope that having all these measures and techniques implemented in one place will allow the development of robust techniques for the automated quantification of dependencies within a system and concrete interpretation of what those dependencies mean.

  • Homepage: http://dit.io
  • License: BSD-3-Clause
  • Latest release: 1.2.3
    published over 6 years ago
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 17.5%
Forks count: 19.2%
Average: 30.5%
Dependent repos count: 34.0%
Dependent packages count: 51.2%
Last synced: 4 months ago