hnet

Association ruled based networks using graphical Hypergeometric Networks.

https://github.com/erdogant/hnet

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary

Keywords

association association-analysis association-learning d3-visualization d3js exploratory-data-analysis graph hypergeometric-distribution inference network network-analysis
Last synced: 6 months ago · JSON representation

Repository

Association ruled based networks using graphical Hypergeometric Networks.

Basic Info
Statistics
  • Stars: 31
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 37
Topics
association association-analysis association-learning d3-visualization d3js exploratory-data-analysis graph hypergeometric-distribution inference network network-analysis
Created about 6 years ago · Last pushed 6 months ago
Metadata Files
Readme Funding License Citation

README.md

Python Pypi Docs LOC Downloads Downloads License Forks Issues Project Status DOI Medium Colab Donate <!---BuyMeCoffee--> <!---Coffee-->

hnet is a Python package for probability density fitting of univariate distributions for random variables. The hnet library can determine the best fit for over 90 theoretical distributions. The goodness-of-fit test is used to score for the best fit and after finding the best-fitted theoretical distribution, the loc, scale, and arg parameters are returned. It can be used for parametric, non-parametric, and discrete distributions. ⭐️Star it if you like it⭐️

Key Features

| Feature | Description | Docs | Medium | Gumroad+Podcast| |---------|-------------|------|------|-------| | Association Learning | Discover significant associations across variables using statistical inference. | Link | Link | Link | | Mixed Data Handling | Works with continuous, discrete, categorical, and nested variables without heavy preprocessing. | Link | - | - | | Summarization | Summarize complex networks into interpretable structures. | Link | - | - | | Feature Importance | Rank variables by importance within associations. | Link | - | - | | Interactive Visualizations | Explore results with dynamic dashboards and d3-based visualizations. | Dashboard | - | Titanic Example | | Performance Evaluation | Compare accuracy with Bayesian association learning and benchmarks. | Link | - | - | | Interactive Dashboard | No data leaves your machine. All computations are performed locally. | Link | - | - |


Resources and Links


Background

  • HNet stands for graphical Hypergeometric Networks, which is a method where associations across variables are tested for significance by statistical inference. The aim is to determine a network with significant associations that can shed light on the complex relationships across variables. Input datasets can range from generic dataframes to nested data structures with lists, missing values and enumerations.

  • Real-world data often contain measurements with both continuous and discrete values. Despite the availability of many libraries, data sets with mixed data types require intensive pre-processing steps, and it remains a challenge to describe the relationships between variables. The data understanding phase is crucial to the data-mining process, however, without making any assumptions on the data, the search space is super-exponential in the number of variables. A thorough data understanding phase is therefore not common practice.

  • Graphical hypergeometric networks (HNet), a method to test associations across variables for significance using statistical inference. The aim is to determine a network using only the significant associations in order to shed light on the complex relationships across variables. HNet processes raw unstructured data sets and outputs a network that consists of (partially) directed or undirected edges between the nodes (i.e., variables). To evaluate the accuracy of HNet, we used well known data sets and generated data sets with known ground truth. In addition, the performance of HNet is compared to Bayesian association learning.

  • HNet showed high accuracy and performance in the detection of node links. In the case of the Alarm data set we can demonstrate on average an MCC score of 0.33 + 0.0002 (P<1x10-6), whereas Bayesian association learning resulted in an average MCC score of 0.52 + 0.006 (P<1x10-11), and randomly assigning edges resulted in a MCC score of 0.004 + 0.0003 (P=0.49). HNet overcomes processes raw unstructured data sets, it allows analysis of mixed data types, it easily scales up in number of variables, and allows detailed examination of the detected associations.


Installation

Install hnet from PyPI

bash pip install hnet

Install from Github source

bash pip install git+https://github.com/erdogant/hnet

Imort Library

```python import hnet print(hnet.version)

Import library

from hnet import hnet ```


Installation

  • Install hnet from PyPI (recommended).

bash pip install -U hnet

Examples

  • Simple example for the Titanic data set

```python

Initialize hnet with default settings

from hnet import hnet

Load example dataset

df = hnet.import_example('titanic')

Print to screen

print(df) ```

#      PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
# 0              1         0       3  ...   7.2500   NaN         S
# 1              2         1       1  ...  71.2833   C85         C
# 2              3         1       3  ...   7.9250   NaN         S
# 3              4         1       1  ...  53.1000  C123         S
# 4              5         0       3  ...   8.0500   NaN         S
# ..           ...       ...     ...  ...      ...   ...       ...
# 886          887         0       2  ...  13.0000   NaN         S
# 887          888         1       1  ...  30.0000   B42         S
# 888          889         0       3  ...  23.4500   NaN         S
# 889          890         1       1  ...  30.0000  C148         C
# 890          891         0       3  ...   7.7500   NaN         Q
Play with the interactive Titanic results.

Example: Learn association learning on the titanic dataset

Example: Summarize results

Networks can become giant hairballs and heatmaps unreadable. You may want to see the general associations between the categories, instead of the label-associations. With the summarize functionality, the results will be summarized towards categories.

Example: Feature importance

Performance


Contributors

Maintainer

  • Erdogan Taskesen, github: erdogant
  • Contributions are welcome.
  • This library is free. But powered by caffeine! Like it? Chip in what it's worth, and keep me creating new functionalities!🙂

Buy me a coffee

Owner

  • Name: Erdogan
  • Login: erdogant
  • Kind: user
  • Location: Den Haag

Machine Learning | Statistics | Bayesian | D3js | Visualizations

GitHub Events

Total
  • Release event: 1
  • Watch event: 2
  • Push event: 16
Last Year
  • Release event: 1
  • Watch event: 2
  • Push event: 16

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 430
  • Total Committers: 3
  • Avg Commits per committer: 143.333
  • Development Distribution Score (DDS): 0.007
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
erdogant e****t@g****m 427
Erdogan Taskesen 5****a 2
Oliver Verver o****r@s****l 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 1
  • Average time to close issues: 4 months
  • Average time to close pull requests: 6 days
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 5.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • nsankar (1)
  • bdatko (1)
Pull Request Authors
  • oliver3 (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 85 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 30
  • Total maintainers: 1
pypi.org: hnet

Graphical Hypergeometric Networks

  • Versions: 30
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 85 Last month
Rankings
Dependent packages count: 10.0%
Downloads: 11.1%
Stargazers count: 13.3%
Average: 15.1%
Forks count: 19.1%
Dependent repos count: 21.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/source/requirements.txt pypi
  • pipinstallsphinx_rtd_theme *
requirements-dev.txt pypi
  • pytest * development
  • rst2pdf * development
  • sphinx * development
  • sphinx_rtd_theme * development
  • spyder-kernels * development
requirements.txt pypi
  • classeval *
  • colourmap *
  • d3graph ==
  • df2onehot *
  • fsspec *
  • imagesc *
  • ismember *
  • matplotlib *
  • networkx *
  • numpy *
  • pandas *
  • pypickle *
  • python-louvain *
  • seaborn *
  • sklearn *
  • statsmodels *
  • tqdm *
  • wget *
setup.py pypi
  • classeval *
  • colourmap *
  • d3graph ==1.0.3
  • d3heatmap *
  • df2onehot *
  • fsspec *
  • imagesc *
  • ismember *
  • matplotlib *
  • networkx *
  • numpy *
  • pandas *
  • pypickle *
  • python-louvain *
  • sklearn *
  • statsmodels *
  • tqdm *
  • wget *
.github/workflows/codeql-analysis.yml actions
  • actions/checkout v2 composite
  • github/codeql-action/analyze v1 composite
  • github/codeql-action/autobuild v1 composite
  • github/codeql-action/init v1 composite