wnb

General (mixed) and weighted naive Bayes classifiers.

https://github.com/msamsami/wnb

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.9%) to scientific vocabulary

Keywords

machine-learning ml naive-bayes python
Last synced: 6 months ago · JSON representation ·

Repository

General (mixed) and weighted naive Bayes classifiers.

Basic Info
  • Host: GitHub
  • Owner: msamsami
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 555 KB
Statistics
  • Stars: 22
  • Watchers: 1
  • Forks: 1
  • Open Issues: 1
  • Releases: 36
Topics
machine-learning ml naive-bayes python
Created over 4 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation Security

README.md

wnb logo
General and weighted naive Bayes classifiers
Scikit-learn-compatible

![Lastest Release](https://img.shields.io/badge/release-v0.8.1-green) [![PyPI Version](https://img.shields.io/pypi/v/wnb)](https://pypi.org/project/wnb/) ![Python Versions](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)
![GitHub Workflow Status (build)](https://github.com/msamsami/wnb/actions/workflows/build.yml/badge.svg) [![Coverage](https://codecov.io/gh/msamsami/wnb/graph/badge.svg?token=74EIO9XQUY)](https://codecov.io/gh/msamsami/wnb) ![PyPI License](https://img.shields.io/pypi/l/wnb) [![PyPi Downloads](https://static.pepy.tech/badge/wnb)](https://pepy.tech/project/wnb)

Introduction

Naive Bayes is a widely used classification algorithm known for its simplicity and efficiency. This package takes naive Bayes to a higher level by providing more flexible and weighted variants, making it suitable for a broader range of applications.

General naive Bayes

Most standard implementations, such as those in sklearn.naive_bayes, assume a single distribution type for all feature likelihoods. This can be restrictive when dealing with mixed data types. WNB overcomes this limitation by allowing users to specify different probability distributions for each feature individually. You can select from a variety of continuous and discrete distributions, enabling greater customization and improved model performance.

Weighted naive Bayes

While naive Bayes is simple and interpretable, its conditional independence assumption often fails in real-world scenarios. To address this, various attribute-weighted naive Bayes methods exist, but most are computationally expensive and lack mechanisms for handling class imbalance.

WNB package provides an optimized implementation of Minimum Log-likelihood Difference Wighted Naive Bayes (MLD-WNB), a novel approach that optimizes feature weights using the Bayes optimal decision rule. It also introduces hyperparameters for controlling model bias, making it more robust for imbalanced classification.

Installation

This library is shipped as an all-in-one module implementation with minimalistic dependencies and requirements. Furthermore, it fully adheres to Scikit-learn API ❤️.

Prerequisites

Ensure that Python 3.8 or higher is installed on your machine before installing WNB.

PyPi

bash pip install wnb

uv

bash uv add wnb

Getting started ⚡️

Here, we show how you can use the library to train general (mixed) and weighted naive Bayes classifiers.

General naive Bayes

A general naive Bayes model can be set up and used in four simple steps:

  1. Import the GeneralNB class as well as Distribution enum class python from wnb import GeneralNB, Distribution as D

  2. Initialize a classifier with likelihood distributions specified python clf = GeneralNB([D.NORMAL, D.CATEGORICAL, D.EXPONENTIAL, D.EXPONENTIAL]) or ```python

    Columns not explicitly specified will default to Gaussian (normal) distribution

    clf = GeneralNB( distributions=[ (D.CATEGORICAL, [1]), (D.EXPONENTIAL, ["col3", "col4"]), ], ) ```

  3. Fit the classifier to a training set (with four features) python clf.fit(X_train, y_train)

  4. Predict on test data python clf.predict(X_test)

Weighted naive Bayes

An MLD-WNB model can be set up and used in four simple steps:

  1. Import the GaussianWNB class python from wnb import GaussianWNB

  2. Initialize a classifier python clf = GaussianWNB(max_iter=25, step_size=1e-2, penalty="l2")

  3. Fit the classifier to a training set python clf.fit(X_train, y_train)

  4. Predict on test data python clf.predict(X_test)

Compatibility with Scikit-learn 🤝

The wnb library fully adheres to the Scikit-learn API, ensuring seamless integration with other Scikit-learn components and workflows. This means that users familiar with Scikit-learn will find the WNB classifiers intuitive to use.

Both Scikit-learn classifiers and WNB classifiers share these well-known methods:

  • fit(X, y)
  • predict(X)
  • predict_proba(X)
  • predict_log_proba(X)
  • predict_joint_log_proba(X)
  • score(X, y)
  • get_params()
  • set_params(**params)
  • etc.

By maintaining this consistency, WNB classifiers can be easily incorporated into existing machine learning pipelines and processes.

Benchmarks 📊

We conducted benchmarks on four datasets, Wine, Iris, Digits, and Breast Cancer, to evaluate the performance of WNB classifiers and compare them with their Scikit-learn counterpart, GaussianNB. The results show that WNB classifiers generally perform better in certain cases.

| Dataset | Scikit-learn Classifier | Accuracy | WNB Classifier | Accuracy | |------------------|-------------------------|----------|----------------|-----------| | Wine | GaussianNB | 0.9749 | GeneralNB | 0.9812 | | Iris | GaussianNB | 0.9556 | GeneralNB | 0.9602 | | Digits | GaussianNB | 0.8372 | GeneralNB | 0.8905 | | Breast Cancer | GaussianNB | 0.9389 | GaussianWNB | 0.9519 |

These benchmarks highlight the potential of WNB classifiers to provide better performance in certain scenarios by allowing more flexibility in the choice of distributions and incorporating weighting strategies.

The scripts used to generate these benchmark results are available in the benchmarks/ directory.

Support us 💡

You can support the project in the following ways:

⭐ Star WNB on GitHub (click the star button in the top right corner)

💡 Provide your feedback or propose ideas in the Issues section

📰 Post about WNB on LinkedIn or other platforms

Citation 📚

If you utilize this repository, please consider citing it with:

@misc{wnb, author = {Mohammd Mehdi Samsami}, title = {WNB: General and weighted naive Bayes classifiers}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/msamsami/wnb}}, }

Owner

  • Name: Mehdi Samsami
  • Login: msamsami
  • Kind: user
  • Company: Data Scientist

BSc & MSc in ECE

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: WNB
message: >-
  Python library for the implementations of general and
  weighted naive Bayes classifiers.
type: software
authors:
  - given-names: Mehdi
    family-names: Samsami
    email: mehdisamsami@live.com
repository-code: 'https://github.com/msamsami/wnb'
keywords:
  - python
  - machine learning
  - bayes
  - naive bayes
  - classifier
license: BSD-2-Clause

GitHub Events

Total
  • Release event: 5
  • Watch event: 4
  • Delete event: 18
  • Issue comment event: 1
  • Push event: 36
  • Pull request event: 40
  • Create event: 27
Last Year
  • Release event: 5
  • Watch event: 4
  • Delete event: 18
  • Issue comment event: 1
  • Push event: 36
  • Pull request event: 40
  • Create event: 27

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 17
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 hours
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 17
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 hours
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • msamsami (1)
Pull Request Authors
  • msamsami (21)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels
enhancement (9) bug (7) internal (4) documentation (4) dependencies (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 46 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 27
  • Total maintainers: 1
pypi.org: wnb

Python library for the implementations of general and weighted naive Bayes (WNB) classifiers.

  • Versions: 27
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 46 Last month
Rankings
Dependent packages count: 7.3%
Forks count: 23.1%
Average: 24.3%
Stargazers count: 25.5%
Dependent repos count: 41.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • pandas ==1.4.1
  • scikit-learn ==1.0.2
  • scipy ==1.8.0
setup.py pypi
  • pandas ==1.4.1
  • scikit-learn ==1.0.2