bayes

A simple implementation of Naive Bayesian classifier

https://github.com/gnames/bayes

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A simple implementation of Naive Bayesian classifier

Basic Info
  • Host: GitHub
  • Owner: gnames
  • License: mit
  • Language: Go
  • Default Branch: master
  • Size: 65.4 KB
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 0
  • Open Issues: 1
  • Releases: 2
Created over 8 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License Citation

README.md

bayes Build Status Doc Status

DOI

An implementation of Naive Bayes classifier. More details are in docs.

Usage

This package allows to classify a new entity into one or another category (class) according to features of the entity. The algorithm uses known data to calculate a weight of each feature for each category.

```go func Example() { // there are two jars of cookies, they are our training set. // Cookies have be round or star-shaped. // There are plain or chocolate chips cookies. jar1 := ft.Class("Jar1") jar2 := ft.Class("Jar2")

// Every preclassified feature-set provides data for one cookie. It tells
// what jar has the cookie, what its kind and shape.
cookie1 := ft.ClassFeatures{
    Class: jar1,
    Features: []ft.Feature{
        {Name: "kind", Value: "plain"},
        {Name: "shape", Value: "round"},
    },
}
cookie2 := ft.ClassFeatures{
    Class: jar1,
    Features: []ft.Feature{
        {Name: "kind", Value: "plain"},
        {Name: "shape", Value: "star"},
    },
}
cookie3 := ft.ClassFeatures{
    Class: jar1,
    Features: []ft.Feature{
        {Name: "kind", Value: "chocolate"},
        {Name: "shape", Value: "star"},
    },
}
cookie4 := ft.ClassFeatures{
    Class: jar1,
    Features: []ft.Feature{
        {Name: "kind", Value: "plain"},
        {Name: "shape", Value: "round"},
    },
}
cookie5 := ft.ClassFeatures{
    Class: jar1,
    Features: []ft.Feature{
        {Name: "kind", Value: "plain"},
        {Name: "shape", Value: "round"},
    },
}
cookie6 := ft.ClassFeatures{
    Class: jar2,
    Features: []ft.Feature{
        {Name: "kind", Value: "chocolate"},
        {Name: "shape", Value: "star"},
    },
}
cookie7 := ft.ClassFeatures{
    Class: jar2,
    Features: []ft.Feature{
        {Name: "kind", Value: "chocolate"},
        {Name: "shape", Value: "star"},
    },
}
cookie8 := ft.ClassFeatures{
    Class: jar2,
    Features: []ft.Feature{
        {Name: "kind", Value: "chocolate"},
        {Name: "shape", Value: "star"},
    },
}

lfs := []ft.ClassFeatures{
    cookie1, cookie2, cookie3, cookie4, cookie5, cookie6, cookie7, cookie8,
}

nb := bayes.New()
nb.Train(lfs)
oddsPrior, err := nb.PriorOdds(jar1)
if err != nil {
    log.Println(err)
}

// If we got a chocolate star-shaped cookie, which jar it came from most
// likely?
aCookie := []ft.Feature{
    {Name: ft.Name("kind"), Value: ft.Value("chocolate")},
    {Name: ft.Name("shape"), Value: ft.Value("star")},
}

res, err := nb.PosteriorOdds(aCookie)
if err != nil {
    fmt.Println(err)
}

// it is more likely to that a random cookie comes from Jar1, but
// for chocolate and star-shaped cookie it is more likely to come from
// Jar2.
fmt.Printf("Prior odds for Jar1 are %0.2f\n", oddsPrior)
fmt.Printf("The cookie came from %s, with odds %0.2f\n", res.MaxClass, res.MaxOdds)
// Output:
// Prior odds for Jar1 are 1.67
// The cookie came from Jar2, with odds 7.50

} ```

Development

Testing

bash go test

Other implementations:

Go, Java, Python, R, Ruby

Owner

  • Name: gnames
  • Login: gnames
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "Bayes -- a Global Names library for Naive Bayes algorithm."
date-released: 2024-12-02
version: v0.5.2
authors:
  - family-names: "Mozzherin"
    given-names: "Dmitry"
    orcid: "https://orcid.org/0000-0003-1593-1417"
repository-code: "https://github.com/gnames/bayes"
doi: 10.5281/zenodo.14262610
license: MIT

GitHub Events

Total
  • Release event: 1
  • Push event: 4
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 4
  • Create event: 1

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 30
  • Total Committers: 1
  • Avg Commits per committer: 30.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Dmitry Mozzherin d****n@g****m 30

Issues and Pull Requests

Last synced: over 2 years ago

All Time
  • Total issues: 17
  • Total pull requests: 0
  • Average time to close issues: 1 day
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 0.06
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: about 1 hour
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dimus (16)
  • mjy (1)
Pull Request Authors
Top Labels
Issue Labels
duplicate (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 6
  • Total dependent repositories: 9
  • Total versions: 12
proxy.golang.org: github.com/gnames/bayes

Package bayes implements Naive Bayes trainer and classifier. Code is located at https://github.com/gnames/bayes Naive Bayes rule calculates a probability of a hypothesis from a prior knowledge about the hypothesis, as well as the evidence that supports or diminishes the probability of the hypothesis. Prior knowledge can dramatically influence the posterior probability of a hypothesis. For example assuming that an adult bird that cannot fly is a penguin is very unlikely in the northern hemisphere, but is very likely in Antarctica. Bayes' theorem is often depicted as where H is our hypothesis, E is a new evidence, P(H) is a prior probability of H to be true, P(E|H) is a known probability for the evidence when H is true, P(E) is a known probability of E in all known cases. P(H|E) is a posterior probability of a hypothesis H adjusted accordingly to the new evidence E. Finding a probability that a hypothesis is true can be considered a classification event. Given prior knowledge and a new evidence we are able to classify an entity to a hypothesis that has the highest posterior probability. It is possible to represent Bayes theorem using odds. Odds describe how likely a hypothesis is in comparison to all other possible hypotheses. Using odds allows us to simplify Bayes calculations where likelihood is P(E|H') in this case is a known probability of an evidence when H is not true. In case if we have several evidences that are independent from each other, posterior odds can be calculated as a product of prior odds and all likelihoods of all given evidences. Each subsequent evidence modifies prior odds. If evidences are not independent (for example inability to fly and a propensity to nesting on the ground for birds) they skew the outcome. In reality given evidences are quite often not completely independent. Because of that Naive Bayes got its name. People who apply it "naively" state that their evidences are completely independent from each other. In practice Naive Bayes approach often shows good results in spite of this known fallacy. It is quite possible that while likelihoods of evidences are representative for classification data the prior odds from the training are not. As in the previous example an evidence that a bird cannot fly supports a 'penguin' hypothesis much better in Antarctica because odds to meet a penguin there are much higher than in the northern hemisphere. Therefore we give an ability to supply prior probability value at a classification event. In natural language processing `evidences` are often called `features`. We follow the same convention in this package. Hypotheses are often called classes. Based on the outcome we classify an entity (assign a class to the entity in other words). Every class receives a number of elements or `tokens`, each with a set of features.

  • Versions: 12
  • Dependent Packages: 6
  • Dependent Repositories: 9
Rankings
Dependent repos count: 1.7%
Dependent packages count: 2.2%
Average: 12.0%
Forks count: 18.7%
Stargazers count: 25.2%
Last synced: 7 months ago

Dependencies

go.mod go
  • github.com/davecgh/go-spew v1.1.1
  • github.com/gnames/gnfmt v0.2.0
  • github.com/json-iterator/go v1.1.10
  • github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421
  • github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742
  • github.com/pmezard/go-difflib v1.0.0
  • github.com/stretchr/testify v1.7.0
  • gopkg.in/yaml.v3 v3.0.0-20200605160147-a5ece683394c
go.sum go
  • github.com/davecgh/go-spew v1.1.0
  • github.com/davecgh/go-spew v1.1.1
  • github.com/gnames/gnfmt v0.2.0
  • github.com/google/gofuzz v1.0.0
  • github.com/json-iterator/go v1.1.10
  • github.com/matryer/is v1.4.0
  • github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421
  • github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742
  • github.com/pmezard/go-difflib v1.0.0
  • github.com/stretchr/objx v0.1.0
  • github.com/stretchr/testify v1.3.0
  • github.com/stretchr/testify v1.7.0
  • gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405
  • gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c
  • gopkg.in/yaml.v3 v3.0.0-20200605160147-a5ece683394c