naivebayes
High performance implementation of the Naive Bayes algorithm in R
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (19.5%) to scientific vocabulary
Keywords
classification-model
datascience
machine-learning
naive-bayes
r
r-package
Last synced: 6 months ago
·
JSON representation
Repository
High performance implementation of the Naive Bayes algorithm in R
Basic Info
- Host: GitHub
- Owner: majkamichal
- License: gpl-2.0
- Language: R
- Default Branch: master
- Homepage: https://majkamichal.github.io/naivebayes/
- Size: 24.1 MB
Statistics
- Stars: 37
- Watchers: 3
- Forks: 5
- Open Issues: 0
- Releases: 2
Topics
classification-model
datascience
machine-learning
naive-bayes
r
r-package
Created over 8 years ago
· Last pushed about 1 year ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/"
)
```
Extended documentation can be found on the website: https://majkamichal.github.io/naivebayes/
# Naïve Bayes
[](https://www.buymeacoffee.com/michalmajka)
[](https://cran.r-project.org/package=naivebayes)
[](http://cran.rstudio.com/web/packages/naivebayes/index.html)
[](https://cran.r-project.org/package=naivebayes)[](https://anaconda.org/conda-forge/r-naivebayes/)
[](https://anaconda.org/conda-forge/r-naivebayes/)
## 1. Overview
The `naivebayes` package presents an efficient implementation of the widely-used Naïve Bayes classifier. It upholds three core principles: efficiency, user-friendliness, and reliance solely on Base `R`. By adhering to the latter principle, the package ensures stability and reliability without introducing external dependencies^[Specialized Naïve Bayes functions within the package may optionally utilize sparse matrices if the Matrix package is installed. However, the Matrix package is not a dependency, and users are not required to install or use it.]. This design choice maintains efficiency by leveraging the optimized routines inherent in Base `R`, many of which are programmed in high-performance languages like `C/C++` or `FORTRAN`. By following these principles, the `naivebayes` package provides a reliable and efficient tool for Naïve Bayes classification tasks, ensuring that users can perform their analyses effectively and with ease.
The `naive_bayes()` function is designed to determine the class of each feature in a dataset, and depending on user specifications, it can assume various distributions for each feature. It currently supports the following class conditional distributions:
- categorical distribution for discrete features (with Bernoulli distribution as a special case for binary outcomes)
- Poisson distribution for non-negative integer features
- Gaussian distribution for continuous features
- non-parametrically estimated densities via Kernel Density Estimation for continuous features
In addition to that specialized functions are available which implement:
- Bernoulli Naive Bayes via `bernoulli_naive_bayes()`
- Multinomial Naive Bayes via `multinomial_naive_bayes()`
- Poisson Naive Bayes via `poisson_naive_bayes()`
- Gaussian Naive Bayes via `gaussian_naive_bayes()`
- Non-Parametric Naive Bayes via `nonparametric_naive_bayes()`
These specialized functions are carefully optimized for efficiency, utilizing linear algebra operations to excel when handling dense matrices. Additionally, they can also exploit *sparsity* of matrices for enhanced performance and work in presence of missing data. The package also includes various helper functions to improve user experience. Moreover, users can access the general `naive_bayes()` function through the excellent `Caret` package, providing additional versatility.
## 2. Installation
The `naivebayes` package can be installed from the `CRAN` repository by simply executing in the console the following line:
```{r, eval = FALSE}
install.packages("naivebayes")
# Or the the development version from GitHub:
devtools::install_github("majkamichal/naivebayes")
```
## 3. Usage
The `naivebayes` package provides a user friendly implementation of the Naïve Bayes algorithm via formula interlace and classical combination of the matrix/data.frame containing the features and a vector with the class labels. All functions can recognize missing values, give an informative warning and more importantly - they know how to handle them. In following the basic usage of the main function `naive_bayes()` is demonstrated. Examples with the specialized Naive Bayes classifiers can be found in the extended documentation: https://majkamichal.github.io/naivebayes/ in [this article](https://majkamichal.github.io/naivebayes/articles/specialized_naive_bayes.html).
### 3.1 Example data
```{r data}
library(naivebayes)
# Simulate example data
n <- 100
set.seed(1)
data <- data.frame(class = sample(c("classA", "classB"), n, TRUE),
bern = sample(LETTERS[1:2], n, TRUE),
cat = sample(letters[1:3], n, TRUE),
logical = sample(c(TRUE,FALSE), n, TRUE),
norm = rnorm(n),
count = rpois(n, lambda = c(5,15)))
train <- data[1:95, ]
test <- data[96:100, -1]
```
### 3.2 Formula interface
```{r formula_interface}
nb <- naive_bayes(class ~ ., train)
summary(nb)
# Classification
predict(nb, test, type = "class")
nb %class% test
# Posterior probabilities
predict(nb, test, type = "prob")
nb %prob% test
# Helper functions
tables(nb, 1)
get_cond_dist(nb)
# Note: all "numeric" (integer, double) variables are modelled
# with Gaussian distribution by default.
```
### 3.3 Matrix/data.frame and class vector
```{r generalusage}
X <- train[-1]
class <- train$class
nb2 <- naive_bayes(x = X, y = class)
nb2 %prob% test
```
### 3.4 Non-parametric estimation for continuous features
Kernel density estimation can be used to estimate class conditional densities of continuous features. It has to be explicitly requested via the parameter `usekernel=TRUE` otherwise Gaussian distribution will be assumed. The estimation is performed with the built in `R` function `density()`. By default, Gaussian smoothing kernel and Silverman's rule of thumb as bandwidth selector are used:
```{r kde}
nb_kde <- naive_bayes(class ~ ., train, usekernel = TRUE)
summary(nb_kde)
get_cond_dist(nb_kde)
nb_kde %prob% test
# Class conditional densities
plot(nb_kde, "norm", arg.num = list(legend.cex = 0.9), prob = "conditional")
# Marginal densities
plot(nb_kde, "norm", arg.num = list(legend.cex = 0.9), prob = "marginal")
```
#### 3.4.1 Changing kernel
In general, there are 7 different smoothing kernels available:
- `gaussian`
- `epanechnikov`
- `rectangular`
- `triangular`
- `biweight`
- `cosine`
- `optcosine`
and they can be specified in `naive_bayes()` via parameter additional parameter `kernel`. Gaussian kernel is the default smoothing kernel. Please see `density()` and `bw.nrd()` for further details.
```{r kde_kernel}
# Change Gaussian kernel to biweight kernel
nb_kde_biweight <- naive_bayes(class ~ ., train, usekernel = TRUE,
kernel = "biweight")
nb_kde_biweight %prob% test
plot(nb_kde_biweight, "norm", arg.num = list(legend.cex = 0.9), prob = "conditional")
```
#### 3.4.2 Changing bandwidth selector
The `density()` function offers 5 different bandwidth selectors, which can be specified via `bw` parameter:
- `nrd0` (Silverman's rule-of-thumb)
- `nrd` (variation of the rule-of-thumb)
- `ucv` (unbiased cross-validation)
- `bcv` (biased cross-validation)
- `SJ` (Sheather & Jones method)
```{r kde_bw}
nb_kde_SJ <- naive_bayes(class ~ ., train, usekernel = TRUE,
bw = "SJ")
nb_kde_SJ %prob% test
plot(nb_kde_SJ, "norm", arg.num = list(legend.cex = 0.9), prob = "conditional")
```
#### 3.4.3 Adjusting bandwidth
The parameter `adjust` allows to rescale the estimated bandwidth and thus introduces more flexibility to the estimation process. For values below 1 (no rescaling; default setting) the density becomes "wigglier" and for values above 1 the density tends to be "smoother":
```{r kde_adjust}
nb_kde_adjust <- naive_bayes(class ~ ., train, usekernel = TRUE,
adjust = 0.5)
nb_kde_adjust %prob% test
plot(nb_kde_adjust, "norm", arg.num = list(legend.cex = 0.9), prob = "conditional")
```
### 3.5 Model non-negative integers with Poisson distribution
Class conditional distributions of non-negative integer predictors can be modelled with Poisson distribution. This can be achieved by setting `usepoisson=TRUE` in the `naive_bayes()` function and by making sure that the variables representing counts in the dataset are of class `integer`.
```{r poisson}
is.integer(train$count)
nb_pois <- naive_bayes(class ~ ., train, usepoisson = TRUE)
summary(nb_pois)
get_cond_dist(nb_pois)
nb_pois %prob% test
# Class conditional distributions
plot(nb_pois, "count", prob = "conditional")
# Marginal distributions
plot(nb_pois, "count", prob = "marginal")
```
Owner
- Name: Michal Majka
- Login: majkamichal
- Kind: user
- Location: Vienna
- Website: https://twitter.com/majkamichal
- Repositories: 2
- Profile: https://github.com/majkamichal
Stats @univienna @DukeU and @ErsteGroup www.stackoverflow.com/users/4956364/michal-majka
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Michal Majka | m****a@h****m | 236 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 6
- Total pull requests: 3
- Average time to close issues: about 2 months
- Average time to close pull requests: 28 days
- Total issue authors: 6
- Total pull request authors: 3
- Average comments per issue: 1.33
- Average comments per pull request: 0.67
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- kohlkopf (1)
- willtownes (1)
- markvanderloo (1)
- hyaochn (1)
- AndreM84 (1)
- BharatAD (1)
Pull Request Authors
- majkamichal (1)
- willtownes (1)
- colin-fraser (1)
Top Labels
Issue Labels
bug (3)
help wanted (2)
wontfix (1)
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- cran 3,618 last-month
- Total docker downloads: 21,870
-
Total dependent packages: 10
(may contain duplicates) -
Total dependent repositories: 15
(may contain duplicates) - Total versions: 11
- Total maintainers: 1
cran.r-project.org: naivebayes
High Performance Implementation of the Naive Bayes Algorithm
- Homepage: https://github.com/majkamichal/naivebayes
- Documentation: http://cran.r-project.org/web/packages/naivebayes/naivebayes.pdf
- License: GPL-2
-
Latest release: 1.0.0
published almost 2 years ago
Rankings
Dependent packages count: 5.3%
Downloads: 5.5%
Dependent repos count: 8.0%
Stargazers count: 9.0%
Average: 10.5%
Forks count: 10.8%
Docker downloads count: 24.2%
Maintainers (1)
Last synced:
6 months ago
conda-forge.org: r-naivebayes
- Homepage: https://github.com/majkamichal/naivebayes, https://majkamichal.github.io/naivebayes/
- License: GPL-2
-
Latest release: 0.9.7
published almost 6 years ago
Rankings
Dependent repos count: 20.2%
Average: 42.5%
Stargazers count: 46.1%
Dependent packages count: 51.6%
Forks count: 52.2%
Last synced:
7 months ago
Dependencies
DESCRIPTION
cran
- Matrix * suggests
- knitr * suggests