logKDE

logKDE: log-transformed kernel density estimation - Published in JOSS (2018)

https://github.com/andrewthomasjones/logkde

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    1 of 7 committers (14.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Earth and Environmental Sciences Physical Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

Kernel Density Estimates on (0,Inf)

Basic Info
  • Host: GitHub
  • Owner: andrewthomasjones
  • License: gpl-3.0
  • Language: R
  • Default Branch: master
  • Size: 3.23 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Created over 8 years ago · Last pushed over 7 years ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
```

# logKDE: log-transformed kernel density estimation


[![Downloads from the RStudio CRAN mirror](http://cranlogs.r-pkg.org/badges/logKDE)](https://CRAN.R-project.org/package=logKDE)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1339352.svg)](https://doi.org/10.5281/zenodo.1339352)
[![DOI](http://joss.theoj.org/papers/10.21105/joss.00870/status.svg)](https://doi.org/10.21105/joss.00870)
[![Build Status](https://travis-ci.org/andrewthomasjones/logKDE.svg?branch=master)](https://travis-ci.org/andrewthomasjones/logKDE)

The goal of logKDE is to provide a set of functions for kernel density estimation on the positive domain, using log-kernel density functions, for the *R* programming environment. The main functions of the package are the `logdensity` and `logdensity_fft` functions. The choice of functional syntax was made to resemble those of the `density` function, for conducting kernel density estimation on the real domain. The `logdensity` function conducts density estimation, via first principle computations, whereas `logdensity_fft` utilizes fast-Fourier transformation in order to speed up computation. The use of `Rcpp` guarantees that both methods are sufficiently fast for large data scenarios.

Currently, a variety of kernel functions and plugin bandwidth methods are available. By default both `logdensity` and `logdensity_fft` are set to use log-normal kernel functions (`kernel = 'gaussian'`) and Silverman's rule-of-thumb bandwidth, applied to log-transformed data (`bw = 'nrd0'`). However, the following kernels are also available:

- log-Epanechnikov (`kernel = 'epanechnikov'`),
- log-Laplace (`kernel = 'laplace'`),
- log-logistic (`kernel = 'logistic'`),
- log-triangular (`kernel = 'triangular'`),
- log-uniform (`kernel = 'uniform'`).

The following plugin bandwidth methods are also available:

- all of the methods that available for density, applied to log-transformed data (see `?bw.nrd` regarding the options),
- unbiased cross-validated bandwidths in the positive domain (`bw = 'logcv'`),
- a Silverman-type rule-of-thumb that optimizes the kernel density estimator fit, compared to a log-normal density function (`bw = 'logg'`).

The `logdensity` and `logdensity_fft` functions also behave in the same way as `density`, when called within the `plot` function. The usual assortment of commands that apply to `plot` output objects can also be called.

For a comprehensive review of the literature on positive-domain kernel density estimation, thorough descriptions of the mathematics relating to the methods that have been described, simulation results, and example applications of the `logKDE` package, please consult the package vignette. The vignette is available via the command `vignette('logKDE')`, once the package is installed.

## Installation

If `devtools` has already been installed, then the most current build of `logKDE` can be obtained via the command:

```{r GH-install, eval=F}
devtools::install_github('andrewthomasjones/logKDE',build_vignettes = T)
```

The latest stable build of `logKDE` can be obtain from CRAN via the command:

```{r CRAN-install, eval = F}
install.packages("logKDE", repos='http://cran.us.r-project.org')
```

An archival build of `logKDE` is available at https://zenodo.org/record/1317784. Manual installation instructions can be found within the *R* installation and administration manual https://cran.r-project.org/doc/manuals/r-release/R-admin.html.

## Examples

### Example 1

In this example, we demonstrate that `logdensity` has nearly identical syntax to `density`. We also show that the format of the outputs are also nearly identical.

```{r example1}
## Load 'logKDE' library.
library(logKDE)

## Set a random seed.
set.seed(1)

## Generate strictly positive data.
## Data are generated from a chi-squared distribution with 12 degrees of freedom.
x <- rchisq(100,6)

## Construct and print the output of the function 'density'.
density(x)

## Construct and print the output of the function 'logdensity'.
logdensity(x)

## Plot the 'density' output object.
plot(density(x))

## Plot the 'logdensity' output object.
plot(logdensity(x))
```

As a note, one can observe that `density` assigns positive probability to negative values. Since we know that the chi-squared generative model generates only positive values, this is an undesirable result. The log-transformed kernel density estimator that is produced by `logdensity` only assigns positive probability to positive values, and is thus bona fide in this estimation scenario.

---

### Example 2

In this example, we showcase the variety of kernel functions that are available in the package. Here, log-transformed kernel density estimators are constructed using the `logdensity` function.

```{r example2}
## Load 'logKDE' library.
library(logKDE)

## Set a random seed.
set.seed(1)

## Generate strictly positive data.
## Data are generated from a chi-squared distribution with 12 degrees of freedom.
x <- rchisq(100,12)

## Construct a log-KDE using the data, and using each of the available kernel functions.
logKDE1 <- logdensity(x,kernel = 'gaussian',from = 1e-6,to = 30)
logKDE2 <- logdensity(x,kernel = 'epanechnikov',from = 1e-6,to = 30)
logKDE3 <- logdensity(x,kernel = 'laplace',from = 1e-6,to = 30)
logKDE4 <- logdensity(x,kernel = 'logistic',from = 1e-6,to = 30)
logKDE5 <- logdensity(x,kernel = 'triangular',from = 1e-6,to = 30)
logKDE6 <- logdensity(x,kernel = 'uniform',from = 1e-6,to = 30)

## Plot the true probability density function of the generative model.
plot(c(0,30),c(0,0.1),type='n',xlab='x',ylab='Density',main='Example 2')
curve(dchisq(x,12),from = 0,to = 30,add = T)

## Plot each of the log-KDE functions, each in a different rainbow() colour.
lines(logKDE1$x,logKDE1$y,col = rainbow(7)[1])
lines(logKDE2$x,logKDE2$y,col = rainbow(7)[2])
lines(logKDE3$x,logKDE3$y,col = rainbow(7)[3])
lines(logKDE4$x,logKDE4$y,col = rainbow(7)[4])
lines(logKDE5$x,logKDE5$y,col = rainbow(7)[5])
lines(logKDE6$x,logKDE6$y,col = rainbow(7)[6])

## Add a grid for a visual guide.
grid()
```

---

### Example 3

In this example, we show that `logdensity` and `logdensity_ftt` yield nearly identical results. Here, log-transformed kernel density estimators are constructed using the `logdensity_ftt` function.

```{r example3}
## Load 'logKDE' library.
library(logKDE)

## Set a random seed.
set.seed(1)

## Generate strictly positive data.
## Data are generated from a chi-squared distribution with 12 degrees of freedom.
x <- rchisq(100,12)

## Construct a log-KDE using the data, and using each of the available kernel functions.
logKDE1 <- logdensity_fft(x,kernel = 'gaussian',from = 1e-6,to = 30)
logKDE2 <- logdensity_fft(x,kernel = 'epanechnikov',from = 1e-6,to = 30)
logKDE3 <- logdensity_fft(x,kernel = 'laplace',from = 1e-6,to = 30)
logKDE4 <- logdensity_fft(x,kernel = 'logistic',from = 1e-6,to = 30)
logKDE5 <- logdensity_fft(x,kernel = 'triangular',from = 1e-6,to = 30)
logKDE6 <- logdensity_fft(x,kernel = 'uniform',from = 1e-6,to = 30)

## Plot the true probability density function of the generative model.
plot(c(0,30),c(0,0.1),type='n',xlab='x',ylab='Density',main='Example 3')
curve(dchisq(x,12),from = 0,to = 30,add = T)

## Plot each of the log-KDE functions, each in a different rainbow() colour.
lines(logKDE1$x,logKDE1$y,col = rainbow(7)[1])
lines(logKDE2$x,logKDE2$y,col = rainbow(7)[2])
lines(logKDE3$x,logKDE3$y,col = rainbow(7)[3])
lines(logKDE4$x,logKDE4$y,col = rainbow(7)[4])
lines(logKDE5$x,logKDE5$y,col = rainbow(7)[5])
lines(logKDE6$x,logKDE6$y,col = rainbow(7)[6])

## Add a grid for a visual guide.
grid()
```

We observe that the `logdensity_fft` outputs are noticiably smoother than those of `logdensity`. This is because fast Fourier transformations (FFT) only yield kernel density estimates at discrete points, and the regions between these discrete points are approximated via a linear approximator, namely using the `approx` function. This is the same evaluation technique as that which is used in the function `density`. Additionally the FFT approximation points are evenly space on the real line, whereas those used for `logdensity` are evenly spaced on a log scale.

---

### Example 4

In this example, we showcase the variety of plugin bandwidth estimators that are available in the package. Here, log-transformed kernel density estimators are constructed using the `logdensity` function.

```{r example4}
## Load 'logKDE' library.
library(logKDE)

## Set a random seed.
set.seed(1)

## Generate strictly positive data.
## Data are generated from a chi-squared distribution with 12 degrees of freedom.
x <- rchisq(100,12)

## Construct a log-KDE using the data, and using each of the available kernel functions.
logKDE1 <- logdensity(x,bw = 'nrd0',from = 1e-6,to = 30)
logKDE2 <- logdensity(x,bw = 'logcv',from = 1e-6,to = 30)
logKDE3 <- logdensity(x,bw = 'logg',from = 1e-6,to = 30)
logKDE4 <- logdensity(x,bw = 'nrd',from = 1e-6,to = 30)
logKDE5 <- logdensity(x,bw = 'ucv',from = 1e-6,to = 30)
logKDE6 <- logdensity(x,bw = 'bcv',from = 1e-6,to = 30)
logKDE7 <- logdensity(x,bw = 'SJ-ste',from = 1e-6,to = 30)
logKDE8 <- logdensity(x,bw = 'SJ-dpi',from = 1e-6,to = 30)


## Plot the true probability density function of the generative model.
plot(c(0,30),c(0,0.1),type='n',xlab='x',ylab='Density',main='Example 4')
curve(dchisq(x,12),from = 0,to = 30,add = T)

## Plot each of the log-KDE functions with different choices of bandwidth, each in a different rainbow() colour.
lines(logKDE1$x,logKDE1$y,col = rainbow(9)[1])
lines(logKDE2$x,logKDE2$y,col = rainbow(9)[2])
lines(logKDE3$x,logKDE3$y,col = rainbow(9)[3])
lines(logKDE4$x,logKDE4$y,col = rainbow(9)[4])
lines(logKDE5$x,logKDE5$y,col = rainbow(9)[5])
lines(logKDE6$x,logKDE6$y,col = rainbow(9)[6])
lines(logKDE7$x,logKDE7$y,col = rainbow(9)[7])
lines(logKDE8$x,logKDE8$y,col = rainbow(9)[8])

## Add a grid for a visual guide.
grid()
```

## Unit testing

Using the package `testthat`, we have conducted the following unit test for the GitHub build, on the date: `r format(Sys.time(), '%d %B, %Y')`. The testing files are contained in the [tests](https://github.com/andrewthomasjones/logKDE/tree/master/tests) folder of the respository.

```{r unittest}


## Load 'logKDE' library.
library(logKDE)

## Load 'testthat' library.
library(testthat)

## Test 'logKDE'.
test_package('logKDE')
```

## Bug reporting and contributions

Thank you for your interest in `logKDE`. If you happen to find any bugs in the program, then please report them on the Issues page (https://github.com/andrewthomasjones/logKDE/issues). Support can also be sought on this page. Furthermore, if you would like to make a contribution to the software, then please forward a pull request to the owner of the repository.

Owner

  • Name: Andrew Jones
  • Login: andrewthomasjones
  • Kind: user
  • Location: Australia

Stats, ML, R, C++

JOSS Publication

logKDE: log-transformed kernel density estimation
Published
August 06, 2018
Volume 3, Issue 28, Page 870
Authors
Andrew T. Jones
School of Mathematics and Physics, University of Queensland, St. Lucia 4072, Queensland Australia
Hien D. Nguyen ORCID
Department of Mathematics and Statistics, La Trobe University, Bundoora 3086, Victoria Australia
Geoffrey J. McLachlan
School of Mathematics and Physics, University of Queensland, St. Lucia 4072, Queensland Australia
Editor
Arfon Smith ORCID
Tags
data visualization exploratory data analysis non-parametric positive data probability density function

GitHub Events

Total
Last Year

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 96
  • Total Committers: 7
  • Avg Commits per committer: 13.714
  • Development Distribution Score (DDS): 0.594
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Andrew Thomas Jones a****s@g****m 39
Hien h****8@g****m 35
Andrew Jones a****s@A****l 13
Andrew Jones a****s@1****u 4
Andrew Jones u****4@1****u 3
Andrew Jones u****4@d****u 1
Andrew Jones a****s@1****u 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 5
  • Total pull requests: 0
  • Average time to close issues: 5 days
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 1.4
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • strengejacke (3)
  • wanghe0127 (1)
  • yoavram (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • Rcpp * imports
  • pracma * imports
  • R.rsp * suggests
  • testthat * suggests