kdensity

kdensity: An R package for kernel density estimation with parametric starts and asymmetric kernels - Published in JOSS (2019)

https://github.com/jonasmoss/kdensity

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: sciencedirect.com, springer.com, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

asymmetric-kernels density-estimation kernel-density-estimation non-parametric

Scientific Fields

Engineering Computer Science - 60% confidence
Last synced: 6 months ago · JSON representation

Repository

An R package for kernel density estimation with parametric starts and asymmetric kernels.

Basic Info
  • Host: GitHub
  • Owner: JonasMoss
  • License: other
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 1.37 MB
Statistics
  • Stars: 16
  • Watchers: 4
  • Forks: 4
  • Open Issues: 2
  • Releases: 5
Topics
asymmetric-kernels density-estimation kernel-density-estimation non-parametric
Created about 8 years ago · Last pushed 12 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output:
  github_document:
    html_preview: true
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
```

# kdensity 
[![R build status](https://github.com/JonasMoss/kdensity/workflows/R-CMD-check/badge.svg)](https://github.com/JonasMoss/kdensity/actions)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/kdensity)](https://cran.r-project.org/package=kdensity)
[![DOI](https://zenodo.org/badge/120678148.svg)](https://zenodo.org/badge/latestdoi/120678148)


An `R` package for univariate kernel density estimation with parametric starts and asymmetric kernels.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(out.width = "750px", dpi = 200)
```

## News 
`kdensity` is now linked to `univariateML`, meaning it supports the 
approximately 30+ parametric starts from that package!

## Overview
kdensity is an implementation of univariate kernel density estimation with support for parametric starts and asymmetric kernels. Its main function is `kdensity`, which is has approximately the same syntax as `stats::density`. Its new functionality is:

* `kdensity` has built-in support for many *parametric starts*, such as `normal` 
  and `gamma`, but you can also supply your own. For a list of supported parametric 
  starts, see the readme of [`univariateML`](https://github.com/JonasMoss/univariateML).
* It supports several asymmetric kernels ones such as `gcopula` and `gamma` kernels, but also     the common symmetric ones. In addition, you can also supply your own kernels. 
* A selection of choices for the bandwidth function `bw`, again including an option to specify    your own.
* The returned value is density function. This can be used for e.g. numerical 
  integration, numerical differentiation, and point evaluations. 

A reason to use `kdensity` is to avoid *boundary bias* when estimating densities on the unit interval or the positive half-line. Asymmetric kernels such as `gamma` and `gcopula` are designed for this purpose. The support for parametric starts allows you to easily use a method that is often superior to ordinary kernel density estimation.

Several `R` packages deal with kernel estimation. For an overview see 
[Deng & Hadley Wickham (2011)](https://vita.had.co.nz/papers/density-estimation.pdf).
While no other `R` package handles density estimation with parametric starts, 
several packages supports methods that handle boundary bias.  [`evmix`](http://www.math.canterbury.ac.nz/~c.scarrott/evmix/) provides a 
variety of boundary bias correction methods in the `bckden` function. 
[`kde1d`](https://github.com/tnagler/kde1d) corrects for boundary bias using 
transformed univariate local polynomial kernel density estimation.
[`logKDE`](https://github.com/andrewthomasjones/logKDE) corrects for boundary 
bias on the half line using a logarithmic transform. 
[`ks`](https://CRAN.R-project.org/package=ks) supports boundary correction through the `kde.boundary` function, while [`Ake`](https://CRAN.R-project.org/package=Ake) corrects for 
boundary bias using tailored kernel functions.

## Installation
From inside `R`, use one of the following commands:
```{r install, echo = TRUE, eval = FALSE}
# For the CRAN release
install.packages("kdensity")
# For the development version from GitHub:
# install.packages("devtools")
devtools::install_github("JonasMoss/kdensity")
```

## Usage Example
Call the `library` function and use it just like `stats::density`, but with optional additional arguments.
```{r simpleuse, echo = TRUE, eval = FALSE}
library("kdensity")
plot(kdensity(mtcars$mpg, start = "normal"))
```

## Description

Kernel density estimation with a *parametric start* was introduced by Hjort and Glad in [Nonparametric Density Estimation with a Parametric Start (1995)](https://projecteuclid.org/euclid.aos/1176324627). The idea is to start out with a parametric density before you do your kernel density estimation, so that your actual kernel density estimation will be a correction to the original parametric estimate. The resulting estimator will outperform the ordinary kernel density estimator in terms of asymptotic 
integrated mean squared error whenever the true density is close to your suggestion; and the estimator can be superior to the ordinary kernel density estimator even when the suggestion is pretty far off.

In addition to parametric starts, the package implements some *asymmetric kernels*. These kernels are useful when modelling data with sharp boundaries, such as data supported on the positive half-line or the unit interval. Currently we support the following asymmetric kernels:

* Jones and Henderson's *Gaussian copula KDE*, from [Kernel-Type Density Estimation on the Unit Interval (2007)](https://academic.oup.com/biomet/article-abstract/94/4/977/246269). This is used for data on the unit interval. The bandwidth selection mechanism described in that paper is implemented as well. This kernel is called `gcopula`.

* Chen's two *beta kernels* from [Beta kernel estimators for density functions (1999)](https://www.sciencedirect.com/science/article/pii/S0167947399000109). These are used for data supported on the on the unit interval, and are called `beta` and `beta_biased`.

* Chen's two *gamma kernels* from [Probability Density Function Estimation Using Gamma Kernels (2000)](https://link.springer.com/article/10.1023/A:1004165218295). These are used for data supported on the positive half-line, and are called `gamma` and `gamma_biased`.

These features can be combined to make asymmetric kernel densities estimators with parametric starts, see the example below. The package contains only one function, `kdensity`, in addition to the generics `plot`, `points`, `lines`, `summary`, and `print`. 

## Usage

The function `kdensity` takes some `data`, a kernel `kernel` and a parametric start `start`. You can optionally specify the `support` parameter, which is used to find the normalizing constant.

The following example uses the \code{datasets::airquality} data set. The black curve is a gamma-kernel density estimate with a gamma start, the red curve a fully parametric gamma density
and and the blue curve an ordinary `density` estimate. Notice the boundary bias of the ordinary 
`density` estimator. The underlying parameter estimates are always maximum likelilood.

```{r example, echo = TRUE}
library("kdensity")
kde <- kdensity(airquality$Wind, start = "gamma", kernel = "gamma")
plot(kde, main = "Wind speed (mph)")
lines(kde, plot_start = TRUE, col = "red")
lines(density(airquality$Wind, adjust = 2), col = "blue")
rug(airquality$Wind)
```

Since the return value of `kdensity` is a function, `kde` is callable and can be
used as any density function in `R` (such as `stats::dnorm`). For example, you can
do:

```{r callable, echo = TRUE}
kde(10)
integrate(kde, lower = 0, upper = 1) # The cumulative distribution up to 1.
```

You can access the parameter estimates by using `coef`. You can also access the log likelihood (`logLik`), AIC and BIC of the parametric start distribution.

```{r dollar, echo = TRUE}
coef(kde)
logLik(kde)
AIC(kde)
```
## How to Contribute or Get Help
If you encounter a bug, have a feature request or need some help, open a [Github issue](https://github.com/JonasMoss/kdensity/issues). Create a pull requests
to contribute. This project follows a [Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct/).

## References

* [Hjort, Nils Lid, and Ingrid K. Glad. "Nonparametric density estimation with a parametric start." The Annals of Statistics (1995): 882-904.](https://projecteuclid.org/euclid.aos/1176324627).

* [Jones, M. C., and D. A. Henderson. "Miscellanea kernel-type density estimation on the unit interval." Biometrika 94.4 (2007): 977-984.]. 

* [Chen, Song Xi. "Probability density function estimation using gamma kernels." Annals of the Institute of Statistical Mathematics 52.3 (2000): 471-480.](https://link.springer.com/article/10.1023/A:1004165218295).

* [Chen, Song Xi. "Beta kernel estimators for density functions." Computational Statistics & Data Analysis 31.2 (1999): 131-145.]

Owner

  • Name: Jonas Moss
  • Login: JonasMoss
  • Kind: user
  • Location: Oslo
  • Company: BI Norwegian Business School

Assistant professor in statistics.

JOSS Publication

kdensity: An R package for kernel density estimation with parametric starts and asymmetric kernels
Published
October 03, 2019
Volume 4, Issue 42, Page 1566
Authors
Jonas Moss ORCID
University of Oslo
Martin Tveten ORCID
University of Oslo
Editor
Yuan Tang ORCID
Tags
statistics kernel density estimation non-parametric statistics non-parametrics non-parametric density estimation boundary bias

GitHub Events

Total
  • Watch event: 2
  • Push event: 6
Last Year
  • Watch event: 2
  • Push event: 6

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 236
  • Total Committers: 3
  • Avg Commits per committer: 78.667
  • Development Distribution Score (DDS): 0.042
Past Year
  • Commits: 9
  • Committers: 1
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Jonas Moss j****n@g****m 226
Martin Tveten m****n@g****m 8
Erfjord a****2@v****o 2
Committer Domains (Top 20 + Academic)
vaf.no: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 50
  • Total pull requests: 15
  • Average time to close issues: 4 months
  • Average time to close pull requests: 20 minutes
  • Total issue authors: 6
  • Total pull request authors: 2
  • Average comments per issue: 0.92
  • Average comments per pull request: 0.07
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • JonasMoss (37)
  • gvegayon (5)
  • trashbirdecology (4)
  • Tveten (2)
  • pkumar81 (1)
  • Avi-Kenny (1)
Pull Request Authors
  • JonasMoss (12)
  • Tveten (3)
Top Labels
Issue Labels
enhancement (12) feature (12) documentation (7) bug (5) vignette (5) question (2) testing (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 649 last-month
  • Total dependent packages: 3
  • Total dependent repositories: 3
  • Total versions: 4
  • Total maintainers: 1
cran.r-project.org: kdensity

Kernel Density Estimation with Parametric Starts and Asymmetric Kernels

  • Versions: 4
  • Dependent Packages: 3
  • Dependent Repositories: 3
  • Downloads: 649 Last month
Rankings
Forks count: 12.2%
Dependent packages count: 13.7%
Stargazers count: 14.6%
Average: 15.7%
Dependent repos count: 16.5%
Downloads: 21.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • EQL * imports
  • assertthat * imports
  • univariateML * imports
  • SkewHyperbolic * suggests
  • covr * suggests
  • extraDistr * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • testthat * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v4 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite