invgamma

A light weight package for the dpqr functions for the inverse gamma, inverse chi-squared, and inverse exponential distributions in R

https://github.com/dkahle/invgamma

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

A light weight package for the dpqr functions for the inverse gamma, inverse chi-squared, and inverse exponential distributions in R

Basic Info
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 2
  • Open Issues: 2
  • Releases: 1
Created almost 10 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "# ",
  fig.path = "tools/README-",
  dpi = 250
)
```


# __invgamma__



[![DOI](https://zenodo.org/badge/61824677.svg)](https://doi.org/10.5281/zenodo.16944076)



__invgamma__ implements the `[dpqr]` statistics functions for the [inverse gamma distribution](https://en.wikipedia.org/wiki/Inverse-gamma_distribution) in [R](https://cran.r-project.org).  It is ideal for using in other packages since it is lightweight and leverages the `[dpqr]gamma()` line of functions maintained by CRAN.

_Please see the section on parameterizations below to avoid any unintended mistakes!_



### Getting __invgamma__

There are two ways to get __invgamma__.  For the [CRAN version](https://cran.r-project.org/package=invgamma), use
```{r, eval=FALSE}
install.packages("invgamma")
```
For the development version, use
```{r, eval=FALSE}
# install.packages("devtools")
devtools::install_github("dkahle/invgamma")
```



### The `[dpqr]invgamma()` functions

The functions in __invgamma__ match those for the gamma distribution provided by the __stats__ package.  Namely, it uses as its density
_f(x) = (b^a / Gamma(a)) x^-(a+1) e^(-b/x),_
where a = `shape` and b = `rate`.


The [PDF](https://en.wikipedia.org/wiki/Probability_density_function) (the _f(x)_ above) can be evaluated with the `dinvgamma()` function:
```{r}
#| fig.height: 3
library("invgamma")
x <- seq(0, 5, .01)
shape <- 7; rate <- 10
plot(x, dinvgamma(x, shape, rate), type = "l")
```


The [CDF](https://en.wikipedia.org/wiki/Cumulative_distribution_function) can be evaluated with the `pinvgamma()` function:
```{r}
f <- function(x) dinvgamma(x, shape, rate)
q <- 2

integrate(f, 0, q)

(p <- pinvgamma(q, shape, rate))
```

The [quantile function](https://en.wikipedia.org/wiki/Quantile_function) can be evaluated with `qinvgamma()`:
```{r}
qinvgamma(p, shape, rate) # = q
```

And random number generation can be performed with `rinvgamma()`:
```{r}
set.seed(1234)
rinvgamma(5, shape, rate)
```
`rinvgamma()` can be used to obtain a [Monte Carlo](https://en.wikipedia.org/wiki/Monte_Carlo_method) estimate of the probability given by `pinvgamma()` above:
```{r}
n <- 1e5
draws <- rinvgamma(n, shape, rate)
mean(draws <= q)
```
Moreover, we can check the consistency and correctness of the implementation with first a [kernel density estimate](https://en.wikipedia.org/wiki/Kernel_density_estimation)... 
```{r}
#| fig.height: 3
plot(density(draws), xlim = c(0,5))
curve(f(x), col = "red", add = TRUE)
```

...and also a [quantile plot](https://en.wikipedia.org/wiki/Q–Q_plot)...

```{r}
#| fig.height: 3
qqplot( "x" = ppoints(n) |> qinvgamma(shape, rate), "y" = draws,
  xlab = "Theoretical quantiles", ylab = "Sample quantiles",
  main = "QQ plot for inverse gamma draws"
)
abline(0, 1, col = "red")
```

Both of these indicate that the samplers are consistent. As an inferential alternative, we can use a [KS test](https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test):

```{r}
ks.test(
  draws, 
  function(p) pinvgamma(p, shape, rate)
)
```





### The `[dpqr]invchisq()` and `[dpqr]invexp()` functions

The [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution) subsumes the [chi-squared](https://en.wikipedia.org/wiki/Chi-squared_distribution) and [exponential](https://en.wikipedia.org/wiki/Exponential_distribution) [distributions](https://en.wikipedia.org/wiki/Probability_distribution#Continuous_probability_distribution), so it makes sense to include the `*invchisq()` and `*invexp()` functions in __invgamma__.  Their implementations, however, wrap `*chisq()` and `*exp()`, not `*invgamma()`.  









### A note on parameterizations

As detailed [here](https://github.com/dkahle/invgamma/issues/1), the parameterizations of the functions in this package cue off of their corresponding non-inverse distributions from **stats**. This commonly causes the confusion that, for example, the parameter `rate` in `dinvgamma()` is the rate parameter of the inverse gamma distribution. It is not! It is the rate parameter of the corresponding gamma distribution. Please take care with this distinction.


### A note on numerics

**invgamma** was intended to be a lightweight and simple, largely self-maintaining package implementing the inverse gamma, inverse chi-square, and inverse exponential distributions. It uses [the transformation theorem](https://en.wikipedia.org/wiki/Random_variable#Functions_of_random_variables) in all cases. 

One of the challenges to using naive implementations of distributions is that their [numerics](https://en.wikipedia.org/wiki/Floating-point_arithmetic) may not work well. Arithmetic on a computer is not the same as arithmetic in theory, the kind that you meet in math classes, and as a consequence the best computer implementations of mathematical facts/algorithms need to be tailored to the specific cases at hand.

In January 2023 I did a little poking around into this for `rinvgamma()` and found that it performs poorly when the shape parameter is less than .001 or so. The resulting distributions are very heavy-tailed, and the draws from these distributions returned by `rinvgamma()` are so large that they get rounded to either very large numbers (where the floating point representation of numbers does not provide many numbers) or infinity. Here's an example:

```{r}
#| warn: true
rinvgamma(10, shape = .001, rate = 7)
```

Notice that `rinvgamma()` issues a warning in this circumstance; this is the general behavior of **invgamma**: if a particular parameter configuration is known to not produce valid results, a warning is issued.


#### KS tests for sampling accuracy


##### `rinvgamma()`

Here is a more detailed Monte Carlo investigation that checks sampler quality using [the Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test).

First, we write a basic Monte Carlo test for the sampler that works by generating a large (`n = 1e6`) sample of draws from the inverse gamma distribution for a given shape and rate:

```{r}
test_invgamma <- function(shape, rate, n = 1e5) {
  draws <- rinvgamma(n, shape, rate)
  ks.test(draws, function(p) pinvgamma(p, shape, rate))$p.value
}

test_invgamma(3, 7)
```

The function returns the $p$-value associated with the KS test, so "small" values suggest a departure from the null hypothesis that the distribution is from the corresponding inverse gamma distribution: the sampler is performing poorly. Under the null hypothesis, [the $p$-value has an approximate uniform distribution](https://en.wikipedia.org/wiki/P-value#p-value_as_the_statistic_for_performing_significance_tests), a fact that can be found in most advanced mathematical statistics books, so we would expect some proportion to be small regardless. 

We want to see the behavior of the sampler `rinvgamma()` across a wide array of parameter values. To do this, we use a range of parameter values running from small ($10^-4$) to large ($10^4$):

```{r}
#| message: false
# load tidyverse and related
library("tidyverse"); library("patchwork"); library("scales", warn.conflicts = FALSE)
theme_set(theme_minimal()); theme_update(panel.grid.minor = element_blank())

# load furrr for parallel computing
library("furrr"); furrr_options(seed = TRUE)
plan(multisession(workers = parallelly::availableCores()))

# set parameter values to test
n_grid <- 51
param_vals <- 10^seq(-4, 4, length.out = n_grid)
(param_grid <- expand_grid("shape" = param_vals, "rate" = param_vals))
```

Here's what the experiment's design space looks like:

```{r}
# make axes labeller
fmt <- scales::math_format(10^.x)

# make plot
ggplot(param_grid, aes(shape, rate)) +
  geom_point() +
  scale_x_log10(expression(alpha), n.breaks = 10, labels = fmt(-5:5)) +
  scale_y_log10(expression(lambda), n.breaks = 10, labels = fmt(-5:5)) +
  labs("title" = "Parameter Values at Which to Test `rinvgamma()`") +
  coord_equal()
```

Now, we run our test for each point in the design space in parallel. (Note: we've suppressed warnings here that are relevant.)

```{r}
#| warning: false
param_grid <- param_grid |> 
  mutate(p_val = future_map2_dbl(shape, rate, test_invgamma))
```

And we visualize the distribution of the $p$-values over that space, binning the colors to at .05 to highlight the rejections of the tests at the 5% level:

```{r}
ggplot(param_grid, aes(shape, rate, color = p_val)) +
  geom_point() +
  scale_x_log10(expression(alpha), n.breaks = 10, labels = fmt(-5:5)) +
  scale_y_log10(expression(lambda), n.breaks = 10, labels = fmt(-5:5)) +
  scale_color_binned(breaks = c(0, .05, 1)) +
  labs(color = "p value") +
  labs("title" = "KS GoF Test of Draws for Different Parameter Values") +
  coord_equal()
```

If the sampler were working correctly, the $p$-values would be approximately IID uniform(0,1), so we would expect about 5% of the points to be purple, and those 5% would be uniformly distributed over the whole space with no patterns. Obviously, that's not the case: when the shape parameter is small, the test is always rejecting. Clearly, when `shape` is small, the sampler does not work well. Further investigations reveal that, as an easy rule, the sampler can be considered unreliable for `shape` values less than `0.01`. As a consequence, `rinvgamma()` issues a warning in those circumstances. (This warning has been suppressed in the above computations.)

##### `rinvchisq()` and `rinvexp()`

Similar investigations using the inverse chi-squared and inverse exponential reveal that `rinvchisq()` should not be trusted when `df <= .01` and `ncp <= 10` and `rinvexp()` is trustworthy for all values. Here is the illustration for the inverse chi-squared:

```{r}
#| warning: false
test_rinvchisq <- function(df, ncp, n = 1e5) {
  draws <- rinvchisq(n, df, ncp)
  ks.test(draws, function(p) pinvchisq(p, df, ncp))$p.value
}

expand_grid("df" = param_vals, "ncp" = param_vals) |> 
  mutate("p_val" = future_map2_dbl(df, ncp, test_rinvchisq)) |> 
  ggplot(aes(df, ncp, color = p_val)) +
    geom_point() +
    scale_x_log10(expression(nu), n.breaks = 10, labels = fmt(-5:5)) +
    scale_y_log10("ncp", n.breaks = 10, labels = fmt(-5:5)) +
    scale_color_binned(breaks = c(0, .05, 1)) +
    labs(color = "p value") +
    coord_equal()
```

And here is the illustration for the inverse exponential:

```{r}
#| warning: false
test_rinvexp <- function(rate, n = 1e5) {
  draws <- rinvexp(n, rate = rate)
  ks.test(draws, function(p) pinvexp(p, rate))$p.value
}

tibble("rate" = 10^seq(-4, 4, length.out = 2*n_grid)) |> 
  mutate("p_val" = future_map_dbl(rate, test_rinvexp)) |> 
  ggplot(aes(rate, 0, color = p_val)) +
    geom_point() +
    scale_x_log10(expression(lambda), n.breaks = 10, labels = fmt(-5:5)) +
    scale_color_binned(breaks = c(0, .05, 1), guide = FALSE) +
    theme(axis.text.y = element_blank(), axis.title.y = element_blank(),
          panel.grid.major.y = element_blank()) +
    coord_equal()
```


Owner

  • Name: David Kahle
  • Login: dkahle
  • Kind: user
  • Location: Waco, Texas
  • Company: Baylor University

GitHub Events

Total
  • Create event: 3
  • Release event: 1
  • Issues event: 1
  • Push event: 22
  • Pull request event: 6
Last Year
  • Create event: 3
  • Release event: 1
  • Issues event: 1
  • Push event: 22
  • Pull request event: 6

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 34
  • Total Committers: 2
  • Avg Commits per committer: 17.0
  • Development Distribution Score (DDS): 0.147
Past Year
  • Commits: 5
  • Committers: 1
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
David Kahle d****e@g****m 29
David Kahle d****d@k****o 5
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 3
  • Total pull requests: 3
  • Average time to close issues: 2 days
  • Average time to close pull requests: 1 minute
  • Total issue authors: 3
  • Total pull request authors: 1
  • Average comments per issue: 2.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 minute
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Beliavsky (1)
  • ghost (1)
  • dklinenberg2020 (1)
Pull Request Authors
  • dkahle (6)
Top Labels
Issue Labels
wontfix (1)
Pull Request Labels
codex (6)

Packages

  • Total packages: 2
  • Total downloads:
    • cran 7,401 last-month
  • Total docker downloads: 235,763
  • Total dependent packages: 19
    (may contain duplicates)
  • Total dependent repositories: 24
    (may contain duplicates)
  • Total versions: 5
  • Total maintainers: 1
cran.r-project.org: invgamma

The Inverse Gamma Distribution

  • Versions: 4
  • Dependent Packages: 17
  • Dependent Repositories: 23
  • Downloads: 7,401 Last month
  • Docker Downloads: 235,763
Rankings
Dependent packages count: 3.8%
Dependent repos count: 5.7%
Downloads: 6.4%
Average: 12.4%
Forks count: 17.1%
Stargazers count: 19.3%
Docker downloads count: 22.1%
Maintainers (1)
Last synced: 7 months ago
conda-forge.org: r-invgamma
  • Versions: 1
  • Dependent Packages: 2
  • Dependent Repositories: 1
Rankings
Dependent packages count: 19.6%
Dependent repos count: 24.3%
Average: 40.4%
Stargazers count: 58.4%
Forks count: 59.2%
Last synced: 7 months ago

Dependencies

.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.5.0 composite
  • actions/checkout v4 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • codecov/codecov-action v5 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • testthat >= 3.0.0 suggests
revdep/library.noindex/invgamma/new/invgamma/DESCRIPTION cran
  • testthat >= 3.0.0 suggests
revdep/library.noindex/invgamma/old/invgamma/DESCRIPTION cran