humanleague

humanleague: a C++ microsynthesis package with R and python interfaces - Published in JOSS (2018)

https://github.com/virgesmith/humanleague

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

c-plus-plus-11 microsynthesis nodejs python3 quasirandom r sampling-methods

Keywords from Contributors

cplusplus-20 markov-chain microsimulation monte-carlo mpi pybind11 population
Last synced: 6 months ago · JSON representation

Repository

Microsynthesis using quasirandom sampling and/or IPF

Basic Info
  • Host: GitHub
  • Owner: virgesmith
  • License: other
  • Language: C++
  • Default Branch: main
  • Homepage:
  • Size: 1.69 MB
Statistics
  • Stars: 18
  • Watchers: 4
  • Forks: 3
  • Open Issues: 1
  • Releases: 23
Topics
c-plus-plus-11 microsynthesis nodejs python3 quasirandom r sampling-methods
Created over 8 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

humanleague

License

PyPI - Python Version PyPI version CRAN\_Status\_Badge

DOI status

python (pip) build r-cmd-check

Codacy Badge codecov

Introduction

Please note ongoing development is for the python version only. R development is currently maintenance-only due to resource constraints.

humanleague is a python and an R package for microsynthesising populations from marginal and (optionally) seed data. The package is implemented in C++ for performance.

The package contains algorithms that use a number of different microsynthesis techniques:

The latter provides a bridge between deterministic reweighting and combinatorial optimisation, offering advantages of both techniques:

  • generates high-entropy integral populations
  • can be used to generate multiple populations for sensitivity analysis
  • goes some way to address the 'empty cells' issues that can occur in straight IPF
  • relatively fast computation time

The algorithms:

  • support arbitrary dimensionality for both the marginals and the seed.
  • produce statistical data to ascertain the likelihood/degeneracy of the population (where appropriate).

The package also contains the following utilities:

  • a Sobol sequence generator (implemented as a generator class in python)
  • a function to construct a closest integer population from a discrete univariate probability distribution.
  • an algorithm for sampling an integer population from a discrete multivariate probability distribution, constrained to the marginal sums in every dimension (see below).
  • utility functions to convert a population represented as a multidimensional state array into tables of either counts (indexed by state) or individuals.

Version 1.0.1 reflects the work described in the Quasirandom Integer Sampling (QIS) paper.

Installation

Python

Requires Python 3.11 or newer. The package can be installed using pip, e.g.

bash pip install humanleague

Development

Fork or clone the repo, then

bash pip install -e .[dev] pytest

R

Official release:

```r

install.packages("humanleague") ```

For a development version

```r

devtools::install_github("virgesmith/humanleague") ```

Or, for the legacy version

```r

devtools::install_github("virgesmith/humanleague@1.0.1") ```

Documentation and Examples

R

Consult the package documentation, e.g.

```r

library(humanleague) ?humanleague ```

Python

The package now contains type annotations and your IDE should automatically display this, e.g.:

help

NB type stubs are generated using the pybind11-stubgen package, with some manual corrections.

Multidimensional integerisation

Building on the one-dimensionl integerise function - which given a discrete probability distribution and a count, returns the closest integer population to the distribution that sums to the count - a multidimensional equivalent integerise is introduced. In one dimension, for example this:

```python

import humanleague p = [0.1, 0.2, 0.3, 0.4] result, stats = humanleague.integerise(p, 11) result array([1, 2, 3, 5], dtype=int32) stats {'rmse': 0.3535533905932736} ```

produces the optimal (i.e. closest possible) integer population to the discrete distribution.

The integerise function generalises this problem and applies it to higher dimensions: given an n-dimensional array of real numbers where the 1-d marginal sums in every dimension are integral (and thus the total population is too), it attempts to find an integral array that also satisfies these constraints.

The QISI algorithm is repurposed to this end. As it is a sampling algorithm it cannot guarantee that a solution is found, and if so, whether the solution is optimal. If it fails this does not prove that a solution does not exist for the given input.

```python

import numpy as np a = np.array([[ 0.3, 1.2, 2. , 1.5], [ 0.6, 2.4, 4. , 3. ], [ 1.5, 6. , 10. , 7.5], [ 0.6, 2.4, 4. , 3. ]])

marginal sums

a.sum(axis=0) array([ 3., 12., 20., 15.]) a.sum(axis=1) array([ 5., 10., 25., 10.])

perform integerisation

result, stats = humanleague.integerise(a) stats {'conv': True, 'rmse': 0.5766281297335398} result array([[ 0, 2, 2, 1], [ 0, 3, 4, 3], [ 2, 6, 10, 7], [ 1, 1, 4, 4]])

check marginals are preserved

(result.sum(axis=0) == a.sum(axis=0)).all() True (result.sum(axis=1) == a.sum(axis=1)).all() True ```

Owner

  • Name: Andrew Smith
  • Login: virgesmith
  • Kind: user
  • Location: Leeds

Re-reformed Quant

JOSS Publication

humanleague: a C++ microsynthesis package with R and python interfaces
Published
May 03, 2018
Volume 3, Issue 25, Page 629
Authors
Andrew P. Smith ORCID
School of Geography and Leeds Institute for Data Analytics, University of Leeds
Editor
Thomas J. Leeper ORCID
Tags
c++ r python microsynthesis sampling

GitHub Events

Total
  • Create event: 10
  • Issues event: 12
  • Release event: 2
  • Watch event: 2
  • Delete event: 9
  • Issue comment event: 14
  • Push event: 30
  • Pull request event: 12
Last Year
  • Create event: 10
  • Issues event: 12
  • Release event: 2
  • Watch event: 2
  • Delete event: 9
  • Issue comment event: 14
  • Push event: 30
  • Pull request event: 12

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 542
  • Total Committers: 6
  • Avg Commits per committer: 90.333
  • Development Distribution Score (DDS): 0.397
Past Year
  • Commits: 5
  • Committers: 1
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
virgesmith a****h@l****k 327
virgesmith a****w@f****t 176
Tom Russell t****l@g****m 20
Robin Lovelace r****x@g****m 16
Version autobump d****s@f****t 2
The Codacy Badger b****r@c****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 40
  • Total pull requests: 23
  • Average time to close issues: 6 months
  • Average time to close pull requests: 16 days
  • Total issue authors: 7
  • Total pull request authors: 2
  • Average comments per issue: 0.68
  • Average comments per pull request: 0.83
  • Merged pull requests: 20
  • Bot issues: 3
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 11
  • Average time to close issues: 27 days
  • Average time to close pull requests: 6 days
  • Issue authors: 4
  • Pull request authors: 1
  • Average comments per issue: 1.33
  • Average comments per pull request: 0.82
  • Merged pull requests: 10
  • Bot issues: 3
  • Bot pull requests: 0
Top Authors
Issue Authors
  • virgesmith (31)
  • test-airev[bot] (3)
  • willu47 (2)
  • sumtxt (1)
  • ZuJinyan (1)
  • mem48 (1)
  • n400peanuts (1)
Pull Request Authors
  • virgesmith (22)
  • codacy-badger (1)
Top Labels
Issue Labels
enhancement (13) question (2) bug (2) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 130 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 2
    (may contain duplicates)
  • Total versions: 22
  • Total maintainers: 1
pypi.org: humanleague

Microsynthesis using quasirandom sampling and/or IPF

  • Versions: 17
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 130 Last month
Rankings
Dependent packages count: 10.1%
Dependent repos count: 11.5%
Stargazers count: 15.6%
Average: 16.9%
Forks count: 16.9%
Downloads: 30.1%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: humanleague
  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Average: 46.6%
Stargazers count: 49.6%
Dependent packages count: 51.2%
Forks count: 51.6%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • Rcpp >= 0.12.8 imports
  • testthat * suggests
setup.py pypi
  • numpy >=1.19.1
.github/workflows/conda.yml actions
  • actions/checkout v3 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/coverage.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/pip-package.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/r-cmd-check.yml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite