mlfit

Implementation of algorithms that extend IPF to nested structures

https://github.com/mlfit/mlfit

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: sciencedirect.com
  • Committers with academic emails
    1 of 8 committers (12.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary

Keywords from Contributors

reproducibility rmarkdown package-creation tidy-data hms report standardization latex devtools sqlite3
Last synced: 6 months ago · JSON representation

Repository

Implementation of algorithms that extend IPF to nested structures

Basic Info
Statistics
  • Stars: 14
  • Watchers: 4
  • Forks: 10
  • Open Issues: 14
  • Releases: 18
Created over 12 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
set.seed(728)
```


[![CRAN status](https://www.r-pkg.org/badges/version/mlfit)](https://CRAN.R-project.org/package=mlfit)
[![rcc](https://github.com/mlfit/mlfit/workflows/rcc/badge.svg)](https://github.com/mlfit/mlfit/actions)
[![Codecov test coverage](https://codecov.io/gh/mlfit/mlfit/branch/master/graph/badge.svg)](https://codecov.io/gh/mlfit/mlfit?branch=master)
[![](https://cranlogs.r-pkg.org/badges/mlfit)](https://cran.r-project.org/package=mlfit)
[![](https://cranlogs.r-pkg.org/badges/grand-total/mlfit)](https://CRAN.R-project.org/package=mlfit)


Implementation of algorithms that extend Iterative Proportional Fitting (IPF) to nested structures.

The IPF algorithm operates on count data. This package offers implementations for several algorithms that extend this to nested structures: "parent" and "child" items for both of which constraints can be provided.

## Installation

Install from CRAN with:

``` r
install.packages("mlfit")
```

Or the development version from GitHub:

``` r
# install.packages("devtools")
devtools::install_github("mlfit/mlfit")
```

## Example - single zone

Here is a multi-level fitting example with a reference sample (`reference_sample`) and two control tables (`individual_control` and `group_control`). Each row of `reference_sample` represents an individual in a sample of a population, where `HHNR` is their group ID and `PNR` is their individual ID, `APER` and `WKSTAT` are individial-level charateristics, and `CAR` is the only household characteristic of the sample population. The 'N' columns in both control tables denote how many units of individuals or groups belong to each category.

```{r}
library(mlfit)
library(tibble)

reference_sample <- tibble::tribble(
  ~HHNR, ~PNR, ~APER, ~CAR, ~WKSTAT,
  1L, 1L, 3L, "0", "1",
  1L, 2L, 3L, "0", "2",
  1L, 3L, 3L, "0", "3",
  2L, 4L, 2L, "0", "1",
  2L, 5L, 2L, "0", "3",
  3L, 6L, 3L, "0", "1",
  3L, 7L, 3L, "0", "1",
  3L, 8L, 3L, "0", "2",
  4L, 9L, 3L, "1", "1",
  4L, 10L, 3L, "1", "3",
  4L, 11L, 3L, "1", "3",
  5L, 12L, 3L, "1", "2",
  5L, 13L, 3L, "1", "2",
  5L, 14L, 3L, "1", "3",
  6L, 15L, 2L, "1", "1",
  6L, 16L, 2L, "1", "2",
  7L, 17L, 5L, "1", "1",
  7L, 18L, 5L, "1", "1",
  7L, 19L, 5L, "1", "2",
  7L, 20L, 5L, "1", "3",
  7L, 21L, 5L, "1", "3",
  8L, 22L, 2L, "1", "1",
  8L, 23L, 2L, "1", "2"
)

individual_control <- tibble::tribble(
  ~WKSTAT, ~N,
  "1", 91L,
  "2", 65L,
  "3", 104L
)

group_control <- tibble::tribble(
  ~CAR, ~N,
  "0", 35L,
  "1", 65L
)
```

First we need to create a `ml_problem` object which defines our multi-level fitting problem. `special_field_names()` is useful for the `field_names` argument to `ml_problem()`, this is where we need to specific the names of the ID columns in our reference sample and the count column in the control tables.

```{r}
fitting_problem <- ml_problem(
  ref_sample = reference_sample,
  controls = list(
    individual = list(individual_control),
    group = list(group_control)
  ),
  field_names = special_field_names(
    groupId = "HHNR",
    individualId = "PNR",
    count = "N"
  )
)
```

You can use one of the `ml_fit_*()` functions to calibrate your fitting problem, or you can use `ml_fit(ml_problem, algorithm = "")`.

```{r}
fit <- ml_fit(ml_problem = fitting_problem, algorithm = "ipu")
fit
```

`mlfit` also provides a function that helps to replicate the reference sample based on the fitted/calibrated weights. See `?ml_replicate` to find out which integerisation algorithms are available.

```{r}
syn_pop <- ml_replicate(fit, algorithm = "trs")
syn_pop
```

## Example - multiple zones

This example is almost identical to the previous example, except we are creating sub-fitting problems based on zones. `ml_problem()` has the `geo_hierarchy` argument, where it lets you specify a geographical hierarchy, a `data.frame` with two columns: `region` and `zone`. To put it simply, a zone can only belong to one region. The image below shows an example of that, where the orange patch is a zone that is within the green region. 


![](https://user-images.githubusercontent.com/17020181/113852241-afed5580-97df-11eb-80ed-2b458e8fcbda.png)


When `geo_hierarchy` is validly specified, `ml_problem()` would return a list of fitting problems, one fitting problem per zone. Each fitting problem will contain only relevant subsets of the reference sample and control totals for its zone. Basically, the reference sample is a population survey sample taken at a regional level and the control totals should be at a zonal level.  

```{r}
ref_sample <- tibble::tribble(
  ~HHNR, ~PNR, ~APER, ~HH_VAR, ~P_VAR, ~REGION,
  1, 1, 3, 1, 1, 1,
  1, 2, 3, 1, 2, 1,
  1, 3, 3, 1, 3, 1,
  2, 4, 2, 1, 1, 1,
  2, 5, 2, 1, 3, 1,
  3, 6, 3, 1, 1, 1,
  3, 7, 3, 1, 1, 1,
  3, 8, 3, 1, 2, 1,
  4, 9, 3, 2, 1, 1,
  4, 10, 3, 2, 3, 1,
  4, 11, 3, 2, 3, 1,
  5, 12, 3, 2, 2, 1,
  5, 13, 3, 2, 2, 1,
  5, 14, 3, 2, 3, 1,
  6, 15, 2, 2, 1, 1,
  6, 16, 2, 2, 2, 1,
  7, 17, 5, 2, 1, 1,
  7, 18, 5, 2, 1, 1,
  7, 19, 5, 2, 2, 1,
  7, 20, 5, 2, 3, 1,
  7, 21, 5, 2, 3, 1,
  8, 22, 2, 2, 1, 1,
  8, 23, 2, 2, 2, 1,
  9, 24, 3, 1, 1, 2,
  9, 25, 3, 1, 2, 2,
  9, 26, 3, 1, 3, 2,
  10, 27, 2, 1, 1, 2,
  10, 28, 2, 1, 3, 2,
  11, 29, 3, 1, 1, 2,
  11, 30, 3, 1, 1, 2,
  11, 31, 3, 1, 2, 2,
  12, 32, 3, 2, 1, 2,
  12, 33, 3, 2, 3, 2,
  12, 34, 3, 2, 3, 2,
  13, 35, 3, 2, 2, 2,
  13, 36, 3, 2, 2, 2,
  13, 37, 3, 2, 3, 2,
  14, 38, 2, 2, 1, 2,
  14, 39, 2, 2, 2, 2,
  15, 40, 5, 2, 1, 2,
  15, 41, 5, 2, 1, 2,
  15, 42, 5, 2, 2, 2,
  15, 43, 5, 2, 3, 2,
  15, 44, 5, 2, 3, 2,
  16, 45, 2, 2, 1, 2,
  16, 46, 2, 2, 2, 2
)


hh_ctrl <- tibble::tribble(
  ~ZONE, ~HH_VAR, ~N,
  1, 1, 35,
  1, 2, 65,
  2, 1, 35,
  2, 2, 65,
  3, 1, 35,
  3, 2, 65,
  4, 1, 35,
  4, 2, 65
)

ind_ctrl <- tibble::tribble(
  ~ZONE, ~P_VAR, ~N,
  1, 1, 91,
  1, 2, 65,
  1, 3, 104,
  2, 1, 91,
  2, 2, 65,
  2, 3, 104,
  3, 1, 91,
  3, 2, 65,
  3, 3, 104,
  4, 1, 91,
  4, 2, 65,
  4, 3, 104
)

geo_hierarchy <- tibble::tribble(
  ~REGION, ~ZONE,
  1, 1,
  1, 2,
  2, 3,
  2, 4
)

fitting_problems <- ml_problem(
  ref_sample = ref_sample,
  field_names = special_field_names(
    groupId = "HHNR", individualId = "PNR", count = "N",
    zone = "ZONE", region = "REGION"
  ),
  group_controls = list(hh_ctrl),
  individual_controls = list(ind_ctrl),
  geo_hierarchy = geo_hierarchy
)

fits <- fitting_problems %>%
  lapply(ml_fit, algorithm = "ipu") %>%
  lapply(ml_replicate, algorithm = "trs")
```

## Powered by

- [`grake`](https://krlmlr.github.io/grake/): A reimplementation of generalized raking ([Deville and Särndal, 1992](https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1992.10475217); [Deville, Särndal and Sautory, 1993](https://www.tandfonline.com/doi/abs/10.1080/01621459.1993.10476369))


## Related work

- [`wrswoR`](https://krlmlr.github.io/wrswoR/): An implementation of fast weighted random sampling without replacement ([Efraimidis and Spirakis, 2006](https://www.sciencedirect.com/science/article/pii/S002001900500298X))
- [`mangow`](https://krlmlr.github.io/mangow/): Embed the Gower distance metric in L1
- [`RANN.L1`](https://github.com/jefferislab/RANN/tree/master-L1): k-nearest neighbors using the L1 metric


### Where is `MultiLeveLIPF`?

From version `0.4.0` onwards the package is now to be known as `mlfit`. If you would like to install any version that is older than `0.4.0` please use:

``` r
# See https://github.com/mlfit/mlfit/releases for the releases that are available
# To install a certain branch or commit or tag, append it to the repo name, after an @:
devtools::install_github("mlfit/mlfit@v0.3-7")
```

Note that, all versions prior to `0.4.0` should be used as `MultiLeveLIPF` not `mlfit`.

## Citation

To cite package ‘mlfit’ in publications use:

  Kirill Müller and Amarin Siripanich (2021). mlfit: Iterative Proportional Fitting Algorithms for Nested Structures. https://mlfit.github.io/mlfit/, https://github.com/mlfit/mlfit.

A BibTeX entry for LaTeX users is

```
@Manual{,
  title = {mlfit: Iterative Proportional Fitting Algorithms for Nested Structures},
  author = {Kirill Müller and Amarin Siripanich},
  year = {2021},
  note = {https://mlfit.github.io/mlfit/, https://github.com/mlfit/mlfit},
}
```

## Used in

- Casati, D., Müller, K., Fourie, P. J., Erath, A., & Axhausen, K. W. (2015). Synthetic population generation by combining a hierarchical, simulation-based approach with reweighting by generalized raking. Transportation Research Record, 2493(1), 107-116.
- Bösch, P. M., Müller, K., & Ciari, F. (2016). The IVT 2015 baseline scenario. In 16th Swiss Transport Research Conference (STRC 2016). 16th Swiss Transport Research Conference (STRC 2016).
- Müller, K. (2017). A generalized approach to population synthesis (Doctoral dissertation, ETH Zurich).
- Ilahi, A., & Axhausen, K. W. (2018). Implementing Bayesian network and generalized raking multilevel IPF for constructing population synthesis in megacities. In 18th Swiss Transport Research Conference (STRC 2018). STRC.
- Ilahi, A., & Axhausen, K. W. (2019). Integrating Bayesian network and generalized raking for population synthesis in Greater Jakarta. Regional Studies, Regional Science, 6(1), 623-636.
- Yameogo, B. F., Vandanjon, P. O., Gastineau, P., & Hankach, P. (2021). Generating a two-layered synthetic population for French municipalities: Results and evaluation of four synthetic reconstruction methods. JASSS-Journal of Artificial Societies and Social Simulation, 24, 27p.
- Zhou, M., Li, J., Basu, R., & Ferreira, J. (2022). Creating spatially-detailed heterogeneous synthetic populations for agent-based microsimulation. Computers, Environment and Urban Systems, 91, 101717.

Owner

  • Name: mlfit
  • Login: mlfit
  • Kind: organization

GitHub Events

Total
  • Watch event: 2
  • Push event: 92
  • Pull request event: 40
  • Fork event: 1
  • Create event: 11
Last Year
  • Watch event: 2
  • Push event: 92
  • Pull request event: 40
  • Fork event: 1
  • Create event: 11

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 966
  • Total Committers: 8
  • Avg Commits per committer: 120.75
  • Development Distribution Score (DDS): 0.337
Past Year
  • Commits: 51
  • Committers: 6
  • Avg Commits per committer: 8.5
  • Development Distribution Score (DDS): 0.647
Top Committers
Name Email Commits
Kirill Müller k****r@i****h 640
asiripanich 1****h 140
Kirill Müller k****r@m****g 82
Kirill Müller k****r@m****g 75
krlmlr k****r 16
Indrajeet Patil p****e@g****m 8
Kirill Müller k****l@c****m 4
github-actions g****s 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 54
  • Total pull requests: 49
  • Average time to close issues: 6 months
  • Average time to close pull requests: 19 days
  • Total issue authors: 4
  • Total pull request authors: 2
  • Average comments per issue: 2.09
  • Average comments per pull request: 1.27
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 22
  • Average time to close issues: N/A
  • Average time to close pull requests: 2 days
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 22
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • krlmlr (31)
  • asiripanich (21)
  • maelle (1)
  • walkerke (1)
Pull Request Authors
  • krlmlr (45)
  • asiripanich (23)
Top Labels
Issue Labels
enhancement (7) bug (3) feature (1)
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • cran 255 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: mlfit

Iterative Proportional Fitting Algorithms for Nested Structures

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 255 Last month
Rankings
Forks count: 7.3%
Stargazers count: 15.8%
Dependent repos count: 24.0%
Average: 24.9%
Dependent packages count: 28.8%
Downloads: 48.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • methods * depends
  • BB * imports
  • Matrix * imports
  • dplyr * imports
  • forcats * imports
  • hms * imports
  • kimisc * imports
  • lifecycle * imports
  • plyr * imports
  • rlang * imports
  • tibble * imports
  • utils * imports
  • wrswoR * imports
  • MASS * suggests
  • covr * suggests
  • sampling * suggests
  • testthat >= 3.0.0 suggests
  • waldo * suggests
.github/workflows/R-CMD-check-dev.yaml actions
  • ./.github/workflows/check * composite
  • ./.github/workflows/custom/after-install * composite
  • ./.github/workflows/custom/before-install * composite
  • ./.github/workflows/dep-matrix * composite
  • ./.github/workflows/install * composite
  • ./.github/workflows/rate-limit * composite
  • ./.github/workflows/update-snapshots * composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/R-CMD-check.yaml actions
  • ./.github/workflows/check * composite
  • ./.github/workflows/commit * composite
  • ./.github/workflows/custom/after-install * composite
  • ./.github/workflows/custom/before-install * composite
  • ./.github/workflows/git-identity * composite
  • ./.github/workflows/install * composite
  • ./.github/workflows/pkgdown-build * composite
  • ./.github/workflows/pkgdown-deploy * composite
  • ./.github/workflows/rate-limit * composite
  • ./.github/workflows/roxygenize * composite
  • ./.github/workflows/style * composite
  • ./.github/workflows/update-snapshots * composite
  • actions/checkout v3 composite
.github/workflows/check/action.yml actions
  • actions/upload-artifact main composite
  • r-lib/actions/check-r-package v2 composite
.github/workflows/fledge.yaml actions
  • ./.github/workflows/git-identity * composite
  • ./.github/workflows/install * composite
  • actions/checkout v2 composite
.github/workflows/install/action.yml actions
  • ./.github/workflows/get-extra * composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/lock.yaml actions
  • dessant/lock-threads v2 composite
.github/workflows/pkgdown-deploy/action.yml actions
  • nick-fields/retry v2 composite
.github/workflows/pkgdown.yaml actions
  • ./.github/workflows/custom/after-install * composite
  • ./.github/workflows/custom/before-install * composite
  • ./.github/workflows/git-identity * composite
  • ./.github/workflows/install * composite
  • ./.github/workflows/pkgdown-build * composite
  • ./.github/workflows/pkgdown-deploy * composite
  • ./.github/workflows/rate-limit * composite
  • actions/checkout v3 composite
.github/workflows/pr-commands.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/pr-fetch master composite
  • r-lib/actions/pr-push master composite
  • r-lib/actions/setup-r master composite
.github/workflows/revdep.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact main composite
  • r-lib/actions/setup-pandoc v2 composite
.github/workflows/style/action.yml actions
  • actions/cache v3 composite
.github/workflows/update-snapshots/action.yml actions
  • peter-evans/create-pull-request v4 composite
.github/workflows/commit/action.yml actions
.github/workflows/dep-matrix/action.yml actions
.github/workflows/git-identity/action.yml actions
.github/workflows/pkgdown-build/action.yml actions
.github/workflows/rate-limit/action.yml actions
.github/workflows/roxygenize/action.yml actions
.github/workflows/versions-matrix/action.yml actions
.github/workflows/dep-suggests-matrix/action.yml actions
.github/workflows/R-CMD-check-status.yaml actions
.github/workflows/get-extra/action.yml actions
.github/workflows/matrix-check/action.yml actions