xmap

R package for verifying and transforming data between nomenclature, categories or standards

https://github.com/cynthiahqy/xmap

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary

Keywords

concordance crosswalks data-wrangling
Last synced: 6 months ago · JSON representation

Repository

R package for verifying and transforming data between nomenclature, categories or standards

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 2
Topics
concordance crosswalks data-wrangling
Created almost 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  llapse = TRUE,
  mment = "#>"
)
```

# xmap  xmap website

The `{xmap}` package provides support for transformations of numeric aggregates
   between statistical classifications (e.g. occupation or industry categorisations) using the Crossmaps framework.
   Implements classes for representing transformations between a source and target classification
   as graph structures (i.e. Crossmaps), and methods for validating and applying crossmaps to transform
   data collected under the source classification into data indexed using the target classification codes.

## Overview

Crossmaps encodes instructions for transforming data between statistical classifications. It is a graph structure with links (`to`, `from`, and `weight_by`) that associate source and target classification codes with weights for redistributing numeric mass attached to each code in a source classification. For example, a given link could specify encode that 10% of people in a source classification are employed in a particular target occupation. A collection of links between two classifications forms a crossmap graph structure, which when represented as an table of links (i.e. an edge list table) can be easily verified against conditions that are required for a valid transformation of data between the specified classifications.

Using a valid crossmap guarantees that the total mass before and after the transformation remains the same. For example, if we reclassify counts of workers by occupation, we would expect that the total number of workers across all occupation categories remains unchanged after reclassification. However, comparing totals is not always sufficient to identify mistakes in data transformation as there can be multiple ways to redistribute mass between source and target classifications while maintaining the same total. This package allows you to create, validate and apply `xmap_tbl` objects to perform valid and mass-preserving transformations of numeric aggregates between statistical classifications. The crossmaps workflow saves users from having to manually check code lines for implementation errors by verifying crossmaps satisfy mathematically sufficient conditions for valid transformation.

### Citation and Related Papers

For more details the task abstraction underpinning the crossmap framework and some visualisations of crossmaps, see [Visualising category recoding and numeric redistributions](https://arxiv.org/pdf/2308.06535), and for more on how transformation guarantees arise from graph properities of crossmaps, see [*Crossmaps: A Unified Statistical And Computational Framework For Ex-Post Harmonisation Of Aggregate Statistics*](https://arxiv.org/abs/2406.14163).

To cite this package use:
```{r}
citation("xmap")
```

## Installation

To install the latest CRAN release of `xmap`:

``` r
install.packages("xmap")
```

To install the latest development version of `xmap`:

``` r
remotes::install_github("cynthiahqy/xmap")
```

## Usage

### Creating crossmaps

The easiest way to create a crossmap is to coerce a dataframe (e.g. `xmap::demo$abc_links`) containing source codes, target codes and weights between them:

```{r}
library(xmap)
demo$abc_links |>
  as_xmap_tbl(from = "lower", to = "upper", weight_by = "share")
```

If the coercion fails, you can use `diagnose_as_xmap()` to identify issues:

```{r error=TRUE}
bad_links <- demo$abc_links
bad_links[4, "share"] <- 5

diagnose_as_xmap_tbl(bad_links, from = "lower", to = "upper", weight_by = "share")
```

### Applying crossmaps

When using a crossmap to transform data, you want to make sure that the crossmap covers all the codes present in your data. For example, if your data contained a count for the category "teacher", but your crossmap doesn't have any links with "teacher", then you risk silently losing data in the transformation. Even if you wanted to remove the count for "teacher", this should be done in the original dataset explicitly (e.g. via filtering and removing rows), rather than implictly in the transformation.

To use a suitable crossmap to transform data, you can use `apply_xmap()`:

```{r}
abc_xmap <- demo$abc_links |>
  as_xmap_tbl(from = "lower", to = "upper", weight_by = "share")
abc_data <- tibble::tibble(
  lower = unique(demo$abc_links$lower),
  count = runif(length(unique(demo$abc_links$lower)), min = 100, max = 500)
)
transformed_data <- apply_xmap(
  .data = abc_data,
  .xmap = abc_xmap,
  values_from = count
)

## totals still match!
sum(abc_data$count) == sum(transformed_data$count)
```

Owner

  • Name: Cynthia Huang
  • Login: cynthiahqy
  • Kind: user
  • Location: Melbourne, Australia

PhD Student in Econometrics and Business Statistics at Monash University

GitHub Events

Total
  • Release event: 1
  • Push event: 16
  • Create event: 3
Last Year
  • Release event: 1
  • Push event: 16
  • Create event: 3

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 11 hours
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 11 hours
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • cynthiahqy (1)
Pull Request Authors
  • cynthiahqy (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 157 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
cran.r-project.org: xmap

Transforming Data Between Statistical Classifications

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 157 Last month
Rankings
Stargazers count: 27.0%
Dependent packages count: 27.3%
Forks count: 28.9%
Dependent repos count: 33.6%
Average: 40.7%
Downloads: 86.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.1 depends
  • cli * imports
  • dplyr * imports
  • glue * imports
  • rlang >= 1.0.0 imports
  • tibble * imports
  • tidyr * imports
  • Matrix * suggests
  • forcats * suggests
  • ggbump * suggests
  • ggplot2 * suggests
  • knitr * suggests
  • matlib * suggests
  • patchwork * suggests
  • rmarkdown * suggests
  • stats * suggests
  • stringr * suggests
  • testthat >= 3.0.0 suggests
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.5.0 composite
  • actions/checkout v4 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite