xmap

R package for verifying and transforming data between nomenclature, categories or standards

https://github.com/cynthiahqy/xmap

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary

Keywords

concordance crosswalks data-wrangling

Last synced: 6 months ago · JSON representation

Repository

R package for verifying and transforming data between nomenclature, categories or standards

Basic Info

Host: GitHub
Owner: cynthiahqy
License: other
Language: R
Default Branch: main
Homepage: https://cynthiahqy.github.io/xmap/
Size: 2.12 MB

Statistics

Stars: 3
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 2

Topics

concordance crosswalks data-wrangling

Created almost 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  llapse = TRUE,
  mment = "#>"
)
```

# xmap  

The `{xmap}` package provides support for transformations of numeric aggregates
   between statistical classifications (e.g. occupation or industry categorisations) using the Crossmaps framework.
   Implements classes for representing transformations between a source and target classification
   as graph structures (i.e. Crossmaps), and methods for validating and applying crossmaps to transform
   data collected under the source classification into data indexed using the target classification codes.

## Overview

Crossmaps encodes instructions for transforming data between statistical classifications. It is a graph structure with links (`to`, `from`, and `weight_by`) that associate source and target classification codes with weights for redistributing numeric mass attached to each code in a source classification. For example, a given link could specify encode that 10% of people in a source classification are employed in a particular target occupation. A collection of links between two classifications forms a crossmap graph structure, which when represented as an table of links (i.e. an edge list table) can be easily verified against conditions that are required for a valid transformation of data between the specified classifications.

Using a valid crossmap guarantees that the total mass before and after the transformation remains the same. For example, if we reclassify counts of workers by occupation, we would expect that the total number of workers across all occupation categories remains unchanged after reclassification. However, comparing totals is not always sufficient to identify mistakes in data transformation as there can be multiple ways to redistribute mass between source and target classifications while maintaining the same total. This package allows you to create, validate and apply `xmap_tbl` objects to perform valid and mass-preserving transformations of numeric aggregates between statistical classifications. The crossmaps workflow saves users from having to manually check code lines for implementation errors by verifying crossmaps satisfy mathematically sufficient conditions for valid transformation.

### Citation and Related Papers

For more details the task abstraction underpinning the crossmap framework and some visualisations of crossmaps, see [Visualising category recoding and numeric redistributions](https://arxiv.org/pdf/2308.06535), and for more on how transformation guarantees arise from graph properities of crossmaps, see [*Crossmaps: A Unified Statistical And Computational Framework For Ex-Post Harmonisation Of Aggregate Statistics*](https://arxiv.org/abs/2406.14163).

To cite this package use:
```{r}
citation("xmap")
```

## Installation

To install the latest CRAN release of `xmap`:

``` r
install.packages("xmap")
```

To install the latest development version of `xmap`:

``` r
remotes::install_github("cynthiahqy/xmap")
```

## Usage

### Creating crossmaps

The easiest way to create a crossmap is to coerce a dataframe (e.g. `xmap::demo$abc_links`) containing source codes, target codes and weights between them:

```{r}
library(xmap)
demo$abc_links |>
  as_xmap_tbl(from = "lower", to = "upper", weight_by = "share")
```

If the coercion fails, you can use `diagnose_as_xmap()` to identify issues:

```{r error=TRUE}
bad_links <- demo$abc_links
bad_links[4, "share"] <- 5

diagnose_as_xmap_tbl(bad_links, from = "lower", to = "upper", weight_by = "share")
```

### Applying crossmaps

When using a crossmap to transform data, you want to make sure that the crossmap covers all the codes present in your data. For example, if your data contained a count for the category "teacher", but your crossmap doesn't have any links with "teacher", then you risk silently losing data in the transformation. Even if you wanted to remove the count for "teacher", this should be done in the original dataset explicitly (e.g. via filtering and removing rows), rather than implictly in the transformation.

To use a suitable crossmap to transform data, you can use `apply_xmap()`:

```{r}
abc_xmap <- demo$abc_links |>
  as_xmap_tbl(from = "lower", to = "upper", weight_by = "share")
abc_data <- tibble::tibble(
  lower = unique(demo$abc_links$lower),
  count = runif(length(unique(demo$abc_links$lower)), min = 100, max = 500)
)
transformed_data <- apply_xmap(
  .data = abc_data,
  .xmap = abc_xmap,
  values_from = count
)

## totals still match!
sum(abc_data$count) == sum(transformed_data$count)
```

Owner

Name: Cynthia Huang
Login: cynthiahqy
Kind: user
Location: Melbourne, Australia

Website: cynthiahqy.com
Repositories: 31
Profile: https://github.com/cynthiahqy

PhD Student in Econometrics and Business Statistics at Monash University

GitHub Events

Total

Release event: 1
Push event: 16
Create event: 3

Last Year

Release event: 1
Push event: 16
Create event: 3

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: about 11 hours
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 1.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: about 11 hours
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 1.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

cynthiahqy (1)

Pull Request Authors

cynthiahqy (2)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 157 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

cran.r-project.org: xmap

Transforming Data Between Statistical Classifications

Homepage: https://github.com/cynthiahqy/xmap
Documentation: http://cran.r-project.org/web/packages/xmap/xmap.pdf
License: MIT + file LICENSE
Latest release: 0.1.0
published about 1 year ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 157 Last month

Rankings

Stargazers count: 27.0%

Dependent packages count: 27.3%

Forks count: 28.9%

Dependent repos count: 33.6%

Average: 40.7%

Downloads: 86.7%

Maintainers (1)

cynthiahqy@gmail.com