coder

coder: An R package for code-based item classification and categorization - Published in JOSS (2020)

https://github.com/ropensci/coder

Keywords

classification icd-10 package r r-package rstats

Scientific Fields

Medicine Life Sciences - 45% confidence

Materials Science Physical Sciences - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

Classification of Cases into Deterministic Categories

Basic Info

Host: GitHub
Owner: ropensci
Language: TeX
Default Branch: master
Homepage: https://docs.ropensci.org/coder/
Size: 74.9 MB

Statistics

Stars: 22
Watchers: 3
Forks: 4
Open Issues: 7
Releases: 3

Topics

classification icd-10 package r r-package rstats

Created over 9 years ago · Last pushed 10 months ago

Metadata Files

Readme Changelog Codemeta

README.Rmd

---
output: github_document
---

# coder 

[![R build status](https://github.com/ropensci/coder/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/coder/actions) 
[![codecov](https://codecov.io/gh/ropensci/coder/branch/master/graph/badge.svg)](https://app.codecov.io/gh/ropensci/coder) 
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![DOI](https://zenodo.org/badge/65975808.svg)](https://zenodo.org/badge/latestdoi/65975808) [![status](https://joss.theoj.org/papers/10.21105/joss.02916/status.svg)](https://joss.theoj.org/papers/10.21105/joss.02916)
[![CRAN status](https://www.r-pkg.org/badges/version/coder)](https://CRAN.R-project.org/package=coder)
![CRAN downloads](http://cranlogs.r-pkg.org/badges/grand-total/coder)



```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README",
  out.width = "100%"
) 
```

## Aim of the package

The goal of `{coder}` is to classify items from one dataset, using codes from a secondary source with classification schemes based on regular expressions and weighted indices.

## Installation

You can install the released version of coder from [CRAN](https://CRAN.R-project.org) with:

``` r
# install.packages("coder")
```

And the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("remotes")
remotes::install_github("eribul/coder")
```

## Typical use case

-   Determining comorbidities before clinical trials
-   Discovering adverse events after surgery

**Patient data:** The initial rationale for the package was to classify patient data based on medical coding. A typical use case would consider patients from a medical/administrative data base, as identified by some patient id and possibly with some associated date of interest (date of diagnoses/treatment/intervention/hospitalization/rehabilitation). This data source could be for example an administrative hospital register or a national quality register.

**Codify:** The primary source could then be linked to a secondary (possibly larger) data base including the same patients with corresponding id:s and some coded patient data. This could be a national patient register with medical codes from the International Classification of Diseases *(ICD)* with corresponding dates of hospital visits/admission/discharge, or a medical prescription register with codes from the Anatomic Therapeutic Chemical *(ATC)* classification system with dates of medical prescription/dispatch/usage. A time window could be specified relating the date of the primary source (i. e. the date of a primary total hip arthroplasty; THA), to dates from the secondary source (i.e. the date of a medical prescription). ATC codes associated with medical prescriptions during one year prior to THA, could thus be identified and used as a measure of comorbidity. Another time window of 90 days after THA, might instead be used to identify adverse events after surgery.

**Classify:** To work with medical/chemical codes directly might be cumbersome, since those classifications tend to be massive and therefore hard to interpret. It is thus common to use data aggregation as proposed by some classification or combined index from the literature. This could be the *Charlson* or *Elixhauser* comorbidity indices based on ICD-codes, or the *RxRisk V* classification based on ATC-codes. Each of those tools appear with different code versions (ICD-8, ICD-9, ICD-9-CM, ICD-10, ICD-10-CA, ICD-10-SE, ICD-10-CM et cetera) and with different codes recognized as relevant comorbidities (the Charlson index proposed by Charlson et al, Deyo et al, Romano et al. Quan et al. et cetera). Using a third object (in addition to the primary and secondary patient data sets) helps to formalize and structure the use of such classifications. This is implemented in the `coder` package by `classcodes` objects based on regular expressions (often with several alternative versions). Those `classcodes` objects could be prepared by the user, although a number of default `classcodes` are also included in the package (table below).

**Index:** Now, instead of working with tens of thousands of individual ICD-codes, each patient might be recognized to have none or some familiar comorbidity such as hypertension, cancer or dementia. This granularity might be too fine-grained still, wherefore an even simpler index score might be searched for. Such scores/indices/weighted sums have been proposed as well and exist in many versions for each of the standard classifications. Some are simple counts, some are weighted sums, and some accounts for some inherited hierarchy (such that ICD-codes for diabetes with and without complications might be recognized in the same patient, although the un-complicated version might be masked by the complicated version in the index).

**Conditions:** Some further complexity might appear if some codes are only supposed to be recognized based on certain conditions. Patients with THA for example might have an adverse event after surgery if a certain ICD-code is recorded as the main diagnose at a later hospital visit, although the same code could be ignored if recorded only as a secondary diagnosis.

**To summarize:** The coder package takes three objects: (1) a data frame/table/tibble with id and possible dates from a primary source; (2) coded data from a secondary source with the same id and possibly different dates and; (3) a `classcodes` object, either a default one from the package, or as specified by the user. The outcome is then: (i) codes associated with each element from (1) identified from (2), possibly limited to a relevant time window; (ii) a broader categorization of the relevant codes as prescribed by (3), and; (iii) a summarized index score based on the relevant categories from (3).

(i-iii) corresponds to the output from functions `codify()`, `classify()` and `index()`, which could be chained explicitly as `codify() %>% classify() %>% index()`, or implicitly by the `categorize()` function.

## Usage

Assume we have some patients with surgery at specified dates:

```{r}
library(coder)
ex_people
```

Those patients (among others) were also recorded in a national patient register with date of hospital admissions and diagnoses codes coded by the International Classification of Diseases (ICD) version 10:

```{r}
ex_icd10
```

Using those two data sets, as well as a classification scheme (`classcodes` object; see below), we can easily identify all Charlson comorbidities for each patient:

```{r}
ch <- 
  categorize(
    ex_people,                  # patients of interest 
    codedata = ex_icd10,        # Medical codes from national patient register
    cc = "charlson",            # Calculate Charlson comorbidity
    id = "name", code = "icd10" # Specify column names
  )

ch
```

How many patients were diagnosed with malignancy?

```{r}
sum(ch$malignancy)
```

What is the distribution of the combined comorbidity index for each patient?

```{r}
barplot(table(ch$charlson))
```

There are many versions of the Charlson comorbidity index, which might be controlled by the `index` argument. We might also be interested only in diagnoses from 90 days before surgery as specified with an argument list `codify_args`as passed to `codify()`:

```{r}
ch <- 
  categorize(
    ex_people, codedata = ex_icd10, cc = "charlson", id = "name", code = "icd10",
    
    # Additional arguments
    index       = c("quan_original", "quan_updated"), # Indices
    codify_args = list(
      date      = "surgery",   # Name of column with index dates
      code_date = "admission", # Name of column with code dates
      days      = c(-90, -1)   # Time window
    )
  )
```

Number of malignancies during this period?

```{r}
sum(ch$malignancy, na.rm = TRUE)
```

Distribution of the index as proposed by Quan et al 2011 during the 90 day period:

```{r}
barplot(table(ch$quan_updated))
```

## Classification schemes

Classification schemes (`classcodes` objects, see `vignette("classcodes")`) are based on regular expressions for computational speed (see `vignette("Interpret_regular_expressions")`), but their content can be summarized and visualized for clarity. Arbitrary `classcodes` objects can also be specified by the user.

The package includes default `classcodes` for medical patient data based on the international classification of diseases version 8, 9 and 10 (ICD-8/9/10), as well as the Anatomical Therapeutic Chemical Classification System (ATC) for medical prescription data.

Default `classcades` are listed in the table. Each classification (classcodes column) can be based on several code systems (regex column) and have several alternative weighted indices (indices column). Those might be combined freely.

```{r}
coder::all_classcodes()
```

# Relation to other packages

`coder` uses `data.table` as a backend to increase computational speed for large datasets. There are some R packages with a narrow focus on Charlson and Elixhauser co-morbidity based on ICD-codes ([icd](https://CRAN.R-project.org/package=icd), [comorbidity](https://CRAN.R-project.org/package=comorbidity), [medicalrisk](https://CRAN.R-project.org/package=medicalrisk), [comorbidities.icd10](https://github.com/gforge/comorbidities.icd10), [icdcoder](https://github.com/wtcooper/icdcoder)). The `coder` package includes similar functionalities but has a wider scope.

# Code of conduct

Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/). By contributing to this project, you agree to abide by its terms.

Owner

Name: rOpenSci
Login: ropensci
Kind: organization
Email: info@ropensci.org
Location: Berkeley, CA

Website: https://ropensci.org/
Twitter: rOpenSci
Repositories: 307
Profile: https://github.com/ropensci

JOSS Publication

coder: An R package for code-based item classification and categorization

Published

December 18, 2020

DOI

10.21105/joss.02916

Volume 5, Issue 56, Page 2916

Authors

Erik Bülow

The Swedish Arthroplasty Register, Registercentrum Västra Götaland, Gothenburg, Sweden, Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden

Editor

Kristen Thyng

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "identifier": "coder",
  "description": " Fast categorization of items based on external code data identified by regular expressions. A typical use case considers patient with medically coded data, such as codes from the International Classification of Diseases ('ICD') or the Anatomic Therapeutic Chemical ('ATC') classification system. Functions of the package relies on a triad of objects: (1) case data with unit id:s and possible dates of interest; (2) external code data for corresponding units in (1) and with optional dates of interest and; (3) a classification scheme ('classcodes' object) with regular expressions to identify and categorize relevant codes from (2). It is easy to introduce new classification schemes ('classcodes' objects) or to use default schemes included in the package. Use cases includes patient categorization based on 'comorbidity indices' such as 'Charlson', 'Elixhauser', 'RxRisk V', or the 'comorbidity-polypharmacy' score (CPS), as well as adverse events after hip and knee replacement surgery.",
  "name": "coder: Deterministic Categorization of Items Based on External Code Data",
  "codeRepository": "https://github.com/ropensci/coder",
  "issueTracker": "https://github.com/ropensci/coder/issues",
  "license": "https://spdx.org/licenses/GPL-2.0",
  "version": "1.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 4.5.0 (2025-04-11)",
  "provider": {
    "@id": "https://cran.r-project.org",
    "@type": "Organization",
    "name": "Comprehensive R Archive Network (CRAN)",
    "url": "https://cran.r-project.org"
  },
  "author": [
    {
      "@type": "Person",
      "givenName": "Erik",
      "familyName": "Bulow",
      "email": "eriklgb@gmail.com",
      "@id": "https://orcid.org/0000-0002-9973-456X"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Erik",
      "familyName": "Bulow",
      "email": "eriklgb@gmail.com",
      "@id": "https://orcid.org/0000-0002-9973-456X"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "covr",
      "name": "covr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=covr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=knitr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rmarkdown"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "writexl",
      "name": "writexl",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=writexl"
    }
  ],
  "softwareRequirements": {
    "1": {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 3.5"
    },
    "2": {
      "@type": "SoftwareApplication",
      "identifier": "data.table",
      "name": "data.table",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=data.table"
    },
    "3": {
      "@type": "SoftwareApplication",
      "identifier": "decoder",
      "name": "decoder",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=decoder"
    },
    "4": {
      "@type": "SoftwareApplication",
      "identifier": "generics",
      "name": "generics",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=generics"
    },
    "5": {
      "@type": "SoftwareApplication",
      "identifier": "methods",
      "name": "methods"
    },
    "6": {
      "@type": "SoftwareApplication",
      "identifier": "tibble",
      "name": "tibble",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=tibble"
    },
    "SystemRequirements": null
  },
  "fileSize": "348.64KB",
  "relatedLink": [
    "https://docs.ropensci.org/coder/",
    "https://CRAN.R-project.org/package=coder"
  ],
  "releaseNotes": "https://github.com/ropensci/coder/blob/master/NEWS.md",
  "readme": "https://github.com/ropensci/coder/blob/master/README.md",
  "contIntegration": [
    "https://github.com/ropensci/coder/actions",
    "https://app.codecov.io/gh/ropensci/coder"
  ],
  "developmentStatus": "https://www.repostatus.org/#active",
  "keywords": [
    "r",
    "package",
    "r-package",
    "classification",
    "icd-10",
    "rstats"
  ]
}

GitHub Events

Total

Issues event: 1
Issue comment event: 2
Push event: 1

Last Year

Issues event: 1
Issue comment event: 2
Push event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 300
Total Committers: 17
Avg Commits per committer: 17.647
Development Distribution Score (DDS): 0.11

Past Year

Commits: 2
Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Erik Bülow	e**b@g**m	267
Erik Bulow	e**u@r**l	18
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
runner	r**r@M**l	1
David Robinson	a**d@g**m	1
GitHub Actions	a**s@g**m	1

Committer Domains (Top 20 + Academic)

github.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 106
Total pull requests: 2
Average time to close issues: 2 months
Average time to close pull requests: 1 day
Total issue authors: 8
Total pull request authors: 2
Average comments per issue: 1.16
Average comments per pull request: 0.5
Merged pull requests: 1
Bot issues: 1
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 0.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 1
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

eribul (99)
mbg-unsw (1)
maelle (1)
github-actions[bot] (1)
dtgnn (1)
rbratslaver (1)
Antros89 (1)
samlipworth (1)

Pull Request Authors

eribul (1)
dgrtwo (1)

Top Labels

Issue Labels

Zabore (23) drgtwo (14) documentation (14) bug (11) API (8) output (4) examples (3) Shiny (1)

coder

Science Score: 93.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

JOSS Publication

coder: An R package for code-based item classification and categorization

Authors

Editor

Tags

CodeMeta (codemeta.json)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies