linelist

R package for handling linelist data

https://github.com/epiverse-trace/linelist

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.9%) to scientific vocabulary

Keywords

data data-structures epidemiology epiverse outbreaks r r-package sdg-3 structured-data
Last synced: 6 months ago · JSON representation ·

Repository

R package for handling linelist data

Basic Info
Statistics
  • Stars: 10
  • Watchers: 7
  • Forks: 4
  • Open Issues: 6
  • Releases: 8
Topics
data data-structures epidemiology epiverse outbreaks r r-package sdg-3 structured-data
Created almost 4 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog License Citation

README.Rmd

---
output: github_document
---



```{r readmesetup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# **linelist**: Tagging and Validating Epidemiological Data 


[![Digital Public Good](https://raw.githubusercontent.com/epiverse-trace/linelist/main/man/figures/dpg_badge.png)](https://www.digitalpublicgoods.net/r/linelist)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit)
[![cran-check](https://badges.cranchecks.info/summary/linelist.svg)](https://cran.r-project.org/web/checks/check_results_linelist.html)
[![R-CMD-check](https://github.com/epiverse-trace/linelist/workflows/R-CMD-check/badge.svg)](https://github.com/epiverse-trace/linelist/actions)
[![codecov](https://codecov.io/gh/epiverse-trace/linelist/branch/main/graph/badge.svg?token=JGTCEY0W02)](https://app.codecov.io/gh/epiverse-trace/linelist)
[![lifecycle-experimental](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-maturing.svg)](https://www.reconverse.org/lifecycle.html#maturing)
[![month-download](https://cranlogs.r-pkg.org/badges/linelist)](https://cran.r-project.org/package=linelist)
[![total-download](https://cranlogs.r-pkg.org/badges/grand-total/linelist)](https://cran.r-project.org/package=linelist)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6532786.svg)](https://doi.org/10.5281/zenodo.6532786)


*linelist* provides a safe entry point to the *Epiverse* software ecosystem,
adding a foundational layer through *tagging*, *validation*, and *safeguarding*
epidemiological data, to help make data pipelines more straightforward and
robust.

## Installation

### Stable version

Our stable versions are released on CRAN, and can be installed using:

```{r, eval=FALSE}
install.packages("linelist", build_vignettes = TRUE)
```

::: {.pkgdown-devel}

### Development version

The development version of linelist can be installed from
[GitHub](https://github.com/) with:

```{r, eval=FALSE}
if (!require(pak)) {
  install.packages("pak")
}
pak::pak("epiverse-trace/linelist")
```

:::

## Usage

```{r}
#| fig.alt: "Graphical summary of the linelist R package, with emphasis of these 4 key features: 1. Tag key epi variables, 2. Validate tagged data, 3. Safeguards vs accidental loss / alteration, 4. Robust data for stronger pipelines](man/figures/linelist_infographics.png"
#| out.width: "60%"
knitr::include_graphics("man/figures/linelist_infographics.png")
```

linelist works by tagging key epidemiological data in a `data.frame` or a
`tibble` to facilitate and strengthen data pipelines. The resulting object is a
`linelist` object, which extends `data.frame` (or `tibble`) by providing three
types of features:

1. a **tagging system** to identify key data, enabling access to these data using
   their tags rather than actual names, which may change over time and across
   datasets

2. **validation** of the tagged variables (making sure they are present and of the
   right type/class)

3. **safeguards** against accidental losses of tagged variables in common data
   handling operations

The short example below illustrates these different features. See the
[Documentation](#documentation) section for more in-depth examples and details
about `linelist` objects.

```{r}
# load packages and a dataset for the example
# -------------------------------------------
library(linelist)
library(dplyr)

dataset <- outbreaks::mers_korea_2015$linelist
head(dataset)

# check known tagged variables
# ----------------------------
tags_names()

# build a linelist
# ----------------
x <- dataset %>%
  tibble() %>%
  make_linelist(
    date_onset = "dt_onset", # date of onset
    date_reporting = "dt_report", # date of reporting
    occupation = "age" # mistake
  )
x
tags(x) # check available tags
```

`validate_linelist()` will error if one of your tagged column doesn't have the
correct type:

```{r, error = TRUE}
# validation of tagged variables
# ------------------------------
## (this flags a likely mistake: occupation should not be an integer)
validate_linelist(x)
```

```{r}
# change tags: fix mistakes, add new ones
# ---------------------------------------
x <- x %>%
  set_tags(
    occupation = NULL, # tag removal
    gender = "sex", # new tag
    outcome = "outcome"
  )

# safeguards against actions losing tags
# --------------------------------------
## attemping to remove geographical info but removing dates by mistake
x_no_geo <- x %>%
  select(-(5:8))
```

For stronger pipelines, you can even trigger errors upon loss:

```{r error = TRUE}
lost_tags_action("error")

x_no_geo <- x %>%
  select(-(5:8))

x_no_geo <- x %>%
  select(-(5:7))

## to revert to default behaviour (warning upon error)
lost_tags_action()
```

Alternatively, content can be accessed by tags:

```{r}
x_no_geo %>%
  select(has_tag(c("date_onset", "outcome")))

x_no_geo %>%
  tags_df()
```

linelist can also be connected to the incidence2 package for pipelines focused
on aggregated count data:

```{r, fig.width=8, fig.height=6, fig.alt="Epicurves (daily incidence) by sex and outcome via the incidence2 R package."}
library(incidence2)

x_no_geo %>%
  tags_df() %>%
  incidence("date_onset", groups = c("gender", "outcome")) %>%
  plot(
    fill = "outcome",
    angle = 45,
    nrow = 2,
    border_colour = "white",
    legend = "bottom"
  )
```

## Documentation

More detailed documentation can be found at:
https://epiverse-trace.github.io/linelist/

In particular:

* [A general introduction to linelist](https://epiverse-trace.github.io/linelist/articles/linelist.html)

* [The reference manual](https://epiverse-trace.github.io/linelist/reference/index.html)

## Getting help

To ask questions or give us some feedback, please use the github
[issues](https://github.com/epiverse-trace/linelist/issues) system.

## Data privacy

Case line lists may contain personally identifiable information (PII). While
linelist provides a way to store this data in R, it does not currently provide
tools for data anonymization. The user is responsible for respecting individual
privacy and ensuring PII is handled with the required level of confidentiality,
in compliance with applicable laws and regulations for storing and sharing PII.

Note that PII is rarely needed for common analytics tasks, so that in many
instances it may be advisable to remove PII from the data before sharing them
with analytics teams.

## Development

### Lifecycle

This package is currently *stable*, as defined by the [RECON software
lifecycle](https://www.reconverse.org/lifecycle.html). This means that the
interface is not meant to change in the future and this package can be used as a
dependency in other packages.

### Contributions

Contributions are welcome via [pull requests](https://github.com/epiverse-trace/linelist/pulls).

### Code of Conduct

Please note that the linelist project is released with a
[Code of Conduct](https://github.com/epiverse-trace/.github/blob/main/CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.

### Notes

This package is a reboot of the RECON package
[linelist](https://github.com/reconhub/linelist). Unlike its predecessor, the
new package focuses on the implementation of a `linelist` class. The data
cleaning features of the original package will eventually be re-implemented for
`linelist` objects, albeit likely in a separate package.

Owner

  • Name: Epiverse-TRACE
  • Login: epiverse-trace
  • Kind: organization

Citation (CITATION.cff)

# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "linelist" in publications use:'
type: software
license: MIT
title: 'linelist: Tagging and Validating Epidemiological Data'
version: 2.0.1.9000
doi: 10.5281/zenodo.6532786
identifiers:
- type: doi
  value: 10.32614/CRAN.package.linelist
abstract: Provides tools to help storing and handling case line list data. The 'linelist'
  class adds a tagging system to classical 'data.frame' objects to identify key epidemiological
  data such as dates of symptom onset, epidemiological case definition, age, gender
  or disease outcome. Once tagged, these variables can be seamlessly used in downstream
  analyses, making data pipelines more robust and reliable.
authors:
- family-names: Gruson
  given-names: Hugo
  orcid: https://orcid.org/0000-0002-4094-1476
- family-names: Jombart
  given-names: Thibaut
- family-names: Hartgerink
  given-names: Chris
  email: chris@data.org
  orcid: https://orcid.org/0000-0003-1050-6809
preferred-citation:
  type: manual
  title: 'linelist: Tagging and Validating Epidemiological Data'
  authors:
  - family-names: Gruson
    given-names: Hugo
    orcid: https://orcid.org/0000-0002-4094-1476
  - family-names: Jombart
    given-names: Thibaut
  year: '2025'
  doi: 10.5281/zenodo.6532786
  url: https://epiverse-trace.github.io/linelist/
repository: https://CRAN.R-project.org/package=linelist
repository-code: https://github.com/epiverse-trace/linelist
url: https://epiverse-trace.github.io/linelist/
contact:
- family-names: Hartgerink
  given-names: Chris
  email: chris@data.org
  orcid: https://orcid.org/0000-0003-1050-6809
keywords:
- data
- data-structures
- epidemiology
- epiverse
- outbreaks
- r
- r-package
- sdg-3
- structured-data
references:
- type: software
  title: 'R: A Language and Environment for Statistical Computing'
  notes: Depends
  url: https://www.R-project.org/
  authors:
  - name: R Core Team
  institution:
    name: R Foundation for Statistical Computing
    address: Vienna, Austria
  year: '2025'
  version: '>= 4.1.0'
- type: software
  title: checkmate
  abstract: 'checkmate: Fast and Versatile Argument Checks'
  notes: Imports
  url: https://mllg.github.io/checkmate/
  repository: https://CRAN.R-project.org/package=checkmate
  authors:
  - family-names: Lang
    given-names: Michel
    email: michellang@gmail.com
    orcid: https://orcid.org/0000-0001-9754-0393
  year: '2025'
  doi: 10.32614/CRAN.package.checkmate
- type: software
  title: rlang
  abstract: 'rlang: Functions for Base Types and Core R and ''Tidyverse'' Features'
  notes: Imports
  url: https://rlang.r-lib.org
  repository: https://CRAN.R-project.org/package=rlang
  authors:
  - family-names: Henry
    given-names: Lionel
    email: lionel@posit.co
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2025'
  doi: 10.32614/CRAN.package.rlang
- type: software
  title: tidyselect
  abstract: 'tidyselect: Select from a Set of Strings'
  notes: Imports
  url: https://tidyselect.r-lib.org
  repository: https://CRAN.R-project.org/package=tidyselect
  authors:
  - family-names: Henry
    given-names: Lionel
    email: lionel@posit.co
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2025'
  doi: 10.32614/CRAN.package.tidyselect
- type: software
  title: callr
  abstract: 'callr: Call R from R'
  notes: Suggests
  url: https://callr.r-lib.org
  repository: https://CRAN.R-project.org/package=callr
  authors:
  - family-names: Csárdi
    given-names: Gábor
    email: csardi.gabor@gmail.com
    orcid: https://orcid.org/0000-0001-7098-9676
  - family-names: Chang
    given-names: Winston
  year: '2025'
  doi: 10.32614/CRAN.package.callr
- type: software
  title: dplyr
  abstract: 'dplyr: A Grammar of Data Manipulation'
  notes: Suggests
  url: https://dplyr.tidyverse.org
  repository: https://CRAN.R-project.org/package=dplyr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
    orcid: https://orcid.org/0000-0003-4757-117X
  - family-names: François
    given-names: Romain
    orcid: https://orcid.org/0000-0002-2444-4226
  - family-names: Henry
    given-names: Lionel
  - family-names: Müller
    given-names: Kirill
    orcid: https://orcid.org/0000-0002-1416-3412
  - family-names: Vaughan
    given-names: Davis
    email: davis@posit.co
    orcid: https://orcid.org/0000-0003-4777-038X
  year: '2025'
  doi: 10.32614/CRAN.package.dplyr
- type: software
  title: knitr
  abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
  notes: Suggests
  url: https://yihui.org/knitr/
  repository: https://CRAN.R-project.org/package=knitr
  authors:
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  year: '2025'
  doi: 10.32614/CRAN.package.knitr
- type: software
  title: outbreaks
  abstract: 'outbreaks: A Collection of Disease Outbreak Data'
  notes: Suggests
  url: https://github.com/reconhub/outbreaks
  repository: https://CRAN.R-project.org/package=outbreaks
  authors:
  - family-names: Jombart
    given-names: Thibaut
    email: thibaut.jombart@gmail.com
  - family-names: Frost
    given-names: Simon
  - family-names: Nouvellet
    given-names: Pierre
  - family-names: Campbell
    given-names: Finlay
    email: finlaycampbell93@gmail.com
  - family-names: Sudre
    given-names: Bertrand
    email: bertrand.sudre@edc.europa.eu
  year: '2025'
  doi: 10.32614/CRAN.package.outbreaks
- type: software
  title: rmarkdown
  abstract: 'rmarkdown: Dynamic Documents for R'
  notes: Suggests
  url: https://pkgs.rstudio.com/rmarkdown/
  repository: https://CRAN.R-project.org/package=rmarkdown
  authors:
  - family-names: Allaire
    given-names: JJ
    email: jj@posit.co
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  - family-names: Dervieux
    given-names: Christophe
    email: cderv@posit.co
    orcid: https://orcid.org/0000-0003-4474-2498
  - family-names: McPherson
    given-names: Jonathan
    email: jonathan@posit.co
  - family-names: Luraschi
    given-names: Javier
  - family-names: Ushey
    given-names: Kevin
    email: kevin@posit.co
  - family-names: Atkins
    given-names: Aron
    email: aron@posit.co
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  - family-names: Cheng
    given-names: Joe
    email: joe@posit.co
  - family-names: Chang
    given-names: Winston
    email: winston@posit.co
  - family-names: Iannone
    given-names: Richard
    email: rich@posit.co
    orcid: https://orcid.org/0000-0003-3925-190X
  year: '2025'
  doi: 10.32614/CRAN.package.rmarkdown
- type: software
  title: spelling
  abstract: 'spelling: Tools for Spell Checking in R'
  notes: Suggests
  url: https://ropensci.r-universe.dev/spelling
  repository: https://CRAN.R-project.org/package=spelling
  authors:
  - family-names: Ooms
    given-names: Jeroen
    email: jeroenooms@gmail.com
    orcid: https://orcid.org/0000-0002-4035-0289
  - family-names: Hester
    given-names: Jim
    email: james.hester@rstudio.com
  year: '2025'
  doi: 10.32614/CRAN.package.spelling
- type: software
  title: testthat
  abstract: 'testthat: Unit Testing for R'
  notes: Suggests
  url: https://testthat.r-lib.org
  repository: https://CRAN.R-project.org/package=testthat
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2025'
  doi: 10.32614/CRAN.package.testthat
- type: software
  title: tibble
  abstract: 'tibble: Simple Data Frames'
  notes: Suggests
  url: https://tibble.tidyverse.org/
  repository: https://CRAN.R-project.org/package=tibble
  authors:
  - family-names: Müller
    given-names: Kirill
    email: kirill@cynkra.com
    orcid: https://orcid.org/0000-0002-1416-3412
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  year: '2025'
  doi: 10.32614/CRAN.package.tibble

GitHub Events

Total
  • Create event: 24
  • Release event: 2
  • Issues event: 10
  • Watch event: 3
  • Delete event: 21
  • Issue comment event: 14
  • Push event: 49
  • Pull request review event: 4
  • Pull request event: 36
Last Year
  • Create event: 24
  • Release event: 2
  • Issues event: 10
  • Watch event: 3
  • Delete event: 21
  • Issue comment event: 14
  • Push event: 49
  • Pull request review event: 4
  • Pull request event: 36

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 198
  • Total Committers: 5
  • Avg Commits per committer: 39.6
  • Development Distribution Score (DDS): 0.182
Top Committers
Name Email Commits
Thibaut Jombart t****t@g****m 162
Hugo Gruson B****o@u****m 25
GitHub Action a****n@g****m 7
Pietro Monticone 3****e@u****m 2
Anna Carnegie 9****e@u****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 69
  • Total pull requests: 128
  • Average time to close issues: 4 months
  • Average time to close pull requests: 13 days
  • Total issue authors: 9
  • Total pull request authors: 9
  • Average comments per issue: 1.04
  • Average comments per pull request: 0.17
  • Merged pull requests: 111
  • Bot issues: 0
  • Bot pull requests: 12
Past Year
  • Issues: 8
  • Pull requests: 42
  • Average time to close issues: 5 months
  • Average time to close pull requests: 16 days
  • Issue authors: 2
  • Pull request authors: 4
  • Average comments per issue: 0.75
  • Average comments per pull request: 0.07
  • Merged pull requests: 32
  • Bot issues: 0
  • Bot pull requests: 9
Top Authors
Issue Authors
  • thibautjombart (28)
  • Bisaloo (20)
  • avallecam (6)
  • joshwlambert (3)
  • TimTaylor (2)
  • CarmenTamayo (2)
  • chartgerink (1)
  • aspina7 (1)
  • sbfnk (1)
Pull Request Authors
  • Bisaloo (124)
  • github-actions[bot] (17)
  • chartgerink (6)
  • annacarnegie (3)
  • epiverse-trace-bot (3)
  • Karim-Mane (2)
  • pitmonticone (1)
  • thibautjombart (1)
  • TimTaylor (1)
Top Labels
Issue Labels
discussion (5) enhancement (4) documentation (4) bug (3) help wanted (3) good first issue (3) wontfix (2) question (1)
Pull Request Labels
wontfix (1)

Packages

  • Total packages: 1
  • Total downloads:
    • cran 898 last-month
  • Total docker downloads: 41,971
  • Total dependent packages: 0
  • Total dependent repositories: 3
  • Total versions: 8
  • Total maintainers: 1
cran.r-project.org: linelist

Tagging and Validating Epidemiological Data

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 898 Last month
  • Docker Downloads: 41,971
Rankings
Docker downloads count: 0.6%
Forks count: 12.2%
Dependent repos count: 16.4%
Average: 17.1%
Downloads: 19.4%
Stargazers count: 25.5%
Dependent packages count: 28.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • checkmate * imports
  • dplyr * imports
  • callr * suggests
  • covr * suggests
  • knitr * suggests
  • magrittr * suggests
  • outbreaks * suggests
  • remotes * suggests
  • rmarkdown * suggests
  • testthat * suggests
  • tibble * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action 4.1.4 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/render_readme.yml actions
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/update-citation-cff.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite