Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.9%) to scientific vocabulary
Keywords
data
data-structures
epidemiology
epiverse
outbreaks
r
r-package
sdg-3
structured-data
Last synced: 6 months ago
·
JSON representation
·
Repository
R package for handling linelist data
Basic Info
- Host: GitHub
- Owner: epiverse-trace
- License: other
- Language: R
- Default Branch: main
- Homepage: https://epiverse-trace.github.io/linelist/
- Size: 9.8 MB
Statistics
- Stars: 10
- Watchers: 7
- Forks: 4
- Open Issues: 6
- Releases: 8
Topics
data
data-structures
epidemiology
epiverse
outbreaks
r
r-package
sdg-3
structured-data
Created almost 4 years ago
· Last pushed 8 months ago
Metadata Files
Readme
Changelog
License
Citation
README.Rmd
---
output: github_document
---
```{r readmesetup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# **linelist**: Tagging and Validating Epidemiological Data
[](https://www.digitalpublicgoods.net/r/linelist)
[](https://opensource.org/license/mit)
[](https://cran.r-project.org/web/checks/check_results_linelist.html)
[](https://github.com/epiverse-trace/linelist/actions)
[](https://app.codecov.io/gh/epiverse-trace/linelist)
[](https://www.reconverse.org/lifecycle.html#maturing)
[](https://cran.r-project.org/package=linelist)
[](https://cran.r-project.org/package=linelist)
[](https://doi.org/10.5281/zenodo.6532786)
*linelist* provides a safe entry point to the *Epiverse* software ecosystem,
adding a foundational layer through *tagging*, *validation*, and *safeguarding*
epidemiological data, to help make data pipelines more straightforward and
robust.
## Installation
### Stable version
Our stable versions are released on CRAN, and can be installed using:
```{r, eval=FALSE}
install.packages("linelist", build_vignettes = TRUE)
```
::: {.pkgdown-devel}
### Development version
The development version of linelist can be installed from
[GitHub](https://github.com/) with:
```{r, eval=FALSE}
if (!require(pak)) {
install.packages("pak")
}
pak::pak("epiverse-trace/linelist")
```
:::
## Usage
```{r}
#| fig.alt: "Graphical summary of the linelist R package, with emphasis of these 4 key features: 1. Tag key epi variables, 2. Validate tagged data, 3. Safeguards vs accidental loss / alteration, 4. Robust data for stronger pipelines](man/figures/linelist_infographics.png"
#| out.width: "60%"
knitr::include_graphics("man/figures/linelist_infographics.png")
```
linelist works by tagging key epidemiological data in a `data.frame` or a
`tibble` to facilitate and strengthen data pipelines. The resulting object is a
`linelist` object, which extends `data.frame` (or `tibble`) by providing three
types of features:
1. a **tagging system** to identify key data, enabling access to these data using
their tags rather than actual names, which may change over time and across
datasets
2. **validation** of the tagged variables (making sure they are present and of the
right type/class)
3. **safeguards** against accidental losses of tagged variables in common data
handling operations
The short example below illustrates these different features. See the
[Documentation](#documentation) section for more in-depth examples and details
about `linelist` objects.
```{r}
# load packages and a dataset for the example
# -------------------------------------------
library(linelist)
library(dplyr)
dataset <- outbreaks::mers_korea_2015$linelist
head(dataset)
# check known tagged variables
# ----------------------------
tags_names()
# build a linelist
# ----------------
x <- dataset %>%
tibble() %>%
make_linelist(
date_onset = "dt_onset", # date of onset
date_reporting = "dt_report", # date of reporting
occupation = "age" # mistake
)
x
tags(x) # check available tags
```
`validate_linelist()` will error if one of your tagged column doesn't have the
correct type:
```{r, error = TRUE}
# validation of tagged variables
# ------------------------------
## (this flags a likely mistake: occupation should not be an integer)
validate_linelist(x)
```
```{r}
# change tags: fix mistakes, add new ones
# ---------------------------------------
x <- x %>%
set_tags(
occupation = NULL, # tag removal
gender = "sex", # new tag
outcome = "outcome"
)
# safeguards against actions losing tags
# --------------------------------------
## attemping to remove geographical info but removing dates by mistake
x_no_geo <- x %>%
select(-(5:8))
```
For stronger pipelines, you can even trigger errors upon loss:
```{r error = TRUE}
lost_tags_action("error")
x_no_geo <- x %>%
select(-(5:8))
x_no_geo <- x %>%
select(-(5:7))
## to revert to default behaviour (warning upon error)
lost_tags_action()
```
Alternatively, content can be accessed by tags:
```{r}
x_no_geo %>%
select(has_tag(c("date_onset", "outcome")))
x_no_geo %>%
tags_df()
```
linelist can also be connected to the incidence2 package for pipelines focused
on aggregated count data:
```{r, fig.width=8, fig.height=6, fig.alt="Epicurves (daily incidence) by sex and outcome via the incidence2 R package."}
library(incidence2)
x_no_geo %>%
tags_df() %>%
incidence("date_onset", groups = c("gender", "outcome")) %>%
plot(
fill = "outcome",
angle = 45,
nrow = 2,
border_colour = "white",
legend = "bottom"
)
```
## Documentation
More detailed documentation can be found at:
https://epiverse-trace.github.io/linelist/
In particular:
* [A general introduction to linelist](https://epiverse-trace.github.io/linelist/articles/linelist.html)
* [The reference manual](https://epiverse-trace.github.io/linelist/reference/index.html)
## Getting help
To ask questions or give us some feedback, please use the github
[issues](https://github.com/epiverse-trace/linelist/issues) system.
## Data privacy
Case line lists may contain personally identifiable information (PII). While
linelist provides a way to store this data in R, it does not currently provide
tools for data anonymization. The user is responsible for respecting individual
privacy and ensuring PII is handled with the required level of confidentiality,
in compliance with applicable laws and regulations for storing and sharing PII.
Note that PII is rarely needed for common analytics tasks, so that in many
instances it may be advisable to remove PII from the data before sharing them
with analytics teams.
## Development
### Lifecycle
This package is currently *stable*, as defined by the [RECON software
lifecycle](https://www.reconverse.org/lifecycle.html). This means that the
interface is not meant to change in the future and this package can be used as a
dependency in other packages.
### Contributions
Contributions are welcome via [pull requests](https://github.com/epiverse-trace/linelist/pulls).
### Code of Conduct
Please note that the linelist project is released with a
[Code of Conduct](https://github.com/epiverse-trace/.github/blob/main/CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.
### Notes
This package is a reboot of the RECON package
[linelist](https://github.com/reconhub/linelist). Unlike its predecessor, the
new package focuses on the implementation of a `linelist` class. The data
cleaning features of the original package will eventually be re-implemented for
`linelist` objects, albeit likely in a separate package.
Owner
- Name: Epiverse-TRACE
- Login: epiverse-trace
- Kind: organization
- Website: https://epiverse.org
- Repositories: 17
- Profile: https://github.com/epiverse-trace
Citation (CITATION.cff)
# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
cff-version: 1.2.0
message: 'To cite package "linelist" in publications use:'
type: software
license: MIT
title: 'linelist: Tagging and Validating Epidemiological Data'
version: 2.0.1.9000
doi: 10.5281/zenodo.6532786
identifiers:
- type: doi
value: 10.32614/CRAN.package.linelist
abstract: Provides tools to help storing and handling case line list data. The 'linelist'
class adds a tagging system to classical 'data.frame' objects to identify key epidemiological
data such as dates of symptom onset, epidemiological case definition, age, gender
or disease outcome. Once tagged, these variables can be seamlessly used in downstream
analyses, making data pipelines more robust and reliable.
authors:
- family-names: Gruson
given-names: Hugo
orcid: https://orcid.org/0000-0002-4094-1476
- family-names: Jombart
given-names: Thibaut
- family-names: Hartgerink
given-names: Chris
email: chris@data.org
orcid: https://orcid.org/0000-0003-1050-6809
preferred-citation:
type: manual
title: 'linelist: Tagging and Validating Epidemiological Data'
authors:
- family-names: Gruson
given-names: Hugo
orcid: https://orcid.org/0000-0002-4094-1476
- family-names: Jombart
given-names: Thibaut
year: '2025'
doi: 10.5281/zenodo.6532786
url: https://epiverse-trace.github.io/linelist/
repository: https://CRAN.R-project.org/package=linelist
repository-code: https://github.com/epiverse-trace/linelist
url: https://epiverse-trace.github.io/linelist/
contact:
- family-names: Hartgerink
given-names: Chris
email: chris@data.org
orcid: https://orcid.org/0000-0003-1050-6809
keywords:
- data
- data-structures
- epidemiology
- epiverse
- outbreaks
- r
- r-package
- sdg-3
- structured-data
references:
- type: software
title: 'R: A Language and Environment for Statistical Computing'
notes: Depends
url: https://www.R-project.org/
authors:
- name: R Core Team
institution:
name: R Foundation for Statistical Computing
address: Vienna, Austria
year: '2025'
version: '>= 4.1.0'
- type: software
title: checkmate
abstract: 'checkmate: Fast and Versatile Argument Checks'
notes: Imports
url: https://mllg.github.io/checkmate/
repository: https://CRAN.R-project.org/package=checkmate
authors:
- family-names: Lang
given-names: Michel
email: michellang@gmail.com
orcid: https://orcid.org/0000-0001-9754-0393
year: '2025'
doi: 10.32614/CRAN.package.checkmate
- type: software
title: rlang
abstract: 'rlang: Functions for Base Types and Core R and ''Tidyverse'' Features'
notes: Imports
url: https://rlang.r-lib.org
repository: https://CRAN.R-project.org/package=rlang
authors:
- family-names: Henry
given-names: Lionel
email: lionel@posit.co
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2025'
doi: 10.32614/CRAN.package.rlang
- type: software
title: tidyselect
abstract: 'tidyselect: Select from a Set of Strings'
notes: Imports
url: https://tidyselect.r-lib.org
repository: https://CRAN.R-project.org/package=tidyselect
authors:
- family-names: Henry
given-names: Lionel
email: lionel@posit.co
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2025'
doi: 10.32614/CRAN.package.tidyselect
- type: software
title: callr
abstract: 'callr: Call R from R'
notes: Suggests
url: https://callr.r-lib.org
repository: https://CRAN.R-project.org/package=callr
authors:
- family-names: Csárdi
given-names: Gábor
email: csardi.gabor@gmail.com
orcid: https://orcid.org/0000-0001-7098-9676
- family-names: Chang
given-names: Winston
year: '2025'
doi: 10.32614/CRAN.package.callr
- type: software
title: dplyr
abstract: 'dplyr: A Grammar of Data Manipulation'
notes: Suggests
url: https://dplyr.tidyverse.org
repository: https://CRAN.R-project.org/package=dplyr
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
orcid: https://orcid.org/0000-0003-4757-117X
- family-names: François
given-names: Romain
orcid: https://orcid.org/0000-0002-2444-4226
- family-names: Henry
given-names: Lionel
- family-names: Müller
given-names: Kirill
orcid: https://orcid.org/0000-0002-1416-3412
- family-names: Vaughan
given-names: Davis
email: davis@posit.co
orcid: https://orcid.org/0000-0003-4777-038X
year: '2025'
doi: 10.32614/CRAN.package.dplyr
- type: software
title: knitr
abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
notes: Suggests
url: https://yihui.org/knitr/
repository: https://CRAN.R-project.org/package=knitr
authors:
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
year: '2025'
doi: 10.32614/CRAN.package.knitr
- type: software
title: outbreaks
abstract: 'outbreaks: A Collection of Disease Outbreak Data'
notes: Suggests
url: https://github.com/reconhub/outbreaks
repository: https://CRAN.R-project.org/package=outbreaks
authors:
- family-names: Jombart
given-names: Thibaut
email: thibaut.jombart@gmail.com
- family-names: Frost
given-names: Simon
- family-names: Nouvellet
given-names: Pierre
- family-names: Campbell
given-names: Finlay
email: finlaycampbell93@gmail.com
- family-names: Sudre
given-names: Bertrand
email: bertrand.sudre@edc.europa.eu
year: '2025'
doi: 10.32614/CRAN.package.outbreaks
- type: software
title: rmarkdown
abstract: 'rmarkdown: Dynamic Documents for R'
notes: Suggests
url: https://pkgs.rstudio.com/rmarkdown/
repository: https://CRAN.R-project.org/package=rmarkdown
authors:
- family-names: Allaire
given-names: JJ
email: jj@posit.co
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
- family-names: Dervieux
given-names: Christophe
email: cderv@posit.co
orcid: https://orcid.org/0000-0003-4474-2498
- family-names: McPherson
given-names: Jonathan
email: jonathan@posit.co
- family-names: Luraschi
given-names: Javier
- family-names: Ushey
given-names: Kevin
email: kevin@posit.co
- family-names: Atkins
given-names: Aron
email: aron@posit.co
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
- family-names: Cheng
given-names: Joe
email: joe@posit.co
- family-names: Chang
given-names: Winston
email: winston@posit.co
- family-names: Iannone
given-names: Richard
email: rich@posit.co
orcid: https://orcid.org/0000-0003-3925-190X
year: '2025'
doi: 10.32614/CRAN.package.rmarkdown
- type: software
title: spelling
abstract: 'spelling: Tools for Spell Checking in R'
notes: Suggests
url: https://ropensci.r-universe.dev/spelling
repository: https://CRAN.R-project.org/package=spelling
authors:
- family-names: Ooms
given-names: Jeroen
email: jeroenooms@gmail.com
orcid: https://orcid.org/0000-0002-4035-0289
- family-names: Hester
given-names: Jim
email: james.hester@rstudio.com
year: '2025'
doi: 10.32614/CRAN.package.spelling
- type: software
title: testthat
abstract: 'testthat: Unit Testing for R'
notes: Suggests
url: https://testthat.r-lib.org
repository: https://CRAN.R-project.org/package=testthat
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2025'
doi: 10.32614/CRAN.package.testthat
- type: software
title: tibble
abstract: 'tibble: Simple Data Frames'
notes: Suggests
url: https://tibble.tidyverse.org/
repository: https://CRAN.R-project.org/package=tibble
authors:
- family-names: Müller
given-names: Kirill
email: kirill@cynkra.com
orcid: https://orcid.org/0000-0002-1416-3412
- family-names: Wickham
given-names: Hadley
email: hadley@rstudio.com
year: '2025'
doi: 10.32614/CRAN.package.tibble
GitHub Events
Total
- Create event: 24
- Release event: 2
- Issues event: 10
- Watch event: 3
- Delete event: 21
- Issue comment event: 14
- Push event: 49
- Pull request review event: 4
- Pull request event: 36
Last Year
- Create event: 24
- Release event: 2
- Issues event: 10
- Watch event: 3
- Delete event: 21
- Issue comment event: 14
- Push event: 49
- Pull request review event: 4
- Pull request event: 36
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 198
- Total Committers: 5
- Avg Commits per committer: 39.6
- Development Distribution Score (DDS): 0.182
Top Committers
| Name | Commits | |
|---|---|---|
| Thibaut Jombart | t****t@g****m | 162 |
| Hugo Gruson | B****o@u****m | 25 |
| GitHub Action | a****n@g****m | 7 |
| Pietro Monticone | 3****e@u****m | 2 |
| Anna Carnegie | 9****e@u****m | 2 |
Committer Domains (Top 20 + Academic)
github.com: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 69
- Total pull requests: 128
- Average time to close issues: 4 months
- Average time to close pull requests: 13 days
- Total issue authors: 9
- Total pull request authors: 9
- Average comments per issue: 1.04
- Average comments per pull request: 0.17
- Merged pull requests: 111
- Bot issues: 0
- Bot pull requests: 12
Past Year
- Issues: 8
- Pull requests: 42
- Average time to close issues: 5 months
- Average time to close pull requests: 16 days
- Issue authors: 2
- Pull request authors: 4
- Average comments per issue: 0.75
- Average comments per pull request: 0.07
- Merged pull requests: 32
- Bot issues: 0
- Bot pull requests: 9
Top Authors
Issue Authors
- thibautjombart (28)
- Bisaloo (20)
- avallecam (6)
- joshwlambert (3)
- TimTaylor (2)
- CarmenTamayo (2)
- chartgerink (1)
- aspina7 (1)
- sbfnk (1)
Pull Request Authors
- Bisaloo (124)
- github-actions[bot] (17)
- chartgerink (6)
- annacarnegie (3)
- epiverse-trace-bot (3)
- Karim-Mane (2)
- pitmonticone (1)
- thibautjombart (1)
- TimTaylor (1)
Top Labels
Issue Labels
discussion (5)
enhancement (4)
documentation (4)
bug (3)
help wanted (3)
good first issue (3)
wontfix (2)
question (1)
Pull Request Labels
wontfix (1)
Packages
- Total packages: 1
-
Total downloads:
- cran 898 last-month
- Total docker downloads: 41,971
- Total dependent packages: 0
- Total dependent repositories: 3
- Total versions: 8
- Total maintainers: 1
cran.r-project.org: linelist
Tagging and Validating Epidemiological Data
- Homepage: https://epiverse-trace.github.io/linelist/
- Documentation: http://cran.r-project.org/web/packages/linelist/linelist.pdf
- License: MIT + file LICENSE
-
Latest release: 2.0.1
published 8 months ago
Rankings
Docker downloads count: 0.6%
Forks count: 12.2%
Dependent repos count: 16.4%
Average: 17.1%
Downloads: 19.4%
Stargazers count: 25.5%
Dependent packages count: 28.6%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- checkmate * imports
- dplyr * imports
- callr * suggests
- covr * suggests
- knitr * suggests
- magrittr * suggests
- outbreaks * suggests
- remotes * suggests
- rmarkdown * suggests
- testthat * suggests
- tibble * suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action 4.1.4 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/render_readme.yml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/update-citation-cff.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite