experDesign

experDesign: stratifying samples into batches with minimal bias - Published in JOSS (2021)

https://github.com/llrs/experdesign

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

batch cran experiment-design r r-package rstats

Scientific Fields

Sociology Social Sciences - 40% confidence
Last synced: 4 months ago · JSON representation ·

Repository

Design experiments distributed in several batches

Basic Info
Statistics
  • Stars: 10
  • Watchers: 2
  • Forks: 1
  • Open Issues: 10
  • Releases: 8
Topics
batch cran experiment-design r r-package rstats
Created over 7 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License Code of conduct Citation Codemeta

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```
# experDesign


[![CRAN status](https://www.r-pkg.org/badges/version/experDesign)](https://CRAN.R-project.org/package=experDesign)
[![CRAN checks](https://badges.cranchecks.info/worst/experDesign.svg)](https://cran.r-project.org/web/checks/check_results_experDesign.html)
[![R-CMD-check](https://github.com/llrs/experDesign/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/llrs/experDesign/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/llrs/experDesign/graph/badge.svg)](https://app.codecov.io/gh/llrs/experDesign)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![JOSS](https://joss.theoj.org/papers/10.21105/joss.03358/status.svg)](https://doi.org/10.21105/joss.03358) 
[![DOI](https://zenodo.org/badge/142569201.svg)](https://zenodo.org/badge/latestdoi/142569201)


The goal of experDesign is to help you distribute your samples before an experiment but after they are collected. 
For example, checking for common problems in the data, and reducing or even preventing batch bias before performing an experiment, or measuring it once the experiment is performed.
It provides four main functions:

* `check_data()`: Check if there are any problems with the data.
* `design()`: Randomize the samples according to their variables.
* `replicates()`: Selects some samples for replicates and randomizes the samples (highly recommended).
* `spatial()`: Randomize the samples on a spatial grid.

There are other helpers. 

## Installation

To install the latest version on [CRAN](https://CRAN.R-project.org/package=experDesign) use:

```r
install.packages("experDesign")
```


::: {.pkgdown-devel}
You can install the development version of pkgdown from [GitHub](https://github.com/llrs/experDesign) with:

```r
# install.packages("devtools")
devtools::install_github("llrs/experDesign")
```
:::


## Example

We can use the survey dataset for the examples:

```{r show}
library("experDesign")
data(survey, package = "MASS") 
head(survey)
```

The dataset has numeric, categorical values and `NA` values.

### Checking initial data

We can check some issues from an experimental point of view via `check_data()`:

```{r check, warning=TRUE}
check_data(survey)
```

As you can see with the warnings we get a collections of problems.
In general, try to have at least 3 replicates for each condition and try to have all the data of each variable. 

### Picking samples for each batch

Imagine that we can only work in groups of 70, and we want to randomize by Sex, 
Smoke, Age, and by hand.  
There are `r choose(237, 70)` combinations of samples per batch in a 
this experiment. 
However, in some of those combinations all the right handed students are in the same batch making it impossible to compare the right handed students with the others and draw conclusions from it. 

We could check all the combinations to select those that allow us to do this comparison.
But as this would be too long with `experDesign` we can try to find the combination with the best design by comparing each combination with the original according to multiple statistics.

```{r design, fig.show='hold'}
# To reduce the variables used:
omit <- c("Wr.Hnd", "NW.Hnd", "Fold", "Pulse", "Clap", "Exer", "Height", "M.I")
(keep <- colnames(survey)[!colnames(survey) %in% omit])
head(survey[, keep])

# Set a seed for reproducibility
set.seed(87732135)
# Looking for groups at most of 70 samples.
index <- design(pheno = survey, size_subset = 70, omit = omit, iterations = 100)
index
```

We can transform then into a vector to append to the file or to pass to a colleague with:

```{r batch_names}
head(batch_names(index))
# Or via inspect() to keep it in a matrix format:
head(inspect(index, survey[, keep]))
```

# Previous work

The CRAN task View of [Experimental Design](https://CRAN.R-project.org/view=ExperimentalDesign) includes many packages relevant for designing an experiment before collecting data, but none of them provides how to manage them once the samples are already collected.

Two packages allow to distribute the samples on batches:

- The [OSAT](https://bioconductor.org/packages/release/bioc/html/OSAT.html) package handles categorical 
variables but not numeric data. It doesn't work with our data.

 - The [minDiff](https://github.com/m-Py/minDiff) package reported in [Stats.SE](https://stats.stackexchange.com/a/326015/105234), handles both 
numeric and categorical data. But it can only optimize for two nominal criteria.
It doesn't work for our data.

 - The [Omixer](https://bioconductor.org/packages/Omixer/) package handles both 
numeric and categorical data (converting categorical variables to numeric). But both the same way either Pearson's Chi-squared Test if there are few samples or Kendall's correlation. It does allow to protect some spots from being used.

If you are still designing the experiment and do not have collected any data [DeclareDesign](https://cran.r-project.org/package=DeclareDesign) might be relevant for you. But specially the [randomizr](https://cran.r-project.org/package=randomizr) packages  which makes common forms of random assignment and sampling.

Question in [Bioinformatics.SE](https://bioinformatics.stackexchange.com/q/4765/48) I made before developing the package.

# Other

Please note that this project is released with a [Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/0/0/code-of-conduct/).
By participating in this project you agree to abide by its terms.

Owner

  • Name: Lluís Revilla
  • Login: llrs
  • Kind: user
  • Location: Spain
  • Company: @irsi-tiv

Bioinformatician, data scientist/engineer making data accessible & useful, be it in pharma, research, open source or government data.

JOSS Publication

experDesign: stratifying samples into batches with minimal bias
Published
November 27, 2021
Volume 6, Issue 67, Page 3358
Authors
Lluís Revilla Sancho ORCID
Centro de Investigación Biomédica en Red, Enfermedades Hepáticas y Digestivas, Institut d’Investigacions Biomèdiques August Pi i Sunyer, IDIBAPS
Juan-José Lozano ORCID
Centro de Investigación Biomédica en Red, Enfermedades Hepáticas y Digestivas
Azucena Salas ORCID
Institut d’Investigacions Biomèdiques August Pi i Sunyer, IDIBAPS
Editor
Lorena Pantano ORCID
Tags
batch effect experiment design

Citation (CITATION.cff)

# -----------------------------------------------------------
# CITATION file created with {cffr} R package, v0.5.0
# See also: https://docs.ropensci.org/cffr/
# -----------------------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "experDesign" in publications use:'
type: software
license: MIT
title: 'experDesign: Design Experiments for Batches'
version: 0.2.0.9001
abstract: Distributes samples in batches while making batches homogeneous according
  to their description. Allows for an arbitrary number of variables, both numeric
  and categorical. For quality control it provides functions to subset a representative
  sample.
authors:
- family-names: Revilla Sancho
  given-names: Lluís
  email: lluis.revilla@gmail.com
  orcid: https://orcid.org/0000-0001-9747-2570
repository: https://CRAN.R-project.org/package=experDesign
repository-code: https://github.com/llrs/experDesign
url: https://experdesign.llrs.dev
contact:
- family-names: Revilla Sancho
  given-names: Lluís
  email: lluis.revilla@gmail.com
  orcid: https://orcid.org/0000-0001-9747-2570
keywords:
- batch
- cran
- experiment-design
- r
- r-package
- rstats
references:
- type: software
  title: 'R: A Language and Environment for Statistical Computing'
  notes: Depends
  url: https://www.R-project.org/
  authors:
  - name: R Core Team
  location:
    name: Vienna, Austria
  year: '2024'
  institution:
    name: R Foundation for Statistical Computing
  version: '>= 3.5.0'
- type: software
  title: methods
  abstract: 'R: A Language and Environment for Statistical Computing'
  notes: Imports
  authors:
  - name: R Core Team
  location:
    name: Vienna, Austria
  year: '2024'
  institution:
    name: R Foundation for Statistical Computing
- type: software
  title: stats
  abstract: 'R: A Language and Environment for Statistical Computing'
  notes: Imports
  authors:
  - name: R Core Team
  location:
    name: Vienna, Austria
  year: '2024'
  institution:
    name: R Foundation for Statistical Computing
- type: software
  title: utils
  abstract: 'R: A Language and Environment for Statistical Computing'
  notes: Imports
  authors:
  - name: R Core Team
  location:
    name: Vienna, Austria
  year: '2024'
  institution:
    name: R Foundation for Statistical Computing
- type: software
  title: covr
  abstract: 'covr: Test Coverage for Packages'
  notes: Suggests
  url: https://covr.r-lib.org
  repository: https://CRAN.R-project.org/package=covr
  authors:
  - family-names: Hester
    given-names: Jim
    email: james.f.hester@gmail.com
  year: '2024'
- type: software
  title: knitr
  abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
  notes: Suggests
  url: https://yihui.org/knitr/
  repository: https://CRAN.R-project.org/package=knitr
  authors:
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  year: '2024'
- type: software
  title: MASS
  abstract: 'MASS: Support Functions and Datasets for Venables and Ripley''s MASS'
  notes: Suggests
  url: http://www.stats.ox.ac.uk/pub/MASS4/
  repository: https://CRAN.R-project.org/package=MASS
  authors:
  - family-names: Ripley
    given-names: Brian
    email: ripley@stats.ox.ac.uk
  year: '2024'
- type: software
  title: rmarkdown
  abstract: 'rmarkdown: Dynamic Documents for R'
  notes: Suggests
  url: https://pkgs.rstudio.com/rmarkdown/
  repository: https://CRAN.R-project.org/package=rmarkdown
  authors:
  - family-names: Allaire
    given-names: JJ
    email: jj@posit.co
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  - family-names: Dervieux
    given-names: Christophe
    email: cderv@posit.co
    orcid: https://orcid.org/0000-0003-4474-2498
  - family-names: McPherson
    given-names: Jonathan
    email: jonathan@posit.co
  - family-names: Luraschi
    given-names: Javier
  - family-names: Ushey
    given-names: Kevin
    email: kevin@posit.co
  - family-names: Atkins
    given-names: Aron
    email: aron@posit.co
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  - family-names: Cheng
    given-names: Joe
    email: joe@posit.co
  - family-names: Chang
    given-names: Winston
    email: winston@posit.co
  - family-names: Iannone
    given-names: Richard
    email: rich@posit.co
    orcid: https://orcid.org/0000-0003-3925-190X
  year: '2024'
- type: software
  title: spelling
  abstract: 'spelling: Tools for Spell Checking in R'
  notes: Suggests
  url: https://docs.ropensci.org/spelling/
  repository: https://CRAN.R-project.org/package=spelling
  authors:
  - family-names: Ooms
    given-names: Jeroen
    email: jeroen@berkeley.edu
    orcid: https://orcid.org/0000-0002-4035-0289
  - family-names: Hester
    given-names: Jim
    email: james.hester@rstudio.com
  year: '2024'
- type: software
  title: testthat
  abstract: 'testthat: Unit Testing for R'
  notes: Suggests
  url: https://testthat.r-lib.org
  repository: https://CRAN.R-project.org/package=testthat
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2024'
  version: '>= 3.0.0'
identifiers:
- type: url
  value: https://github.com/llrs/experDesign/

CodeMeta (codemeta.json)

{
  "@context": [
    "https://doi.org/10.5063/schema/codemeta-2.0",
    "http://schema.org"
  ],
  "@type": "SoftwareSourceCode",
  "identifier": "experDesign",
  "description": "Distributes samples in batches while making batches homogeneous \n    according to their description. Allows for an arbitrary number of variables, \n    both numeric and categorical. For quality control it provides functions to \n    subset a representative sample.",
  "name": "experDesign: Design Experiments for Batches",
  "codeRepository": "https://github.com/llrs/experDesign/",
  "issueTracker": "https://github.com/llrs/experDesign/issues",
  "license": "https://spdx.org/licenses/MIT",
  "version": "0.1.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 3.6.3 (2020-02-29)",
  "provider": {
    "@id": "https://cran.r-project.org",
    "@type": "Organization",
    "name": "Comprehensive R Archive Network (CRAN)",
    "url": "https://cran.r-project.org"
  },
  "author": [
    {
      "@type": "Person",
      "givenName": "Lluís",
      "familyName": "Revilla Sancho",
      "email": "lluis.revilla@gmail.com",
      "@id": "https://orcid.org/0000-0001-9747-2570"
    }
  ],
  "contributor": {},
  "copyrightHolder": {},
  "funder": {},
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Lluís",
      "familyName": "Revilla Sancho",
      "email": "lluis.revilla@gmail.com",
      "@id": "https://orcid.org/0000-0001-9747-2570"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=knitr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rmarkdown"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "version": ">= 2.1.0",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "covr",
      "name": "covr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=covr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "MASS",
      "name": "MASS",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=MASS"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "spelling",
      "name": "spelling",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=spelling"
    }
  ],
  "softwareRequirements": [
    {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 3.5.0"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "methods",
      "name": "methods"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "stats",
      "name": "stats"
    }
  ],
  "releaseNotes": "https://github.com/llrs/experDesign/blob/master/NEWS.md",
  "readme": "https://github.com/llrs/experDesign/blob/master/README.md",
  "fileSize": "1316.643KB",
  "contIntegration": [
    "https://ci.appveyor.com/project/llrs/experDesign",
    "https://travis-ci.org/llrs/experDesign",
    "https://codecov.io/github/llrs/experDesign?branch=master"
  ],
  "developmentStatus": [
    "https://lifecycle.r-lib.org/articles/stages.html#stable",
    "https://www.repostatus.org/#active"
  ],
  "keywords": [
    "experiment-design",
    "batch",
    "cran"
  ],
  "relatedLink": [
    "https://CRAN.R-project.org/package=experDesign",
    "https://experdesign.llrs.dev"
  ]
}

GitHub Events

Total
  • Issues event: 4
  • Issue comment event: 3
  • Push event: 2
Last Year
  • Issues event: 4
  • Issue comment event: 3
  • Push event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 404
  • Total Committers: 1
  • Avg Commits per committer: 404.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 4
  • Committers: 1
  • Avg Commits per committer: 4.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
llrs l****a@g****m 404

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 57
  • Total pull requests: 1
  • Average time to close issues: 4 months
  • Average time to close pull requests: 7 minutes
  • Total issue authors: 3
  • Total pull request authors: 1
  • Average comments per issue: 1.25
  • Average comments per pull request: 1.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: 17 minutes
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • llrs (54)
  • AnnaPagotto (2)
  • LucFrancisENX (1)
Pull Request Authors
  • llrs (1)
Top Labels
Issue Labels
enhancement (11) help wanted (4) bug (3) good first issue (2) todo :spiral_notepad: (2) hacktoberfest (1) question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 281 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
cran.r-project.org: experDesign

Design Experiments for Batches

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 281 Last month
Rankings
Stargazers count: 18.7%
Forks count: 21.9%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Average: 35.9%
Downloads: 73.4%
Maintainers (1)
Last synced: 4 months ago