dataset

Create interoperable and well described data frames in R

https://github.com/ropensci/dataset

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.4%) to scientific vocabulary

Keywords

dataset metadata-management r rstats
Last synced: 10 months ago · JSON representation

Repository

Create interoperable and well described data frames in R

Basic Info
Statistics
  • Stars: 18
  • Watchers: 1
  • Forks: 8
  • Open Issues: 14
  • Releases: 10
Topics
dataset metadata-management r rstats
Created almost 4 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Codemeta

README.Rmd

---
output: github_document
---



```{r setupdefinitions, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
rlang::check_installed("here")
```

# The dataset R Package 




[![rhub](https://github.com/ropensci/dataset/actions/workflows/rhub.yaml/badge.svg)](https://github.com/ropensci/dataset/actions/workflows/rhub.yaml)
[![devel-version](https://img.shields.io/badge/devel%20version-0.4.0-blue.svg)](https://github.com/ropensci/dataset)
[![Codecov test coverage](https://codecov.io/gh/ropensci/dataset/branch/main/graph/badge.svg)](https://app.codecov.io/gh/ropensci/dataset?branch=main)
[![Project Status: Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/dataset)](https://cran.r-project.org/package=dataset)
[![CRAN_time_from_release](https://www.r-pkg.org/badges/ago/dataset)](https://cran.r-project.org/package=dataset)
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/681_status.svg)](https://github.com/ropensci/software-review/issues/681)
[![DOI](https://img.shields.io/badge/DOI-10.32614%2FCRAN.package.dataset-blue)](https://doi.org/10.32614/CRAN.package.dataset)
[![dataobservatory](https://img.shields.io/badge/ecosystem-dataobservatory.eu-3EA135.svg)](https://dataobservatory.eu/)



# Overview

The `dataset` package helps you create **semantically rich**, **machine-readable**, and **interoperable datasets** in R. It introduces S3 classes that extend data frames, vectors, and bibliographic entries with formal metadata structures inspired by:

- **SDMX** (Statistical Data and Metadata eXchange), widely used in official statistics  
- **Dublin Core** and **DataCite**, for FAIR-compliant depositing and reuse in scientific and open data repositories  
- **Open Science publishing practices**, to support transparent and reproducible research  

The goal is to preserve metadata when reusing statistical and repository datasets, improve interoperability, and make it easy to turn tidy data frames into web-ready, publishable datasets that comply with ISO and W3C standards.


## Installation

You can install the latest released version of **`dataset`** from [CRAN](https://cran.r-project.org/package=dataset) with:

```{r, eval=FALSE}
install.packages("dataset")
```

To install the development version from GitHub with `pak` or `remotes`:

```{r, eval=FALSE}
# install.packages("pak")
pak::pak("dataobservatory-eu/dataset")

# install.packages("remotes")
remotes::install_github("dataobservatory-eu/dataset")
```


## Minimal Example

```{r example}
library(dataset)
df <- dataset_df(
  country = defined(
    c("AD", "LI"),
    label = "Country",
    namespace = "https://www.geonames.org/countries/$1/"
  ),
  gdp = defined(c(3897, 7365),
    label = "GDP",
    unit = "million euros"
  ),
  dataset_bibentry = dublincore(
    title = "GDP Dataset",
    creator = person("Jane", "Doe", role = "aut"),
    publisher = "Small Repository"
  )
)
print(df)
```

Export as RDF triples:

```{css, echo=FALSE}
.smaller .table {
  font-size: 11px;
}

.smaller pre,
.smaller code {
  font-size: 11px;
  line-height: 1.2;
}
```

```{r ntriples, eval = FALSE}
dataset_to_triples(df, format = "nt")
```

::: smaller
```{r ntriplessmall, echo=FALSE}
dataset_to_triples(df, format = "nt")
```
:::

Retain automatically recorded provenance: 

```{r provenance, eval=FALSE}
provenance(df)
```

::: smaller
```{r provenancesmall, echo=FALSE}
provenance(df)
```
:::

## Contributing

We welcome contributions and discussion!

-   Please see our [CONTRIBUTING.md](https://github.com/dataobservatory-eu/dataset/blob/main/CONTRIBUTING.md) guide.
-   Ideas, bug reports, and feedback are welcome via [GitHub issues](https://github.com/dataobservatory-eu/dataset/issues).
-  The design principles and ideas for futher development are explained in [Design Principles & Future Work Semantically Enriched, Standards-Aligned Datasets in R](https://dataset.dataobservatory.eu/articles/design.html).

## Code of Conduct

This project follows the [rOpenSci Code of Conduct](https://ropensci.org/code-of-conduct/). By participating, you are expected to uphold these guidelines.

Owner

  • Name: rOpenSci
  • Login: ropensci
  • Kind: organization
  • Email: info@ropensci.org
  • Location: Berkeley, CA

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "identifier": "dataset",
  "description": "The 'dataset' package helps create semantically rich, machine-readable, and interoperable datasets in R. It extends tidy data frames with metadata that preserves meaning, improves interoperability, and makes datasets easier to publish, exchange, and reuse in line with ISO and W3C standards.",
  "name": "dataset: Create Data Frames for Exchange and Reuse",
  "relatedLink": [
    "https://docs.ropensci.org/dataset",
    "https://dataset.dataobservatory.eu"
  ],
  "codeRepository": "https://github.com/ropensci/dataset",
  "issueTracker": "https://github.com/ropensci/dataset/issues",
  "license": "https://spdx.org/licenses/GPL-3.0",
  "version": "0.4.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 4.5.0 (2025-04-11 ucrt)",
  "provider": {
    "@id": "https://cran.r-project.org",
    "@type": "Organization",
    "name": "Comprehensive R Archive Network (CRAN)",
    "url": "https://cran.r-project.org"
  },
  "author": [
    {
      "@type": "Person",
      "givenName": "Daniel",
      "familyName": "Antal",
      "email": "daniel.antal@dataobservatory.eu",
      "@id": "https://orcid.org/0000-0001-7513-6760"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Daniel",
      "familyName": "Antal",
      "email": "daniel.antal@dataobservatory.eu",
      "@id": "https://orcid.org/0000-0001-7513-6760"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "dplyr",
      "name": "dplyr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=dplyr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "jsonld",
      "name": "jsonld",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=jsonld"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=knitr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rdflib",
      "name": "rdflib",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rdflib"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rmarkdown"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "spelling",
      "name": "spelling",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=spelling"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "tidyr",
      "name": "tidyr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=tidyr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "version": ">= 3.0.0",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    }
  ],
  "softwareRequirements": {
    "1": {
      "@type": "SoftwareApplication",
      "identifier": "assertthat",
      "name": "assertthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=assertthat"
    },
    "2": {
      "@type": "SoftwareApplication",
      "identifier": "haven",
      "name": "haven",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=haven"
    },
    "3": {
      "@type": "SoftwareApplication",
      "identifier": "ISOcodes",
      "name": "ISOcodes",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=ISOcodes"
    },
    "4": {
      "@type": "SoftwareApplication",
      "identifier": "labelled",
      "name": "labelled",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=labelled"
    },
    "5": {
      "@type": "SoftwareApplication",
      "identifier": "pillar",
      "name": "pillar",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=pillar"
    },
    "6": {
      "@type": "SoftwareApplication",
      "identifier": "tibble",
      "name": "tibble",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=tibble"
    },
    "7": {
      "@type": "SoftwareApplication",
      "identifier": "utils",
      "name": "utils"
    },
    "8": {
      "@type": "SoftwareApplication",
      "identifier": "vctrs",
      "name": "vctrs",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=vctrs"
    },
    "9": {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 3.5"
    },
    "SystemRequirements": null
  },
  "fileSize": "882.666KB",
  "citation": [
    {
      "@type": "SoftwareSourceCode",
      "datePublished": "2025",
      "author": [
        {
          "@type": "Person",
          "givenName": "Daniel",
          "familyName": "Antal",
          "email": "daniel.antal@dataobservatory.eu",
          "@id": "https://orcid.org/0000-0001-7513-6760"
        }
      ],
      "name": "dataset: Create Data Frames for Exchange and Reuse",
      "url": "https://docs.ropensci.org/dataset",
      "description": "R package version 0.4.0"
    },
    {
      "@type": "CreativeWork",
      "datePublished": "2025",
      "author": [
        {
          "@type": "Person",
          "givenName": "Daniel",
          "familyName": "Antal"
        }
      ],
      "name": "The dataset R Package: Create Data Frames that are Easier to Exchange and Reuse",
      "identifier": "10.32614/CRAN.package.dataset",
      "url": "https://dataset.dataobservatory.eu/index.html",
      "description": "R package version 0.4.0",
      "@id": "https://doi.org/10.32614/CRAN.package.dataset",
      "sameAs": "https://doi.org/10.32614/CRAN.package.dataset"
    }
  ]
}

GitHub Events

Total
  • Issue comment event: 1
  • Member event: 1
  • Push event: 1
Last Year
  • Issue comment event: 1
  • Member event: 1
  • Push event: 1

Dependencies

.github/workflows/pkgcheck.yaml actions
  • ropensci-review-tools/pkgcheck-action main composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • R >= 2.10 depends
  • ISOcodes * imports
  • assertthat * imports
  • stats * imports
  • utils * imports
  • covr * suggests
  • dataspice * suggests
  • declared * suggests
  • dplyr * suggests
  • eurostat * suggests
  • here * suggests
  • kableExtra * suggests
  • knitr * suggests
  • rdflib * suggests
  • readxl * suggests
  • rmarkdown * suggests
  • spelling * suggests
  • statcodelists * suggests
  • testthat >= 3.0.0 suggests
  • tidyr * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite