dataset
Create interoperable and well described data frames in R
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.4%) to scientific vocabulary
Keywords
dataset
metadata-management
r
rstats
Last synced: 10 months ago
·
JSON representation
Repository
Create interoperable and well described data frames in R
Basic Info
- Host: GitHub
- Owner: ropensci
- License: gpl-3.0
- Language: R
- Default Branch: main
- Homepage: http://dataset.dataobservatory.eu/
- Size: 1.84 MB
Statistics
- Stars: 18
- Watchers: 1
- Forks: 8
- Open Issues: 14
- Releases: 10
Topics
dataset
metadata-management
r
rstats
Created almost 4 years ago
· Last pushed 10 months ago
Metadata Files
Readme
Changelog
Contributing
License
Code of conduct
Codemeta
README.Rmd
---
output: github_document
---
```{r setupdefinitions, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
rlang::check_installed("here")
```
# The dataset R Package
[](https://github.com/ropensci/dataset/actions/workflows/rhub.yaml)
[](https://github.com/ropensci/dataset)
[](https://app.codecov.io/gh/ropensci/dataset?branch=main)
[](https://www.repostatus.org/#active)
[](https://cran.r-project.org/package=dataset)
[](https://cran.r-project.org/package=dataset)
[](https://github.com/ropensci/software-review/issues/681)
[](https://doi.org/10.32614/CRAN.package.dataset)
[](https://dataobservatory.eu/)
# Overview
The `dataset` package helps you create **semantically rich**, **machine-readable**, and **interoperable datasets** in R. It introduces S3 classes that extend data frames, vectors, and bibliographic entries with formal metadata structures inspired by:
- **SDMX** (Statistical Data and Metadata eXchange), widely used in official statistics
- **Dublin Core** and **DataCite**, for FAIR-compliant depositing and reuse in scientific and open data repositories
- **Open Science publishing practices**, to support transparent and reproducible research
The goal is to preserve metadata when reusing statistical and repository datasets, improve interoperability, and make it easy to turn tidy data frames into web-ready, publishable datasets that comply with ISO and W3C standards.
## Installation
You can install the latest released version of **`dataset`** from [CRAN](https://cran.r-project.org/package=dataset) with:
```{r, eval=FALSE}
install.packages("dataset")
```
To install the development version from GitHub with `pak` or `remotes`:
```{r, eval=FALSE}
# install.packages("pak")
pak::pak("dataobservatory-eu/dataset")
# install.packages("remotes")
remotes::install_github("dataobservatory-eu/dataset")
```
## Minimal Example
```{r example}
library(dataset)
df <- dataset_df(
country = defined(
c("AD", "LI"),
label = "Country",
namespace = "https://www.geonames.org/countries/$1/"
),
gdp = defined(c(3897, 7365),
label = "GDP",
unit = "million euros"
),
dataset_bibentry = dublincore(
title = "GDP Dataset",
creator = person("Jane", "Doe", role = "aut"),
publisher = "Small Repository"
)
)
print(df)
```
Export as RDF triples:
```{css, echo=FALSE}
.smaller .table {
font-size: 11px;
}
.smaller pre,
.smaller code {
font-size: 11px;
line-height: 1.2;
}
```
```{r ntriples, eval = FALSE}
dataset_to_triples(df, format = "nt")
```
::: smaller
```{r ntriplessmall, echo=FALSE}
dataset_to_triples(df, format = "nt")
```
:::
Retain automatically recorded provenance:
```{r provenance, eval=FALSE}
provenance(df)
```
::: smaller
```{r provenancesmall, echo=FALSE}
provenance(df)
```
:::
## Contributing
We welcome contributions and discussion!
- Please see our [CONTRIBUTING.md](https://github.com/dataobservatory-eu/dataset/blob/main/CONTRIBUTING.md) guide.
- Ideas, bug reports, and feedback are welcome via [GitHub issues](https://github.com/dataobservatory-eu/dataset/issues).
- The design principles and ideas for futher development are explained in [Design Principles & Future Work Semantically Enriched, Standards-Aligned Datasets in R](https://dataset.dataobservatory.eu/articles/design.html).
## Code of Conduct
This project follows the [rOpenSci Code of Conduct](https://ropensci.org/code-of-conduct/). By participating, you are expected to uphold these guidelines.
Owner
- Name: rOpenSci
- Login: ropensci
- Kind: organization
- Email: info@ropensci.org
- Location: Berkeley, CA
- Website: https://ropensci.org/
- Twitter: rOpenSci
- Repositories: 307
- Profile: https://github.com/ropensci
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"identifier": "dataset",
"description": "The 'dataset' package helps create semantically rich, machine-readable, and interoperable datasets in R. It extends tidy data frames with metadata that preserves meaning, improves interoperability, and makes datasets easier to publish, exchange, and reuse in line with ISO and W3C standards.",
"name": "dataset: Create Data Frames for Exchange and Reuse",
"relatedLink": [
"https://docs.ropensci.org/dataset",
"https://dataset.dataobservatory.eu"
],
"codeRepository": "https://github.com/ropensci/dataset",
"issueTracker": "https://github.com/ropensci/dataset/issues",
"license": "https://spdx.org/licenses/GPL-3.0",
"version": "0.4.0",
"programmingLanguage": {
"@type": "ComputerLanguage",
"name": "R",
"url": "https://r-project.org"
},
"runtimePlatform": "R version 4.5.0 (2025-04-11 ucrt)",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"author": [
{
"@type": "Person",
"givenName": "Daniel",
"familyName": "Antal",
"email": "daniel.antal@dataobservatory.eu",
"@id": "https://orcid.org/0000-0001-7513-6760"
}
],
"maintainer": [
{
"@type": "Person",
"givenName": "Daniel",
"familyName": "Antal",
"email": "daniel.antal@dataobservatory.eu",
"@id": "https://orcid.org/0000-0001-7513-6760"
}
],
"softwareSuggestions": [
{
"@type": "SoftwareApplication",
"identifier": "dplyr",
"name": "dplyr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=dplyr"
},
{
"@type": "SoftwareApplication",
"identifier": "jsonld",
"name": "jsonld",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=jsonld"
},
{
"@type": "SoftwareApplication",
"identifier": "knitr",
"name": "knitr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=knitr"
},
{
"@type": "SoftwareApplication",
"identifier": "rdflib",
"name": "rdflib",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rdflib"
},
{
"@type": "SoftwareApplication",
"identifier": "rmarkdown",
"name": "rmarkdown",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rmarkdown"
},
{
"@type": "SoftwareApplication",
"identifier": "spelling",
"name": "spelling",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=spelling"
},
{
"@type": "SoftwareApplication",
"identifier": "tidyr",
"name": "tidyr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=tidyr"
},
{
"@type": "SoftwareApplication",
"identifier": "testthat",
"name": "testthat",
"version": ">= 3.0.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=testthat"
}
],
"softwareRequirements": {
"1": {
"@type": "SoftwareApplication",
"identifier": "assertthat",
"name": "assertthat",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=assertthat"
},
"2": {
"@type": "SoftwareApplication",
"identifier": "haven",
"name": "haven",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=haven"
},
"3": {
"@type": "SoftwareApplication",
"identifier": "ISOcodes",
"name": "ISOcodes",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=ISOcodes"
},
"4": {
"@type": "SoftwareApplication",
"identifier": "labelled",
"name": "labelled",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=labelled"
},
"5": {
"@type": "SoftwareApplication",
"identifier": "pillar",
"name": "pillar",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=pillar"
},
"6": {
"@type": "SoftwareApplication",
"identifier": "tibble",
"name": "tibble",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=tibble"
},
"7": {
"@type": "SoftwareApplication",
"identifier": "utils",
"name": "utils"
},
"8": {
"@type": "SoftwareApplication",
"identifier": "vctrs",
"name": "vctrs",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=vctrs"
},
"9": {
"@type": "SoftwareApplication",
"identifier": "R",
"name": "R",
"version": ">= 3.5"
},
"SystemRequirements": null
},
"fileSize": "882.666KB",
"citation": [
{
"@type": "SoftwareSourceCode",
"datePublished": "2025",
"author": [
{
"@type": "Person",
"givenName": "Daniel",
"familyName": "Antal",
"email": "daniel.antal@dataobservatory.eu",
"@id": "https://orcid.org/0000-0001-7513-6760"
}
],
"name": "dataset: Create Data Frames for Exchange and Reuse",
"url": "https://docs.ropensci.org/dataset",
"description": "R package version 0.4.0"
},
{
"@type": "CreativeWork",
"datePublished": "2025",
"author": [
{
"@type": "Person",
"givenName": "Daniel",
"familyName": "Antal"
}
],
"name": "The dataset R Package: Create Data Frames that are Easier to Exchange and Reuse",
"identifier": "10.32614/CRAN.package.dataset",
"url": "https://dataset.dataobservatory.eu/index.html",
"description": "R package version 0.4.0",
"@id": "https://doi.org/10.32614/CRAN.package.dataset",
"sameAs": "https://doi.org/10.32614/CRAN.package.dataset"
}
]
}
GitHub Events
Total
- Issue comment event: 1
- Member event: 1
- Push event: 1
Last Year
- Issue comment event: 1
- Member event: 1
- Push event: 1
Dependencies
.github/workflows/pkgcheck.yaml
actions
- ropensci-review-tools/pkgcheck-action main composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 2.10 depends
- ISOcodes * imports
- assertthat * imports
- stats * imports
- utils * imports
- covr * suggests
- dataspice * suggests
- declared * suggests
- dplyr * suggests
- eurostat * suggests
- here * suggests
- kableExtra * suggests
- knitr * suggests
- rdflib * suggests
- readxl * suggests
- rmarkdown * suggests
- spelling * suggests
- statcodelists * suggests
- testthat >= 3.0.0 suggests
- tidyr * suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite