py3data

A flexible and lightweight Python interface to the re3data.org database

https://github.com/j535d165/py3data

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary

Keywords

metadata metadata-management re3data research-data-management
Last synced: 6 months ago · JSON representation ·

Repository

A flexible and lightweight Python interface to the re3data.org database

Basic Info
  • Host: GitHub
  • Owner: J535D165
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 30.3 KB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Topics
metadata metadata-management re3data research-data-management
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

py3data

PyPI DOI

py3data is a Python library for re3data registry. Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines. It includes repositories that enable permanent storage of and access to data sets to researchers, funding bodies, publishers, and scholarly institutions. Re3data offers an open and free REST API. py3data is a lightweight and thin Python interface to the beta version of this API.

The following features of re3data are currently supported by py3data:

  • [x] Get single repositories
  • [x] Filter and query repositories

Key features

  • Pipe operations - py3data can handle multiple operations in a sequence. This allows the developer to write understandable queries. For examples, see code snippets.
  • JSON support - Re3data doesn't offer a JSON implementation of the REST API. py3data parses the XML REST API and offers it in Python dict-like objects.
  • Schema fixes - The re3data Schema is slightly hard to parse in Python directly. Re3data makes is very easy to parse the API and solves the issues.
  • Permissive license - Re3data data is CC0 licensed :raised_hands:. py3data is published under the MIT license.

Installation

py3data requires Python 3.8 or later.

sh pip install py3data

Getting started

python from py3data import Repositories

Get single repository

Get a single Repository

python Repositories()["r3d100011986"]

The result is a Repository object, which is very similar to a dictionary. Find the available fields with .keys().

For example, get the open access status:

python Repositories()["r3d100011986"]["subjects"]

python [{'subjectScheme': 'DFG', 'subjectName': '2 Life Sciences'}, {'subjectScheme': 'DFG', 'subjectName': '202 Plant Sciences'}, {'subjectScheme': 'DFG', 'subjectName': '20202 Plant Ecology and Ecosystem Analysis'}, {'subjectScheme': 'DFG', 'subjectName': '20203 Inter-organismic Interactions of Plants'}, {'subjectScheme': 'DFG', 'subjectName': '203 Zoology'}, {'subjectScheme': 'DFG', 'subjectName': '20303 Animal Ecology, Biodiversity and Ecosystem Research'}, {'subjectScheme': 'DFG', 'subjectName': '21 Biology'}, {'subjectScheme': 'DFG', 'subjectName': '3 Natural Sciences'}, {'subjectScheme': 'DFG', 'subjectName': '313 Atmospheric Science and Oceanography'}, {'subjectScheme': 'DFG', 'subjectName': '318 Water Research'}, {'subjectScheme': 'DFG', 'subjectName': '31801 Hydrogeology, Hydrology, Limnology, Urban Water Management, Water Chemistry, Integrated Water Resources Management'}, {'subjectScheme': 'DFG', 'subjectName': '34 Geosciences (including Geography)'}]

Get lists of repositories

It is possible to get lists of results from re3data. However keep in mind that lists consist of Repository objects with very few metadata (id, name, doi, link).

Get all repositories:

python Repositories().get()

For lists of repositories, you can also count the number of records found instead of returning the results. This also works for search queries and filters.

```python Repositories().count()

3137

```

Filter and query records

Re3data makes use of filters and queries. Filters can be used to slice the structured metadata of re3data and queries can be used to search for specific terms or phrases. Both filters and queries can be used in one request.

An overview of all the filters can be found under "Beta" in the REST API documentation. It can be hard to find the correct values sometimes. In that case, look for values in other single Repository requests, the Metadata Schema, or the website.

python ( Repositories() .filter(countries="CAN") .filter(subjects=["2 Life Sciences", "3 Natural Sciences"]) .filter(pidSystems="DOI") .query("University") .get() )

which is identical to:

python ( Repositories() .filter( countries="CAN", subjects=["2 Life Sciences", "3 Natural Sciences"], pidSystems="DOI", ) .query("University") .get() )

Code snippets

A list of examples for the re3data.org dataset.

Get repositories running Dataverse software

python ( Repositories() .filter(software="Dataverse") .get() )

Get repositories with word "climate" and DOI identifiers

python ( Repositories() .filter(pidSystems="DOI") .query("climate") .get() )

Data dump

The following code dumps all data of re3data.org into a list of dicts. The following code can take a while to run because of the structure of the re3data API.

```python from py3data import Repositories

all_data = [Repositories()[x["id"]] for x in Repositories().get()]

```

License

MIT

Contact

This library is a community contribution. The authors of this Python library aren't affiliated with re3data.

Feel free to reach out with questions, remarks, and suggestions. The issue tracker is a good starting point. You can also email me at jonathandebruinos@gmail.com.

Owner

  • Name: Jonathan de Bruin
  • Login: J535D165
  • Kind: user
  • Location: Netherlands
  • Company: Utrecht University

Research engineer working on software, datasets, and tools to advance open science 👐 @UtrechtUniversity @asreview

Citation (CITATION.cff)

cff-version: 1.2.0
title: >-
  py3data - A flexible and lightweight Python interface to
  the re3data.org database
message: 'We appreciate, but do not require, attribution.'
type: software
authors:
  - family-names: De Bruin
    given-names: Jonathan
    orcid: 'https://orcid.org/0000-0002-4297-0502'
repository-code: 'https://github.com/J535D165/py3data'
url: 'https://github.com/J535D165/py3data'
repository-artifact: 'https://pypi.org/project/py3data/'
keywords:
  - data-repository
  - data-management
  - research
  - dataset
license: MIT

GitHub Events

Total
Last Year

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 16
  • Total Committers: 1
  • Avg Commits per committer: 16.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 16
  • Committers: 1
  • Avg Commits per committer: 16.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Jonathan de Bruin j****s@g****m 16

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/python-lint.yml actions
  • actions/checkout v3 composite
  • chartboost/ruff-action v1 composite
.github/workflows/python-package.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish release/v1 composite
pyproject.toml pypi
  • requests *