rdata

rdata: A Python library for R datasets - Published in JOSS (2024)

https://github.com/vnmabus/rdata

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

conversion python python3 r rda rdata rds

Scientific Fields

Earth and Environmental Sciences Physical Sciences - 83% confidence
Last synced: 4 months ago · JSON representation ·

Repository

Reader of R datasets in .rda format, in Python

Basic Info
Statistics
  • Stars: 53
  • Watchers: 3
  • Forks: 3
  • Open Issues: 5
  • Releases: 9
Topics
conversion python python3 r rda rdata rds
Created over 7 years ago · Last pushed 4 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.rst

rdata
=====

|build-status| |docs| |coverage| |repostatus| |versions| |pypi| |conda| |zenodo| |pyOpenSci| |joss|

A Python library for R datasets.

..
	Github does not support include in README for dubious security reasons, so
	we copy-paste instead. Also Github does not understand Sphinx directives.
	.. include:: docs/index.rst
	.. include:: docs/usage.rst

The package rdata offers a lightweight way in Python to import and export R datasets/objects stored
in the ".rda" and ".rds" formats.
Its main advantages are:

- It is a pure Python implementation, with no dependencies on the R language or
  related libraries.
  Thus, it can be used anywhere where Python is supported, including the web
  using `Pyodide `__.
- It attempts to support all objects that can be meaningfully translated between R and Python.
  As opposed to other solutions, you are no limited to import dataframes or
  data with a particular structure.
- It allows users to easily customize the conversion of R classes to Python
  ones and vice versa.
  Does your data use custom R classes?
  Worry no longer, as it is possible to define custom conversions to the Python
  classes of your choosing.
- It has a permissive license (MIT). As opposed to other packages that depend
  on R libraries and thus need to adhere to the GPL license, you can use rdata
  as a dependency on MIT, BSD or even closed source projects.

Installation
============

Installing a stable release
---------------------------

The rdata package is on PyPi and can be installed using :code:`pip`:

.. code::

   pip install rdata

The package is also available for :code:`conda` using the :code:`conda-forge` channel:

.. code::

   conda install -c conda-forge rdata

Installing a develop version
----------------------------

The current version from the develop branch can be installed as

.. code::

   pip install git+https://github.com/vnmabus/rdata.git@develop

Documentation
=============

The documentation of rdata is in
`ReadTheDocs `__.

Examples
========

Examples of use are available in
`ReadTheDocs `__.

Citing rdata
============

Please, if you find this software useful in your work, reference it citing the following paper:

.. code-block::

  @article{ramos-carreno+rossi_2024_rdata,
      author = {Ramos-Carreño, Carlos and Rossi, Tuomas},
      doi = {10.21105/joss.07540},
      journal = {Journal of Open Source Software},
      month = dec,
      number = {104},
      pages = {1--4},
      title = {{rdata: A Python library for R datasets}},
      url = {https://joss.theoj.org/papers/10.21105/joss.07540#},
      volume = {9},
      year = {2024}
  }

You can additionally cite the software repository itself using:

.. code-block::

  @misc{ramos-carreno++_2024_rdata-repo,
    author = {The rdata developers},
    doi = {10.5281/zenodo.6382237},
    month = dec,
    title = {rdata: A Python library for R datasets},
    url = {https://github.com/vnmabus/rdata},
    year = {2024}
  }

If you want to reference a particular version for reproducibility, check the version-specific DOIs available in Zenodo.

Usage
=====

Read an R dataset
-----------------

The common way of reading an rds file is:

.. code:: python

    import rdata

    converted = rdata.read_rds(rdata.TESTDATA_PATH / "test_dataframe.rds")
    print(converted)

which returns the read dataframe:

.. code:: none

      class  value
    1     a      1
    2     b      2
    3     b      3

The analog rda file can be read in a similar way:

.. code:: python

    import rdata

    converted = rdata.read_rda(rdata.TESTDATA_PATH / "test_dataframe.rda")
    print(converted)

which returns a dictionary mapping the variable name defined in the file (:code:`test_dataframe`) to the dataframe:

.. code:: none

    {'test_dataframe':   class  value
    1     a      1
    2     b      2
    3     b      3}

Under the hood, these reading functions are equivalent to the following two-step code:

.. code:: python

    import rdata

    parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH / "test_dataframe.rda")
    converted = rdata.conversion.convert(parsed)
    print(converted)

This consists of two steps:

#. First, the file is parsed using the function
   `rdata.parser.parse_file `__.
   This provides a literal description of the
   file contents as a hierarchy of Python objects representing the basic R
   objects. This step is unambiguous and always the same.
#. Then, each object must be converted to an appropriate Python object. In this
   step there are several choices on which Python type is the most appropriate
   as the conversion for a given R object. Thus, we provide a default
   `rdata.conversion.convert `__
   routine, which tries to select Python
   objects that preserve most information of the original R object. For custom
   R classes, it is also possible to specify conversion routines to Python
   objects as exemplified in
   `the documentation `__.

Write an R dataset
------------------

The common way of writing data to an rds file is:

.. code:: python

    import pandas as pd
    import rdata

    df = pd.DataFrame({"class": pd.Categorical(["a", "b", "b"]), "value": [1, 2, 3]})
    print(df)

    rdata.write_rds("data.rds", df)

which writes the dataframe to file :code:`data.rds`:

.. code:: none

      class  value
    0     a      1
    1     b      2
    2     b      3

Similarly, the dataframe can be written to an rda file with a given variable name:

.. code:: python

    import pandas as pd
    import rdata

    df = pd.DataFrame({"class": pd.Categorical(["a", "b", "b"]), "value": [1, 2, 3]})
    data = {"my_dataframe": df}
    print(data)

    rdata.write_rda("data.rda", data)

which writes the name-dataframe dictionary to file :code:`data.rda`:

.. code:: none

    {'my_dataframe':   class  value
    0     a      1
    1     b      2
    2     b      3}

Under the hood, these writing functions are equivalent to the following two-step code:

.. code:: python

    import pandas as pd
    import rdata

    df = pd.DataFrame({"class": pd.Categorical(["a", "b", "b"]), "value": [1, 2, 3]})
    data = {"my_dataframe": df}

    r_data = rdata.conversion.convert_python_to_r_data(data, file_type="rda")
    rdata.unparser.unparse_file("data.rda", r_data, file_type="rda")

This consists of two steps (reverse to reading):

#. First, each Python object is converted to an appropriate R object.
   Like in reading, there are several choices, and the default
   `rdata.conversion.convert_python_to_r_data `__.
   routine tries to select
   R objects that preserve most information of the original Python object.
   For Python classes, it is also possible to specify custom conversion routines
   to R classes as exemplified in
   `the documentation `__.
#. Then, the created RData representation is unparsed to a file using the function
   `rdata.unparser.unparse_file `__.


Additional examples
===================

Additional examples illustrating the functionalities of this package can be
found in the
`ReadTheDocs documentation `__.


.. |build-status| image:: https://github.com/vnmabus/rdata/actions/workflows/main.yml/badge.svg?branch=master
    :alt: build status
    :target: https://github.com/vnmabus/rdata/actions/workflows/main.yml

.. |docs| image:: https://readthedocs.org/projects/rdata/badge/?version=latest
    :alt: Documentation Status
    :target: https://rdata.readthedocs.io/en/latest/?badge=latest

.. |coverage| image:: http://codecov.io/github/vnmabus/rdata/coverage.svg?branch=develop
    :alt: Coverage Status
    :target: https://codecov.io/gh/vnmabus/rdata/branch/develop

.. |repostatus| image:: https://www.repostatus.org/badges/latest/active.svg
   :alt: Project Status: Active – The project has reached a stable, usable state and is being actively developed.
   :target: https://www.repostatus.org/#active

.. |versions| image:: https://img.shields.io/pypi/pyversions/rdata
   :alt: PyPI - Python Version

.. |pypi| image:: https://badge.fury.io/py/rdata.svg
    :alt: Pypi version
    :target: https://pypi.python.org/pypi/rdata/

.. |conda| image:: https://anaconda.org/conda-forge/rdata/badges/version.svg
    :alt: Conda version
    :target: https://anaconda.org/conda-forge/rdata

.. |zenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.6382237.svg
    :alt: Zenodo DOI
    :target: https://doi.org/10.5281/zenodo.6382237

.. |pyOpenSci| image:: https://tinyurl.com/y22nb8up
    :alt: pyOpenSci: Peer reviewed
    :target: https://github.com/pyOpenSci/software-submission/issues/144

.. |joss| image:: https://joss.theoj.org/papers/10.21105/joss.07540/status.svg
   :target: https://doi.org/10.21105/joss.07540

Owner

  • Name: Carlos Ramos Carreño
  • Login: vnmabus
  • Kind: user
  • Location: Madrid, Spain

Software engineer and mathematician. PhD student in Machine Learning at Universidad Autónoma de Madrid.

JOSS Publication

rdata: A Python library for R datasets
Published
December 01, 2024
Volume 9, Issue 104, Page 7540
Authors
Carlos Ramos-Carreño ORCID
Universidad Autónoma de Madrid, Spain
Tuomas Rossi ORCID
CSC – IT Center for Science Ltd., Finland
Editor
Arfon Smith ORCID
Tags
R datasets rda rds

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Ramos-Carreño"
    given-names: "Carlos"
    orcid: "https://orcid.org/0000-0003-2566-7058"
    affiliation: "Universidad Autónoma de Madrid"
    email: vnmabus@gmail.com
title: "rdata: Read R datasets from Python"
date-released: 2022-03-24
doi: 10.5281/zenodo.6382237
url: "https://github.com/vnmabus/rdata"
license: MIT
keywords:
  - rdata
  - Python
  - R
  - parser
  - conversion
identifiers:
  - description: "This is the collection of archived snapshots of all versions of rdata"
    type: doi
    value: 10.5281/zenodo.6382237
  - description: "This is the archived snapshot of version 0.7 of rdata"
    type: doi
    value: 10.5281/zenodo.6382238
preferred-citation:
  type: article
  title: "rdata: A Python library for R datasets"
  authors:
    - family-names: "Ramos-Carreño"
      given-names: "Carlos"
      orcid: "https://orcid.org/0000-0003-2566-7058"
      affiliation: "Universidad Autónoma de Madrid"
      email: vnmabus@gmail.com
    - family-names: "Rossi"
      given-names: "Tuomas"
      orcid: "https://orcid.org/0000-0002-8713-4559"
      affiliation: "CSC - IT Center for Science Ltd."
  date-published: 2024-12-01
  abstract: "Research work usually requires the analysis and processing of data from different sources. Traditionally in statistical computing, the R language has been widely used for this task, and a huge amount of datasets have been compiled in the Rda and Rds formats, native to this programming language. As these formats contain internally the representation of R objects, they cannot be directly used from Python, another widely used language for data analysis and processing. The library rdata allows to load and convert these datasets to Python objects, without the need of exporting them to other intermediate formats which may not keep all the original information. This library has minimal dependencies, ensuring that it can be used in contexts where an R installation is not available. The capability to write data in Rda and Rds formats is also under development. Thus, the library rdata facilitates data interchange, enabling the usage of the same datasets in both languages (e.g. for reproducibility, comparisons of results against methods in both languages, or the creation of complex processing pipelines that involve steps in both R and Python)."
  doi: 10.21105/joss.07540
  institution:
    name: "Universidad Autónoma de Madrid"
  issn: "2475-9066"
  issue-date: "2024-12-01"
  journal: "Journal of Open Source Software"
  keywords:
    - "R"
    - "datasets"
    - "rda"
    - "rds"
  languages:
    - en
  license: CC-BY-4.0
  publisher:
    name: "The Open Journal"
  url: "https://joss.theoj.org/papers/10.21105/joss.07540#"
  volume: 9
  issue: 104
  start: 1
  end: 4

Papers & Mentions

Total mentions: 2

A global-local neighborhood search algorithm and tabu search for flexible job shop scheduling problem
Last synced: 2 months ago
SMAGEXP: a galaxy tool suite for transcriptomics data meta-analysis
Last synced: 2 months ago

GitHub Events

Total
  • Create event: 6
  • Release event: 1
  • Issues event: 10
  • Watch event: 9
  • Delete event: 6
  • Issue comment event: 30
  • Push event: 19
  • Pull request event: 12
  • Pull request review event: 15
  • Pull request review comment event: 19
  • Fork event: 1
Last Year
  • Create event: 6
  • Release event: 1
  • Issues event: 10
  • Watch event: 9
  • Delete event: 6
  • Issue comment event: 30
  • Push event: 19
  • Pull request event: 12
  • Pull request review event: 15
  • Pull request review comment event: 19
  • Fork event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 516
  • Total Committers: 2
  • Avg Commits per committer: 258.0
  • Development Distribution Score (DDS): 0.438
Past Year
  • Commits: 150
  • Committers: 2
  • Avg Commits per committer: 75.0
  • Development Distribution Score (DDS): 0.113
Top Committers
Name Email Commits
Tuomas Rossi t****i@c****i 290
VNMabus v****s@g****m 226
Committer Domains (Top 20 + Academic)
csc.fi: 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 24
  • Total pull requests: 33
  • Average time to close issues: 3 months
  • Average time to close pull requests: 23 days
  • Total issue authors: 17
  • Total pull request authors: 2
  • Average comments per issue: 4.46
  • Average comments per pull request: 1.3
  • Merged pull requests: 31
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 16
  • Average time to close issues: 2 days
  • Average time to close pull requests: about 1 month
  • Issue authors: 5
  • Pull request authors: 2
  • Average comments per issue: 2.6
  • Average comments per pull request: 0.88
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • vnmabus (4)
  • deeenes (2)
  • soheila-sahami (2)
  • trossi (2)
  • pulpdood (1)
  • Jorgelindo238 (1)
  • ag1805x (1)
  • zoj613 (1)
  • userLUX (1)
  • austinv11 (1)
  • schlegelp (1)
  • has2k1 (1)
  • r-re (1)
  • VolodyaCO (1)
  • rituparna-13 (1)
Pull Request Authors
  • vnmabus (24)
  • trossi (14)
Top Labels
Issue Labels
bug (5) enhancement (2) good first issue (1)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 31,039 last-month
  • Total docker downloads: 135
  • Total dependent packages: 8
    (may contain duplicates)
  • Total dependent repositories: 17
    (may contain duplicates)
  • Total versions: 25
  • Total maintainers: 1
pypi.org: rdata

Read R datasets from Python.

  • Documentation: https://rdata.readthedocs.io/
  • License: MIT License Copyright (c) 2018 Rdata developers. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  • Latest release: 1.0.0
    published 4 months ago
  • Versions: 18
  • Dependent Packages: 8
  • Dependent Repositories: 17
  • Downloads: 31,039 Last month
  • Docker Downloads: 135
Rankings
Dependent packages count: 1.4%
Downloads: 2.0%
Average: 2.5%
Docker downloads count: 3.3%
Dependent repos count: 3.5%
Maintainers (1)
Last synced: 4 months ago
conda-forge.org: rdata

This package parses .rda datasets used in R. It does not depend on the R language or its libraries, and thus it is released under a MIT license.

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Stargazers count: 44.9%
Average: 46.9%
Dependent packages count: 51.2%
Forks count: 57.4%
Last synced: 4 months ago

Dependencies

readthedocs-requirements.txt pypi
  • Sphinx >=3.1
  • sphinx_rtd_theme *
requirements.txt pypi
  • numpy *
  • pandas *
  • setuptools *
  • xarray *
setup.py pypi
  • numpy *
  • pandas *
  • xarray *