dantro

dantro is a Python package to handle, transform, and visualize hierarchically structured data. Docs @ https://dantro.readthedocs.io — NOTE: This repository is a READ-ONLY-MIRROR of the actual development repository; for open issues and MRs, see there:

https://github.com/utopia-foss/dantro

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    9 of 16 committers (56.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (19.5%) to scientific vocabulary

Keywords

automation data-processing data-structures hdf5 matplotlib modelling visualization xarray yaml

Keywords from Contributors

data processing pipeline h5py utopia-project C++ complex systems complex-adaptive-systems simulation-framework
Last synced: 4 months ago · JSON representation ·

Repository

dantro is a Python package to handle, transform, and visualize hierarchically structured data. Docs @ https://dantro.readthedocs.io — NOTE: This repository is a READ-ONLY-MIRROR of the actual development repository; for open issues and MRs, see there:

Basic Info
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
automation data-processing data-structures hdf5 matplotlib modelling visualization xarray yaml
Created over 4 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

dantro logo

dantro: handle, transform, and visualize hierarchically structured data

dantro—from data and dentro (Greek for tree)—is a Python package that provides a uniform interface for hierarchically structured and semantically heterogeneous data. It is built around three main features:

  • data handling: loading heterogeneous data into a tree-like data structure and providing a uniform interface for it
  • data transformation: performing arbitrary operations on the data, if necessary using lazy evaluation
  • data visualization: creating a visual representation of the processed data

Together, these stages constitute a data processing pipeline: an automated sequence of predefined, configurable operations. Akin to a Continuous Integration pipeline, a data processing pipeline provides a uniform, consistent, and easily extensible infrastructure that contributes to more efficient and reproducible workflows. This can be beneficial especially in a scientific context, for instance when handling data that was generated by computer simulations.

dantro is meant to be integrated into projects and to be used to set up such a data processing pipeline. It is designed to be easily customizable to the requirements of the project it is integrated into, even if the involved data is hierarchically structured or semantically heterogeneous. Furthermore, it allows a configuration-based specification of all operations via YAML configuration files; the resulting pipeline can then be controlled entirely via these configuration files and without requiring code changes.

The dantro package is open source software released under the LGPLv3+ license (see copyright notice below). It was developed alongside the Utopia project, but is an independent package.

We describe the motivation and scope of dantro in more detail in this publication in the Journal of Open Source Software. For more information on the package, its features, philosophy, and integration, please visit its documentation at dantro.readthedocs.io. If you encounter any issues with dantro or have suggestions or questions of any kind, please open an issue via the project page.

Installing dantro

The dantro package is available on the Python Package Index and via conda-forge.

If you are unsure which installation method works best for you, we recommend to use conda.

Note that — in order to make full use of dantro's features — it is meant to be integrated into your project and customized to its needs. Basic usage examples and an integration guide can be found in the package documentation.

Installation via conda

As a first step, install Anaconda or Miniconda, if you have not already done so. You can then use the following command to install dantro and its dependencies:

bash $ conda install -c conda-forge dantro

Installation via pip

If you already have a Python installation on your system, you probably already have pip installed as well. To install dantro and its dependencies, invoke the following command:

bash $ pip install dantro

In case the pip command is not available, follow these instructions to install it or switch to the conda-based installation. Note that if you have both Python 2 and Python 3 installed, you might have to use the pip3 command instead.

Dependencies

dantro is implemented and tested for Python >= 3.8 and depends on the following packages:

| Package Name | Minimum Version | Purpose | | ----------------------------- | ---------------- | ------------------------ | | numpy | 1.21 | | | xarray | 0.16.2 | For labelled N-dimensional arrays | | dask | 2.10 | To work with large data | | toolz | 0.10 | For dask.delayed | distributed | 2.10 | For distributed computing | | scipy | 1.7.3 | As engine for NetCDF files | | sympy | 1.7 | For symbolic math operations | | h5py | 3.6 | For reading HDF5 datasets | | matplotlib | 3.3 | For data visualization | | seaborn | 0.11 | For advanced data visualization | | networkx | 2.6 | For network visualization | | ruamel.yaml | 0.16.12 | For parsing YAML configuration files | | dill | 0.3.3 | For advanced pickling | | paramspace | 2.5.6 | For dictionary- or YAML-based parameter spaces |

Developing dantro

Installation for developers

For installation of versions that are not on the PyPI, pip allows specifying an URL to a git repository:

bash $ pip install git+<clone-link>@<some-branch-name>

Here, replace clone-link with the clone URL of this project and some-branch-name with the name of the branch that you want to install the package from (see the pip documentation for details). Alternatively, omit the @ and everything after it. If you do not have SSH keys available, use the HTTPS link.

If you would like to contribute to dantro (yeah!), you should clone the repository to a local directory:

bash $ git clone <clone-link>

For development purposes, it makes sense to work in a specific virtual environment for dantro and install dantro in editable mode:

bash $ python3 -m venv ~/.virtualenvs/dantro $ source ~/.virtualenvs/dantro/bin/activate (dantro) $ pip install -e ./dantro

Additional dependencies

For development purposes, the following additional packages are required.

| Package Name | Minimum Version | Purpose | | ----------------------------- | ---------------- | ------------------------ | | pytest | 3.4 | Testing framework | | pytest-cov | 2.5 | Coverage report | | tox | 3.1 | Test environments | | Sphinx | 4.* | Documentation generator | | sphinx-book-theme | 0.2.* | Modern sphinx theme | | pre-commit | 2.15 | For commit hooks | | black | 22.3.0 | For code formatting |

To install these development-related dependencies, enter the virtual environment, navigate to the cloned repository, and perform the installation using:

bash (dantro) $ cd dantro (dantro) $ pip install -e .[dev]

With these dependencies having been installed, make sure to set up the git hook that allows pre-commit to run before making a commit:

bash (dantro) $ pre-commit install

The corresponding dependencies needed for the hooks will be installed automatically upon a first commit. For more information on commit hooks, see the commit hooks section below.

Testing framework

To assert correct functionality, tests are written alongside all features. The pytest and tox packages are used as testing frameworks.

All tests are carried out for Python 3.7 through 3.10 using the GitLab CI/CD and the newest versions of all dependencies. When merging to the master branch, dantro is additionally tested against the specified minimum versions.

Test coverage and pipeline status can be seen on the project page.

Running tests

To run all defined tests, call:

bash (dantro) $ python -m pytest -v tests/ --cov=dantro --cov-report=term-missing This also provides a coverage report, showing the lines that are not covered by the tests.

Alternatively, with tox, it is possible to select different python environments for testing. Given that the interpreter is available, the test for a specific environment can be carried out with the following command:

bash (dantro) $ tox -e py37

Documentation

Locally building the documentation

To build dantro's documentation locally via Sphinx, install the required dependencies and invoke the make doc command:

bash (dantro) $ cd doc (dantro) $ make doc

You can then view the documentation by opening the doc/_build/html/index.html file.

Note: Sphinx is configured such that warnings will be regarded as errors, making detection of markup mistakes easier. You can inspect the error logs gathered in the doc/build_errors.log file. For Python-related Sphinx referencing errors, see the doc/.nitpick-ignore file for exceptions

GitLab Documentation Environment

When developing dantro and pushing to the feature branch, the build:doc job of the CI pipeline additionally creates a documentation preview. The result can either be downloaded from the job artifacts or the deployed GitLab environment.

Upon warnings or errors in the build, the job will exit with an orange warning sign. You can inspect the build_errors.log file via the exposed CI artifacts.

Commit hooks

To streamline dantro development, a number of automations are used which take care of code formatting and perform some basic checks. These automations are managed by pre-commit and are run when invoking git commit (hence the name).

If these so-called hooks determine a problem, they will display an error and you will not be able to commit just yet. Some of the hooks automatically fix the error (e.g.: removing whitespace), others require some manual action on your part. Either way, you will have to stage these changes manually (using git add, as usual). To check which changes were made by the hooks, use git diff.

Once you applied the requested changes, invoke git commit anew. This will again trigger the hooks, but — with all issues resolved — the hooks should now all pass and lead you to the usual commit message prompt.

The most notable hooks are:

  • black: The uncompromising code formatter
  • isort: Systematically sorts Python import statements

Both isort and black are configured in the pyproject.toml file. For the other hooks' configuration, see .pre-commit-config.yaml. All hooks are also being run in the GitLab CI/CD check:hooks job.

If you have trouble setting up the hooks or if they create erroneous results, please let us know.

Troubleshooting

Install test and/or documentation dependencies when using zsh

If you use a zsh terminal (default for macOS users since Catalina) and try to install extra requirements like the test and/or documentation dependencies, you will probably get an error similar to zsh: no matches found: .[test_deps]. This can be fixed by escaping the square brackets, i.e. writing .\[test_deps\] or .\[doc_deps\].

Copyright

dantro is licensed under the GNU Lesser General Public License Version 3 or any later version.

Copyright Notice

dantro -- a python package for handling and plotting hierarchical data
Copyright (C) 2018 – 2022  dantro developers

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

A copy of the GNU General Public License Version 3, and the GNU Lesser General Public License Version 3 extending it, is distributed with the source code of this program; see COPYING and COPYING.LESSER, respectively.

Copyright Holders

The copyright holders of dantro are collectively referred to as dantro developers in the respective copyright notices and disclaimers.

dantro has been developed by (in alphabetical order):

  • Unai Fischer Abaigar
  • Benjamin Herdeanu
  • Daniel Lake
  • Yunus Sevinchan
  • Jeremias Traub
  • Julian Weninger

Contact the developers via: dantro-dev@iup.uni-heidelberg.de

Owner

  • Name: Utopia
  • Login: utopia-foss
  • Kind: organization
  • Email: utopia-dev@iup.uni-heidelberg.de
  • Location: Heidelberg, Germany

The Utopia project provides software for the modelling of complex and evolving systems. Visit https://utopia-project.org for more details

Citation (CITATION.cff)

# YAML 1.2
---
cff-version: "1.1.0"
message: If you use this software, please cite it using these metadata.

# JOSS publication information (also see: joss/paper.md )
title: "dantro: a Python package for handling, transforming, and visualizing hierarchically structured data"
doi: 10.21105/joss.02316
# TODO Add date-released, once available

authors:
  - given-names: Yunus
    family-names: Sevinchan
    orcid: https://orcid.org/0000-0003-3858-0904
    affiliation: Institute of Environmental Physics, Heidelberg University, Germany

  - given-names: Benjamin
    family-names: Herdeanu
    orcid: https://orcid.org/0000-0001-6343-3004
    affiliation: Institute of Environmental Physics, Heidelberg University, Germany

  - given-names: Jeremias
    family-names: Traub
    orcid: https://orcid.org/0000-0001-8911-6365
    affiliation: Institute of Environmental Physics, Heidelberg University, Germany

# Other package-related metadata
version: v0.14
license: LGPL-3.0+
repository-code: https://gitlab.com/utopia-project/dantro
keywords:
  - Python
  - YAML
  - computer simulations
  - research software
  - data analysis
  - plotting
  - data processing pipeline
  - configuration-based interfaces
  - hierarchical data structures
  - HDF5

...

GitHub Events

Total
  • Push event: 8
  • Create event: 5
Last Year
  • Push event: 8
  • Create event: 5

Committers

Last synced: almost 2 years ago

All Time
  • Total Commits: 1,069
  • Total Committers: 16
  • Avg Commits per committer: 66.813
  • Development Distribution Score (DDS): 0.598
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Yunus Sevinchan y****n@g****m 430
Yunus Sevinchan Y****n@g****m 413
Benjamin Herdeanu h****u@i****e 97
Yunus Sevinchan y****n@i****e 59
Jeremias Traub j****b@i****e 38
Julian Weninger j****r@u****h 7
Benjamin Herdeanu b****u@w****e 6
Lukas Riedel m****l@l****m 4
Julian Weninger j****r@i****e 3
Daniel Lake d****e@p****e 3
Daniel Lake d****e@i****e 2
Philipp S. Sommer p****r@h****e 2
Benjamin Herdeanu b****u@g****m 2
Mark Piper m****r@c****u 1
Unai Fischer u****r@i****e 1
Utopia Developers b****0@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: almost 2 years ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels