dantro
dantro: a Python package for handling, transforming, and visualizing hierarchically structured data - Published in JOSS (2020)
Science Score: 94.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README and JOSS metadata -
○Academic publication links
-
✓Committers with academic emails
9 of 20 committers (45.0%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Keywords from Contributors
Scientific Fields
Repository
dantro is a python package for handling, transforming, and visualizing hierarchically organized data. Integrated into data-intensive projects, it supplies an easy way to define a customizable, configuration-based data processing pipeline. See [utopya](https://gitlab.com/utopia-project/utopya) for an example.
Basic Info
- Host: gitlab.com
- Owner: utopia-project
- License: gpl-3.0+
- Default Branch: main
Statistics
- Stars: 7
- Forks: 1
- Open Issues: 91
- Releases: 0
Topics
Metadata Files
README.md
dantro: handle, transform, and visualize hierarchically structured data
dantro—from data and dentro (Greek for tree)—is a Python package that provides a uniform interface for hierarchically structured and semantically heterogeneous data.
It is built around three main features:
- data handling: loading heterogeneous data into a tree-like data structure and providing a uniform interface for it
- data transformation: performing arbitrary operations on the data, if necessary using lazy evaluation
- data visualization: creating a visual representation of the processed data
Together, these stages constitute a data processing pipeline: an automated sequence of predefined, configurable operations. Akin to a Continuous Integration pipeline, a data processing pipeline provides a uniform, consistent, and easily extensible infrastructure that contributes to more efficient and reproducible workflows. This can be beneficial especially in a scientific context, for instance when handling data that was generated by computer simulations.
dantro is meant to be integrated into projects and to be used to set up such a data processing pipeline.
It is designed to be easily customizable to the requirements of the project it is integrated into, even if the involved data is hierarchically structured or semantically heterogeneous.
Furthermore, it allows a configuration-based specification of all operations via YAML configuration files; the resulting pipeline can then be controlled entirely via these configuration files and without requiring code changes.
The dantro package is open source software released under the LGPLv3+ license (see copyright notice below).
It was developed alongside the Utopia project, but is an independent package.
We describe the motivation and scope of dantro in more detail in this publication in the Journal of Open Source Software.
For more information on the package, its features, philosophy, and integration, please visit its documentation at dantro.readthedocs.io.
If you encounter any issues with dantro or have suggestions or questions of any kind, please open an issue via the project page.
Installing dantro
The dantro package is available on the Python Package Index and via conda-forge.
If you are unsure which installation method works best for you, we recommend to use conda.
Note that — in order to make full use of dantro's features — it is meant to be integrated into your project and customized to its needs.
Basic usage examples and an integration guide can be found in the package documentation.
Installation via conda
As a first step, install Anaconda or Miniconda, if you have not already done so. You can then use the following command to install dantro and its dependencies:
bash
$ conda install -c conda-forge dantro
Installation via pip
If you already have a Python installation on your system, you probably already have pip installed as well.
To install dantro and its dependencies, invoke the following command:
bash
$ pip install dantro
In case the pip command is not available, follow these instructions to install it or switch to the conda-based installation.
Note that if you have both Python 2 and Python 3 installed, you might have to use the pip3 command instead.
Dependencies
dantro is implemented and tested for Python >= 3.9 and depends on the following packages:
| Package Name | Purpose | | ----------------------------- | ------------------------------------------- | | numpy | For fast and versatile array operations | | xarray | For labelled N-dimensional arrays | | scipy | As engine for NetCDF files | | sympy | For symbolic math operations | | dask | To work with large data | | toolz | For dask.delayed | distributed | For distributed computing | | h5py | For reading HDF5 datasets | | netCDF4 | netCDF4 backend | | matplotlib | For data visualization | | seaborn | For advanced data visualization | | networkx | For network visualization | | dill | For advanced pickling | | ruamel.yaml | For parsing YAML configuration files | | paramspace | For dictionary- or YAML-based parameter spaces | | yayaml | Working conveniently with YAML files |
If not specified further, dantro does not impose lower or upper bounds on package versions. Effectively, dantro works with the latest versions of all dependencies and the scheduled CI jobs make sure that such a combination continues working.
In case you have trouble with dependencies, make sure you have the most recent version of dantro installed.
Developing dantro
Installation for developers
For installation of versions that are not on the PyPI, pip allows specifying an URL to a git repository:
bash
$ pip install git+<clone-link>@<some-branch-name>
Here, replace clone-link with the clone URL of this project and some-branch-name with the name of the branch that you want to install the package from (see the pip documentation for details).
Alternatively, omit the @ and everything after it.
If you do not have SSH keys available, use the HTTPS link.
If you would like to contribute to dantro (yeah!), you should clone the repository to a local directory:
bash
$ git clone <clone-link>
For development purposes, it makes sense to work in a specific virtual environment for dantro and install dantro in editable mode:
bash
$ python3 -m venv ~/.virtualenvs/dantro
$ source ~/.virtualenvs/dantro/bin/activate
(dantro) $ pip install -e ./dantro
Additional dependencies
For development purposes, the following additional packages are required.
| Package Name | Minimum Version | Purpose | | ----------------------------- | ---------------- | ------------------------ | | pytest | 3.4 | Testing framework | | pytest-cov | 2.5 | Coverage report | | tox | 3.1 | Test environments | | Sphinx | 5 | Documentation generator | | sphinx-book-theme | | Modern sphinx theme | | pre-commit | 2.15 | For commit hooks | | black | 24.1.0 | For code formatting |
To install these development-related dependencies, enter the virtual environment, navigate to the cloned repository, and perform the installation using:
bash
(dantro) $ cd dantro
(dantro) $ pip install -e .[dev]
With these dependencies having been installed, make sure to set up the git hook that allows pre-commit to run before making a commit:
bash
(dantro) $ pre-commit install
The corresponding dependencies needed for the hooks will be installed automatically upon a first commit. For more information on commit hooks, see the commit hooks section below.
Testing framework
To assert correct functionality, tests are written alongside all features.
The pytest and tox packages are used as testing frameworks.
All tests are carried out for Python versions 3.9 to 3.13 using the GitLab CI/CD and the newest versions of all dependencies, as resolved by pip.
Test coverage and pipeline status can be seen on the project page.
Running tests
To run all defined tests, call:
bash
(dantro) $ python -m pytest -v tests/ --cov=dantro --cov-report=term-missing
This also provides a coverage report, showing the lines that are not covered by the tests.
Alternatively, with tox, it is possible to select different python environments for testing.
Given that the interpreter is available, the test for a specific environment can be carried out with the following command:
bash
(dantro) $ tox -e py312
Documentation
Locally building the documentation
To build dantro's documentation locally via Sphinx, first install the required dependencies and invoke the make doc command:
bash
(dantro) $ pip install .[dev]
(dantro) $ cd doc
(dantro) $ DANTRO_DOC_GENERATE_FIGURES=True make doc
The DANTRO_DOC_GENERATE_FIGURES environment variable controls whether figures will be built.
This only needs to happen once, the figures are stored in the doc/_static/_gen directory.
Optionally, you can invoke make doctest and make linkcheck to run documentation tests and a check of the validity of all links in the docs.
The documentation can then be viewed by opening the doc/_build/html/index.html file.
Note: Sphinx is configured such that warnings will be regarded as errors, making detection of markup mistakes easier.
You can inspect the error logs gathered in the doc/build_errors.log file.
For Python-related Sphinx referencing errors, see the doc/.nitpick-ignore file for exceptions
GitLab Documentation Environment
When developing dantro and pushing to the feature branch, the build:doc job of the CI pipeline additionally creates a documentation preview.
The result can either be downloaded from the job artifacts or the deployed GitLab environment.
Upon warnings or errors in the build, the job will exit with an orange warning sign.
You can inspect the build_errors.log file via the exposed CI artifacts.
Commit hooks
To streamline dantro development, a number of automations are used which take care of code formatting and perform some basic checks.
These automations are managed by pre-commit and are run when invoking git commit (hence the name).
If these so-called hooks determine a problem, they will display an error and you will not be able to commit just yet.
Some of the hooks automatically fix the error (e.g.: removing whitespace), others require some manual action on your part.
Either way, you will have to stage these changes manually (using git add, as usual).
To check which changes were made by the hooks, use git diff.
Once you applied the requested changes, invoke git commit anew.
This will again trigger the hooks, but — with all issues resolved — the hooks should now all pass and lead you to the usual commit message prompt.
The most notable hooks are:
Both isort and black are configured in the pyproject.toml file.
For the other hooks' configuration, see .pre-commit-config.yaml.
All hooks are also being run in the GitLab CI/CD check:hooks job.
If you have trouble setting up the hooks or if they create erroneous results, please let us know.
Troubleshooting
Install test and/or documentation dependencies when using zsh
If you use a zsh terminal (default for macOS users since Catalina) and try to install extra requirements like the test and/or documentation dependencies, you will probably get an error similar to zsh: no matches found: .[test_deps].
This can be fixed by escaping the square brackets, i.e. writing .\[test_deps\] or .\[doc_deps\].
Copyright
dantro is licensed under the GNU Lesser General Public License Version 3 or any later version.
Copyright Notice
dantro -- a python package for handling and plotting hierarchical data
Copyright (C) 2018 – 2025 dantro developers
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
A copy of the GNU General Public License Version 3, and the GNU Lesser General Public License Version 3 extending it, is distributed with the source code of this program; see COPYING and COPYING.LESSER, respectively.
Copyright Holders
The copyright holders of dantro are collectively referred to as dantro developers in the respective copyright notices and disclaimers.
Maintainers:
- Thomas Gaskin (@tgaskin)
- Benjamin Herdeanu (@herdeanu)
- Yunus Sevinchan (@blsqr)
- Jeremias Traub (@jeremiastraub)
Contributors (in alphabetical order):
- Unai Fischer Abaigar
- Daniel Lake
- Lukas Riedel (@peanutfun)
- Julian Weninger (@julianweninger)
Contact the maintainers via: dantro-dev@iup.uni-heidelberg.de
Owner
- Name: utopia
- Login: utopia-project
- Kind: organization
- Repositories: 7
- Profile: https://gitlab.com/utopia-project
The Utopia Project provides software for the modelling of complex and adaptive systems. Its key component is the Utopia modelling framework. For more information, see [utopia-project.org](https://utopia-project.org).
JOSS Publication
dantro: a Python package for handling, transforming, and visualizing hierarchically structured data
Authors
Tags
YAML computer simulations research software data analysis plotting data processing pipeline configuration-based interfaces hierarchical data structures HDF5Citation (CITATION.cff)
# YAML 1.2
---
cff-version: "1.1.0"
message: If you use this software, please cite it using these metadata.
# JOSS publication information (also see: joss/paper.md )
title: "dantro: a Python package for handling, transforming, and visualizing hierarchically structured data"
doi: 10.21105/joss.02316
# TODO Add date-released, once available
authors:
- given-names: Yunus
family-names: Sevinchan
orcid: https://orcid.org/0000-0003-3858-0904
affiliation: Institute of Environmental Physics, Heidelberg University, Germany
- given-names: Benjamin
family-names: Herdeanu
orcid: https://orcid.org/0000-0001-6343-3004
affiliation: Institute of Environmental Physics, Heidelberg University, Germany
- given-names: Jeremias
family-names: Traub
orcid: https://orcid.org/0000-0001-8911-6365
affiliation: Institute of Environmental Physics, Heidelberg University, Germany
# Other package-related metadata
version: v0.14
license: LGPL-3.0+
repository-code: https://gitlab.com/utopia-project/dantro
keywords:
- Python
- YAML
- computer simulations
- research software
- data analysis
- plotting
- data processing pipeline
- configuration-based interfaces
- hierarchical data structures
- HDF5
...
Committers
Last synced: 4 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Yunus Sevinchan | y****n@g****m | 642 |
| Yunus Sevinchan | Y****n@g****m | 413 |
| Benjamin Herdeanu | h****u@i****e | 97 |
| Yunus Sevinchan | y****n@i****e | 59 |
| Jeremias Traub | j****b@i****e | 39 |
| Thomas Gaskin | t****n@l****k | 13 |
| Yunus Sevinchan | b****r@p****m | 9 |
| Julian Weninger | j****r@u****h | 7 |
| Benjamin Herdeanu | b****u@w****e | 6 |
| Lukas Riedel | m****l@l****m | 4 |
| Daniel Lake | d****e@p****e | 3 |
| Julian Weninger | j****r@i****e | 3 |
| Lukas Riedel | 2****n@u****m | 3 |
| Benjamin Herdeanu | b****u@g****m | 2 |
| Daniel Lake | d****e@i****e | 2 |
| Philipp S. Sommer | p****r@h****e | 2 |
| Julian Weninger | w****n@g****m | 1 |
| Mark Piper | m****r@c****u | 1 |
| Unai Fischer | u****r@i****e | 1 |
| Utopia Developers | b****0@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
Packages
- Total packages: 2
-
Total downloads:
- pypi 870 last-month
-
Total dependent packages: 1
(may contain duplicates) -
Total dependent repositories: 2
(may contain duplicates) - Total versions: 90
- Total maintainers: 2
pypi.org: dantro
Handle, transform, and visualize hierarchically structured data
- Homepage: https://gitlab.com/utopia-project/dantro
- Documentation: https://dantro.readthedocs.io/
- License: LGPL-3.0-or-later
-
Latest release: 0.21.0
published 4 months ago
Rankings
conda-forge.org: dantro
dantro is a python package for handling, transforming, and visualizing hierarchically organized data. Integrated into data-intensive projects, it supplies an easy way to define a customizable, configuration-based data processing pipeline.
- Homepage: https://gitlab.com/utopia-project/dantro
- License: LGPL-3.0-or-later
-
Latest release: 0.18.9
published about 3 years ago
