dirschema

Spec and validator for directories, files and metadata based on JSON Schema and regexes.

https://github.com/materials-data-science-and-informatics/dirschema

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.1%) to scientific vocabulary

Keywords

json-schema metadata python validation
Last synced: 4 months ago · JSON representation ·

Repository

Spec and validator for directories, files and metadata based on JSON Schema and regexes.

Basic Info
Statistics
  • Stars: 8
  • Watchers: 2
  • Forks: 0
  • Open Issues: 2
  • Releases: 1
Topics
json-schema metadata python validation
Created about 4 years ago · Last pushed about 2 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Authors Codemeta

README.md

Project status Docs CI Test Coverage PyPIPkgVersion

dirschema


DirSchema Logo   

A directory structure and metadata linter based on JSON Schema.

JSON Schema is great for validating (files containing) JSON objects that e.g. contain metadata, but these are only the smallest pieces in the organization of a whole directory structure, e.g. of some dataset of project. When working on datasets of a certain kind, they might contain various types of data, each different file requiring different accompanying metadata, based on its file type and/or location.

DirSchema combines JSON Schemas and regexes into a solution to enforce structural dependencies and metadata requirements in directories and directory-like archives. With it you can for example check that:

  • only files of a certain type are in a location (e.g. only jpg files in directory img)
  • for each data file there exists a metadata file (e.g. test.jpg has test.jpg_meta.json)
  • each metadata file is valid according to some JSON Schema

If validating these kinds of constraints looks appealing to you, this tool is for you!

Dirschema features:

  • Built-in support for schemas and metadata stored as JSON or YAML
  • Built-in support for checking contents of ZIP and HDF5 archives
  • Extensible validation interface for advanced needs beyond JSON Schema
  • Both a Python library and a CLI tool to perform the validation

Installation

pip install dirschema

Getting Started

The dirschema tool needs as input:

  • a DirSchema YAML file (containing a specification), and
  • a path to a directory or file (e.g. zip file) that should be checked.

You can run it like this:

dirschema my_dirschema.yaml DIRECTORY_OR_ARCHIVE_PATH

If the validation was successful, there will be no output. Otherwise, the tool will output a list of errors (e.g. invalid metadata, missing files, etc.).

You can also use dirschema from other Python code as a library:

python from dirschema.validate import DSValidator DSValidator("/path/to/dirschema").validate("/dataset/path")

Similarly, the method will return an error dict, which will be empty if the validation succeeded.

You can find more information on using and contributing to this repository in the documentation.

How to Cite

If you want to cite this project in your scientific work, please use the citation file in the repository.

Acknowledgements

We kindly thank all authors and contributors.

HMC Logo    FZJ Logo


This project was developed at the Institute for Materials Data Science and Informatics (IAS-9) of the Jülich Research Center and funded by the Helmholtz Metadata Collaboration (HMC), an incubator-platform of the Helmholtz Association within the framework of the Information and Data Science strategic initiative.

Owner

  • Name: Materials Data Science and Informatics
  • Login: Materials-Data-Science-and-Informatics
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
type: software
message: If you use this software, please cite it using this metadata.

title: dirschema
version: 0.1.0
abstract: "Spec and validator for directories, files and metadata based on JSON Schema
  and regexes."
repository-code: https://github.com/Materials-Data-Science-and-Informatics/dirschema
license: MIT
keywords:
- jsonschema
- validation
- directory
- structure
- fair
- metadata
authors:
- affiliation: Forschungszentrum Jülich GmbH - Institute for Materials Data Science
    and Informatics (IAS9)
  email: a.pirogov@fz-juelich.de
  family-names: Pirogov
  given-names: Anton
  orcid: https://orcid.org/0000-0002-5077-7497
url: "https://materials-data-science-and-informatics.github.io/dirschema"
contact:
- orcid: https://orcid.org/0000-0002-5077-7497
  given-names: Anton
  email: a.pirogov@fz-juelich.de
  family-names: Pirogov

CodeMeta (codemeta.json)

{
  "@context": [
    "https://doi.org/10.5063/schema/codemeta-2.0",
    "https://w3id.org/software-iodata",
    "https://raw.githubusercontent.com/jantman/repostatus.org/master/badges/latest/ontology.jsonld",
    "https://schema.org",
    "https://w3id.org/software-types"
  ],
  "@type": "SoftwareSourceCode",
  "applicationCategory": [
    "Scientific/Engineering",
    "Software Development"
  ],
  "audience": [
    {
      "@type": "Audience",
      "audienceType": "Developers"
    },
    {
      "@type": "Audience",
      "audienceType": "Science/Research"
    }
  ],
  "author": [
    {
      "@id": "https://orcid.org/0000-0002-5077-7497",
      "@type": "Person",
      "affiliation": {
        "@type": "Organization",
        "legalName": "Forschungszentrum Jlich GmbH - Institute for Materials Data Science and Informatics (IAS9)"
      },
      "familyName": "Pirogov",
      "givenName": "Anton"
    }
  ],
  "codeRepository": "https://github.com/Materials-Data-Science-and-Informatics/dirschema",
  "description": "Spec and validator for directories, files and metadata based on JSON Schema and regexes.",
  "developmentStatus": "https://www.repostatus.org/#wip",
  "identifier": "dirschema",
  "keywords": [
    "directory",
    "fair",
    "jsonschema",
    "metadata",
    "structure",
    "validation"
  ],
  "license": "http://spdx.org/licenses/MIT",
  "maintainer": {
    "@type": "Person",
    "email": "a.pirogov@fz-juelich.de",
    "familyName": "Pirogov",
    "givenName": "Anton"
  },
  "name": "dirschema",
  "operatingSystem": "OS Independent",
  "runtimePlatform": "Python 3",
  "softwareHelp": "https://materials-data-science-and-informatics.github.io/dirschema",
  "softwareRequirements": [
    {
      "@type": "SoftwareApplication",
      "identifier": "entrypoints",
      "name": "entrypoints",
      "runtimePlatform": "Python 3",
      "version": "^0.4"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "h5py",
      "name": "h5py",
      "runtimePlatform": "Python 3",
      "version": "^3.4.0"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "jsonref",
      "name": "jsonref",
      "runtimePlatform": "Python 3",
      "version": "^0.2"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "jsonschema",
      "name": "jsonschema",
      "runtimePlatform": "Python 3",
      "version": "^4.4.0"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "numpy",
      "name": "numpy",
      "runtimePlatform": "Python 3",
      "version": "^1.21.2"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "pydantic",
      "name": "pydantic",
      "runtimePlatform": "Python 3",
      "version": "^1.8.2"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "python",
      "name": "python",
      "runtimePlatform": "Python 3",
      "version": "^3.8"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "ruamel.yaml",
      "name": "ruamel.yaml",
      "runtimePlatform": "Python 3",
      "version": "^0.17.16"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "typer",
      "name": "typer",
      "runtimePlatform": "Python 3",
      "version": "^0.9.0"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "typing-extensions",
      "name": "typing-extensions",
      "runtimePlatform": "Python 3",
      "version": "^4.5.0"
    }
  ],
  "targetProduct": {
    "@type": "CommandLineApplication",
    "executableName": "dirschema",
    "name": "dirschema",
    "runtimePlatform": "Python 3"
  },
  "url": "https://materials-data-science-and-informatics.github.io/dirschema",
  "version": "0.1.0"
}

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Issues and Pull Requests

Last synced: 5 months ago

All Time
  • Total issues: 14
  • Total pull requests: 0
  • Average time to close issues: about 2 months
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 0.21
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • apirogov (10)
  • hofmannv (1)
  • mindleaving (1)
Pull Request Authors
Top Labels
Issue Labels
enhancement (5) idea (1) documentation (1) bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 9 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 3
pypi.org: dirschema

Spec and validator for directories, files and metadata based on JSON Schema and regexes.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 9 Last month
Rankings
Dependent packages count: 7.5%
Stargazers count: 20.5%
Forks count: 30.2%
Average: 35.5%
Downloads: 49.4%
Dependent repos count: 69.8%
Maintainers (3)
Last synced: 5 months ago