fairscape-cli

Data Validation and Packaging utility for sending evidence graphs to FAIRSCAPE

https://github.com/fairscape/fairscape-cli

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Data Validation and Packaging utility for sending evidence graphs to FAIRSCAPE

Basic Info
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 4
Created about 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License Citation

README.md

fairscape-cli

A utility for packaging objects and validating metadata for FAIRSCAPE.


Documentation: https://fairscape.github.io/fairscape-cli/

Features

fairscape-cli provides a Command Line Interface (CLI) that allows the client side to create, manage, and publish scientific data packages:

  • RO-Crate Management: Create and manipulate RO-Crate packages locally.
    • Initialize RO-Crates in new or existing directories.
    • Add data, software, and computation metadata.
    • Copy files into the crate structure alongside metadata registration.
  • Schema Handling: Define, infer, and validate data schemas (Tabular, HDF5).
    • Create schema definition files.
    • Add properties with constraints.
    • Infer schemas directly from data files.
    • Validate data files against specified schemas.
    • Register schemas within RO-Crates.
  • Data Import: Fetch data from external sources and convert them into RO-Crates.
    • Import NCBI BioProjects.
    • Convert Portable Encapsulated Projects (PEPs) to RO-Crates.
  • Build Artifacts: Generate derived outputs from RO-Crates.
    • Create detailed HTML datasheets summarizing crate contents.
    • Generate provenance evidence graphs (JSON and HTML).
  • Release Management: Organize multiple related RO-Crates into a cohesive release package.
    • Initialize a release structure.
    • Automatically link sub-crates and propagate metadata.
    • Build a top-level datasheet for the release.
  • Publishing: Publish RO-Crate metadata to external repositories.
    • Upload RO-Crate directories or zip files to Fairscape.
    • Create datasets on Dataverse instances.
    • Mint or update DOIs on DataCite.

Requirements

Python 3.8+

Installation

console $ pip install fairscape-cli

Command Overview

The CLI is organized into several top-level commands:

rocrate: Core local RO-Crate manipulation (create, add files/metadata).

schema: Operations on data schemas (create, infer, add properties, add to crate).

validate: Validate data against schemas.

import: Fetch external data into RO-Crate format (e.g., bioproject, pep).

build: Generate outputs from RO-Crates (e.g., datasheet, evidence-graph).

release: Manage multi-part RO-Crate releases (e.g., create, build).

publish: Publish RO-Crates to repositories (e.g., fairscape, dataverse, doi).

Use --help for details on any command or subcommand:

console $ fairscape-cli --help $ fairscape-cli rocrate --help $ fairscape-cli rocrate add --help $ fairscape-cli schema create --help

Examples

Creating an RO-Crate

Create an RO-Crate in a specified directory:

console $ fairscape-cli rocrate create \ --name "My Analysis Crate" \ --description "RO-Crate containing analysis scripts and results" \ --organization-name "My Org" \ --project-name "My Project" \ --keywords "analysis" \ --keywords "python" \ --author "Jane Doe" \ --version "1.1.0" \ ./my_analysis_crate

Initialize an RO-Crate in the current working directory:

```console

Navigate to an empty directory first if desired

mkdir myanalysiscrate && cd myanalysiscrate

$ fairscape-cli rocrate init \ --name "My Analysis Crate" \ --description "RO-Crate containing analysis scripts and results" \ --organization-name "My Org" \ --project-name "My Project" \ --keywords "analysis" \ --keywords "python" ```

Adding Content and Metadata to an RO-Crate

These commands support adding both the file and its metadata (add) or just the metadata (register).

Add a dataset file and its metadata:

console $ fairscape-cli rocrate add dataset \ --name "Raw Measurements" \ --author "John Smith" \ --version "1.0" \ --date-published "2023-10-27" \ --description "Raw sensor measurements from Experiment A." \ --keywords "raw-data" \ --keywords "sensors" \ --data-format "csv" \ --source-filepath "./source_data/measurements.csv" \ --destination-filepath "data/measurements.csv" \ ./my_analysis_crate

Add a software script file and its metadata:

console $ fairscape-cli rocrate add software \ --name "Analysis Script" \ --author "Jane Doe" \ --version "1.1.0" \ --description "Python script for processing raw measurements." \ --keywords "analysis" \ --keywords "python" \ --file-format "py" \ --source-filepath "./scripts/process_data.py" \ --destination-filepath "scripts/process_data.py" \ ./my_analysis_crate

Register computation metadata (metadata only):

```console

Assuming the script and dataset were added previously and have GUIDs:

Dataset GUID: ark:59852/dataset-raw-measurements-xxxx

Software GUID: ark:59852/software-analysis-script-yyyy

$ fairscape-cli rocrate register computation \ --name "Data Processing Run" \ --run-by "Jane Doe" \ --date-created "2023-10-27T14:30:00Z" \ --description "Execution of the analysis script on the raw measurements." \ --keywords "processing" \ --used-dataset "ark:59852/dataset-raw-measurements-xxxx" \ --used-software "ark:59852/software-analysis-script-yyyy" \ --generated "ark:59852/dataset-processed-results-zzzz" \ ./myanalysiscrate

Note: You would typically register the generated dataset ('processed-results') separately.

```

Register dataset metadata (metadata only, file assumed present or external):

console $ fairscape-cli rocrate register dataset \ --name "Processed Results" \ --guid "ark:59852/dataset-processed-results-zzzz" \ --author "Jane Doe" \ --version "1.0" \ --description "Processed results from the analysis script." \ --keywords "results" \ --data-format "csv" \ --filepath "results/processed.csv" \ --generated-by "ark:59852/computation-data-processing-run-wwww" \ ./my_analysis_crate

Schema Management

Create a tabular schema definition file:

console $ fairscape-cli schema create \ --name 'Measurement Schema' \ --description 'Schema for raw sensor measurements' \ --schema-type tabular \ --separator ',' \ --header true \ ./measurement_schema.json

Add properties to the tabular schema file:

```console

Add a string property (column 0)

$ fairscape-cli schema add-property string \ --name 'Timestamp' \ --index 0 \ --description 'Measurement time (ISO8601)' \ ./measurement_schema.json

Add a number property (column 1)

$ fairscape-cli schema add-property number \ --name 'Value' \ --index 1 \ --description 'Sensor reading' \ --minimum 0 \ ./measurement_schema.json ```

Infer a schema from an existing data file:

console $ fairscape-cli schema infer \ --name "Inferred Results Schema" \ --description "Schema inferred from processed results" \ ./my_analysis_crate/results/processed.csv \ ./processed_schema.json

Add an existing schema file to an RO-Crate:

console $ fairscape-cli schema add-to-crate \ ./measurement_schema.json \ ./my_analysis_crate

Validation

Validate a data file against a schema file:

```console

Successful validation

$ fairscape-cli validate schema \ --schema-path ./measurementschema.json \ --data-path ./myanalysis_crate/data/measurements.csv

Example failure

$ fairscape-cli validate schema \ --schema-path ./measurementschema.json \ --data-path ./sourcedata/measurements_invalid.csv ```

Importing Data

Import an NCBI BioProject into a new RO-Crate:

console $ fairscape-cli import bioproject \ --accession PRJNA123456 \ --author "Importer Name" \ --output-dir ./bioproject_prjna123456_crate \ --crate-name "Imported BioProject PRJNA123456"

Convert a PEP project to an RO-Crate:

console $ fairscape-cli import pep \ ./path/to/my_pep_project \ --output-path ./my_pep_rocrate \ --crate-name "My PEP Project Crate"

Building Outputs

Generate an HTML datasheet for an RO-Crate:

```console $ fairscape-cli build datasheet ./myanalysiscrate

Output will be ./myanalysiscrate/ro-crate-datasheet.html by default

```

Generate a provenance graph for a specific item within the crate:

```console

Assuming 'ark:59852/dataset-processed-results-zzzz' is the item of interest

$ fairscape-cli build evidence-graph \ ./myanalysiscrate \ ark:59852/dataset-processed-results-zzzz \ --output-json ./myanalysiscrate/prov/resultsprov.json \ --output-html ./myanalysiscrate/prov/resultsprov.html ```

Release Management

Create the structure for a multi-part release:

```console $ fairscape-cli release create \ --name "My Big Release Q4 2023" \ --description "Combined release of Experiment A and Experiment B crates" \ --organization-name "My Org" \ --project-name "Overall Project" \ --keywords "release" \ --keywords "experiment-a" \ --keywords "experiment-b" \ --version "2.0" \ --author "Release Manager" \ --publisher "My Org Publishing" \ ./mybigrelease

Manually copy or move your individual RO-Crate directories (e.g., experimentacrate, experimentbcrate)

into the ./mybigrelease directory now.

```

Build the release (link sub-crates, update metadata, generate datasheet):

console $ fairscape-cli release build ./my_big_release

Publishing

Upload an RO-Crate to Fairscape:

```console

Ensure FAIRSCAPEUSERNAME and FAIRSCAPEPASSWORD are set as environment variables or use options

$ fairscape-cli publish fairscape \ --rocrate ./myanalysiscrate \ --username \ --password

Works with either directories or zip files

$ fairscape-cli publish fairscape \ --rocrate ./myanalysiscrate.zip \ --username \ --password \ --api-url https://fairscape.example.edu/api ```

Publish RO-Crate metadata to Dataverse:

```console

Ensure DATAVERSEAPITOKEN is set as an environment variable or use --token

$ fairscape-cli publish dataverse \ --rocrate ./myanalysiscrate/ro-crate-metadata.json \ --url https://my.dataverse.instance.edu \ --collection mycollectionalias \ --token ```

Mint a DOI using DataCite:

```console

Ensure DATACITEUSERNAME and DATACITEPASSWORD are set or use options

$ fairscape-cli publish doi \ --rocrate ./myanalysiscrate/ro-crate-metadata.json \ --prefix 10.1234 \ --username MYORG.MYREPO \ --password \ --event publish # or 'register' for draft ```

Contribution

If you'd like to request a feature or report a bug, please create a GitHub Issue using one of the templates provided.

License

This project is licensed under the terms of the MIT license.

Owner

  • Name: fairscape
  • Login: fairscape
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: FAIRSCAPE CLI
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Maxwell Adam
    family-names: Levinson
    email: mal8ch@virginia.edu
    affiliation: University of Virginia
    orcid: 'https://orcid.org/0000-0003-0384-8499'
  - given-names: Sadnan
    family-names: Al Manir
    affiliation: University of Virginia
    email: ma3xy@virginia.edu
    orcid: 'https://orcid.org/0000-0003-4647-3877'
  - given-names: Timothy
    family-names: Clark
    email: twclark@virginia.edu
    affiliation: University of Virginia
    orcid: 'https://orcid.org/0000-0003-4060-7360'
repository-code: 'https://github.com/fairscape/fairscape-cli'
url: 'https://fairscape.net'
abstract: >-
  FAIRSCAPE is a FAIRness and AI-readiness service providing
  deep provenance graphs and data dictionaries with data
  element validation on uploaded data, software, and
  computations, with special reference to biomedical
  datasets. FAIRSCAPE provenance graphs are represented
  using the Evidence Graph Ontology, EVI, an OWL2
  representation derived from W3C PROV and specialized for
  biomedical research data. The FAIRSCAPE server is a
  cloud-ready environment that processes RO-Crate
  data+metadata packages produced by the FAIRSCAPE CLI- and
  GUI-based clients, registers them with persistent IDs,
  decomposes and registers their components, computes
  provenance graph entailments of each component, and
  integrates the provenance graphs. FAIRSCAPE server
  provides a web-based GUI for inspecting metadata,
  visualizing provenance graphs, and obtaining downloads
  packaged as RO-Crates. 


  FAIRSCAPE is supported by the U.S. National Institutes of
  Health Bridge2AI program under grants OT2OD032742
  [Bridge2AI: Cell Maps for AI (CM4AI) Data Generation
  Project] and OT2OD032701 [Bridge2AI: Patient-Focused
  Collaborative Hospital Repository Uniting Standards
  (CHoRUS) for Equitable AI],  and by the Frederick Thomas
  Fund of the University of Virginia.
keywords:
  - FAIR
  - Data Science
  - EVI
  - Ontology
  - Provenance
license: MIT

GitHub Events

Total
  • Release event: 3
  • Delete event: 2
  • Push event: 123
  • Pull request event: 9
  • Create event: 9
Last Year
  • Release event: 3
  • Delete event: 2
  • Push event: 123
  • Pull request event: 9
  • Create event: 9

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mlev71 (4)
  • coleslaw481 (2)
Pull Request Authors
  • mlev71 (11)
  • sadnanalmanir (3)
  • jniestroy (3)
  • pdurbin (1)
  • coleslaw481 (1)
Top Labels
Issue Labels
enhancement (4)
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 350 last-month
  • Total dependent packages: 3
  • Total dependent repositories: 0
  • Total versions: 43
  • Total maintainers: 2
pypi.org: fairscape-cli

A utility for packaging objects and validating metadata for FAIRSCAPE

  • Homepage: https://github.com/fairscape/fairscape-cli
  • Documentation: https://fairscape.github.io/fairscape-cli/
  • License: Copyright 2023 THE RECTOR AND VISITORS OF THE UNIVERSITY OF VIRGINIA Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  • Latest release: 1.1.6
    published 7 months ago
  • Versions: 43
  • Dependent Packages: 3
  • Dependent Repositories: 0
  • Downloads: 350 Last month
Rankings
Dependent packages count: 6.9%
Average: 18.7%
Dependent repos count: 30.5%
Maintainers (2)
Last synced: 6 months ago

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v2 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/python-test.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
Dockerfile docker
  • python 3.11.6-slim build
pyproject.toml pypi
  • click ^8.1.3
  • fairscape-models *
  • fairscape-models ^0.1.2
  • imageio ^2.27.0
  • mkdocs-material ^9.1.18
  • pandas ^2.0.0
  • prettytable ^3.7.0
  • pydantic *
  • pyld ^2.0.3
  • pyld *
  • python ^3.8
setup.py pypi
  • Click *
  • prettytable >=3.9.0
  • pydantic >=2.5.1