reproschema

A standardized form generation and data collection schema to harmonize results by design across projects.

https://github.com/repronim/reproschema

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.0%) to scientific vocabulary

Keywords

clinical-assessments interoperability schema standardizing-assessments
Last synced: 4 months ago · JSON representation ·

Repository

A standardized form generation and data collection schema to harmonize results by design across projects.

Basic Info
Statistics
  • Stars: 17
  • Watchers: 12
  • Forks: 24
  • Open Issues: 20
  • Releases: 5
Topics
clinical-assessments interoperability schema standardizing-assessments
Created over 7 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation

README.md

Python package DOI

ReproSchema: Enhancing Research Reproducibility through Standardized Survey Data Collection

Table of Contents

Introduction

ReproSchema is a standardized framework for creating, sharing, and reusing cognitive and clinical assessments. It addresses the lack of consistency in assessment data acquisition across studies by providing a common schema that captures relationships between questionnaire elements from the start.

Key Benefits: - 📊 Rich Context: JSON-LD format provides semantic relationships rather than flat CSV files - 🔄 Version Control: Track different versions of questionnaires (e.g., PHQ-9, PHQ-8) - 🌍 Internationalization: Built-in support for multiple languages - 🔗 Persistent Identifiers: Unique IDs for items, activities, and protocols - ✅ Validation: Schema validation using SHACL ensures data quality - 🚀 Implementation Agnostic: Use with any software platform

Quick Start

Get started with ReproSchema in minutes:

```bash

Install the ReproSchema Python package

pip install reproschema

Validate an example schema

reproschema validate examples/protocols/protocol1.jsonld

Create a new protocol from template (requires cookiecutter)

pip install cookiecutter cookiecutter https://github.com/ReproNim/reproschema-protocol-cookiecutter ```

Prerequisites

Before using ReproSchema, ensure you have:

  • Python 3.8+: Required for the reproschema-py tools
  • Git: For version control and cloning repositories
  • Text Editor: Preferably with JSON/JSON-LD support (e.g., VS Code)
  • Basic JSON Knowledge: Understanding of JSON syntax

Optional but recommended: - GitHub Account: For hosting and sharing your schemas - Node.js: If using the reproschema-ui interface

Installation

Installing the Python Package

The easiest way to work with ReproSchema is through the Python package:

```bash

Using pip

pip install reproschema

Using pip with specific version

pip install reproschema==1.0.0

For development (with latest changes)

pip install git+https://github.com/ReproNim/reproschema-py.git ```

Cloning the Repository

To access examples and contribute to the schema:

bash git clone https://github.com/ReproNim/reproschema.git cd reproschema

Usage Examples

Creating a Simple Item

An item represents a single question in a questionnaire. Here's a basic example:

json { "@context": "https://raw.githubusercontent.com/ReproNim/reproschema/1.0.0/contexts/reproschema", "@type": "reproschema:Field", "@id": "age_item", "prefLabel": "Age", "description": "Participant's age in years", "schemaVersion": "1.0.0", "version": "1.0.0", "ui": { "inputType": "number" }, "responseOptions": { "valueType": "xsd:integer", "minValue": 0, "maxValue": 120, "unitCode": "years" } }

Creating an Activity

An activity groups related items (like a complete questionnaire):

json { "@context": "https://raw.githubusercontent.com/ReproNim/reproschema/1.0.0/contexts/reproschema", "@type": "reproschema:Activity", "@id": "demographics_activity", "prefLabel": "Demographics", "description": "Basic demographic information", "schemaVersion": "1.0.0", "version": "1.0.0", "ui": { "order": ["age_item", "gender_item"], "shuffle": false, "addProperties": [ { "variableName": "age", "isAbout": "age_item", "isVis": true }, { "variableName": "gender", "isAbout": "gender_item", "isVis": true } ] } }

Creating a Protocol

A protocol combines multiple activities for a complete study:

json { "@context": "https://raw.githubusercontent.com/ReproNim/reproschema/1.0.0/contexts/reproschema", "@type": "reproschema:Protocol", "@id": "my_study_protocol", "prefLabel": "My Research Study", "description": "Protocol for my research study", "schemaVersion": "1.0.0", "version": "1.0.0", "ui": { "order": ["demographics_activity", "phq9_activity"], "shuffle": false, "addProperties": [ { "variableName": "demographics", "isAbout": "demographics_activity", "prefLabel": "Demographics", "isVis": true }, { "variableName": "phq9", "isAbout": "phq9_activity", "prefLabel": "PHQ-9 Depression Scale", "isVis": true } ] } }

Validating Schemas

Always validate your schemas to ensure they're correctly formatted:

```bash

Validate a single file

reproschema validate my_protocol.jsonld

Validate all files in a directory

reproschema validate protocols/

Validate with detailed output

reproschema --log-level DEBUG validate my_schema.jsonld ```

Schema Structure

File Formats

ReproSchema uses several file formats:

  • JSON-LD (.jsonld): Primary format combining JSON with Linked Data
  • Turtle (.ttl): RDF serialization format
  • N-Triples (.nt): Line-based RDF format
  • YAML: Used for LinkML schema definitions

Schema Components

The ReproSchema consists of three hierarchical levels:

  1. Items (Fields): Individual questions or data points

    • Question text and descriptions
    • Input types (text, number, select, etc.)
    • Response options and constraints
    • Visibility conditions
  2. Activities: Collections of related items

    • Groups items into logical assessments
    • Defines item order and display logic
    • Can include scoring computations
    • Supports branching logic
  3. Protocols: Complete study designs

    • Combines multiple activities
    • Defines activity order and scheduling
    • Manages participant flow
    • Includes study-level metadata

ReproSchema Ecosystem

The ReproSchema project integrates five key components designed to standardize research protocols and enhance consistency across various stages of data collection:

1. Foundational Schema (reproschema)

This core schema delineates the content and relationships of protocols, assessments, and items to ensure consistency and facilitate data harmonization across studies.

2. Assessment Library (reproschema-library)

A comprehensive collection of standardized questionnaires, supporting the application of uniform assessments across time and different studies.

3. Python CLI Tool (reproschema-py)

Command-line interface tool that facilitates schema development and validation, aiding researchers in efficiently creating and refining data collection frameworks.

4. User Interface (reproschema-ui)

An intuitive web interface that simplifies the visualization and interaction with data, enhancing the manageability of the data collection process for researchers.

5. Protocol Template (reproschema-protocol-cookiecutter)

A customizable template that supports the design and implementation of research protocols tailored to specific study requirements.

Repository Structure

This repository contains:

reproschema/ ├── terms/ # ReproSchema vocabulary terms ├── contexts/ # JSON-LD context files ├── examples/ # Example protocols, activities, and items │ ├── activities/ # Sample activities │ ├── protocols/ # Sample protocols │ └── responses/ # Sample response data ├── linkml-schema/ # LinkML schema definitions ├── releases/ # Official release versions ├── docs/ # Documentation │ ├── tutorials/ # Step-by-step guides │ ├── how-to/ # Task-specific instructions │ └── user-guide/ # Comprehensive user documentation └── scripts/ # Utility scripts

Developing ReproSchema

Updating the schema

As of release 1.0.0, a linked data modeling language, LinkML, is used to create a YAML file with the schema.

The context file was automatically generated using LinkML, and then manually curated in order to support all the reproschema feature.

Style

This repo uses pre-commit to check styling. - Install pre-commit with pip: pip install pre-commit - In order to use it with the repository, you have to run run pre-commit install in the root directory the first time you use it.

Release

Upon release, there are additional formats, jsonsld, turtle, n-triples and pydantic that are created using LinkML tools, reproschema-py, and reproschema-specific script to "fix" the pydantic format. The entire process is automated in the GitHub Action Workflow: Validate and Release. This workflow must be manually triggered by the core developers once a new release is ready. All the releases can be found in releases directory.

Updating model in reproschema-py

Another GitHub Action Workflow: Create Pull Request to reproschema-py is responsible for creating pull request to the reproschema-py Python library with the new version of pydantic model and context. The workflow is currently also triggered manually by the core developers.

Licenses

Code

The content of this repository is distributed under the Apache 2.0 license.

Documentation

Creative Commons License
The corresponding documentation is licensed under a Creative Commons Attribution 4.0 International License.

Citation

If you use ReproSchema in your research, please cite our paper:

Chen Y, Jarecka D, Abraham S, Gau R, Ng E, Low D, Bevers I, Johnson A, Keshavan A, Klein A, Clucas J, Rosli Z, Hodge S, Linkersdörfer J, Bartsch H, Das S, Fair D, Kennedy D, Ghosh S. Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem. J Med Internet Res 2025;27:e63343. DOI: 10.2196/63343

Contributors

https://github.com/ReproNim/reproschema/graphs/contributors

Owner

  • Name: Center for Reproducible Neuroimaging Computation
  • Login: ReproNim
  • Kind: organization

Citation (CITATION.cff)

# schema: https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md

cff-version: 1.2.0

title: ReproSchema

abstract: >-
  A standardized form generation and data collection schema to harmonize results by design across projects.

version: 1.0.0-rc4

license: CC-BY-4.0

repository-code: https://github.com/ReproNim/reproschema.git

message: >-
  If you use ReproSchema in your research, please cite our JMIR publication.

identifiers:
  - description: Journal article describing the ReproSchema ecosystem
    type: doi
    value: 10.2196/63343
  - description: Zenodo archive of ReproSchema releases
    type: doi
    value: 10.5281/zenodo.4064939

keywords:
    - "schema"
    - "assessment"
    - "jsonld"
    - "rdf"
    - "repronim"
    - "reproducibility"
    - "data collection"

authors:
  - family-names: Chen
    given-names: Yibei
  - family-names: Jarecka
    given-names: Dorota
  - family-names: Abraham
    given-names: Samuel
  - family-names: Gau
    given-names: Rémi
  - family-names: Ng
    given-names: Eric
  - family-names: Low
    given-names: Deshea
  - family-names: Bevers
    given-names: Isaac
  - family-names: Johnson
    given-names: Adam
  - family-names: Keshavan
    given-names: Anisha
  - family-names: Klein
    given-names: Arno
  - family-names: Clucas
    given-names: Jon
  - family-names: Rosli
    given-names: Zulfikar
  - family-names: Hodge
    given-names: Steven
  - family-names: Linkersdörfer
    given-names: Janosch
  - family-names: Bartsch
    given-names: Hauke
  - family-names: Das
    given-names: Samir
  - family-names: Fair
    given-names: Damien
  - family-names: Kennedy
    given-names: David
  - family-names: Ghosh
    given-names: Satrajit

preferred-citation:
  type: article
  title: "Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem"
  authors:
    - family-names: Chen
      given-names: Yibei
    - family-names: Jarecka
      given-names: Dorota
    - family-names: Abraham
      given-names: Samuel
    - family-names: Gau
      given-names: Rémi
    - family-names: Ng
      given-names: Eric
    - family-names: Low
      given-names: Deshea
    - family-names: Bevers
      given-names: Isaac
    - family-names: Johnson
      given-names: Adam
    - family-names: Keshavan
      given-names: Anisha
    - family-names: Klein
      given-names: Arno
    - family-names: Clucas
      given-names: Jon
    - family-names: Rosli
      given-names: Zulfikar
    - family-names: Hodge
      given-names: Steven
    - family-names: Linkersdörfer
      given-names: Janosch
    - family-names: Bartsch
      given-names: Hauke
    - family-names: Das
      given-names: Samir
    - family-names: Fair
      given-names: Damien
    - family-names: Kennedy
      given-names: David
    - family-names: Ghosh
      given-names: Satrajit
  journal: "Journal of Medical Internet Research"
  volume: 27
  year: 2025
  doi: 10.2196/63343
  url: https://www.jmir.org/2025/1/e63343

GitHub Events

Total
  • Create event: 1
  • Issues event: 4
  • Watch event: 4
  • Delete event: 1
  • Member event: 2
  • Issue comment event: 10
  • Push event: 3
  • Pull request review event: 9
  • Pull request review comment event: 6
  • Pull request event: 4
  • Fork event: 1
Last Year
  • Create event: 1
  • Issues event: 4
  • Watch event: 4
  • Delete event: 1
  • Member event: 2
  • Issue comment event: 10
  • Push event: 3
  • Pull request review event: 9
  • Pull request review comment event: 6
  • Pull request event: 4
  • Fork event: 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 54
  • Total pull requests: 138
  • Average time to close issues: over 2 years
  • Average time to close pull requests: about 1 month
  • Total issue authors: 14
  • Total pull request authors: 10
  • Average comments per issue: 2.26
  • Average comments per pull request: 1.23
  • Merged pull requests: 124
  • Bot issues: 0
  • Bot pull requests: 11
Past Year
  • Issues: 4
  • Pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: 3 days
  • Issue authors: 4
  • Pull request authors: 3
  • Average comments per issue: 0.25
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • akeshavan (13)
  • Remi-Gau (11)
  • sanuann (11)
  • satra (9)
  • yibeichan (5)
  • djarecka (2)
  • dnkennedy (2)
  • stufreen (1)
  • cmungall (1)
  • shnizzedy (1)
  • ibevers (1)
  • yarikoptic (1)
  • Rahul-Brito (1)
  • ericearl (1)
Pull Request Authors
  • sanuann (74)
  • djarecka (58)
  • Remi-Gau (34)
  • dependabot[bot] (14)
  • github-actions[bot] (9)
  • yibeichan (9)
  • satra (4)
  • akeshavan (4)
  • lioneldeveauxcri (1)
  • yarikoptic (1)
Top Labels
Issue Labels
enhancement (4)
Pull Request Labels
dependencies (14)

Dependencies

requirements.txt pypi
  • mkdocs-material *
  • pymdown-extensions *
  • rdflib ==5.0.0
.github/workflows/publishdocs.yaml actions
  • actions/checkout v1 composite
  • mhausenblas/mkdocs-deploy-gh-pages master composite
.github/workflows/pythonpackage.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/check_md_links.yml actions
  • actions/checkout v4 composite
  • gaurav-nelson/github-action-markdown-link-check v1 composite
.github/workflows/run_precommit.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • pre-commit/action v3.0.0 composite
.github/workflows/update_precommit_hooks.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • peter-evans/create-pull-request v5 composite