reproschema
A standardized form generation and data collection schema to harmonize results by design across projects.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.0%) to scientific vocabulary
Keywords
Repository
A standardized form generation and data collection schema to harmonize results by design across projects.
Basic Info
- Host: GitHub
- Owner: ReproNim
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://www.repronim.org/reproschema/
- Size: 22.2 MB
Statistics
- Stars: 17
- Watchers: 12
- Forks: 24
- Open Issues: 20
- Releases: 5
Topics
Metadata Files
README.md

ReproSchema: Enhancing Research Reproducibility through Standardized Survey Data Collection
Table of Contents
- Introduction
- Quick Start
- Prerequisites
- Installation
- Usage Examples
- Schema Structure
- ReproSchema Ecosystem
- Repository Structure
- Development
- Resources and Links
- Licenses
- Contributors
Introduction
ReproSchema is a standardized framework for creating, sharing, and reusing cognitive and clinical assessments. It addresses the lack of consistency in assessment data acquisition across studies by providing a common schema that captures relationships between questionnaire elements from the start.
Key Benefits: - 📊 Rich Context: JSON-LD format provides semantic relationships rather than flat CSV files - 🔄 Version Control: Track different versions of questionnaires (e.g., PHQ-9, PHQ-8) - 🌍 Internationalization: Built-in support for multiple languages - 🔗 Persistent Identifiers: Unique IDs for items, activities, and protocols - ✅ Validation: Schema validation using SHACL ensures data quality - 🚀 Implementation Agnostic: Use with any software platform
Quick Start
Get started with ReproSchema in minutes:
```bash
Install the ReproSchema Python package
pip install reproschema
Validate an example schema
reproschema validate examples/protocols/protocol1.jsonld
Create a new protocol from template (requires cookiecutter)
pip install cookiecutter cookiecutter https://github.com/ReproNim/reproschema-protocol-cookiecutter ```
Prerequisites
Before using ReproSchema, ensure you have:
- Python 3.8+: Required for the reproschema-py tools
- Git: For version control and cloning repositories
- Text Editor: Preferably with JSON/JSON-LD support (e.g., VS Code)
- Basic JSON Knowledge: Understanding of JSON syntax
Optional but recommended: - GitHub Account: For hosting and sharing your schemas - Node.js: If using the reproschema-ui interface
Installation
Installing the Python Package
The easiest way to work with ReproSchema is through the Python package:
```bash
Using pip
pip install reproschema
Using pip with specific version
pip install reproschema==1.0.0
For development (with latest changes)
pip install git+https://github.com/ReproNim/reproschema-py.git ```
Cloning the Repository
To access examples and contribute to the schema:
bash
git clone https://github.com/ReproNim/reproschema.git
cd reproschema
Usage Examples
Creating a Simple Item
An item represents a single question in a questionnaire. Here's a basic example:
json
{
"@context": "https://raw.githubusercontent.com/ReproNim/reproschema/1.0.0/contexts/reproschema",
"@type": "reproschema:Field",
"@id": "age_item",
"prefLabel": "Age",
"description": "Participant's age in years",
"schemaVersion": "1.0.0",
"version": "1.0.0",
"ui": {
"inputType": "number"
},
"responseOptions": {
"valueType": "xsd:integer",
"minValue": 0,
"maxValue": 120,
"unitCode": "years"
}
}
Creating an Activity
An activity groups related items (like a complete questionnaire):
json
{
"@context": "https://raw.githubusercontent.com/ReproNim/reproschema/1.0.0/contexts/reproschema",
"@type": "reproschema:Activity",
"@id": "demographics_activity",
"prefLabel": "Demographics",
"description": "Basic demographic information",
"schemaVersion": "1.0.0",
"version": "1.0.0",
"ui": {
"order": ["age_item", "gender_item"],
"shuffle": false,
"addProperties": [
{
"variableName": "age",
"isAbout": "age_item",
"isVis": true
},
{
"variableName": "gender",
"isAbout": "gender_item",
"isVis": true
}
]
}
}
Creating a Protocol
A protocol combines multiple activities for a complete study:
json
{
"@context": "https://raw.githubusercontent.com/ReproNim/reproschema/1.0.0/contexts/reproschema",
"@type": "reproschema:Protocol",
"@id": "my_study_protocol",
"prefLabel": "My Research Study",
"description": "Protocol for my research study",
"schemaVersion": "1.0.0",
"version": "1.0.0",
"ui": {
"order": ["demographics_activity", "phq9_activity"],
"shuffle": false,
"addProperties": [
{
"variableName": "demographics",
"isAbout": "demographics_activity",
"prefLabel": "Demographics",
"isVis": true
},
{
"variableName": "phq9",
"isAbout": "phq9_activity",
"prefLabel": "PHQ-9 Depression Scale",
"isVis": true
}
]
}
}
Validating Schemas
Always validate your schemas to ensure they're correctly formatted:
```bash
Validate a single file
reproschema validate my_protocol.jsonld
Validate all files in a directory
reproschema validate protocols/
Validate with detailed output
reproschema --log-level DEBUG validate my_schema.jsonld ```
Schema Structure
File Formats
ReproSchema uses several file formats:
- JSON-LD (.jsonld): Primary format combining JSON with Linked Data
- Turtle (.ttl): RDF serialization format
- N-Triples (.nt): Line-based RDF format
- YAML: Used for LinkML schema definitions
Schema Components
The ReproSchema consists of three hierarchical levels:
Items (Fields): Individual questions or data points
- Question text and descriptions
- Input types (text, number, select, etc.)
- Response options and constraints
- Visibility conditions
Activities: Collections of related items
- Groups items into logical assessments
- Defines item order and display logic
- Can include scoring computations
- Supports branching logic
Protocols: Complete study designs
- Combines multiple activities
- Defines activity order and scheduling
- Manages participant flow
- Includes study-level metadata
ReproSchema Ecosystem
The ReproSchema project integrates five key components designed to standardize research protocols and enhance consistency across various stages of data collection:
1. Foundational Schema (reproschema)
This core schema delineates the content and relationships of protocols, assessments, and items to ensure consistency and facilitate data harmonization across studies.
2. Assessment Library (reproschema-library)
A comprehensive collection of standardized questionnaires, supporting the application of uniform assessments across time and different studies.
3. Python CLI Tool (reproschema-py)
Command-line interface tool that facilitates schema development and validation, aiding researchers in efficiently creating and refining data collection frameworks.
4. User Interface (reproschema-ui)
An intuitive web interface that simplifies the visualization and interaction with data, enhancing the manageability of the data collection process for researchers.
5. Protocol Template (reproschema-protocol-cookiecutter)
A customizable template that supports the design and implementation of research protocols tailored to specific study requirements.
Repository Structure
This repository contains:
reproschema/
├── terms/ # ReproSchema vocabulary terms
├── contexts/ # JSON-LD context files
├── examples/ # Example protocols, activities, and items
│ ├── activities/ # Sample activities
│ ├── protocols/ # Sample protocols
│ └── responses/ # Sample response data
├── linkml-schema/ # LinkML schema definitions
├── releases/ # Official release versions
├── docs/ # Documentation
│ ├── tutorials/ # Step-by-step guides
│ ├── how-to/ # Task-specific instructions
│ └── user-guide/ # Comprehensive user documentation
└── scripts/ # Utility scripts
Developing ReproSchema
Updating the schema
As of release 1.0.0, a linked data modeling language, LinkML, is used to create a YAML file with the schema.
The context file was automatically generated using LinkML, and then manually curated in order to support all the reproschema feature.
Style
This repo uses pre-commit to check styling.
- Install pre-commit with pip: pip install pre-commit
- In order to use it with the repository, you have to run run pre-commit install in the root directory the first time you use it.
Release
Upon release, there are additional formats, jsonsld, turtle, n-triples
and pydantic that are created using LinkML tools, reproschema-py,
and reproschema-specific script to "fix" the pydantic format.
The entire process is automated in the GitHub Action Workflow:
Validate and Release.
This workflow must be manually triggered by the core developers once a new release is ready.
All the releases can be found in releases directory.
Updating model in reproschema-py
Another GitHub Action Workflow: Create Pull Request to reproschema-py
is responsible for creating pull request to the reproschema-py Python library with
the new version of pydantic model and context.
The workflow is currently also triggered manually by the core developers.
Licenses
Code
The content of this repository is distributed under the Apache 2.0 license.
Documentation

The corresponding documentation is licensed under a Creative Commons Attribution 4.0 International License.
Citation
If you use ReproSchema in your research, please cite our paper:
Chen Y, Jarecka D, Abraham S, Gau R, Ng E, Low D, Bevers I, Johnson A, Keshavan A, Klein A, Clucas J, Rosli Z, Hodge S, Linkersdörfer J, Bartsch H, Das S, Fair D, Kennedy D, Ghosh S. Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem. J Med Internet Res 2025;27:e63343. DOI: 10.2196/63343
Contributors
https://github.com/ReproNim/reproschema/graphs/contributors
Owner
- Name: Center for Reproducible Neuroimaging Computation
- Login: ReproNim
- Kind: organization
- Website: http://repronim.org
- Repositories: 75
- Profile: https://github.com/ReproNim
Citation (CITATION.cff)
# schema: https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md
cff-version: 1.2.0
title: ReproSchema
abstract: >-
A standardized form generation and data collection schema to harmonize results by design across projects.
version: 1.0.0-rc4
license: CC-BY-4.0
repository-code: https://github.com/ReproNim/reproschema.git
message: >-
If you use ReproSchema in your research, please cite our JMIR publication.
identifiers:
- description: Journal article describing the ReproSchema ecosystem
type: doi
value: 10.2196/63343
- description: Zenodo archive of ReproSchema releases
type: doi
value: 10.5281/zenodo.4064939
keywords:
- "schema"
- "assessment"
- "jsonld"
- "rdf"
- "repronim"
- "reproducibility"
- "data collection"
authors:
- family-names: Chen
given-names: Yibei
- family-names: Jarecka
given-names: Dorota
- family-names: Abraham
given-names: Samuel
- family-names: Gau
given-names: Rémi
- family-names: Ng
given-names: Eric
- family-names: Low
given-names: Deshea
- family-names: Bevers
given-names: Isaac
- family-names: Johnson
given-names: Adam
- family-names: Keshavan
given-names: Anisha
- family-names: Klein
given-names: Arno
- family-names: Clucas
given-names: Jon
- family-names: Rosli
given-names: Zulfikar
- family-names: Hodge
given-names: Steven
- family-names: Linkersdörfer
given-names: Janosch
- family-names: Bartsch
given-names: Hauke
- family-names: Das
given-names: Samir
- family-names: Fair
given-names: Damien
- family-names: Kennedy
given-names: David
- family-names: Ghosh
given-names: Satrajit
preferred-citation:
type: article
title: "Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem"
authors:
- family-names: Chen
given-names: Yibei
- family-names: Jarecka
given-names: Dorota
- family-names: Abraham
given-names: Samuel
- family-names: Gau
given-names: Rémi
- family-names: Ng
given-names: Eric
- family-names: Low
given-names: Deshea
- family-names: Bevers
given-names: Isaac
- family-names: Johnson
given-names: Adam
- family-names: Keshavan
given-names: Anisha
- family-names: Klein
given-names: Arno
- family-names: Clucas
given-names: Jon
- family-names: Rosli
given-names: Zulfikar
- family-names: Hodge
given-names: Steven
- family-names: Linkersdörfer
given-names: Janosch
- family-names: Bartsch
given-names: Hauke
- family-names: Das
given-names: Samir
- family-names: Fair
given-names: Damien
- family-names: Kennedy
given-names: David
- family-names: Ghosh
given-names: Satrajit
journal: "Journal of Medical Internet Research"
volume: 27
year: 2025
doi: 10.2196/63343
url: https://www.jmir.org/2025/1/e63343
GitHub Events
Total
- Create event: 1
- Issues event: 4
- Watch event: 4
- Delete event: 1
- Member event: 2
- Issue comment event: 10
- Push event: 3
- Pull request review event: 9
- Pull request review comment event: 6
- Pull request event: 4
- Fork event: 1
Last Year
- Create event: 1
- Issues event: 4
- Watch event: 4
- Delete event: 1
- Member event: 2
- Issue comment event: 10
- Push event: 3
- Pull request review event: 9
- Pull request review comment event: 6
- Pull request event: 4
- Fork event: 1
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 54
- Total pull requests: 138
- Average time to close issues: over 2 years
- Average time to close pull requests: about 1 month
- Total issue authors: 14
- Total pull request authors: 10
- Average comments per issue: 2.26
- Average comments per pull request: 1.23
- Merged pull requests: 124
- Bot issues: 0
- Bot pull requests: 11
Past Year
- Issues: 4
- Pull requests: 6
- Average time to close issues: N/A
- Average time to close pull requests: 3 days
- Issue authors: 4
- Pull request authors: 3
- Average comments per issue: 0.25
- Average comments per pull request: 1.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- akeshavan (13)
- Remi-Gau (11)
- sanuann (11)
- satra (9)
- yibeichan (5)
- djarecka (2)
- dnkennedy (2)
- stufreen (1)
- cmungall (1)
- shnizzedy (1)
- ibevers (1)
- yarikoptic (1)
- Rahul-Brito (1)
- ericearl (1)
Pull Request Authors
- sanuann (74)
- djarecka (58)
- Remi-Gau (34)
- dependabot[bot] (14)
- github-actions[bot] (9)
- yibeichan (9)
- satra (4)
- akeshavan (4)
- lioneldeveauxcri (1)
- yarikoptic (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- mkdocs-material *
- pymdown-extensions *
- rdflib ==5.0.0
- actions/checkout v1 composite
- mhausenblas/mkdocs-deploy-gh-pages master composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v4 composite
- gaurav-nelson/github-action-markdown-link-check v1 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- pre-commit/action v3.0.0 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- peter-evans/create-pull-request v5 composite