unhide-docs
Mirror of https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/documentation
https://github.com/materials-data-science-and-informatics/unhide-docs
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary
Repository
Mirror of https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/documentation
Basic Info
- Host: GitHub
- Owner: Materials-Data-Science-and-Informatics
- Default Branch: main
- Size: 7.38 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md

unHIDE - unified Helmholtz Information and data exchange
Authors
Pier Luigi Buttigieg
0000-0002-4366-3088
Volker Hofmann
0000-0002-5149-603X
Introduction & Scope
Research across the Helmholtz Association (HGF) depends and thrives on a complex network of inter- and multidisciplinary collaborations which spans across its 18 Centres and beyond.
However, the (meta)data generated through the HGF's research and operations is typically siloed within institutional infrastructure and often within individual teams. The result is that the wealth of the HGF's (meta)data is stored and maintained in a scattered manner, and cannot be used to its full value to scientists, managers, strategists, and policy makers.
To address this challenge, the Helmholtz Metadata Collaboration (HMC) is launching the unified Helmholtz Information and Data Exchange (unHIDE). This initiative seeks to create a lightweight and sustainable interoperability layer to interlink data infrastructures and provide greater, cross-organisational access to the HGF's (meta)data and information assets. Using proven and globally adopted knowledge graph technology (Box 1), unHIDE will develop a comprehensive association-wide Knowledge Graph (KG) the "Helmholtz-KG": a solution to connect (meta)data, information, and knowledge.
Box 1
What is a Knowledge Graph? - A "graph", from graph theory, is a structure that models pairwise connections between objects using "nodes" connected by "edges". - A "knowledge graph" uses such a graph structure to capture knowledge about how a collection of things (represented as nodes) relate to one another (via edges). This helps organisations keep track of their collective knowledge, especially in complex and rapidly changing scenarios. - Social networks are perhaps the best known graphs, that store knowledge about who knows whom and how, what their interests are, what groups they belong to, and what content they create and interact with.
With the implementation of the Helmholtz-KG, unHIDE will create substantial additional value for the Helmholtz digital ecosystem and its interconnectivity:
With the development of the Helmholtz-KG, unHide will: - increase discoverability and actionability of HGF data across the whole Association* - motivate enhancement of (meta)data quality [1] and interoperability - provide overviews and diagnostics of the HGF dataspace and digital assets - allow for traceable and reproducible recovery of (meta)data to enhance research - support connectivity of HGF data to interact with global infrastructures and projects - act as a central access and distribution point for stakeholders within and beyond the HGF
Architecture & Implementation
Foundational architecture
The Helmholtz Knowledge Graph (Helmholtz-KG) aims to enhance the HGF's digital capacities, transparency, and productivity through dissemination and implementation of Linked Data principles (Box 2). Thus, unHIDE will build the Helmholtz-KG on mature web architecture and state-of-the-art semantic web technologies. This will ensure reliability and compatibility with global systems, while also exploring innovative approaches to maximise the Helmholtz-KG's ability to accelerate research and operations.
Box 2
Graph data is: - Open-world, allowing resilient operations with novel or unexpected data flows - Faster than using SQL and associated JOIN operations - Better suited to integrating data from heterogeneous sources - Better suited to situations where the data model is complex and (rapidly) evolving
To ensure ease of use, the Helmholtz-KG will be based on a lightweight and internationally adopted interoperabiliy architecture based on schema.org semantics and JSON-LD serialisation [2]. This architecture is widely used by data producers - including public, private, and governmental data systems - to link and expose scattered, diverse digital assets. By reusing this architecture, unHIDE will ensure that the Helmholtz-KG is able to natively interoperate with global systems.
Modular design & Extensibility
While the foundation of the Helmholtz-KG will reuse standard web architectural elements and proven, globally adopted conventions, the KG itself is modular by nature: Graphs can be merged, split, independently managed, and readily interfaced with other digital resources without compromising core integrity and functionality. In this manner, Helmholtz data scientists and engineers will be able to propose and test extensions to the graph with minimal overhead, which will support the ability to extend into existing and well-established systems in the HGF.
This modularity (especially the ability to securely and independently manage parts of the overall graph) will also allow to realize different modes of access to digital assets (e.g. respecting sensitivity and confidentiality but also permitting full openness). The initial implementation of the Helmholtz-KG will not contain sensitive or confidential data, but such capacities (e.g. user management, license recognition across (meta)data holdings, and authentication) can be explored and implemented when the core technology and operational procedures are stabilised.
The backbone architecture of the Helmholtz Knowledge Graph will be licensed under CC0/CCBY to enable crosswalks to the outside world and gain visibility as e.g. a sub-cloud of the Linked Open Data Cloud.
Inspiration
The implementation of the Helmholtz-KG architecture is inspired by the federation of stakeholders in IOC-UNESCO's Ocean Data and Information System (ODIS), interconnected by the ODIS Architecture [2], and rendered into a knowledge graph federating over 50 partners across the globe by the Ocean InfoHub Project (OIH). Personnel from the HMC's Earth and Environment Hub chair the ODIS federation and lead the technical implementation of OIH, offering direct alignment with unHIDE.
Data Sources
Initial Scope
Initial efforts of the Helmholtz-KG implementation will focus on the representation of (meta)data describing the following digital core assets: - Documents / Publications - published Datasets - Software - Institutions - Infrastructure & Resources - Researchers & Experts - Projects
The representation of these instances will be semantically aligned with the schema.org vocabulary, a globally adopted standard offering a relaxed frame for the representation of heterogeneous data. Following the initial implementation the semantic expressiveness of the graph can be increased by integrating domain ontologies such as the HMC developed Helmholtz Digitization Ontology (HDO), which provides precise and comprehensive semantics of the concepts and practices used to manage digital assets.
Data Ingestion Process
The Helmholtz-KG will offer multiple options for existing and emerging HGF infrastructures, data providers, and communities to declare their resources and digital assets in the graph for discoverability. We will prioritise the recommended publishing process for structured data on the web (as used by ODIS/OIH and many others): data providers would either 1) provide a sitemap or robots.txt file which will direct harvesting software to a collection of JSON-LD/schema.org documents or 2) expose JSON-LD snippets in the document head element of a web resource (i.e. HTML document). Both approaches are described in the publisher documentation of the Ocean InfoHub Project [3].
Alternative publication patterns may include -- HTTP-accessible RO-Crate [4] metadata in ro-crate-metadata.json, the exposure of structured metadata records via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [5] or properly documented RESTful APIs in general [6]. We will explore the need and feasibility of alternate publishing and harvesting modes during the course of unHIDE.
HMC personnel will support the onboarding of data providers as well as the implementation of custom (meta)data pipelines / connectors and mapping to RDF / JSON-LD, if necessary and where appropriate.
Potential Data Providers
Within the HGF a number of relevant web-based data architectures exist. These will be targeted by unHIDE to collaborate on building interfaces to the Helmholtz-KG.
The initial implementation fill focus on: - HGF institutional (data) repositories - Central Libraries of Helmholtz Research Centers - Domain-specific (data) repositories relevant to HGF - Helmholtz GitLab Instances
Subsequent efforts will include further resources such as: - Helmholtz FAIR digital objects (via HMC) - Helmholtz Ontology Base (HOB) (via HMC) - The Helmholtz software directory (centrally maintained by HIFIS) - Helmholtz Data Challenges Platform - other resources of the Helmholtz Metadata Collaboration (HMC) - Content management systems (CMS) - Helmholtz Computing centers (e.g. JSC) - Helmholtz Federated IT Services (HIFIS) - Helmholtz instruments and sensor databases (e.g. @GFZ, DEPAS @AWI, RDMInfoPool, etc.) - Helmholtz Scientific Project Workflow Platform (HELIPORT) - Helmholtz Imaging Modalities database - Laboratory information management systems (LIMS) - Helmholtz Open Science Office
References
- [1] https://5stardata.info/en/
- [2] https://www.w3.org/TR/2014/REC-json-ld-20140116/
- [3] https://book.oceaninfohub.org/publishing/publishing.html
- [4] https://www.researchobject.org/ro-crate/
- [6] https://www.openarchives.org/pmh/
Owner
- Name: Materials Data Science and Informatics
- Login: Materials-Data-Science-and-Informatics
- Kind: organization
- Repositories: 8
- Profile: https://github.com/Materials-Data-Science-and-Informatics
Citation (CITATION.cff)
cff-version: 1.2.0 message: If you use or want to cite any software around unHIDE and the Helmholtz knowledge graph, please cite it using these metadata. type: software title: unhide-software abstract: This is representative for the software around the unHIDE initiative and the helmholtz knowledge graph. The software includes a library for harvesting linked data on the web, a web front end for user friendly search as well as documentation and docker compose deployment files for the whole software stack behind the Helmholtz knowledge graph, work from the unHIDE initiative. Deployment includes containers for the harvesters and utility, the API, SOLR, Virtuoso, Web Frontend as well as for nginx and letsencrypt. More information on the unHIDE initiative, which was launched by the Helmholtz Metadata Collaboration (HMC), you can find under https://docs.unhide.helmholtz-metadaten.de. version: 1.0.0 keywords: - unhide - Helmholtz association - deployment - docker - docker-compose - HMC - metadata - linked data - solr - virtuoso - knowledge graph - helmholtz knowledge graph - science authors: - orcid: https://orcid.org/0000-0002-0465-1009 family-names: D'Mello affiliation: Forschungszentrum Jülich GmbH given-names: Fiona email: f.dmello@fz-juelich.de - orcid: https://orcid.org/0000-0001-7939-226X family-names: Bröder affiliation: Forschungszentrum Jülich GmbH given-names: Jens email: j.broeder@fz-juelich.de - orcid: https://orcid.org/0000-0002-4366-3088 family-names: Buttigieg affiliation: Alfred Wegener Institute for Polar and Marine Research given-names: Pier Luigi email: pier.buttigieg@awi.de - orcid: https://orcid.org/0000-0002-2818-5890 family-names: Fathalla affiliation: Forschungszentrum Jülich GmbH given-names: Said email: s.fathalla@fz-juelich.de - orcid: https://orcid.org/0000-0002-3968-2446 family-names: Preuß affiliation: Helmholtz-Zentrum Berlin für Materialien und Energie given-names: Gabriel email: gabriel.preuss@helmholtz-berlin.de - orcid: https://orcid.org/0000-0002-5149-603X family-names: Hofmann affiliation: Forschungszentrum Jülich GmbH given-names: Volker email: v.hofmann@fz-juelich.de - orcid: https://orcid.org/0000-0001-9560-4728 family-names: Sandfeld affiliation: Forschungszentrum Jülich GmbH given-names: Stefan email: s.sandfeld@fz-juelich.de - orcid: https://orcid.org/0000-0003-0575-2853 family-names: Mannix affiliation: Helmholtz-Zentrum Berlin für Materialien und Energie given-names: Oonagh email: oonagh.mannix@helmholtz-berlin.de contact: - orcid: https://orcid.org/0000-0002-0465-1009 family-names: D'Mello affiliation: Forschungszentrum Jülich GmbH given-names: Fiona email: f.dmello@fz-juelich.de - orcid: https://orcid.org/0000-0001-7939-226X family-names: Bröder affiliation: Forschungszentrum Jülich GmbH given-names: Jens email: j.broeder@fz-juelich.de - orcid: https://orcid.org/0000-0002-3968-2446 family-names: Preuß affiliation: Helmholtz-Zentrum Berlin für Materialien und Energie given-names: Gabriel email: gabriel.preuss@helmholtz-berlin.de - orcid: https://orcid.org/0000-0003-0575-2853 family-names: Mannix affiliation: Helmholtz-Zentrum Berlin für Materialien und Energie given-names: Oonagh email: oonagh.mannix@helmholtz-berlin.de license: MIT url: https://codebase.helmholtz.cloud/hmc/hmc-public/unhide repository-code: https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/unhide-docker.git
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/deploy-pages v1 composite
- actions/setup-python v1 composite
- actions/upload-pages-artifact v1 composite
- jupyter-book *