in-silico-protocols-4ml
Semantically enriched in-silico protocols for ML experiments to improve FAIRness, transparency and reproducibility
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: nature.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
·
Repository
Semantically enriched in-silico protocols for ML experiments to improve FAIRness, transparency and reproducibility
Basic Info
- Host: GitHub
- Owner: zbmed-semtec
- License: other
- Default Branch: main
- Size: 26.4 KB
Statistics
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 1 year ago
· Last pushed 7 months ago
Metadata Files
Readme
License
Citation
README.md
In-silico protocols for ML models
This project builds on top of [SMART Protocols](https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0160-y) for wet-lab experiments to adapt them and extend them to cover Machine Learning experiments.
Lab-protocols often accompany wet-lab experiments as text-based documents describing the sequence of tasks and operations executed, including, e.g., references to equipment, reagents troubleshooting and tips. [SMART Protocols](https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0160-y) provides a semantic layer for lab-protocols so, for instance, reagents are linked to (semantic) chemical databases such as ChEMBL.
Similarly to wet-lab experiments, Machine Learning (ML) experiments are also composed of inputs, steps and outputs and would benefit from semantically enriched in-silico protocols. We aim at providing a semantic layer, namely in-silico protocol, for ML experiments supporting FAIRness for ML, transparency and reproducible outcomes.
## Documentation of ML experiments
Efforts to standardize reports and documentation describing ML models include [ML Model Cards](https://huggingface.co/blog/model-cards), [DOME recommendations](https://www.nature.com/articles/s41592-021-01205-4), and [AIMe registry](https://www.nature.com/articles/s41592-021-01241-0), while [Dataset Cards](https://huggingface.co/docs/hub/en/datasets-cards) are used to document those datasets used in ML experiments.
DOME recommendations stands for Data, Optimization, Model and Evaluation in the general ML pipeline. DOME comprises an extensive array of community-centered guidelines, recommendations, and checklists that cover these four domains, aimed at facilitating standardized methodologies for supervised machine learning validation within the biological sciences.
Model cards are documents that come with the models and offer valuable insights. At their core, model cards are straightforward Markdown documents enriched with extra metadata. They play a crucial role in enhancing discoverability, ensuring reproducibility, and facilitating sharing. Model cards contain the information of the model, intended uses, biases and legal considerations, limitations, used datasets, training parameters, performance evaluation.
Dataset cards can be described in detail through the README.md file located in the repository. This document is referred to as a dataset card, and the Hugging Face Hub will display its information on the main page of the dataset. To guide users on the responsible usage of the data, it is advisable to include details regarding any possible biases present in the dataset. Typically, dataset cards assist users in grasping the details of the dataset and provide context on how it should be utilized.
## Semantic approaches for ML experiments
Semantic representations for ML experiments are still new. Recent efforts by [ML Commons](https://mlcommons.org/) provide a representation based on schema.org for datasets used in ML experiments, namely [Croissant ML](https://github.com/mlcommons/croissant).
On its side, the [RDA FAIR4ML Interest Group](https://www.rd-alliance.org/groups/fair-machine-learning-fair4ml-ig/) has been working on an extension of schema.org to present ML models, vr 0.0.1 was released in 2024-06-04 and is available at [FAIR4ML for ML models](https://w3id.org/fair4ml). This version was created based on [crosswalks](https://github.com/RDA-FAIR4ML/FAIR4ML-crosswalks), mostly done during an NFDI4DS hackathon at ZB MED in November 2023. This group is also working on FAIRness for ML and FAIR elements associated with the ML life cycle.
Acknowledgements
I would like to express my gratitude to Dr. Olga Ximena Giraldo for her invaluable contributions and guidance in developing the project's outline.
Funding
This work is partially funded by the German Research Foundation (DFG) under the grant No. 460234259 corresponding to NFDI4DataScience consortium.
Owner
- Name: zbmed-semtec
- Login: zbmed-semtec
- Kind: organization
- Repositories: 12
- Profile: https://github.com/zbmed-semtec
Citation (CITATION.cff)
cff-version: 1.2.0 message: 'In-silico protocols for ML models' authors: - family-names: "Castro" given-names: "Leyla Jael" orcid: "https://orcid.org/0000-0003-3986-0510" - family-names: "Giraldo" given-names: "Olga" orcid: "https://orcid.org/0000-0003-2978-8922" - family-names: "Rebholz-Schuhmann" given-names: "Dietrich" orcid: "https://orcid.org/0000-0002-1018-0370" - family-names: "Solanki" given-names: "Dhwani" orcid: "https://orcid.org/0009-0004-1529-0095" title: " In-silico protocols for ML models" license: CC-BY-4.0 url: "https://github.com/zbmed-semtec/in-silico-protocols-4ML"
GitHub Events
Total
- Issues event: 3
- Push event: 3
Last Year
- Issues event: 3
- Push event: 3
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Dhwanisolanki12 (2)
Pull Request Authors
Top Labels
Issue Labels
ongoing (2)
documentation (1)