Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Keywords
Repository
Resource code for MLSea
Basic Info
- Host: GitHub
- Owner: dtai-kg
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: http://w3id.org/mlsea
- Size: 1.54 MB
Statistics
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 2
- Releases: 2
Topics
Metadata Files
README.md
MLSea Resource Code
This repository contains source code and RML mappings used for creating MLSea-KG, a declaratively constructed and regularly updated machine learning KG with more than 1.44 billion RDF triples containing metadata about machine learning: - Datasets - Tasks - Implementations and related hyper-parameters - Experiment executions, their configuration settings and evaluation results - Code notebooks and repositories - Algorithms - Publications - Models - Scientists and practitioners
The data were gathered and integrated from OpenML, Kaggle and Papers with Code.
MLSea-KG Construction Process Overview
Data Integration
Resource code directory contains resource code used for collecting, pre-processing, sampling and declaratively generating RDF triples, using the declarative mappings included. The input data sources used are the OpenML data extracted from the OpenML API, the Meta Kaggle CSVs and the Papers with Code dumps, which are not included in this repository.
OpenML CSV dumps are also generated, to store data retrieved from the OpenML API.
RML Mappings
The RML mapping that were used for each platform are also provided, demonstrating the rules used to declaratively construct MLSea-KG. Both common RML mappings and the corresponding in-memory RML mappings used to generate RDF from in-memory samples are provided, complemented by their YARRRML serialization.
Querying MLSea-KG
MLSea-KG is accessible through our SPARQL endpoint. The sparql_examples folder contains example queries for traversing MLSea-KG.
MLSea-KG Snapshots
MLSea-KG snapshots are available at MLSea-KG's Zenodo repository.
Resource Code Pagkage Installation
Clone the repository:
git clone https://github.com/dtai-kg/MLSea-KGC.git
Install dependencies:
pip install requirements.txt
Resource Code Pagkage Usage
Import Original Data Sources
- Edit 'config.py' to set the target locations where imported data sources will be stored.
- Download Kaggle metadata through the Meta Kaggle dataset.
- Download Papers with Code metadata through the Papers with Code dump files.
Download OpenML metatadata and store them as CSV backups through the OpenML API service with 'openmldatacollector.py':
python openml_data_collector.py
Process RDF Mappings
View and explore the RDF mappings. Make necessary changes to the input sources paths to point to the location of your local data sources.
Generate RDF dumps
Generate the RDF dumps of MLSea-KG by running:
python data_integration_openml.py
python data_integration_kaggle.py
python data_integration_pwc.py
Cite
Thank you for reading! To cite our resource:
@InProceedings{dasoulas2024mlsea,
author = {Dasoulas, Ioannis and Yang, Duo and Dimou, Anastasia},
booktitle = {The Semantic Web},
title = {{MLSea: A Semantic Layer for Discoverable Machine Learning}},
year = {2024}
}
Owner
- Name: dtai-kg
- Login: dtai-kg
- Kind: organization
- Repositories: 1
- Profile: https://github.com/dtai-kg
Citation (CITATION.cff)
cff-version: 1.2.0
title: "MLSea"
license: Apache-2.0
authors:
- family-names: Dasoulas
given-names: Ioannis
preferred-citation:
authors:
- family-names: Dasoulas
given-names: Ioannis
- family-names: Yang
given-names: Duo
- family-names: Dimou
given-names: Anastasia
title: "MLSea: A Semantic Layer for Discoverable Machine Learning"
type: conference-paper
collection-title: "Proceedings of the 21\textsuperscript{th} Extended Semantic Web Conference (ESWC)"
year: 2024
GitHub Events
Total
- Issues event: 2
- Watch event: 2
- Issue comment event: 2
Last Year
- Issues event: 2
- Watch event: 2
- Issue comment event: 2
Dependencies
- morph_kgc ==2.6.4
- openml ==0.14.1
- pandas ==2.1.4
- python ==3.9.18
- validators ==0.22.0