modelset-dataset
ModelSet is a labelled dataset of Ecore and UML models
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: springer.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Keywords
Repository
ModelSet is a labelled dataset of Ecore and UML models
Basic Info
- Host: GitHub
- Owner: modelset
- License: lgpl-3.0
- Language: Java
- Default Branch: master
- Homepage: https://modelset.github.io
- Size: 6.3 MB
Statistics
- Stars: 11
- Watchers: 1
- Forks: 2
- Open Issues: 2
- Releases: 5
Topics
Metadata Files
README.md
ModelSet
ModelSet is a labelled dataset of software models.
This repository contains:
- The ModelSet databases with the labelled datasets. See the Downloading ModelSet section for more information.
- The scripts to create the databases and generate the release. See the Building ModelSet section for more information.
You can find more information about ModelSet in the following Open Access paper: https://link.springer.com/article/10.1007%2Fs10270-021-00929-3.
Downloading ModelSet
To download ModelSet follow these steps:
- You can download the latest release from https://github.com/modelset/modelset-dataset/releases
- Unzip the package in some location of your workspace
The structure of the decompressed package is the following:
```
- datasets
- dataset.ecore/data/ecore.db
- dataset.genmymodel/data/genmymodel.db
- graph
- raw-data
- txt
``
3.1. Thedatasetsfolder contains the databases with the labelled models. The.db` files are SQLite databases containing the information about the models.
The database schema includes a table called model with model data (i.e., unique identifier, source repository and filename) and a table called metadata with label data (i.e., unique identifier and a JSON object with the label information). The following figure illustrates the database schema:
+-------------------+ +---------------------------------+ | model | | metadata | +-------------------+ +---------------------------------+ | id : VARCHAR {PK} | | id : VARCHAR {PK, FK(model.id)} | | source : VARCHAR | | json : TEXT | | filename : TEXT | +---------------------------------+ +-------------------+3.2. The
graphfolder contains the graph representation of the models.3.3. The
raw-datafolder contains the models serialized in XMI.3.4. The
txtfolder includes the strings of the models (e.g., to train simple NLP models).- datasets
Querying ModelSet via Java JDBC
Once you have downloaded ModelSet, you can use JDBC to query the databases.
For instance, the following code illustrates how to query the database using JDBC:
```java Connection dataset = DriverManager.getConnection("jdbc:sqlite:/path/to/dbfile"); PreparedStatement stm = dataset.prepareStatement("select mo.id, mo.filename, mm.metadata from models mo join metadata mm on mo.id = mm.id"); stm.execute();
ResultSet rs = stm.getResultSet();
while (rs.next()) {
String id = rs.getString(1);
String filename = rs.getString(2);
String metadata = rs.getString(3);
System.out.println(id + ": " + metadata);
}
```
Using ModelSet in Python
To use ModelSet in a typical Python/Jupyter setting, we recommend you to use the modelset-py Python library we have developed. Visit the corresponding repository for more information.
Examples
We provide some examples of how to use ModelSet in the examples repository.
Building ModelSet
Note: these steps are only required if you want to create a new release of ModelSet. If you just want to use ModelSet, you can download the latest release (see Downloading ModelSet section in this file)
To create the ModelSet release, you have to follow these steps:
- Execute
./bin/download-data.shto recover the model files which will be stored in theraw-datafolder. These files are not stored here as they have already published in existing GitHub repositories. - Execute
./bin/generate.shto generate additional artifacts. - Execute
./bin/build.shto build the ModelSet release package. - The ModelSet release package will have the name
modelset.zip.
Citation
If you find this dataset useful, please consider citing its associated paper: https://link.springer.com/article/10.1007/s10270-021-00929-3
@article{lopez2021modelset,
title = {{ModelSet: a dataset for machine learning in model-driven engineering}},
author = {L{\'o}pez, Jos{\'e} Antonio Hern{\'a}ndez and
C{\'a}novas Izquierdo, Javier Luis and
Cuadrado, Jes{\'u}s S{\'a}nchez},
journal = {Softw. Syst. Model.},
volume = {21},
number = {3},
pages = {967--986},
year = {2022},
url = {https://doi.org/10.1007/s10270-021-00929-3},
}
Contributing
We welcome contributions of all kinds, including extensions to the dataset, new empirical studies, and new features. If you want to contribute to ModelSet, please review our contribution guidelines and our governance model.
Note that we have a code of conduct that we expect project participants to adhere to. Please read it before contributing.
License
This dataset is licensed under the GNU Lesser General Public License v3.0.
Owner
- Name: ModelSet
- Login: modelset
- Kind: organization
- Website: https://modelset.github.io
- Repositories: 5
- Profile: https://github.com/modelset
A dataset for models
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Hernández López"
given-names: "José Antonio"
orcid: "https://orcid.org/0000-0003-2439-2136"
- family-names: "Cánovas Izquierdo"
given-names: "Javier Luis"
orcid: "https://orcid.org/0000-0002-2326-1700"
- family-names: "Sánchez Cuadrado"
given-names: "Jesús"
orcid: "https://orcid.org/0000-0001-9755-5616"
title: "ModelSet: a dataset for machine learning in model-driven engineering"
version: 1.0.0
doi: 10.1007/s10270-021-00929-3
date-released: 2022-06-23
url: "https://github.com/modelset/modelset-dataset"
preferred-citation:
type: article
authors:
- family-names: "Hernández López"
given-names: "José Antonio"
orcid: "https://orcid.org/0000-0003-2439-2136"
- family-names: "Cánovas Izquierdo"
given-names: "Javier Luis"
orcid: "https://orcid.org/0000-0002-2326-1700"
- family-names: "Sánchez Cuadrado"
given-names: "Jesús"
orcid: "https://orcid.org/0000-0001-9755-5616"
doi: "10.1007/s10270-021-00929-3"
journal: "Softw. Syst. Model."
start: 967
end: 986
title: "ModelSet: a dataset for machine learning in model-driven engineering"
issue: 3
volume: 21
year: 2022
GitHub Events
Total
- Issues event: 3
- Watch event: 1
- Issue comment event: 5
Last Year
- Issues event: 3
- Watch event: 1
- Issue comment event: 5
Dependencies
- ml2.mar:mar-modelling 1.0-SNAPSHOT
- org.eclipse.emf:org.eclipse.emf.ecore 2.20.0
- org.jgrapht:jgrapht-core 1.5.1
- org.jgrapht:jgrapht-ext 1.5.1
- org.jgrapht:jgrapht-io 1.5.1
- org.xerial:sqlite-jdbc 3.32.3.2