modelset-dataset

ModelSet is a labelled dataset of Ecore and UML models

https://github.com/modelset/modelset-dataset

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.8%) to scientific vocabulary

Keywords

dataset ecore machine-learning mde uml
Last synced: 6 months ago · JSON representation ·

Repository

ModelSet is a labelled dataset of Ecore and UML models

Basic Info
  • Host: GitHub
  • Owner: modelset
  • License: lgpl-3.0
  • Language: Java
  • Default Branch: master
  • Homepage: https://modelset.github.io
  • Size: 6.3 MB
Statistics
  • Stars: 11
  • Watchers: 1
  • Forks: 2
  • Open Issues: 2
  • Releases: 5
Topics
dataset ecore machine-learning mde uml
Created over 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation Governance

README.md

ModelSet

ModelSet is a labelled dataset of software models.

This repository contains:

  1. The ModelSet databases with the labelled datasets. See the Downloading ModelSet section for more information.
  2. The scripts to create the databases and generate the release. See the Building ModelSet section for more information.

You can find more information about ModelSet in the following Open Access paper: https://link.springer.com/article/10.1007%2Fs10270-021-00929-3.

Downloading ModelSet

To download ModelSet follow these steps:

  1. You can download the latest release from https://github.com/modelset/modelset-dataset/releases
  2. Unzip the package in some location of your workspace
  3. The structure of the decompressed package is the following:

    ```

    • datasets
      • dataset.ecore/data/ecore.db
      • dataset.genmymodel/data/genmymodel.db
    • graph
    • raw-data
    • txt `` 3.1. Thedatasetsfolder contains the databases with the labelled models. The.db` files are SQLite databases containing the information about the models.

    The database schema includes a table called model with model data (i.e., unique identifier, source repository and filename) and a table called metadata with label data (i.e., unique identifier and a JSON object with the label information). The following figure illustrates the database schema:

    +-------------------+ +---------------------------------+ | model | | metadata | +-------------------+ +---------------------------------+ | id : VARCHAR {PK} | | id : VARCHAR {PK, FK(model.id)} | | source : VARCHAR | | json : TEXT | | filename : TEXT | +---------------------------------+ +-------------------+

    3.2. The graph folder contains the graph representation of the models.

    3.3. The raw-data folder contains the models serialized in XMI.

    3.4. The txt folder includes the strings of the models (e.g., to train simple NLP models).

Querying ModelSet via Java JDBC

Once you have downloaded ModelSet, you can use JDBC to query the databases.

For instance, the following code illustrates how to query the database using JDBC:

```java Connection dataset = DriverManager.getConnection("jdbc:sqlite:/path/to/dbfile"); PreparedStatement stm = dataset.prepareStatement("select mo.id, mo.filename, mm.metadata from models mo join metadata mm on mo.id = mm.id"); stm.execute();

ResultSet rs = stm.getResultSet(); while (rs.next()) { String id = rs.getString(1); String filename = rs.getString(2); String metadata = rs.getString(3); System.out.println(id + ": " + metadata); }
```

Using ModelSet in Python

To use ModelSet in a typical Python/Jupyter setting, we recommend you to use the modelset-py Python library we have developed. Visit the corresponding repository for more information.

Examples

We provide some examples of how to use ModelSet in the examples repository.

Building ModelSet

Note: these steps are only required if you want to create a new release of ModelSet. If you just want to use ModelSet, you can download the latest release (see Downloading ModelSet section in this file)

To create the ModelSet release, you have to follow these steps:

  1. Execute ./bin/download-data.sh to recover the model files which will be stored in the raw-data folder. These files are not stored here as they have already published in existing GitHub repositories.
  2. Execute ./bin/generate.sh to generate additional artifacts.
  3. Execute ./bin/build.sh to build the ModelSet release package.
  4. The ModelSet release package will have the name modelset.zip.

Citation

If you find this dataset useful, please consider citing its associated paper: https://link.springer.com/article/10.1007/s10270-021-00929-3

@article{lopez2021modelset, title = {{ModelSet: a dataset for machine learning in model-driven engineering}}, author = {L{\'o}pez, Jos{\'e} Antonio Hern{\'a}ndez and C{\'a}novas Izquierdo, Javier Luis and Cuadrado, Jes{\'u}s S{\'a}nchez}, journal = {Softw. Syst. Model.}, volume = {21}, number = {3}, pages = {967--986}, year = {2022}, url = {https://doi.org/10.1007/s10270-021-00929-3}, }

Contributing

We welcome contributions of all kinds, including extensions to the dataset, new empirical studies, and new features. If you want to contribute to ModelSet, please review our contribution guidelines and our governance model.

Note that we have a code of conduct that we expect project participants to adhere to. Please read it before contributing.

License

This dataset is licensed under the GNU Lesser General Public License v3.0.

Owner

  • Name: ModelSet
  • Login: modelset
  • Kind: organization

A dataset for models

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Hernández López"
  given-names: "José Antonio"
  orcid: "https://orcid.org/0000-0003-2439-2136"
- family-names: "Cánovas Izquierdo"
  given-names: "Javier Luis"
  orcid: "https://orcid.org/0000-0002-2326-1700"
- family-names: "Sánchez Cuadrado"
  given-names: "Jesús"
  orcid: "https://orcid.org/0000-0001-9755-5616"
title: "ModelSet: a dataset for machine learning in model-driven engineering"
version: 1.0.0
doi: 10.1007/s10270-021-00929-3
date-released: 2022-06-23
url: "https://github.com/modelset/modelset-dataset"
preferred-citation:
  type: article
  authors:
    - family-names: "Hernández López"
      given-names: "José Antonio"
      orcid: "https://orcid.org/0000-0003-2439-2136"
    - family-names: "Cánovas Izquierdo"
      given-names: "Javier Luis"
      orcid: "https://orcid.org/0000-0002-2326-1700"
    - family-names: "Sánchez Cuadrado"
      given-names: "Jesús"
      orcid: "https://orcid.org/0000-0001-9755-5616"
  doi: "10.1007/s10270-021-00929-3"
  journal: "Softw. Syst. Model."
  start: 967
  end: 986
  title: "ModelSet: a dataset for machine learning in model-driven engineering"
  issue: 3
  volume: 21
  year: 2022

GitHub Events

Total
  • Issues event: 3
  • Watch event: 1
  • Issue comment event: 5
Last Year
  • Issues event: 3
  • Watch event: 1
  • Issue comment event: 5

Dependencies

java-lib/modelset-lib/pom.xml maven
  • ml2.mar:mar-modelling 1.0-SNAPSHOT
  • org.eclipse.emf:org.eclipse.emf.ecore 2.20.0
  • org.jgrapht:jgrapht-core 1.5.1
  • org.jgrapht:jgrapht-ext 1.5.1
  • org.jgrapht:jgrapht-io 1.5.1
  • org.xerial:sqlite-jdbc 3.32.3.2