smecs

Software Metadata Extraction and Curation Software (SMECS)

https://github.com/nfdi4energy/smecs

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary

Keywords

codemeta curation extraction fair-software fair4rs repository research-software research-software-engineering software-metadata
Last synced: 4 months ago · JSON representation ·

Repository

Software Metadata Extraction and Curation Software (SMECS)

Basic Info
  • Host: GitHub
  • Owner: NFDI4Energy
  • License: agpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 939 KB
Statistics
  • Stars: 9
  • Watchers: 1
  • Forks: 0
  • Open Issues: 73
  • Releases: 3
Topics
codemeta curation extraction fair-software fair4rs repository research-software research-software-engineering software-metadata
Created over 1 year ago · Last pushed 4 months ago
Metadata Files
Readme License Citation Codemeta

README.rst

Software Metadata Extraction and Curation Software (SMECS)
__________________________________________________________
| A web application to extract and curate research software metadata following the `CodeMeta `_ (`version 3.0 `_) software metadata standard.
|
| SMECS facilitates the extraction of research software metadata from GitHub and GitLab repositories. It provides a user-friendly graphical interface for visualizing the retrieved metadata, enabling researchers and research software engineers to create high-quality metadata without reentering information already available elsewhere. The curated metadata is exported as CodeMeta-compliant JSON, ensuring integration with other tools and enhancing the discoverability, reuse, and impact of research software.
|
| 📄 For more details, see our `Preprint `_.
|
| **Authors:** Stephan Ferenz, Aida Jafarbigloo
|
Phases in SMECS
__________________________________________________________
| The workflow of SMECS consists of four sequential phases: **Start**, **Extraction**, **Curation**, and **Export**.
|
.. image:: https://github.com/NFDI4Energy/SMECS/blob/master/docs/Extraction_via_hermes-1.png
   :alt: SMECS Workflow
   :width: 1000px
|

1. **Start Phase**
__________________________________________________________
In the Start phase, users provide two key inputs:
      - A repository link (GitHub or GitLab)
      - A personal access token for the corresponding platform
SMECS can operate without user-provided tokens for some repositories by using internal default tokens. However:
      - For other GitLab instances, a user-provided token is always required.
      - Providing a token can enable SMECS to extract more detailed metadata from certain repositories.
|
2. **Extraction Phase**
__________________________________________________________
The Extraction phase uses `HERMES `_ harvesting steps to retrieve metadata from multiple sources. For details on the metadata fields, see: `Metadata Terms in SMECS `_. Once the inputs from the Start phase are submitted, SMECS initiates metadata retrieval using four HERMES harvesters:
      - GitHub
      - GitLab
      - CFF (`Citation File Format `_)
      - CodeMeta
GitHub and GitLab metadata are harvested via the `HERMES GitHub/GitLab plugin `_.

All harvested metadata are mapped to CodeMeta using existing crosswalks from CodeMeta and HERMES, plus a custom crosswalk we created for GitLab.
The metadata are then processed and merged via the HERMES processing step, producing a unified set of metadata.
These results are displayed in the Curation phase. The HERMES-based approach ensures an interoperable, modular architecture that makes it easy to integrate additional harvesting sources in the future.

|
3. **Curation Phase**
__________________________________________________________
The Curation phase allows users to edit and refine the extracted metadata. The metadata are displayed in a form-based interface organized into four main tabs:
   #. General Information
   #. Provenance
   #. Related Persons
   #. Technical Aspects

Key visualization and curation features include:
   - **Metadata Visualization & User-Friendly Interface:** Metadata is displayed in a structured, easy-to-read format. The interface is intuitive, responsive, and allows smooth    navigation through metadata fields.
   - **Missing Metadata Identification:** SMECS flags fields where metadata is absent.
   - **Required Metadata Properties:** Certain fields are marked as mandatory to ensure completeness of the final output.
   - **Editable Fields:** Users can directly edit or correct metadata within the interface.
   - **Tagging Feature:** Some fields allow multiple values for better metadata organization.
   - **Suggestion Lists:** For selected fields, SMECS provides suggestions to reduce manual input and ensure consistency.
   - **Form-to-JSON Synchronization:** Updates in the form are mirrored in the JSON view (one-directional) so users can track changes instantly.


4. **Export Phase**
_________________________________________________________
In the Export phase, the curated metadata can be downloaded as a CodeMeta 3.0–compliant JSON file. Users can:
     - Include this file in their repository to make their research software more FAIR
     - Use it for other purposes, such as uploading metadata to a software registry
  
|
|
Installation and Usage
__________________________________________________________
Install from GitHub
----------

* Cloning the repository
.. code-block:: shell

   git clone https://github.com/NFDI4Energy/SMECS.git

* Creating virtual environment
     * Ensure that `Python 3.10 or higher `_ is installed on your system.
         - **Windows:** Check the version with ``py --version``. 
         - **Unix/MacOS:** Check the version with ``python3 --version``.
     * Create the virtual environment.
         * **Windows:** 
         .. code-block:: shell

            py -m venv my-env

         * **Unix/MacOS:**
         .. code-block:: shell

          python3 -m venv my-env

       | for more details visit `Creation of virtual environments `_

     * Activate virtual environment.
         * **Windows:**
         .. code-block:: shell

          env\Scripts\activate

         * **Unix/MacOS:**
         .. code-block:: shell

          source env/bin/activate


       (Note that activating the virtual environment change the shell's prompt and show what virtual environment is being used.)

* Managing Packages with pip
   * Ensure you can run pip from command prompt.
      * **Windows:**
      .. code-block:: shell

         py -m pip --version

      * **Unix/MacOS:**
      .. code-block:: shell         
         
         python3 -m pip --version

   * Install a list of requirements specified in a *Requirements.txt*.
         * **Windows:** 
         .. code-block:: shell

          py -m pip install -r requirements.txt

         * **Unix/MacOS:** 
         .. code-block:: shell

          python3 -m pip install -r requirements.txt

   | for more details visit `Installing Packages `_
|   
|
* **Running the project**
    * Open and run the project in an editor (e.g. VS code).
    * Run the project.
        * **Windows:** 
        .. code-block:: shell

          py manage.py runserver

        * **Unix/MacOS:** 
        .. code-block:: shell

          python3 manage.py runserver

* To see the output on the browser follow the link shown in the terminal. (e.g. http://127.0.0.1:8000/)
|
|
Install through Docker
----------
To get started with SMECS using Docker, follow the steps below:

* Prerequisites
   * Make sure `Docker `_  is installed on your local machine.

* Cloning the Repository
.. code-block:: shell

   git clone https://github.com/NFDI4Energy/SMECS.git

* Navigate to the Project Directory
.. code-block:: shell

   cd smecs

* Building the Docker Images
.. code-block:: shell

   docker-compose build

* Starting the Services
.. code-block:: shell

   docker-compose up

* Accessing the Application
   * Navigate to ``http://localhost:8000`` in your web browser.

* Stopping the Services
.. code-block:: shell

   docker-compose down
|
| **Setting Up GitLab/GitHub Personal Token**
| To enhance the functionality of this program and ensure secure interactions with the GitLab/GitHub API, users are required to provide their personal access token. Follow these steps to integrate your token:

* Generate a GitLab Token:
    * Visit `Create a personal access token `_ for more information on how to generate a new token.
* Generate a GitHub Token:
    * Visit `Managing your personal access tokens `_ for more information on how to generate a new token.
|
| **Tip for developers**
| If the page does not refresh correctly, clear the browser cache. You can force Chrome to pull in new data and ignore the saved ("cached") data by using the keyboard shortcut ``Cmd+Shift+R`` on Mac, and ``Ctrl+F5`` or ``Ctrl+Shift+R`` on Windows. 
|
Collaboration
__________________________________________________________
| We believe in the power of collaboration and welcome contributions from the community to enhance the SMECS workflow. Whether you have found a bug, have a feature idea, or want to share feedback, your contribution matters. Feel free to submit a pull request, open up an issue, or reach out with any questions or concerns.
|
| To see upcoming features in SMECS, please refer to our `open issues `_.
| To stay updated on upcoming changes to the `HERMES GitHub and GitLab Plugin `_, visit the `project’s issues page `_. And if you have questions, suggestions, feedback, or need to report a bug, please open a new issue `there `_.
|
License and Citation
__________________________________________________________
| The code is licensed under the **GNU Affero General Public License v3.0 or later** (AGPL-3.0-or-later).
| See `LICENSE.txt `_ for further information.

|
Acknowledgements
__________________________________________________________
We would like to thank `meta_tool `_ for providing the foundational framework upon which this project is built.


.. |badge_license| image:: https://img.shields.io/github/license/rl-institut/meta_tool
    :target: LICENSE.txt
    :alt: License

.. |badge_contributing| image:: https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat
    :alt: contributions

.. |badge_repo_counts| image:: http://hits.dwyl.com/rl-institut/meta_tool.svg
    :alt: counter

.. |badge_contributors| image:: https://img.shields.io/badge/all_contributors-1-orange.svg?style=flat-square
    :alt: contributors

.. |badge_issue_open| image:: https://img.shields.io/github/issues-raw/rl-institut/meta_tool
    :alt: open issues

.. |badge_issue_closes| image:: https://img.shields.io/github/issues-closed-raw/rl-institut/meta_tool
    :alt: closes issues

.. |badge_pr_open| image:: https://img.shields.io/github/issues-pr-raw/rl-institut/meta_tool
    :alt: closes issues

.. |badge_pr_closes| image:: https://img.shields.io/github/issues-pr-closed-raw/rl-institut/meta_tool
    :alt: closes issues
    

Owner

  • Name: NFDI4Energy
  • Login: NFDI4Energy
  • Kind: organization
  • Email: nfdi4energy@uol.de
  • Location: Germany

National Research Data Infrastructure for the Interdisciplinary Energy System Research

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.

cff-version: 1.2.0
title: Software Metadata Extraction and Curation Software (SMECS)
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Stephan
    family-names: Ferenz
    email: stephan.ferenz@uol.de
  - given-names: Aida
    family-names: Jafarbigloo
    email: aida.jafarbigloo.274@gmail.com
identifiers:
  - type: url
    value: 'https://github.com/NFDI4Energy/SMECS'
    description: GitHub link to the project
repository-code: 'https://github.com/NFDI4Energy/SMECS'
abstract: >-
  SMECS is a web-based tool designed to extract and curate
  research software metadata in alignment with the codemeta 
  software metadata standard. By accessing repositories on 
  platforms like GitHub and GitLab, SMECS extracts multiple 
  already existing metadata in the harvesting phase. 
  In the curation phase its intuitive graphical interface 
  allows researchers to easily improve and change the retrieved 
  metadata to bestly represent their software.
  Ultimately, SMECS allows to export the curated metadata in
  JSON format in line with the codemeta stand.
keywords:
  - software metadata
  - extraction
  - curation
  - repository
  - research software
  - codemeta
license: AGPL-3.0-or-later

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "identifier": "796699799",
  "name": "Software Metadata Extraction and Curation Software (SMECS)",
  "description": "SMECS is a web-based tool designed to extract and curate research software metadata in alignment with the codemeta software metadata standard. By accessing repositories on platforms like GitHub and GitLab, SMECS extracts multiple already existing metadata in the harvesting phase. In the curation phase its intuitive graphical interface allows researchers to easily improve and change the retrieved metadata to bestly represent their software. Ultimately, SMECS allows to export the curated metadata in JSON format in line with the codemeta stand.",
  "codeRepository": "https://github.com/NFDI4Energy/SMECS",
  "issueTracker": "https://github.com/NFDI4Energy/SMECS/issues",
  "license": "https://spdx.org/licenses/AGPL-3.0",
  "contributor": [
    {
      "id": "_:contributor_1",
      "type": "Person",
      "email": "stephan.ferenz@uol.de",
      "familyName": "Ferenz",
      "givenName": "Stephan"
    },
    {
      "id": "_:contributor_2",
      "type": "Person",
      "email": "aida.jafarbigloo.274@gmail.com",
      "familyName": "Jafarbigloo",
      "givenName": "Aida"
    },
    {
      "id": "_:contributor_3",
      "type": "Person",
      "email": "sundraiz.shah@gmail.com",
      "familyName": "Sundraiz Shah",
      "givenName": "Syed"
    }
  ],
  "author": [
    {
      "id": "_:author_1",
      "type": "Person",
      "email": "stephan.ferenz@uol.de",
      "familyName": "Ferenz",
      "givenName": "Stephan"
    },
    {
      "id": "_:author_2",
      "type": "Person",
      "email": "aida.jafarbigloo.274@gmail.com",
      "familyName": "Jafarbigloo",
      "givenName": "Aida"
    }
  ],
  "programmingLanguage": [
    "Python",
    "HTML",
    "CSS",
    "JavaScript",
    "Dockerfile",
    "Shell"
  ],
  "keywords": [
    "software metadata",
    "extraction",
    "curation",
    "repository",
    "research software",
    "codemeta"
  ],
  "dateCreated": "2024-05-06",
  "downloadUrl": "https://github.com/NFDI4Energy/SMECS/releases",
  "citation": "Ferenz, S., & Jafarbigloo, A. Software Metadata Extraction and Curation Software (SMECS) [Computer software]. https://github.com/NFDI4Energy/SMECS"
}

GitHub Events

Total
  • Create event: 26
  • Release event: 3
  • Issues event: 112
  • Watch event: 9
  • Delete event: 17
  • Member event: 1
  • Issue comment event: 91
  • Push event: 213
  • Pull request review comment event: 30
  • Pull request review event: 35
  • Pull request event: 34
Last Year
  • Create event: 26
  • Release event: 3
  • Issues event: 112
  • Watch event: 9
  • Delete event: 17
  • Member event: 1
  • Issue comment event: 91
  • Push event: 213
  • Pull request review comment event: 30
  • Pull request review event: 35
  • Pull request event: 34

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 77
  • Total pull requests: 23
  • Average time to close issues: 5 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 2
  • Total pull request authors: 5
  • Average comments per issue: 0.49
  • Average comments per pull request: 0.7
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 70
  • Pull requests: 23
  • Average time to close issues: about 1 month
  • Average time to close pull requests: about 1 month
  • Issue authors: 2
  • Pull request authors: 5
  • Average comments per issue: 0.37
  • Average comments per pull request: 0.7
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • sferenz (120)
  • Aidajafarbigloo (38)
  • zyzzyxdonta (2)
Pull Request Authors
  • Aidajafarbigloo (16)
  • sferenz (14)
  • Sundraiz-Shah (8)
  • dependabot[bot] (2)
  • wejowen (1)
  • smutyala1at (1)
Top Labels
Issue Labels
back end (44) front end (43) short-term (15) gitlab merge request (15) todo (8) extraction (7) other tools (7) documentation (6) mid-term (6) bug (3) good first issue (3) long-term (3) UX (2) testing (2) review (2) enhancement (1) wait (1) doing (1) thesis (1) discuss (1) security alert (1) tagging (1)
Pull Request Labels
documentation (4) bug (3) front end (3) gitlab merge request (3) back end (2) dependencies (2) extraction (1) short-term (1) testing (1) enhancement (1)

Dependencies

compose/Dockerfile docker
  • python 3.8.0-slim build
requirements.txt pypi
  • GitPython ==3.1.31
  • configobj ==5.0.6
  • django ==4.0
  • django_jsonforms ==1.1.2
  • environs ==9.3.5
  • gunicorn ==20.1.0
  • psycopg2-binary ==2.9.2
  • pyld ==2.0.3
  • python-gitlab ==3.13.0
  • sqlahelper ==1.0
  • sqlalchemy ==1.4.28
  • sqlalchemy_utils ==0.37.9