gimie

Extract linked metadata from repositories

https://github.com/sdsc-ordes/gimie

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    3 of 10 committers (30.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.1%) to scientific vocabulary

Keywords

cli fair-data git library linked-open-data metadata-extraction scientific-software
Last synced: 4 months ago · JSON representation ·

Repository

Extract linked metadata from repositories

Basic Info
Statistics
  • Stars: 13
  • Watchers: 3
  • Forks: 2
  • Open Issues: 10
  • Releases: 9
Topics
cli fair-data git library linked-open-data metadata-extraction scientific-software
Created about 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License Citation

README.md

gimie

PyPI version Python Poetry Test docs Coverage Status

Gimie (GIt Meta Information Extractor) is a python library and command line tool to extract structured metadata from git repositories.

Context

Scientific code repositories contain valuable metadata which can be used to enrich existing catalogues, platforms or databases. This tool aims to easily extract structured metadata from a generic git repositories. It can extract extract metadata from the Git provider (GitHub or GitLab) or from the git index itself.


Using Gimie: easy peasy, it's a 3 step process.

1: Installation

To install the stable version on PyPI:

shell pip install gimie

To install the dev version from github:

shell pip install git+https://github.com/sdsc-ordes/gimie.git@main#egg=gimie

Gimie is also available as a docker container hosted on the Github container registry:

```shell docker pull ghcr.io/sdsc-ordes/gimie:latest

The access token can be provided as an environment variable

docker run -e GITHUBTOKEN=$GITHUBTOKEN ghcr.io/sdsc-ordes/gimie:latest gimie data ```

2 : Set your credentials

In order to access the github api, you need to provide a github token with the read:org scope.

A. Create access tokens

New to access tokens? Or don't know how to get your Github / Gitlab token ?

Have no fear, see here for Github tokens and here for Gitlab tokens. (Note: tokens are as precious as passwords! Treat them as such.)

B. Set your access tokens via the Terminal

Gimie will use your access tokens to gather information for you. If you want info about a Github repo, Gimie needs your Github token; if you want info about a Gitlab Project then Gimie needs your Gitlab token.

Add your tokens one by one in your terminal: your Github token: bash export GITHUB_TOKEN= and/or your Gitlab token: bash export GITLAB_TOKEN=

3: GIMIE info ! Run Gimie

As a command line tool

shell gimie data https://github.com/numpy/numpy (want a Gitlab project instead? Just replace the URL in the command line)

As a python library

```python from gimie.project import Project proj = Project("https://github.com/numpy/numpy")

To retrieve the rdflib.Graph object

g = proj.extract()

To retrieve the serialized graph

ginttl = g.serialize(format='ttl') print(ginttl) ``` For more advanced use see the documentation.

Outputs

The default output is Turtle, a textual syntax for RDF data model. We follow the schema recommended by codemeta. Supported formats are turtle, json-ld and n-triples (by specifying the --format argument in your call i.e. gimie data https://github.com/numpy/numpy --format 'ttl').

With no specifications, Gimie will print results in the terminal. Want to save Gimie output to a file? Add your file path to the end : gimie data https://github.com/numpy/numpy > path_to_output/gimie_output.ttl


Contributing

All contributions are welcome. New functions and classes should have associated tests and docstrings following the numpy style guide.

The code formatting standard we use is black, with --line-length=79 to follow PEP8 recommendations. We use pytest as our testing framework. This project uses pyproject.toml to define package information, requirements and tooling configuration.

For development:

activate a conda or virtual environment with Python 3.8 or higher

shell git clone https://github.com/sdsc-ordes/gimie && cd gimie make install

run tests:

shell make test

run checks:

shell make check for an easier use Github/Gitlab APIs, place your access tokens in the .env file: (and don't worry, the .gitignore will ignore them when you push to GitHub)

cp .env.dist .env

build documentation:

shell make doc

Releases and Publishing on Pypi

Releases are done via github release

  • a release will trigger a github workflow to publish the package on Pypi
  • Make sure to update to a new version in pyproject.toml and conf.py before making the release
  • It is possible to test the publishing on Pypi.test by running a manual workflow: go to github actions and run the Workflow: 'Publish on Pypi Test'

Owner

  • Name: Swiss Data Science Center - ORD
  • Login: sdsc-ordes
  • Kind: organization
  • Location: Switzerland

Open Research Data team at the Swiss Data Science Center.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: gimie
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Cyril
    family-names: Matthey-Doret
    affiliation: Swiss Data Science Center
    orcid: 'https://orcid.org/0000-0002-1126-1535'
  - given-names: Sabine
    family-names: Maennel
    orcid: 'https://orcid.org/0009-0001-3022-8239'
    affiliation: Swiss Data Science Center
  - given-names: Robin
    family-names: Franken
    orcid: 'https://orcid.org/0009-0008-0143-9118'
    affiliation: Swiss Data Science Center
  - given-names: Martin
    family-names: Fontanet
    orcid: 'https://orcid.org/0000-0002-6441-8540'
    affiliation: Swiss Data Science Center
  - given-names: Laure
    family-names: Vancauwenberghe
    affiliation: Swiss Data Science Center
  - given-names: Stefan
    family-names: Milosavljevic
    email: supermegaiperste@hotmail.com
    affiliation: Swiss Data Science Center
repository-code: 'https://github.com/sdsc-ordes/gimie'
abstract: Extract linked metadata from repositories
keywords:
  - git
  - cli
  - library
  - linked-open-data
  - metadata-extraction
  - fair-data
  - scientific-software
license: Apache-2.0

GitHub Events

Total
  • Create event: 16
  • Release event: 2
  • Issues event: 4
  • Watch event: 9
  • Delete event: 1
  • Issue comment event: 8
  • Push event: 68
  • Pull request review comment event: 15
  • Pull request review event: 24
  • Pull request event: 19
  • Fork event: 1
Last Year
  • Create event: 16
  • Release event: 2
  • Issues event: 4
  • Watch event: 9
  • Delete event: 1
  • Issue comment event: 8
  • Push event: 68
  • Pull request review comment event: 15
  • Pull request review event: 24
  • Pull request event: 19
  • Fork event: 1

Committers

Last synced: almost 2 years ago

All Time
  • Total Commits: 236
  • Total Committers: 10
  • Avg Commits per committer: 23.6
  • Development Distribution Score (DDS): 0.669
Past Year
  • Commits: 143
  • Committers: 9
  • Avg Commits per committer: 15.889
  • Development Distribution Score (DDS): 0.615
Top Committers
Name Email Commits
cmdoret c****t@g****m 78
Sabine Maennel s****l@g****m 36
Martin Fontanet m****t@e****h 28
Laure Vancau l****o@s****r 24
Cyril Matthey-Doret c****t@m****g 19
rmfranken 7****n 19
Cyril Matthey-Doret c****t@e****h 15
Laure Vancau l****e@e****h 7
Stefan Milosavljevic s****n@o****l 5
supermaxiste s****e@h****m 5
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 87
  • Total pull requests: 117
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 5 days
  • Total issue authors: 6
  • Total pull request authors: 8
  • Average comments per issue: 1.29
  • Average comments per pull request: 0.85
  • Merged pull requests: 106
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 13
  • Average time to close issues: 1 day
  • Average time to close pull requests: about 24 hours
  • Issue authors: 1
  • Pull request authors: 4
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.69
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • rmfranken (5)
  • cmdoret (3)
Pull Request Authors
  • rmfranken (14)
  • cmdoret (10)
  • caviri (2)
  • raj921 (1)
  • vancauwe (1)
Top Labels
Issue Labels
enhancement (2) good first issue (2) bug (2) wontfix (2) refactor (1) documentation (1)
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 49 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 10
  • Total maintainers: 3
pypi.org: gimie

Extract structured metadata from git repositories.

  • Versions: 10
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 49 Last month
Rankings
Dependent packages count: 10.0%
Downloads: 18.7%
Average: 20.6%
Dependent repos count: 21.7%
Stargazers count: 23.1%
Forks count: 29.8%
Maintainers (3)
Last synced: 4 months ago