sparql-profiler

✨ A package to profile SPARQL endpoints to extract the nodes and relations represented in the knowledge graph

https://github.com/maastrichtu-ids/sparql-profiler

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
✓
Institutional organization owner
Organization maastrichtu-ids has institutional domain (www.maastrichtuniversity.nl)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary

Keywords

metadata-extraction sparql sparql-endpoints sparql-query

Last synced: 10 months ago · JSON representation ·

Repository

✨ A package to profile SPARQL endpoints to extract the nodes and relations represented in the knowledge graph

Basic Info

Host: GitHub
Owner: MaastrichtU-IDS
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 29.3 KB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Topics

metadata-extraction sparql sparql-endpoints sparql-query

Created over 3 years ago · Last pushed about 3 years ago

Metadata Files

Readme License Citation

# ✨ SPARQL endpoint profiler [![PyPI - Version](https://img.shields.io/pypi/v/sparql-profiler.svg?logo=pypi&label=PyPI&logoColor=silver)](https://pypi.org/project/sparql-profiler/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sparql-profiler.svg?logo=python&label=Python&logoColor=silver)](https://pypi.org/project/sparql-profiler/) [![license](https://img.shields.io/pypi/l/sparql-profiler.svg?color=%2334D058)](https://github.com/MaastrichtU-IDS/sparql-profiler/blob/main/LICENSE.txt) [![code style - black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![Test package](https://github.com/MaastrichtU-IDS/sparql-profiler/actions/workflows/test.yml/badge.svg)](https://github.com/MaastrichtU-IDS/sparql-profiler/actions/workflows/test.yml) [![Publish package](https://github.com/MaastrichtU-IDS/sparql-profiler/actions/workflows/publish.yml/badge.svg)](https://github.com/MaastrichtU-IDS/sparql-profiler/actions/workflows/publish.yml)

A package to profile SPARQL endpoints to extract the nodes and relations represented in the knowledge graph.

This package follows the recommendations defined by the HCLS Community Profile (Health Care and Life Sciences) to generate the metadata about the content of a SPARQL endpoint.

📦️ Installation

This package requires Python >=3.7, simply install it with:

shell pip install sparql-profiler

🪄 Usage

⌨️ Use as a command-line interface

You can easily use the sparql-profiler from your terminal after installing with pip.

Run profiling

Quickly profile a small SPARQL endpoint to generate HCLS descriptive metadata for each graph:

bash sparql-profiler profile https://graphdb.dumontierlab.com/repositories/umids-kg

Profiling a bigger SPARQL endpoint will take more times:

bash sparql-profiler profile https://bio2rdf.org/sparql

Display more debugging logs with -l debug:

bash sparql-profiler profile https://bio2rdf.org/sparql -l debug

Profile a SPARQL endpoint to run a profiling method specific to Bio2RDF:

bash sparql-profiler profile https://bio2rdf.org/sparql --profiler bio2rdf

You can also add additional metadata for the dataset distribution after answering questions about it (description, license, etc) by running this command:

bash sparql-profiler profile https://graphdb.dumontierlab.com/repositories/umids-kg -q

Help

See all options for the profile command with:

bash sparql-profiler profile --help

Get a full rundown of all available commands with:

bash sparql-profiler --help

🐍 Use with python

Use the sparql-profiler in python scripts:

```python from sparql_profiler import SparqlProfiler

sp = SparqlProfiler("https://graphdb.dumontierlab.com/repositories/umids-kg") print(sp.metadata.serialize(format="turtle")) ```

🧑‍💻 Development setup

The final section of the README is for if you want to run the package in development, and get involved by making a code contribution.

📥️ Clone

Clone the repository:

bash git clone https://github.com/MaastrichtU-IDS/sparql-profiler cd sparql-profiler

🐣 Install dependencies

Install Hatch, this will automatically handle virtual environments and make sure all dependencies are installed when you run a script in the project:

bash pip install --upgrade hatch

Install the dependencies in a local virtual environment:

bash hatch -v env create

Alternatively, if you are already handling the virtual environment yourself or installing in a docker container you can use:

bash pip install -e ".[test,dev]"

🏗️ Run in development

You can easily run the sparql-profiler in your terminal with hatch while in development to profile a specific SPARQL endpoint:

bash hatch run sparql-profiler profile https://graphdb.dumontierlab.com/repositories/umids-kg

☑️ Run tests

Make sure the existing tests still work by running pytest. Note that any pull requests to the fairworkflows repository on github will automatically trigger running of the test suite;

bash hatch run test

To display all print():

bash hatch run test -s

🧹 Code formatting

The code will be automatically formatted when you commit your changes using pre-commit. But you can also run the script to format the code yourself:

hatch run fmt

Check the code for errors, and if it is in accordance with the PEP8 style guide, by running flake8 and mypy:

hatch run check

♻️ Reset the environment

In case you are facing issues with dependencies not updating properly you can easily reset the virtual environment with:

bash hatch env prune

🏷️ New release process

The deployment of new releases is done automatically by a GitHub Action workflow when a new release is created on GitHub. To release a new version:

Make sure the PYPI_TOKEN secret has been defined in the GitHub repository (in Settings > Secrets > Actions). You can get an API token from PyPI at pypi.org/manage/account.
Increment the version number in the pyproject.toml file in the root folder of the repository.
Create a new release on GitHub, which will automatically trigger the publish workflow, and publish the new release to PyPI.

You can also manually trigger the workflow from the Actions tab in your GitHub repository webpage.

Notes

SPARQL profiler of the Yumaka viewer (https://umaka-viewer.dbcls.jp/, code in https://github.com/dbcls/umakaparser) written in Java which should be able to work with large graphs: https://bitbucket.org/yayamamo/tripledataprofiler/src/master/src/jp/ac/rois/dbcls/TripleDataProfiler.java

Run:

bash java -jar TripleDataProfiler.jar -ep https://bio2rdf.org/sparql

Build:

bash javac -cp commons-cli-1.2.jar:commons-lang3-3.3.2.jar:apache-jena-2.11.1/lib/*:./src ./src/jp/ac/rois/dbcls/TripleDataProfiler.java

The guy is expecting we figure out by ourselves where to get his shit** deprecated versions of jena. Can't use basic maven in 2021, I am so tired about this kind of work done in research. There is no respect whatsoever by anyone for their own work.

Owner

Name: Maastricht University IDS
Login: MaastrichtU-IDS
Kind: organization
Email: info-ids@maastrichtuniversity.nl
Location: Maastricht, Netherlands

Website: https://www.maastrichtuniversity.nl/research/institute-data-science
Twitter: um_ids
Repositories: 191
Profile: https://github.com/MaastrichtU-IDS

Institute of Data Science at Maastricht University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - orcid: https://orcid.org/0000-0002-1501-1082
    email: vincent.emonet@gmail.com
    given-names: Vincent Emonet
    affiliation: Institute of Data Science, Maastricht University
  - email: maryam.mohammadi@maastrichtuniversity.nl
    given-names: Maryam Mohammadi
    affiliation: Institute of Data Science, Maastricht University
    # orcid: https://orcid.org/0000-0002-1501-1082
title: "SPARQL endpoint profiler"
repository-code: https://github.com/MaastrichtU-IDS/sparql-profiler
date-released: 2022-12-20
url: https://pypi.org/project/sparql-profiler
# doi: 10.48550/arXiv.2206.13787

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/publish.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
pypa/gh-action-pypi-publish release/v1 composite

.github/workflows/test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
github/codeql-action/analyze v2 composite
github/codeql-action/autobuild v2 composite
github/codeql-action/init v2 composite

pyproject.toml pypi

SPARQLWrapper *
rdflib >=6.1.1
requests *
typer >=0.6.0