sparql-profiler
✨ A package to profile SPARQL endpoints to extract the nodes and relations represented in the knowledge graph
Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization maastrichtu-ids has institutional domain (www.maastrichtuniversity.nl) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Keywords
Repository
✨ A package to profile SPARQL endpoints to extract the nodes and relations represented in the knowledge graph
Basic Info
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
A package to profile SPARQL endpoints to extract the nodes and relations represented in the knowledge graph.
This package follows the recommendations defined by the HCLS Community Profile (Health Care and Life Sciences) to generate the metadata about the content of a SPARQL endpoint.
📦️ Installation
This package requires Python >=3.7, simply install it with:
shell
pip install sparql-profiler
🪄 Usage
⌨️ Use as a command-line interface
You can easily use the sparql-profiler from your terminal after installing with pip.
Run profiling
Quickly profile a small SPARQL endpoint to generate HCLS descriptive metadata for each graph:
bash
sparql-profiler profile https://graphdb.dumontierlab.com/repositories/umids-kg
Profiling a bigger SPARQL endpoint will take more times:
bash
sparql-profiler profile https://bio2rdf.org/sparql
Display more debugging logs with -l debug:
bash
sparql-profiler profile https://bio2rdf.org/sparql -l debug
Profile a SPARQL endpoint to run a profiling method specific to Bio2RDF:
bash
sparql-profiler profile https://bio2rdf.org/sparql --profiler bio2rdf
You can also add additional metadata for the dataset distribution after answering questions about it (description, license, etc) by running this command:
bash
sparql-profiler profile https://graphdb.dumontierlab.com/repositories/umids-kg -q
Help
See all options for the profile command with:
bash
sparql-profiler profile --help
Get a full rundown of all available commands with:
bash
sparql-profiler --help
🐍 Use with python
Use the sparql-profiler in python scripts:
```python from sparql_profiler import SparqlProfiler
sp = SparqlProfiler("https://graphdb.dumontierlab.com/repositories/umids-kg") print(sp.metadata.serialize(format="turtle")) ```
🧑💻 Development setup
The final section of the README is for if you want to run the package in development, and get involved by making a code contribution.
📥️ Clone
Clone the repository:
bash
git clone https://github.com/MaastrichtU-IDS/sparql-profiler
cd sparql-profiler
🐣 Install dependencies
Install Hatch, this will automatically handle virtual environments and make sure all dependencies are installed when you run a script in the project:
bash
pip install --upgrade hatch
Install the dependencies in a local virtual environment:
bash
hatch -v env create
Alternatively, if you are already handling the virtual environment yourself or installing in a docker container you can use:
bash
pip install -e ".[test,dev]"
🏗️ Run in development
You can easily run the sparql-profiler in your terminal with hatch while in development to profile a specific SPARQL endpoint:
bash
hatch run sparql-profiler profile https://graphdb.dumontierlab.com/repositories/umids-kg
☑️ Run tests
Make sure the existing tests still work by running pytest. Note that any pull requests to the fairworkflows repository on github will automatically trigger running of the test suite;
bash
hatch run test
To display all print():
bash
hatch run test -s
🧹 Code formatting
The code will be automatically formatted when you commit your changes using pre-commit. But you can also run the script to format the code yourself:
hatch run fmt
Check the code for errors, and if it is in accordance with the PEP8 style guide, by running flake8 and mypy:
hatch run check
♻️ Reset the environment
In case you are facing issues with dependencies not updating properly you can easily reset the virtual environment with:
bash
hatch env prune
🏷️ New release process
The deployment of new releases is done automatically by a GitHub Action workflow when a new release is created on GitHub. To release a new version:
- Make sure the
PYPI_TOKENsecret has been defined in the GitHub repository (in Settings > Secrets > Actions). You can get an API token from PyPI at pypi.org/manage/account. - Increment the
versionnumber in thepyproject.tomlfile in the root folder of the repository. - Create a new release on GitHub, which will automatically trigger the publish workflow, and publish the new release to PyPI.
You can also manually trigger the workflow from the Actions tab in your GitHub repository webpage.
Notes
SPARQL profiler of the Yumaka viewer (https://umaka-viewer.dbcls.jp/, code in https://github.com/dbcls/umakaparser) written in Java which should be able to work with large graphs: https://bitbucket.org/yayamamo/tripledataprofiler/src/master/src/jp/ac/rois/dbcls/TripleDataProfiler.java
Run:
bash
java -jar TripleDataProfiler.jar -ep https://bio2rdf.org/sparql
Build:
bash
javac -cp commons-cli-1.2.jar:commons-lang3-3.3.2.jar:apache-jena-2.11.1/lib/*:./src ./src/jp/ac/rois/dbcls/TripleDataProfiler.java
The guy is expecting we figure out by ourselves where to get his shit** deprecated versions of jena. Can't use basic maven in 2021, I am so tired about this kind of work done in research. There is no respect whatsoever by anyone for their own work.
Owner
- Name: Maastricht University IDS
- Login: MaastrichtU-IDS
- Kind: organization
- Email: info-ids@maastrichtuniversity.nl
- Location: Maastricht, Netherlands
- Website: https://www.maastrichtuniversity.nl/research/institute-data-science
- Twitter: um_ids
- Repositories: 191
- Profile: https://github.com/MaastrichtU-IDS
Institute of Data Science at Maastricht University
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- orcid: https://orcid.org/0000-0002-1501-1082
email: vincent.emonet@gmail.com
given-names: Vincent Emonet
affiliation: Institute of Data Science, Maastricht University
- email: maryam.mohammadi@maastrichtuniversity.nl
given-names: Maryam Mohammadi
affiliation: Institute of Data Science, Maastricht University
# orcid: https://orcid.org/0000-0002-1501-1082
title: "SPARQL endpoint profiler"
repository-code: https://github.com/MaastrichtU-IDS/sparql-profiler
date-released: 2022-12-20
url: https://pypi.org/project/sparql-profiler
# doi: 10.48550/arXiv.2206.13787
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- github/codeql-action/analyze v2 composite
- github/codeql-action/autobuild v2 composite
- github/codeql-action/init v2 composite
- SPARQLWrapper *
- rdflib >=6.1.1
- requests *
- typer >=0.6.0