https://github.com/constraintautomaton/rule-based-approach-for-source-selection-in-ltqp

A poster paper published to ISWC about using a boolean solver to do source selection in Link Traversal Queries

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary

Keywords

article guided-link-traversal link-traversal poster query rdf semantic-web sparql-query tree

Last synced: 9 months ago · JSON representation

Repository

A poster paper published to ISWC about using a boolean solver to do source selection in Link Traversal Queries

Basic Info

Host: GitHub
Owner: constraintAutomaton
License: cc-by-4.0
Language: TeX
Default Branch: main
Homepage:
Size: 3.78 MB

Statistics

Stars: 0
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 1

Topics

article guided-link-traversal link-traversal poster query rdf semantic-web sparql-query tree

Created about 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme

Exploring a rule-based approach for source selection in link traversal queries

Poster paper accepted by the "The 23rd International Semantic Web Conference (ISWC 2024)". The experiment repository is available via this hypermedia link.

Abstract

Link Traversal queries face challenges in completeness and long execution time due to the size of the web. Reachability criteria define completeness by restricting the links followed by engines. However, the number of links to dereference remains the bottleneck of the approach. Web environments often have structures exploitable by query engines to prune irrelevant sources. Current criteria rely on using information from the query definition and predefined predicate. However, it is difficult to use them to traverse environments where logical expressions indicate the location of resources. We propose to use a rule-based reachability criterion that captures logical statements expressed in hypermedia descriptions within linked data documents to prune irrelevant sources. In this poster paper, we show how the Comunica link traversal engine is modified to take hints from a hypermedia control vocabulary, to prune irrelevant sources. Our preliminary findings show that by using this strategy, the query engine can significantly reduce the number of HTTP requests and the query execution time without sacrificing the completeness of results. Our work shows that the investigation of hypermedia controls in link pruning of traversal queries is a worthy effort for optimizing web queries of unindexed decentralized databases.

Building a PDF

The authors compiled the PDF version using pdflatex (you can use your favorite latex compiler). We created a makefile to facilitate the building of the PDF version. One can simply execute make main.pdf or make to produce the PDF version if pdflatex and the other dependencies of the TeX Live suite are installed on the machine of the user. A PDF version is also available in the github releases sections.

Conclusion

This paper reported on preliminary tests to add guided link traversal support into the Comunica querying engine using a rule-based reachability approach. A similar approach could be performed with other SPARQL query engines supporting Link Traversal Query Processing. Our preliminary results show that our rule-based reachability criterion can significantly reduce the execution time of queries aligned with hypermedia description constraints compared to predicate-based reachability opening the possibility for faster and more versatile traversal-based query execution over fragmented RDF documents. Our experiment also highlights that the size of the internal data store might have more impact on performance than noted in previous studies. In future work, we will perform more exhaustive evaluations of other types of domain-oriented fragmentation strategies such as string and geospatial evaluations, and investigate how to generalize our approach to support more expressive online reasoning for online source selection during traversal queries. Furthermore, we also showed there might still be room for optimization by researching ways for pruning useless triples from the internal triple store during the link traversal process.

How to cite

bib @inproceedings{tam_iswc_rulebasedreachability_2024, author = {Tam, Bryan-Elliott and Taelman, Ruben and Rojas Meléndez, Julián and Colpaert, Pieter}, title = {Optimizing Traversal Queries of Sensor Data Using a Rule-Based Reachability Approach}, month = {sep}, booktitle = {Proceedings of the ISWC 2024: The 23rd International Semantic Web Conference}, year = {2024}, url = {https://github.com/constraintAutomaton/rule-based-approach-for-source-selection-in-LTQP/releases}, abstract = { Link Traversal queries face challenges in completeness and long execution time due to the size of the web. Reachability criteria define completeness by restricting the links followed by engines. However, the number of links to dereference remains the bottleneck of the approach. Web environments often have structures exploitable by query engines to prune irrelevant sources. Current criteria rely on using information from the query definition and predefined predicate. However, it is difficult to use them to traverse environments where logical expressions indicate the location of resources. We propose to use a rule-based reachability criterion that captures logical statements expressed in hypermedia descriptions within linked data documents to prune irrelevant sources. In this poster paper, we show how the Comunica link traversal engine is modified to take hints from a hypermedia control vocabulary, to prune irrelevant sources. Our preliminary findings show that by using this strategy, the query engine can significantly reduce the number of HTTP requests and the query execution time without sacrificing the completeness of results. Our work shows that the investigation of hypermedia controls in link pruning of traversal queries is a worthy effort for optimizing web queries of unindexed decentralized databases. }, _type = {Poster} }

License

The code is licensed under the CC-BY-4.0 license. See the LICENSE file for details.

Owner

Name: Bryan-Elliott Tam
Login: constraintAutomaton
Kind: user
Location: Ghent, Belgium
Company: imec - Ghent University - IDLab

Repositories: 8
Profile: https://github.com/constraintAutomaton

PhD Student working on querying for semantic web technologies

GitHub Events

Total

Push event: 4
Pull request event: 1
Create event: 1

Last Year

Push event: 4
Pull request event: 1
Create event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 9
Average time to close issues: N/A
Average time to close pull requests: about 4 hours
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 9
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: about 4 hours
Issue authors: 0
Pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/constraintautomaton/rule-based-approach-for-source-selection-in-ltqp

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Exploring a rule-based approach for source selection in link traversal queries

Abstract

Building a PDF

Conclusion

How to cite

License

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels