graphameleon-ds
An RDF dataset of Web navigation traces, generated by the Graphameleon Web extension
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
Repository
An RDF dataset of Web navigation traces, generated by the Graphameleon Web extension
Basic Info
Statistics
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
graphameleon-ds
graphameleon-ds: a RDF dataset for process mining on Web navigation traces. The dataset comes from the Graphameleon Web extension.
The following dataset was built using the Graphameleon Web extension, an open-source plug-in that captures Web navigation traces and transforms them into a RDF graph for further exploration (eg., process-mining of navigation traces, Web browser and server behavior analysis, network topology analysis).
The RDF dataset implements the concepts of micro-activity and macro-activity (see below) on the basis of the UCO (Unified Cyber Ontology) vocabulary for the semantic representation of user activities. UCO is a popular community-developed ontology built around the cyber security domain, covering a wide range of important concepts such as agents, resources, or actions. Since UCO is also used in NORIA-O, an ontology enabling to describe a IT network, one could establish meaningful connections and access additional contextual information within a knowledge graph combining Graphameleon data and network topology data.
Usage
The data in the sub-folders are describing:
exp-01: RDF triples generated by Graphameleon during the initial connection to a set of websites. We refer to this data as the "Website complexity clustering" experiment in which we sought to understand to what extent the behavior of a website during a first connection is crucial in creating a usable footprint subsequently for anomaly detection.
exp-02: RDF triples generated by Graphameleon during three pre-defined Web navigation scenarios on a simulated online bookstore website. We refer to this data as the "Navigation trace classification" experiment in which we sought to classify the Web navigation traces as either normal or abnormal behaviors.
The typical use of the provided data corresponds to:
- Cloning the repository to your computer,
- Query the data with SPARQL queries (eg., using the Apache Jena CLI toolset),
- Analyse the user activities with higher-level tools (eg., using the PM4Py Python package).
Semantic Modeling of User Activity
Semantic representation of a macro-activity:

Semantic representation of a micro-activity:

Citation
If you use this dataset in a scientific publication, please cite:
Lionel Tailhardat, Benjamin Stach, Yoan Chabot, and Raphaël Troncy. 2024. Graphameleon: Relational Learning and Anomaly Detection on Web Navigation Traces Captured as Knowledge Graphs. In The Web Conference 2024, WWW '24, Singapore, May 13--17, 2024, Proceedings. https://doi.org/10.1145/3589335.3651447
BibTex format:
bibtex
@inproceedings{graphemeleon-2024,
title = {{Graphameleon: Relational Learning and Anomaly Detection on Web Navigation Traces Captured as Knowledge Graphs}},
author = {{Lionel Tailhardat} and {Benjamin Stach} and {Yoan Chabot} and {Rapha\"el Troncy}},
booktitle = {{The Web Conference 2024, WWW '24, Singapore, May 13--17, 2024, Proceedings}},
year = {2024},
doi = {10.1145/3589335.3651447}
}
Copyright
Copyright (c) 2023, Orange. All rights reserved.
License
Maintainer
Owner
- Name: Orange
- Login: Orange-OpenSource
- Kind: organization
- Email: opensource.contact@orange.com
- Location: Paris, France
- Website: https://orange-opensource.github.io/
- Twitter: OrangeDev
- Repositories: 352
- Profile: https://github.com/Orange-OpenSource
Open Source by Orange
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
graphameleon-ds: a RDF dataset for process mining on Web navigation traces.
message: >-
If you use this dataset, please cite it using the
metadata from this file.
type: dataset
authors:
- orcid: 'https://orcid.org/0009-0004-4822-4748'
affiliation: Orange
given-names: Benjamin
family-names: Stach
- orcid: 'https://orcid.org/0000-0001-5887-899X'
affiliation: Orange
given-names: Lionel
family-names: Tailhardat
- orcid: 'https://orcid.org/0000-0001-5639-1504'
given-names: Yoan
family-names: Chabot
affiliation: Orange
- orcid: 'https://orcid.org/0000-0003-0457-1436'
affiliation: EURECOM
given-names: Raphaël
family-names: Troncy
repository-code: 'https://github.com/Orange-OpenSource/graphameleon-ds'
url: 'https://github.com/Orange-OpenSource/graphameleon-ds'
abstract: >-
graphameleon-ds: a RDF dataset for process mining on Web navigation traces.
The dataset comes from the Graphameleon Web extension.
keywords:
- process-mining
- web-navigation
- semantic-web
- linked-data
- RDF
license: CC-BY-NC-SA-4.0
version: v0.1.0
date-released: '2023-08-20'
preferred-citation:
type: conference-paper
authors:
- orcid: 'https://orcid.org/0000-0001-5887-899X'
affiliation: Orange
given-names: Lionel
family-names: Tailhardat
- orcid: 'https://orcid.org/0009-0004-4822-4748'
affiliation: Orange
given-names: Benjamin
family-names: Stach
- orcid: 'https://orcid.org/0000-0001-5639-1504'
given-names: Yoan
family-names: Chabot
affiliation: Orange
- orcid: 'https://orcid.org/0000-0003-0457-1436'
affiliation: EURECOM
given-names: Raphaël
family-names: Troncy
journal: "The Web Conference 2024, WWW '24, Singapore, May 13--17, 2024, Proceedings"
title: "Graphameleon: Relational Learning and Anomaly Detection on Web Navigation Traces Captured as Knowledge Graphs"
year: 2024