graphameleon-ds

An RDF dataset of Web navigation traces, generated by the Graphameleon Web extension

https://github.com/orange-opensource/graphameleon-ds

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

dataset linked-data process-mining rdf semantic-web web-navigation
Last synced: 6 months ago · JSON representation ·

Repository

An RDF dataset of Web navigation traces, generated by the Graphameleon Web extension

Basic Info
  • Host: GitHub
  • Owner: Orange-OpenSource
  • License: other
  • Language: Makefile
  • Default Branch: main
  • Homepage:
  • Size: 6.52 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
dataset linked-data process-mining rdf semantic-web web-navigation
Created over 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

graphameleon-ds

graphameleon-ds: a RDF dataset for process mining on Web navigation traces. The dataset comes from the Graphameleon Web extension.

The following dataset was built using the Graphameleon Web extension, an open-source plug-in that captures Web navigation traces and transforms them into a RDF graph for further exploration (eg., process-mining of navigation traces, Web browser and server behavior analysis, network topology analysis).

The RDF dataset implements the concepts of micro-activity and macro-activity (see below) on the basis of the UCO (Unified Cyber Ontology) vocabulary for the semantic representation of user activities. UCO is a popular community-developed ontology built around the cyber security domain, covering a wide range of important concepts such as agents, resources, or actions. Since UCO is also used in NORIA-O, an ontology enabling to describe a IT network, one could establish meaningful connections and access additional contextual information within a knowledge graph combining Graphameleon data and network topology data.

Usage

The data in the sub-folders are describing:

  • exp-01: RDF triples generated by Graphameleon during the initial connection to a set of websites. We refer to this data as the "Website complexity clustering" experiment in which we sought to understand to what extent the behavior of a website during a first connection is crucial in creating a usable footprint subsequently for anomaly detection.

  • exp-02: RDF triples generated by Graphameleon during three pre-defined Web navigation scenarios on a simulated online bookstore website. We refer to this data as the "Navigation trace classification" experiment in which we sought to classify the Web navigation traces as either normal or abnormal behaviors.

The typical use of the provided data corresponds to:

  • Cloning the repository to your computer,
  • Query the data with SPARQL queries (eg., using the Apache Jena CLI toolset),
  • Analyse the user activities with higher-level tools (eg., using the PM4Py Python package).

Semantic Modeling of User Activity

Semantic representation of a macro-activity:

gpl_mapping_macro.png

Semantic representation of a micro-activity:

gpl_mapping_micro.png

Citation

If you use this dataset in a scientific publication, please cite:

Lionel Tailhardat, Benjamin Stach, Yoan Chabot, and Raphaël Troncy. 2024. Graphameleon: Relational Learning and Anomaly Detection on Web Navigation Traces Captured as Knowledge Graphs. In The Web Conference 2024, WWW '24, Singapore, May 13--17, 2024, Proceedings. https://doi.org/10.1145/3589335.3651447

BibTex format:

bibtex @inproceedings{graphemeleon-2024, title = {{Graphameleon: Relational Learning and Anomaly Detection on Web Navigation Traces Captured as Knowledge Graphs}}, author = {{Lionel Tailhardat} and {Benjamin Stach} and {Yoan Chabot} and {Rapha\"el Troncy}}, booktitle = {{The Web Conference 2024, WWW '24, Singapore, May 13--17, 2024, Proceedings}}, year = {2024}, doi = {10.1145/3589335.3651447} }

Copyright

Copyright (c) 2023, Orange. All rights reserved.

License

CC-BY-NC-SA

Maintainer

Owner

  • Name: Orange
  • Login: Orange-OpenSource
  • Kind: organization
  • Email: opensource.contact@orange.com
  • Location: Paris, France

Open Source by Orange

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  graphameleon-ds: a RDF dataset for process mining on Web navigation traces.
message: >-
  If you use this dataset, please cite it using the
  metadata from this file.
type: dataset
authors:
  - orcid: 'https://orcid.org/0009-0004-4822-4748'
    affiliation: Orange
    given-names: Benjamin
    family-names: Stach
  - orcid: 'https://orcid.org/0000-0001-5887-899X'
    affiliation: Orange
    given-names: Lionel
    family-names: Tailhardat
  - orcid: 'https://orcid.org/0000-0001-5639-1504'
    given-names: Yoan
    family-names: Chabot
    affiliation: Orange
  - orcid: 'https://orcid.org/0000-0003-0457-1436'
    affiliation: EURECOM
    given-names: Raphaël
    family-names: Troncy
repository-code: 'https://github.com/Orange-OpenSource/graphameleon-ds'
url: 'https://github.com/Orange-OpenSource/graphameleon-ds'
abstract: >-
  graphameleon-ds: a RDF dataset for process mining on Web navigation traces.
  The dataset comes from the Graphameleon Web extension.
keywords:
  - process-mining
  - web-navigation
  - semantic-web
  - linked-data
  - RDF
license: CC-BY-NC-SA-4.0
version: v0.1.0
date-released: '2023-08-20'
preferred-citation:
  type: conference-paper
  authors:
  - orcid: 'https://orcid.org/0000-0001-5887-899X'
    affiliation: Orange
    given-names: Lionel
    family-names: Tailhardat
  - orcid: 'https://orcid.org/0009-0004-4822-4748'
    affiliation: Orange
    given-names: Benjamin
    family-names: Stach
  - orcid: 'https://orcid.org/0000-0001-5639-1504'
    given-names: Yoan
    family-names: Chabot
    affiliation: Orange
  - orcid: 'https://orcid.org/0000-0003-0457-1436'
    affiliation: EURECOM
    given-names: Raphaël
    family-names: Troncy
  journal: "The Web Conference 2024, WWW '24, Singapore, May 13--17, 2024, Proceedings"
  title: "Graphameleon: Relational Learning and Anomaly Detection on Web Navigation Traces Captured as Knowledge Graphs"
  year: 2024

GitHub Events

Total
Last Year