https://github.com/aksw/frankgraphbench

The FranKGraphBench is a Framework to allow KG Aware RSs to be benchmarked in a reproducible and easy to implement manner. It was first created on Google Summer of Code 2023 for Data Integration between DBpedia and some standard RS datasets in a reproducible framework.

https://github.com/aksw/frankgraphbench

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary

Keywords

benchmark knowledge-graph recommender-system
Last synced: 6 months ago · JSON representation

Repository

The FranKGraphBench is a Framework to allow KG Aware RSs to be benchmarked in a reproducible and easy to implement manner. It was first created on Google Summer of Code 2023 for Data Integration between DBpedia and some standard RS datasets in a reproducible framework.

Basic Info
Statistics
  • Stars: 8
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 1
Topics
benchmark knowledge-graph recommender-system
Created over 2 years ago · Last pushed 8 months ago
Metadata Files
Readme License

README.md

FranKGraphBench: Knowledge Graph Aware Recommender Systems Framework for Benchmarking

DOI

The FranKGraphBench is a framework to allow KG Aware RSs to be benchmarked in a reproducible and easy to implement manner. It was first created on Google Summer of Code 2023 for Data Integration between DBpedia and some standard RS datasets in a reproducible framework.

Check the docs for more information.

  • This repository was first created for Data Integration between DBpedia and some standard Recommender Systems datasets and a framework for reproducible experiments. For more info, check the project proposal and the project progress with weekly (as possible) updates.

Data Integration Usage

pip

We recommend using a python 3.8 virtual environment

shell pip install pybind11 pip install frankgraphbench

Install the full dataset using bash scripts located at datasets/:

shell cd datasets bash ml-100k.sh # Downloaded at `datasets/ml-100k` folder bash ml-1m.sh # Downloaded at `datasets/ml-1m` folder

Usage

shell data_integration [-h] -d DATASET -i INPUT_PATH -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-enrich] [-w] Arguments: - -h: Shows the help message. - -d: Name of a supported dataset. It will be the same name of the folder created by the bash script provided for the dataset. For now, check data_integration/dataset2class.py to see the supported ones. - -i: Input path where the full dataset is placed. - -o: Output path where the integrated dataset will be placed. - -ci: Use this flag if you want to convert item data. - -cu: Use this flag if you want to convert user data. - -cr: Use this flag if you want to convert rating data. - -cs: Use this flag if you want to convert social link data. - -map: Use this flag if you want to map dataset items with DBpedia. At least the item data should be already converted. - -enrich: Use this flag if you want to enrich dataset with DBpedia. - -w: Choose the number of workers(threads) to be used for parallel queries.

Usage Example:

shell data_integration -d 'ml-100k' -i 'datasets/ml-100k' -o 'datasets/ml-100k/processed' \ -ci -cu -cr -map -enrich -w 8

source

Install the required packages using python virtualenv, using:

shell python3 -m venv venv_data_integration/ source venv_data_integration/bin/activate pip3 install -r requirements_data_integration.txt

Install the full dataset using bash scripts located at datasets/:

shell cd datasets bash ml-100k.sh # Downloaded at `datasets/ml-100k` folder bash ml-1m.sh # Downloaded at `datasets/ml-1m` folder

Usage

shell python3 src/data_integration.py [-h] -d DATASET -i INPUT_PATH -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-enrich] [-w]

Arguments: - -h: Shows the help message. - -d: Name of a supported dataset. It will be the same name of the folder created by the bash script provided for the dataset. For now, check data_integration/dataset2class.py to see the supported ones. - -i: Input path where the full dataset is placed. - -o: Output path where the integrated dataset will be placed. - -ci: Use this flag if you want to convert item data. - -cu: Use this flag if you want to convert user data. - -cr: Use this flag if you want to convert rating data. - -cs: Use this flag if you want to convert social link data. - -map: Use this flag if you want to map dataset items with DBpedia. At least the item data should be already converted. - -enrich: Use this flag if you want to enrich dataset with DBpedia. - -w: Choose the number of workers(threads) to be used for parallel queries.

Usage Example:

shell python3 src/data_integration.py -d 'ml-100k' -i 'datasets/ml-100k' -o 'datasets/ml-100k/processed' \ -ci -cu -cr -map -enrich -w 8

Check Makefile for more examples.

Supported datasets

| Dataset | #items matched | #items | |---------|---------------|---| |MovieLens-100k|1411|1681| |MovieLens-1M|3253|3883| |LastFM-hetrec-2011|8628|17632| |Douban-Movie-Short-Comments-Dataset|24|28|douban-movie| |Yelp-Dataset|---|150348|yelp| |Amazon-Video-Games-5|---|21106|amazon-video_games-5|

Dataset enrichment is done through a fixed DBpedia endpoint available at ..., with raw files download available at ...

Framework for reproducible experiments usage

pip

We recommend using a python 3.8 virtual environment

shell pip install pybind11 pip install frankgraphbench

Usage

shell framework -c 'config_files/test.yml' Arguments: - -c: Experiment configuration file path.

The experiment config file should be a .yaml file like this:

```yaml experiment: dataset: name: ml-100k item: path: datasets/ml-100k/processed/item.csv extrafeatures: [movieyear, movietitle] user: path: datasets/ml-100k/processed/user.csv extrafeatures: [gender, occupation] ratings: path: datasets/ml-100k/processed/rating.csv timestamp: True enrich: mappath: datasets/ml-100k/processed/map.csv enrichpath: datasets/ml-100k/processed/enriched.csv remove_unmatched: False properties: - type: subject grouped: True sep: "::" - type: director grouped: True sep: "::"

preprocess: - method: filter_kcore parameters: k: 20 iterations: 1 target: user

split: seed: 42 test: method: k_fold k: 2 level: 'user'

models: - name: deepwalkbased config: saveweights: True parameters: walklen: 10 p: 1.0 q: 1.0 nwalks: 50 embedding_size: 64 epochs: 1

evaluation: k: 5 relevance_threshold: 3 metrics: [MAP, nDCG]

report: file: 'experimentresults/ml100kenriched/run1.csv' ```

See the config_files/ directory for more examples.

source

Install the require packages using python virtualenv, using:

shell python3 -m venv venv_framework/ source venv_framework/bin/activate pip3 install -r requirements_framework.txt

Usage

shell python3 src/framework.py -c 'config_files/test.yml' Arguments: - -c: Experiment configuration file path.

The experiment config file should be a .yaml file like this:

```yaml experiment: dataset: name: ml-100k item: path: datasets/ml-100k/processed/item.csv extrafeatures: [movieyear, movietitle] user: path: datasets/ml-100k/processed/user.csv extrafeatures: [gender, occupation] ratings: path: datasets/ml-100k/processed/rating.csv timestamp: True enrich: mappath: datasets/ml-100k/processed/map.csv enrichpath: datasets/ml-100k/processed/enriched.csv remove_unmatched: False properties: - type: subject grouped: True sep: "::" - type: director grouped: True sep: "::"

preprocess: - method: filter_kcore parameters: k: 20 iterations: 1 target: user

split: seed: 42 test: method: k_fold k: 2 level: 'user'

models: - name: deepwalkbased config: saveweights: True parameters: walklen: 10 p: 1.0 q: 1.0 nwalks: 50 embedding_size: 64 epochs: 1

evaluation: k: 5 relevance_threshold: 3 metrics: [MAP, nDCG]

report: file: 'experimentresults/ml100kenriched/run1.csv' ```

See the config_files/ directory for more examples.

Chart generation for results usage

Chart generation module based on: https://github.com/hfawaz/cd-diagram

pip

We recommend using a python 3.8 virtual environment

shell pip install pybind11 pip install frankgraphbench

After obtaining results from some experiments

Usage

shell chart_generation [-h] -c CHART -p PERFORMANCE_METRIC -f INPUT_FILES -i INPUT_PATH -o OUTPUT_PATH -n FILE_NAME Arguments: - -h: Shows the help message. - -p: Name of the performance metric within the file to use for chart generation. - -f: List of .csv files to use for generating the chart. - -i: Path where results data to generate chart is located in .csv files. - -o: Path where generated charts will be placed. - -n: Add a name (and file extension) to the chart that will be generated.

Usage Example:

shell chart_generation -c 'cd-diagram' -p 'MAP@5' -f "['ml-100k.csv', 'ml-1m.csv', 'lastfm.csv', 'ml-100k_enriched.csv', 'ml-1m_enriched.csv', 'lastfm_enriched.csv']" -i 'experiment_results' -o 'charts' -n 'MAP@5.pdf'

Supported charts

| Chart | |-------| |CD-Diagram|

source

Install the required packages using python virtualenv, using:

shell python3 -m venv venv_chart_generation/ source venv_chart_generation/bin/activate pip3 install -r requirements_chart_generation.txt After obtaining results from some experiments

Usage

shell python3 src/chart_generation.py [-h] -c CHART -p PERFORMANCE_METRIC -f INPUT_FILES -i INPUT_PATH -o OUTPUT_PATH -n FILE_NAME Arguments: - -h: Shows the help message. - -p: Name of the performance metric within the file to use for chart generation. - -f: List of .csv files to use for generating the chart. - -i: Path where results data to generate chart is located in .csv files. - -o: Path where generated charts will be placed. - -n: Add a name (and file extension) to the chart that will be generated.

Usage Example:

shell python3 src/chart_generation.py -c 'cd-diagram' -p 'MAP@5' -f "['ml-100k.csv', 'ml-1m.csv', 'lastfm.csv', 'ml-100k_enriched.csv', 'ml-1m_enriched.csv', 'lastfm_enriched.csv']" -i 'experiment_results' -o 'charts' -n 'MAP@5.pdf'

Supported charts

| Chart | |-------| |CD-Diagram|

Owner

  • Name: AKSW Research Group @ University of Leipzig
  • Login: AKSW
  • Kind: organization
  • Location: Leipzig

GitHub Events

Total
  • Release event: 1
  • Watch event: 2
  • Delete event: 2
  • Push event: 61
  • Pull request event: 5
  • Create event: 2
Last Year
  • Release event: 1
  • Watch event: 2
  • Delete event: 2
  • Push event: 61
  • Pull request event: 5
  • Create event: 2

Dependencies

docs/requirements.txt pypi
  • Babel ==2.13.0
  • Jinja2 ==3.1.2
  • MarkupSafe ==2.1.3
  • PyYAML ==6.0.1
  • Pygments ==2.16.1
  • alabaster ==0.7.13
  • certifi ==2023.7.22
  • charset-normalizer ==3.3.0
  • docutils ==0.18.1
  • idna ==3.4
  • imagesize ==1.4.1
  • importlib-metadata ==6.8.0
  • markdown-it-py ==3.0.0
  • mdit-py-plugins ==0.4.0
  • mdurl ==0.1.2
  • myst-parser ==2.0.0
  • packaging ==23.2
  • pytz ==2023.3.post1
  • requests ==2.31.0
  • snowballstemmer ==2.2.0
  • sphinx ==7.1.2
  • sphinx-rtd-theme ==1.3.0
  • sphinxcontrib-applehelp ==1.0.4
  • sphinxcontrib-devhelp ==1.0.2
  • sphinxcontrib-htmlhelp ==2.0.1
  • sphinxcontrib-jquery ==4.1
  • sphinxcontrib-jsmath ==1.0.1
  • sphinxcontrib-qthelp ==1.0.3
  • sphinxcontrib-serializinghtml ==1.1.5
  • urllib3 ==2.0.6
  • zipp ==3.17.0
requirements_data_integration.txt pypi
  • Levenshtein ==0.21.0
  • SPARQLWrapper ==2.0.0
  • isodate ==0.6.1
  • numpy ==1.24.3
  • pandas ==2.0.2
  • pyparsing ==3.0.9
  • python-Levenshtein ==0.21.0
  • python-dateutil ==2.8.2
  • pytz ==2023.3
  • rapidfuzz ==3.0.0
  • rdflib ==6.3.2
  • six ==1.16.0
  • thefuzz ==0.19.0
  • tqdm ==4.65.0
  • tzdata ==2023.3
requirements_framework.txt pypi
  • PyYAML ==6.0.1
  • gensim ==4.3.1
  • joblib ==1.3.2
  • networkx ==3.1
  • numpy ==1.24.4
  • pandas ==2.0.3
  • python-dateutil ==2.8.2
  • pytz ==2023.3
  • scikit-learn ==1.3.0
  • scipy ==1.10.1
  • six ==1.16.0
  • smart-open ==6.3.0
  • threadpoolctl ==3.2.0
  • tqdm ==4.65.0
  • tzdata ==2023.3