https://github.com/aksw/frankgraphbench

The FranKGraphBench is a Framework to allow KG Aware RSs to be benchmarked in a reproducible and easy to implement manner. It was first created on Google Summer of Code 2023 for Data Integration between DBpedia and some standard RS datasets in a reproducible framework.

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary

Keywords

benchmark knowledge-graph recommender-system

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: AKSW
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://frankgraphbench.readthedocs.io
Size: 15.9 MB

Statistics

Stars: 8
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 1

Topics

benchmark knowledge-graph recommender-system

Created over 2 years ago · Last pushed 8 months ago

Metadata Files

Readme License

README.md

FranKGraphBench: Knowledge Graph Aware Recommender Systems Framework for Benchmarking

The FranKGraphBench is a framework to allow KG Aware RSs to be benchmarked in a reproducible and easy to implement manner. It was first created on Google Summer of Code 2023 for Data Integration between DBpedia and some standard RS datasets in a reproducible framework.

Check the docs for more information.

This repository was first created for Data Integration between DBpedia and some standard Recommender Systems datasets and a framework for reproducible experiments. For more info, check the project proposal and the project progress with weekly (as possible) updates.

Data Integration Usage

pip

We recommend using a python 3.8 virtual environment

shell pip install pybind11 pip install frankgraphbench

Install the full dataset using bash scripts located at datasets/:

shell cd datasets bash ml-100k.sh # Downloaded at `datasets/ml-100k` folder bash ml-1m.sh # Downloaded at `datasets/ml-1m` folder

Usage

shell data_integration [-h] -d DATASET -i INPUT_PATH -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-enrich] [-w] Arguments: - -h: Shows the help message. - -d: Name of a supported dataset. It will be the same name of the folder created by the bash script provided for the dataset. For now, check data_integration/dataset2class.py to see the supported ones. - -i: Input path where the full dataset is placed. - -o: Output path where the integrated dataset will be placed. - -ci: Use this flag if you want to convert item data. - -cu: Use this flag if you want to convert user data. - -cr: Use this flag if you want to convert rating data. - -cs: Use this flag if you want to convert social link data. - -map: Use this flag if you want to map dataset items with DBpedia. At least the item data should be already converted. - -enrich: Use this flag if you want to enrich dataset with DBpedia. - -w: Choose the number of workers(threads) to be used for parallel queries.

Usage Example:

shell data_integration -d 'ml-100k' -i 'datasets/ml-100k' -o 'datasets/ml-100k/processed' \ -ci -cu -cr -map -enrich -w 8

source

Install the required packages using python virtualenv, using:

shell python3 -m venv venv_data_integration/ source venv_data_integration/bin/activate pip3 install -r requirements_data_integration.txt

Install the full dataset using bash scripts located at datasets/:

shell cd datasets bash ml-100k.sh # Downloaded at `datasets/ml-100k` folder bash ml-1m.sh # Downloaded at `datasets/ml-1m` folder

Usage

shell python3 src/data_integration.py [-h] -d DATASET -i INPUT_PATH -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-enrich] [-w]

Arguments: - -h: Shows the help message. - -d: Name of a supported dataset. It will be the same name of the folder created by the bash script provided for the dataset. For now, check data_integration/dataset2class.py to see the supported ones. - -i: Input path where the full dataset is placed. - -o: Output path where the integrated dataset will be placed. - -ci: Use this flag if you want to convert item data. - -cu: Use this flag if you want to convert user data. - -cr: Use this flag if you want to convert rating data. - -cs: Use this flag if you want to convert social link data. - -map: Use this flag if you want to map dataset items with DBpedia. At least the item data should be already converted. - -enrich: Use this flag if you want to enrich dataset with DBpedia. - -w: Choose the number of workers(threads) to be used for parallel queries.

Usage Example:

shell python3 src/data_integration.py -d 'ml-100k' -i 'datasets/ml-100k' -o 'datasets/ml-100k/processed' \ -ci -cu -cr -map -enrich -w 8

Check Makefile for more examples.

Supported datasets

| Dataset | #items matched | #items | |---------|---------------|---| |MovieLens-100k|1411|1681| |MovieLens-1M|3253|3883| |LastFM-hetrec-2011|8628|17632| |Douban-Movie-Short-Comments-Dataset|24|28|douban-movie| |Yelp-Dataset|---|150348|yelp| |Amazon-Video-Games-5|---|21106|amazon-video_games-5|

Dataset enrichment is done through a fixed DBpedia endpoint available at ..., with raw files download available at ...

Framework for reproducible experiments usage

pip

We recommend using a python 3.8 virtual environment

shell pip install pybind11 pip install frankgraphbench

Usage

shell framework -c 'config_files/test.yml' Arguments: - -c: Experiment configuration file path.

The experiment config file should be a .yaml file like this:

```yaml experiment: dataset: name: ml-100k item: path: datasets/ml-100k/processed/item.csv extrafeatures: [movieyear, movietitle] user: path: datasets/ml-100k/processed/user.csv extrafeatures: [gender, occupation] ratings: path: datasets/ml-100k/processed/rating.csv timestamp: True enrich: mappath: datasets/ml-100k/processed/map.csv enrichpath: datasets/ml-100k/processed/enriched.csv remove_unmatched: False properties: - type: subject grouped: True sep: "::" - type: director grouped: True sep: "::"

preprocess: - method: filter_kcore parameters: k: 20 iterations: 1 target: user

split: seed: 42 test: method: k_fold k: 2 level: 'user'

models: - name: deepwalkbased config: saveweights: True parameters: walklen: 10 p: 1.0 q: 1.0 nwalks: 50 embedding_size: 64 epochs: 1

evaluation: k: 5 relevance_threshold: 3 metrics: [MAP, nDCG]

report: file: 'experimentresults/ml100kenriched/run1.csv' ```

See the config_files/ directory for more examples.

source

Install the require packages using python virtualenv, using:

shell python3 -m venv venv_framework/ source venv_framework/bin/activate pip3 install -r requirements_framework.txt

Usage

shell python3 src/framework.py -c 'config_files/test.yml' Arguments: - -c: Experiment configuration file path.

The experiment config file should be a .yaml file like this:

preprocess: - method: filter_kcore parameters: k: 20 iterations: 1 target: user

split: seed: 42 test: method: k_fold k: 2 level: 'user'

models: - name: deepwalkbased config: saveweights: True parameters: walklen: 10 p: 1.0 q: 1.0 nwalks: 50 embedding_size: 64 epochs: 1

evaluation: k: 5 relevance_threshold: 3 metrics: [MAP, nDCG]

report: file: 'experimentresults/ml100kenriched/run1.csv' ```

See the config_files/ directory for more examples.

Chart generation for results usage

Chart generation module based on: https://github.com/hfawaz/cd-diagram

pip

We recommend using a python 3.8 virtual environment

shell pip install pybind11 pip install frankgraphbench

After obtaining results from some experiments

Usage

shell chart_generation [-h] -c CHART -p PERFORMANCE_METRIC -f INPUT_FILES -i INPUT_PATH -o OUTPUT_PATH -n FILE_NAME Arguments: - -h: Shows the help message. - -p: Name of the performance metric within the file to use for chart generation. - -f: List of .csv files to use for generating the chart. - -i: Path where results data to generate chart is located in .csv files. - -o: Path where generated charts will be placed. - -n: Add a name (and file extension) to the chart that will be generated.

Usage Example:

shell chart_generation -c 'cd-diagram' -p 'MAP@5' -f "['ml-100k.csv', 'ml-1m.csv', 'lastfm.csv', 'ml-100k_enriched.csv', 'ml-1m_enriched.csv', 'lastfm_enriched.csv']" -i 'experiment_results' -o 'charts' -n 'MAP@5.pdf'

Supported charts

| Chart | |-------| |CD-Diagram|

source

Install the required packages using python virtualenv, using:

shell python3 -m venv venv_chart_generation/ source venv_chart_generation/bin/activate pip3 install -r requirements_chart_generation.txt After obtaining results from some experiments

Usage

shell python3 src/chart_generation.py [-h] -c CHART -p PERFORMANCE_METRIC -f INPUT_FILES -i INPUT_PATH -o OUTPUT_PATH -n FILE_NAME Arguments: - -h: Shows the help message. - -p: Name of the performance metric within the file to use for chart generation. - -f: List of .csv files to use for generating the chart. - -i: Path where results data to generate chart is located in .csv files. - -o: Path where generated charts will be placed. - -n: Add a name (and file extension) to the chart that will be generated.

Usage Example:

shell python3 src/chart_generation.py -c 'cd-diagram' -p 'MAP@5' -f "['ml-100k.csv', 'ml-1m.csv', 'lastfm.csv', 'ml-100k_enriched.csv', 'ml-1m_enriched.csv', 'lastfm_enriched.csv']" -i 'experiment_results' -o 'charts' -n 'MAP@5.pdf'

Supported charts

| Chart | |-------| |CD-Diagram|

Owner

Name: AKSW Research Group @ University of Leipzig
Login: AKSW
Kind: organization
Location: Leipzig

Website: http://aksw.org
Repositories: 358
Profile: https://github.com/AKSW

GitHub Events

Total

Release event: 1
Watch event: 2
Delete event: 2
Push event: 61
Pull request event: 5
Create event: 2

Last Year

Release event: 1
Watch event: 2
Delete event: 2
Push event: 61
Pull request event: 5
Create event: 2

Dependencies

docs/requirements.txt pypi

Babel ==2.13.0
Jinja2 ==3.1.2
MarkupSafe ==2.1.3
PyYAML ==6.0.1
Pygments ==2.16.1
alabaster ==0.7.13
certifi ==2023.7.22
charset-normalizer ==3.3.0
docutils ==0.18.1
idna ==3.4
imagesize ==1.4.1
importlib-metadata ==6.8.0
markdown-it-py ==3.0.0
mdit-py-plugins ==0.4.0
mdurl ==0.1.2
myst-parser ==2.0.0
packaging ==23.2
pytz ==2023.3.post1
requests ==2.31.0
snowballstemmer ==2.2.0
sphinx ==7.1.2
sphinx-rtd-theme ==1.3.0
sphinxcontrib-applehelp ==1.0.4
sphinxcontrib-devhelp ==1.0.2
sphinxcontrib-htmlhelp ==2.0.1
sphinxcontrib-jquery ==4.1
sphinxcontrib-jsmath ==1.0.1
sphinxcontrib-qthelp ==1.0.3
sphinxcontrib-serializinghtml ==1.1.5
urllib3 ==2.0.6
zipp ==3.17.0

requirements_data_integration.txt pypi

Levenshtein ==0.21.0
SPARQLWrapper ==2.0.0
isodate ==0.6.1
numpy ==1.24.3
pandas ==2.0.2
pyparsing ==3.0.9
python-Levenshtein ==0.21.0
python-dateutil ==2.8.2
pytz ==2023.3
rapidfuzz ==3.0.0
rdflib ==6.3.2
six ==1.16.0
thefuzz ==0.19.0
tqdm ==4.65.0
tzdata ==2023.3

requirements_framework.txt pypi

PyYAML ==6.0.1
gensim ==4.3.1
joblib ==1.3.2
networkx ==3.1
numpy ==1.24.4
pandas ==2.0.3
python-dateutil ==2.8.2
pytz ==2023.3
scikit-learn ==1.3.0
scipy ==1.10.1
six ==1.16.0
smart-open ==6.3.0
threadpoolctl ==3.2.0
tqdm ==4.65.0
tzdata ==2023.3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/aksw/frankgraphbench

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

FranKGraphBench: Knowledge Graph Aware Recommender Systems Framework for Benchmarking

Data Integration Usage

pip

Usage

source

Usage

Supported datasets

Framework for reproducible experiments usage

pip

Usage

source

Usage

Chart generation for results usage

pip

Usage

Supported charts

source

Usage

Supported charts

Owner

GitHub Events

Total

Last Year

Dependencies