WSKNN - Weighted Session-based K-NN recommender system

WSKNN - Weighted Session-based K-NN recommender system - Published in JOSS (2023)

https://github.com/nokaut/wsknn

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 12 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

e-commerce knn machine-learning recommendation-engine recommender-system vsknn

Scientific Fields

Agricultural and Biological Sciences Life Sciences - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

Session-weighted recommendation system in Python

Basic Info

Host: GitHub
Owner: nokaut
License: bsd-3-clause
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 130 MB

Statistics

Stars: 6
Watchers: 5
Forks: 2
Open Issues: 0
Releases: 7

Archived

Topics

e-commerce knn machine-learning recommendation-engine recommender-system vsknn

Created almost 4 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog Contributing License

WSKNN: k-NN recommender for session-based data

Weighted session-based k-NN - Intro

Do you build a recommender system for your website? K-nearest neighbors algorithm is a good choice if you are looking for a simple, fast, and explainable solution. Weighted-session-based k-nn recommendations are close to the state-of-the-art, and we don't need to tune multiple hyperparameters and build complex deep learning models to achieve a good result.

Documentation

API Documentation is available here: WSKNN Docs

How does it work?

You provide two input structures as training data:

``` sessions : dict sessions = { session id: ( [sequence of items with user interaction], [timestamp of user interaction per item], [(optional) sequence of event names], [(optional) sequence of weights] ) }

items : dict items = { item id: ( [sequence of sessions with an item], [the first timestamp of each session with an item] ) } ```

And you ask a model to recommend products based on the user session:

user session: {session id: [[sequence of items], [sequence of timestamps], [optional event names], [optional weights]] }

The package is lightweight. It depends only on the numpy and pyyaml.

Moreover, we can provide a package for non-programmers, and they can use settings.yaml to control a model behavior.

Why should we use WSKNN?

training is faster than deep learning or XGBoost algorithms, model memorizes map of session-items and item-sessions,
recommendations are easy to control. We can change how the algorithm works in just a few lines... of text,
as a baseline, for comparison of deep learning / XGBoost architectures,
swift prototyping,
easy to run in production.

The model was created along with multiple other approaches: based on RNN (GRU/LSTM), matrix factorization, and others. Its performance was always very close to the level of fine-tuned neural networks, but it was much easier and faster to train.

What are the limitations of WSKNN?

model memorizes session-items and item-sessions maps, and if your product base is large, and you use sessions for an extended period, then the model may be too big to fit an available memory; in this case, you can categorize products and train a different model for each category,
response time may be slower than from other models, especially if there are available many sessions,
there's additional overhead related to the preparation of the input.

Example

Example below is available in demo-notebooks/demo-readme.ipynb notebook.

```python import numpy as np from wsknn import fit from wsknn.utils import loadgzippedpickle

Load data

ITEMS = 'demo-data/recsys-2015/parseditems.pkl.gz' SESSIONS = 'demo-data/recsys-2015/parsedsessions.pkl.gz'

items = loadgzippedpickle(ITEMS) sessions = loadgzippedpickle(SESSIONS) imap = items['map'] smap = sessions['map']

Train model

trainedmodel = fit(smap, imap, numberofrecommendations=5, weightingfunc='log', returneventsfrom_session=False)

Get sample session

testsessionkey = np.random.choice(list(smap.keys())) testsession = smap[testsessionkey] print(testsession) # [products], [timestamps]

```

shell [[214850771, 214677615, 214651777], [1407592501.048, 1407592529.941, 1407592552.98]]

```python

recommendations = trainedmodel.recommend(testsession) for rec in recommendations: print('Item:', rec[0], '| weight:', rec[1])

```

Output recommendations

Setup

Version 1.x of a package can be installed with pip:

shell pip install wsknn

It works with Python versions greater or equal to 3.8.

Requirements

| Package Version | Python versions | Requirements | |-----------------|-----------------|---------------------------------------------| | 0.1.x | 3.6+ | numpy, pyyaml | | 1.1.x | 3.8+ | numpy, moreitertools, pyyaml | | 1.2.x | 3.8+ | numpy, moreitertools, pandas, pyyaml, tqdm |

Contribution

We welcome all submissions, issues, feature requests, and bug reports! To learn how to contribute to the package please visit CONTRIBUTION.md file

Developers

Szymon Moliski (Sales Intelligence : Digitree Group SA)

Citation

Moliski, S., (2023). WSKNN - Weighted Session-based K-NN recommender system. Journal of Open Source Software, 8(90), 5639, https://doi.org/10.21105/joss.05639

Bibliography

Data used in a demo example

David Ben-Shimon, Alexander Tsikinovsky, Michael Friedmann, Bracha Shapira, Lior Rokach, and Johannes Hoerle. 2015. RecSys Challenge 2015 and the YOOCHOOSE Dataset. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys '15). Association for Computing Machinery, New York, NY, USA, 357358. DOI:https://doi.org/10.1145/2792838.2798723

Comparison between DL and WSKNN

Twardowski, B., Zawistowski, P., Zaborowski, S. (2021). Metric Learning for Session-Based Recommendations. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12656. Springer, Cham. https://doi.org/10.1007/978-3-030-72113-8_43

SKNN performance

The article compares performance of mutiple session-based recommender systems.

Ludewig, M., Jannach, D. Evaluation of session-based recommendation algorithms. User Model User-Adap Inter 28, 331390 (2018). https://doi.org/10.1007/s11257-018-9209-6

Funding

Development of the package was partially based on the research project E-commerce Shopping Patterns Prediction System that was founded under Priority Axis 1.1 of Smart Growth Operational Programme 2014-2020 for Poland co-funded by European Regional Development Fund. Project number: POIR.01.01.01-00-0632/18

Computational Performance

As a rule of thumb you should assume that you should have ~2 times more memory available than your model's memory size

Used machine has 16GB RAM and 4-core CPU with 4.5 GHz frequency
testing sample size - 1000 sessions
max session length - 50 events
min session length - 1 event
basic data types (integers)

All performance characterists were derived in this notebook, and you can use it for your own performance tests.

Training time in relation to session length vs number of items

Training time in relation to Session length vs number of items

Total response time for 1000 requests in relation to session length vs number of items

Model size in relation to session length vs number of items

Relation between training time and increasing number of items

Relation between response time and increasing number of items (for 1000 requests)

Relation between response time and increasing number of items

Relation between training time and increasing number of sessions

Relation between response time and increasing number of sessions (for 1000 requests)

Relation between response time and increasing number of sessions

Owner

Name: Sales Intelligence sp z o.o.
Login: nokaut
Kind: organization
Location: Gdynia, Poland

Website: https://salesintelligence.pl/
Repositories: 10
Profile: https://github.com/nokaut

JOSS Publication

WSKNN - Weighted Session-based K-NN recommender system

Published

October 09, 2023

DOI

10.21105/joss.05639

Volume 8, Issue 90, Page 5639

Authors

Szymon Moliński

Sales Intelligence Sp. z o.o., Digitree SA

Editor

Aoife Hughes

GitHub Events

Total

Fork event: 2

Last Year

Fork event: 2

Committers

Last synced: 7 months ago

All Time

Total Commits: 125
Total Committers: 2
Avg Commits per committer: 62.5
Development Distribution Score (DDS): 0.088

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Szymon	s**n@m**m	114
Szymon Moliński	s**i@d**l	11

Committer Domains (Top 20 + Academic)

digitree.pl: 1 ml-gis-service.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 26
Total pull requests: 45
Average time to close issues: 24 days
Average time to close pull requests: about 1 hour
Total issue authors: 4
Total pull request authors: 1
Average comments per issue: 0.65
Average comments per pull request: 0.0
Merged pull requests: 44
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

SimonMolinsky (17)
inpefess (6)
AoifeHughes (1)
ralgond (1)

Pull Request Authors

SimonMolinsky (46)

Top Labels

Issue Labels

documentation (10) enhancement (7) paper-submission (6) bug (4) invalid (4) question (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 66 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 16
Total maintainers: 1

pypi.org: wsknn

Weighted session-based model for recommendations

Homepage: https://github.com/nokaut/wsknn
Documentation: https://readthedocs.org/projects/wsknn/
License: MIT
Latest release: 1.2.3
published over 1 year ago

Versions: 16
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 66 Last month

Rankings

Dependent packages count: 10.0%

Downloads: 19.1%

Average: 20.4%

Stargazers count: 21.5%

Dependent repos count: 21.7%

Forks count: 29.8%

Maintainers (1)

smolinski

Last synced: 6 months ago

Dependencies

.github/workflows/test_update.yaml actions

actions/checkout v3 composite
actions/setup-python v3 composite

.github/workflows/draft-pdf.yml actions

actions/checkout v3 composite
actions/upload-artifact v1 composite
openjournals/openjournals-draft-action master composite

docs/requirements.txt pypi

more_itertools *
numpy *
numpydoc *
pandas *
pydata-sphinx-theme *
pyyaml *
sphinx *
sphinx-copybutton *
tqdm *

pyproject.toml pypi

requirements-dev.txt pypi

build * development
joblib * development
matplotlib * development
more_itertools * development
numpy * development
pandas * development
pytest * development
pyyaml * development
setuptools * development
tqdm * development
twine * development

requirements.txt pypi

joblib *
more_itertools *
numpy *
pandas *
pyyaml *
tqdm *

WSKNN - Weighted Session-based K-NN recommender system

Science Score: 93.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

WSKNN: k-NN recommender for session-based data

Weighted session-based k-NN - Intro

Documentation

How does it work?

Why should we use WSKNN?

What are the limitations of WSKNN?

Example

Load data

Train model

Get sample session

Setup

Requirements

Contribution

Developers

Citation

Bibliography

Data used in a demo example

Comparison between DL and WSKNN

SKNN performance

Funding

Computational Performance

Training time in relation to session length vs number of items

Total response time for 1000 requests in relation to session length vs number of items

Model size in relation to session length vs number of items

Relation between training time and increasing number of items

Relation between response time and increasing number of items (for 1000 requests)

Relation between training time and increasing number of sessions

Relation between response time and increasing number of sessions (for 1000 requests)

Owner

JOSS Publication

WSKNN - Weighted Session-based K-NN recommender system

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: wsknn

Rankings

Maintainers (1)

Dependencies