LLM-Playlist-Recommender

Repo for the LLM-based playlist recommender system

https://github.com/elea-vellard/LLM-Playlist-Recommender

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Repo for the LLM-based playlist recommender system

Basic Info

Host: GitHub
Owner: elea-vellard
License: mit
Language: Python
Default Branch: main
Size: 25.8 MB

Statistics

Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

Language Model-Based Playlist Generation Recommender

This repository contains the implementation of a novel approach to playlist generation using language models. Our method leverages the thematic coherence between playlist titles and their tracks by creating semantic clusters from text embeddings. We fine-tune a transformer model on these clusters to generate playlists based on cosine similarity scores between known and unknown titles, utilizing a voting mechanism for final recommendations.

The repository includes code for preprocessing, model training, evaluation, and generating recommendations. For more detail, please refer to the paper.

Related links - Online demo - code - Zenodo repository including the best trained model.

1. Transform and pre-process the dataset

Run:

bash python3 transform-dataset/json2csv.py

to convert the JSON slices of the dataset into user-friendly CSV files:

2. Embedding generation and clustering

First, playlists titles and tracks are embedded using a pre-trained SentenceBERT model and stored in a 'pickle' file:

bash python3 clustering-no-split/embeddings/track_embeddings_no-split.py

Then, the K-means clustering algorithm is applied to create the clusters, and the generated 'csv' file is modified to calculate and include the percentage of exact matches:

bash python3 clustering-no-split/clusters/clustering-no-split.py clustering-no-split/clusters/percent-no-split.py

Apply the clean algorithm to remove miscellaneous clusters:

bash python3 clustering-no-split/clean/clean.py

Finally, randomly split the clusters, ensuring a representation of each cluster in both train, test and validation sets:

bash python3 clustering-no-split/split/split.py

3. Finetuning

Train the SentenceBERT model with two loss functions (cross-entropy and triplet loss) to better capture thematic similarities:

bash python3 finetuning/cross_entropy_model_finetuning.py finetuning/finetuning_triplet_loss.py

4. Generate the embeddings for playlists titles using the fine-tuned models

Run :

bash python3 embeddings/playlists_embeddings_final.py

to generate embeddings for playlist titles using the fine-tuned models.

Make sure to adjust the model path to select either the triplet loss model, the cross-entropy loss model or the pretrained model.

5. Generate the recommendations and evaluate the models

Evaluate the metrics for a given test playlist:

bash python3 similarity/test_1_playlist_finetuned_model.py

Generate the recommendation for a playlist title:

bash python3 similarity/recommend.py

Assess the model’s overall performance on the complete test set:

bash python3 similarity/testset_test_model.py

Make sure to adjust the model path to select either the triplet loss model, the cross-entropy loss model or the pretrained model.

Citation

If you use this software, please cite (bib file):

Enzo Charolois–Pasqua, Eléa Vellard, Youssra Rebboud, Pasquale Lisena, and Raphael Troncy. 2025. A Language Model-Based Playlist Generation Recommender System. In Proceedings of the Nineteenth ACM Conference on Recommender Systems (RecSys ’25), September 22–26, 2025, Prague, Czech Re- public. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3705328.3748053

Owner

Login: elea-vellard
Kind: user

Repositories: 1
Profile: https://github.com/elea-vellard

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this paper, please cite it as below."
title: "A Language Model-Based Playlist Generation Recommender System"
authors:
  - family-names: Charolois–Pasqua
    given-names: Enzo
  - family-names: Vellard
    given-names: Eléa
  - family-names: Rebboud
    given-names: Youssra
  - family-names: Lisena
    given-names: Pasquale
  - family-names: Troncy
    given-names: Raphael
date-released: 2025-09-22
conference: "Proceedings of the 19th ACM Conference on Recommender Systems (RecSys '25)"
type: conference-paper
publisher: ACM
place: Prague, Czech Republic
doi: 10.1145/3705328.3748053
url: https://doi.org/10.1145/3705328.3748053

preferred-citation:
  type: conference-paper
  title: "A Language Model-Based Playlist Generation Recommender System"
  authors:
    - family-names: Charolois-Pasqua
      given-names: Enzo
    - family-names: Vellard
      given-names: Eléa
    - family-names: Rebboud
      given-names: Youssra
    - family-names: Lisena
      given-names: Pasquale
    - family-names: Troncy
      given-names: Raphael
  conference: "Proceedings of the 19th ACM Conference on Recommender Systems (RecSys '25)"
  date: 2025-09-22
  publisher: ACM
  place: Prague, Czech Republic
  doi: 10.1145/3705328.3748053
  url: https://doi.org/10.1145/3705328.3748053

GitHub Events

Total

Push event: 1

Last Year

Push event: 1

Dependencies

demo/Dockerfile docker

python 3.11-slim build

demo/requirements.txt pypi

flask *
gensim *
scikit-learn *
tqdm *
transformers ==4.34.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science