LLM-Playlist-Recommender

Repo for the LLM-based playlist recommender system

https://github.com/elea-vellard/LLM-Playlist-Recommender

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Repo for the LLM-based playlist recommender system

Basic Info
  • Host: GitHub
  • Owner: elea-vellard
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 25.8 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

Language Model-Based Playlist Generation Recommender

This repository contains the implementation of a novel approach to playlist generation using language models. Our method leverages the thematic coherence between playlist titles and their tracks by creating semantic clusters from text embeddings. We fine-tune a transformer model on these clusters to generate playlists based on cosine similarity scores between known and unknown titles, utilizing a voting mechanism for final recommendations.

The repository includes code for preprocessing, model training, evaluation, and generating recommendations. For more detail, please refer to the paper.

Related links - Online demo - code - Zenodo repository including the best trained model.

1. Transform and pre-process the dataset

Run:

bash python3 transform-dataset/json2csv.py

to convert the JSON slices of the dataset into user-friendly CSV files:

2. Embedding generation and clustering

First, playlists titles and tracks are embedded using a pre-trained SentenceBERT model and stored in a 'pickle' file:

bash python3 clustering-no-split/embeddings/track_embeddings_no-split.py

Then, the K-means clustering algorithm is applied to create the clusters, and the generated 'csv' file is modified to calculate and include the percentage of exact matches:

bash python3 clustering-no-split/clusters/clustering-no-split.py clustering-no-split/clusters/percent-no-split.py

Apply the clean algorithm to remove miscellaneous clusters:

bash python3 clustering-no-split/clean/clean.py

Finally, randomly split the clusters, ensuring a representation of each cluster in both train, test and validation sets:

bash python3 clustering-no-split/split/split.py

3. Finetuning

Train the SentenceBERT model with two loss functions (cross-entropy and triplet loss) to better capture thematic similarities:

bash python3 finetuning/cross_entropy_model_finetuning.py finetuning/finetuning_triplet_loss.py

4. Generate the embeddings for playlists titles using the fine-tuned models

Run :

bash python3 embeddings/playlists_embeddings_final.py

to generate embeddings for playlist titles using the fine-tuned models.

Make sure to adjust the model path to select either the triplet loss model, the cross-entropy loss model or the pretrained model.

5. Generate the recommendations and evaluate the models

Evaluate the metrics for a given test playlist:

bash python3 similarity/test_1_playlist_finetuned_model.py

Generate the recommendation for a playlist title:

bash python3 similarity/recommend.py

Assess the model’s overall performance on the complete test set:

bash python3 similarity/testset_test_model.py

Make sure to adjust the model path to select either the triplet loss model, the cross-entropy loss model or the pretrained model.

Citation

If you use this software, please cite (bib file):

Enzo Charolois–Pasqua, Eléa Vellard, Youssra Rebboud, Pasquale Lisena, and Raphael Troncy. 2025. A Language Model-Based Playlist Generation Recommender System. In Proceedings of the Nineteenth ACM Conference on Recommender Systems (RecSys ’25), September 22–26, 2025, Prague, Czech Re- public. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3705328.3748053

Owner

  • Login: elea-vellard
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this paper, please cite it as below."
title: "A Language Model-Based Playlist Generation Recommender System"
authors:
  - family-names: Charolois–Pasqua
    given-names: Enzo
  - family-names: Vellard
    given-names: Eléa
  - family-names: Rebboud
    given-names: Youssra
  - family-names: Lisena
    given-names: Pasquale
  - family-names: Troncy
    given-names: Raphael
date-released: 2025-09-22
conference: "Proceedings of the 19th ACM Conference on Recommender Systems (RecSys '25)"
type: conference-paper
publisher: ACM
place: Prague, Czech Republic
doi: 10.1145/3705328.3748053
url: https://doi.org/10.1145/3705328.3748053

preferred-citation:
  type: conference-paper
  title: "A Language Model-Based Playlist Generation Recommender System"
  authors:
    - family-names: Charolois-Pasqua
      given-names: Enzo
    - family-names: Vellard
      given-names: Eléa
    - family-names: Rebboud
      given-names: Youssra
    - family-names: Lisena
      given-names: Pasquale
    - family-names: Troncy
      given-names: Raphael
  conference: "Proceedings of the 19th ACM Conference on Recommender Systems (RecSys '25)"
  date: 2025-09-22
  publisher: ACM
  place: Prague, Czech Republic
  doi: 10.1145/3705328.3748053
  url: https://doi.org/10.1145/3705328.3748053

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Dependencies

demo/Dockerfile docker
  • python 3.11-slim build
demo/requirements.txt pypi
  • flask *
  • gensim *
  • scikit-learn *
  • tqdm *
  • transformers ==4.34.0