LLM-Playlist-Recommender
Repo for the LLM-based playlist recommender system
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Repository
Repo for the LLM-based playlist recommender system
Basic Info
- Host: GitHub
- Owner: elea-vellard
- License: mit
- Language: Python
- Default Branch: main
- Size: 25.8 MB
Statistics
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Language Model-Based Playlist Generation Recommender
This repository contains the implementation of a novel approach to playlist generation using language models. Our method leverages the thematic coherence between playlist titles and their tracks by creating semantic clusters from text embeddings. We fine-tune a transformer model on these clusters to generate playlists based on cosine similarity scores between known and unknown titles, utilizing a voting mechanism for final recommendations.
The repository includes code for preprocessing, model training, evaluation, and generating recommendations. For more detail, please refer to the paper.
Related links - Online demo - code - Zenodo repository including the best trained model.
1. Transform and pre-process the dataset
Run:
bash
python3 transform-dataset/json2csv.py
to convert the JSON slices of the dataset into user-friendly CSV files:
2. Embedding generation and clustering
First, playlists titles and tracks are embedded using a pre-trained SentenceBERT model and stored in a 'pickle' file:
bash
python3 clustering-no-split/embeddings/track_embeddings_no-split.py
Then, the K-means clustering algorithm is applied to create the clusters, and the generated 'csv' file is modified to calculate and include the percentage of exact matches:
bash
python3 clustering-no-split/clusters/clustering-no-split.py clustering-no-split/clusters/percent-no-split.py
Apply the clean algorithm to remove miscellaneous clusters:
bash
python3 clustering-no-split/clean/clean.py
Finally, randomly split the clusters, ensuring a representation of each cluster in both train, test and validation sets:
bash
python3 clustering-no-split/split/split.py
3. Finetuning
Train the SentenceBERT model with two loss functions (cross-entropy and triplet loss) to better capture thematic similarities:
bash
python3 finetuning/cross_entropy_model_finetuning.py finetuning/finetuning_triplet_loss.py
4. Generate the embeddings for playlists titles using the fine-tuned models
Run :
bash
python3 embeddings/playlists_embeddings_final.py
to generate embeddings for playlist titles using the fine-tuned models.
Make sure to adjust the model path to select either the triplet loss model, the cross-entropy loss model or the pretrained model.
5. Generate the recommendations and evaluate the models
Evaluate the metrics for a given test playlist:
bash
python3 similarity/test_1_playlist_finetuned_model.py
Generate the recommendation for a playlist title:
bash
python3 similarity/recommend.py
Assess the model’s overall performance on the complete test set:
bash
python3 similarity/testset_test_model.py
Make sure to adjust the model path to select either the triplet loss model, the cross-entropy loss model or the pretrained model.
Citation
If you use this software, please cite (bib file):
Enzo Charolois–Pasqua, Eléa Vellard, Youssra Rebboud, Pasquale Lisena, and Raphael Troncy. 2025. A Language Model-Based Playlist Generation Recommender System. In Proceedings of the Nineteenth ACM Conference on Recommender Systems (RecSys ’25), September 22–26, 2025, Prague, Czech Re- public. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3705328.3748053
Owner
- Login: elea-vellard
- Kind: user
- Repositories: 1
- Profile: https://github.com/elea-vellard
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this paper, please cite it as below."
title: "A Language Model-Based Playlist Generation Recommender System"
authors:
- family-names: Charolois–Pasqua
given-names: Enzo
- family-names: Vellard
given-names: Eléa
- family-names: Rebboud
given-names: Youssra
- family-names: Lisena
given-names: Pasquale
- family-names: Troncy
given-names: Raphael
date-released: 2025-09-22
conference: "Proceedings of the 19th ACM Conference on Recommender Systems (RecSys '25)"
type: conference-paper
publisher: ACM
place: Prague, Czech Republic
doi: 10.1145/3705328.3748053
url: https://doi.org/10.1145/3705328.3748053
preferred-citation:
type: conference-paper
title: "A Language Model-Based Playlist Generation Recommender System"
authors:
- family-names: Charolois-Pasqua
given-names: Enzo
- family-names: Vellard
given-names: Eléa
- family-names: Rebboud
given-names: Youssra
- family-names: Lisena
given-names: Pasquale
- family-names: Troncy
given-names: Raphael
conference: "Proceedings of the 19th ACM Conference on Recommender Systems (RecSys '25)"
date: 2025-09-22
publisher: ACM
place: Prague, Czech Republic
doi: 10.1145/3705328.3748053
url: https://doi.org/10.1145/3705328.3748053
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Dependencies
- python 3.11-slim build
- flask *
- gensim *
- scikit-learn *
- tqdm *
- transformers ==4.34.0