predict_llm_memorization

Predicting memorization within Large Language Models fine-tuned for classification

https://github.com/orailix/predict_llm_memorization

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Predicting memorization within Large Language Models fine-tuned for classification

Basic Info

Host: GitHub
Owner: orailix
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 5.62 MB

Statistics

Stars: 4
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Contributing License Citation

Predicting memorization within Large Language Models fine-tuned for classification

Jérémie Dentan¹, Davide Buscaldi^{1, 2}, Aymen Shabou³, Sonia Vanier¹

¹LIX (École Polytechnique, IP Paris, CNRS) ²LIPN (Sorbonne Paris Nord) ³Crédit Agricole SA

Presentation of the repository

This repository implements the experiments of our paper "Predicting memorization within Large Language Models fine-tuned for classification", published at ECAI 2025.

Abstract of the paper

Large Language Models have received significant attention due to their abilities to solve a wide range of complex tasks. However these models memorize a significant proportion of their training data, posing a serious threat when disclosed at inference time. To mitigate this unintended memorization, it is crucial to understand what elements are memorized and why. This area of research is largely unexplored, with most existing works providing a posteriori explanations. To address this gap, we propose a new approach to detect memorized samples a priori in LLMs fine-tuned for classification tasks. This method is effective from the early stages of training and readily adaptable to other classification settings, such as training vision models from scratch. Our method is supported by new theoretical results, and requires a low computational budget. We achieve strong empirical results, paving the way for the systematic identification and protection of vulnerable samples before they are memorized.

License and Copyright

Please cite this work as follows:

bibtex @inproceedings{dentan_predicting_2025, title = {Predicting Memorization within Large Language Models Fine-Tuned for Classification}, author = {Dentan, Jérémie and Buscaldi, Davide and Shabou, Aymen and Vanier, Sonia}, booktitle = {Proceedings of the 28th European Conference on Artificial Intelligence (ECAI 2025)}, year = {2025}, note = {To appear}, url = {https://arxiv.org/abs/2409.18858} }

Reproducing the CIFAR-10 part of our paper

This repository contains the source code needed to reproduce our results, except the experiments with CIFAR-10 dataset. For these experiments, we provide another repository containing the corresponding source code: https://github.com/orailix/predictllmmemorization_cifar10

Overview of the repository

The repository contains three main directories

grokking_llm contains Python source code for the experiments
scripts contains Bash and Slurm scripts for deployment on an HPC cluster
figures contains notebook to reproduce our figures from the paper

Important notice: the module we developed is called grokking_llm because the original purpose of this project was to study the Grokking phenomenon on LLM.

Main configs: `configs` folder

Appart from the training configs and the deployment configs (see below), two config files are necessary:
- main.cfg: To declare where the HuggingFace cache should be stored (for deploment on an offline HPC cluster, for example), as well as the paths where ouputs and logs should be stored.
- env_vars.cfg: Optionally, to declare environment variables. For example on a HPC cluster with shared CPUS, you might have to use variable OMP_NUM_THREADS to make sure that default libraries do not use too many threads compared to what is really available.

Python source code: `grokking_llm` folder

Module `grokking_llm.utils`

training_cfg.py Every training config is mapped to an instance of this class. The instance is associated to an alphanumeric hash (the config_id), and all output associated to this training config will be stored in outputs/individual/<config_id>. You can use TrainingCfg.autoconfig to retrieve any config that was already created.
deployment_cfg.py A deployment config describes the procedure to train models with many training configs. For example, we use deployment config to vary the random split of the dataset between 0 and 99 to train shadow models. Similarly, every deployment config is associated with a deployment_id and its outputs stored in outputs/deployment/<deployment_id>

Module `grokking_llm.training`

Contains the scripts needed to train models and manage datasets

Module `grokking_llm.measures_dyn` and `grokking_llm.measures_stat`

In appendix A of the paper, we explain the difference between local and global measures of memorization. In this paper, we use the terms dynamic and static to refer to these concepts, respectively.
grokking_llm.measures_dyn contains scripts for the local measures, i.e. the ones aligned with our threat model: a practitioners willing to audit a fixed model trained on a fixed dataset.
grokking_llm.measures_stat contains scripts for the global measures, i.e. the ones not aligned with our threat model: we obtain average vulnerability metrics of a population of models trained on random splits of a dataset.

Figures: `figures` folder

01_main_figures.ipynb: code used for the main figures of the paper
01_compare_memorization.ipynb: code used for figure 6 in the appendix

Deployment: `scripts` folder

We provide our Bash and Slurm scripts for deplyment on an HPC cluster. We used Jean-Zay HPC cluster from IDRIS. We used some Nvidia A100 80G GPUs and Intel Xeon 6248 CPUs with 40 cores. The training took between 3 and 10 hours on a single GPU. Overall, our experiments are equivalent to around 5000 hours of single GPU and 4000 hours of single-core CPU.

arc_mistral: Deployment scripts for a Mistral 7B model [1] trained on ARC dataset [2].
ethics_mistral: Deployment scripts for a Mistral 7B model [1] trained on ETHICS dataset [3].
mmlu_mistral: Deployment scripts for a Mistral 7B model [1] trained on MMLU dataset [4].
mmlu_llama: Deployment scripts for a Llama 2 7B model [5] trained on MMLU dataset [4].
mmlu_gemma: Deployment scripts for a Gemma 7B model [6] trained on MMLU dataset [4].

References

[1] Albert Q. Jiang et al. Mistral 7B, October 2023. http://arxiv.org/abs/2310.06825
[2] Michael Boratko et al. Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset. In Proceedings of the Workshop on Machine Reading for Question Answering, 2018. http://aclweb.org/anthology/W18-2607
[3] Dan Hendrycks et al. Aligning AI With Shared Human Values. In ICLR, 2021. https://openreview.net/forum?id=dNy_RKzJacY
[4] Dan Hendrycks et al. Measuring Massive Multitask Language Understanding. In ICLR, 2021. https://openreview.net/forum?id=d7KBjmI3GmQ
[5] Hugo Touvron et al. LLaMA: Open and Efficient Foundation Language Models, February 2023. https://arxiv.org/abs/2302.13971
[6] Gemma Team et al. Gemma: Open Models Based on Gem- ini Research and Technology, April 2024. http://arxiv.org/abs/2403.08295

Acknowledgements

This work received financial support from Crédit Agricole SA through the research chair “Trustworthy and responsible AI” with École Polytechnique. This work was performed using HPC resources from GENCI-IDRIS 2023-AD011014843. We thank Arnaud Grivet Sébert and Mohamed Dhouib for discussions on this paper.

Owner

Name: ORAILIX
Login: orailix
Kind: organization
Location: France

Repositories: 1
Profile: https://github.com/orailix

Research team focusing on Operation Research and Artificial Intelligence at LIX (Comuter Science lab of École Polytechnique, Paris)

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Predicting memorization within Large Language Models
  fine-tuned for classification
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Jérémie
    family-names: Dentan
    orcid: 'https://orcid.org/0009-0001-5561-8030'
  - given-names: Davide
    family-names: Buscaldi
  - given-names: Aymen
    family-names: Shabou
  - given-names: Sonia
    family-names: Vanier
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2409.18858'
repository-code: 'https://github.com/orailix/predict_llm_memorization'
license: Apache-2.0

GitHub Events

Total

Watch event: 3
Push event: 3

Last Year

Watch event: 3
Push event: 3

Dependencies

.github/workflows/ci.yaml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/dev_requirements.txt pypi

detect-secrets ==1.5.0 development
ipywidgets ==8.1.1 development
notebook ==7.0.6 development
pre-commit ==3.5.0 development
pytest ==7.4.3 development
pytest-cov ==4.1.0 development

requirements.txt pypi

accelerate ==0.29.3
datasets ==2.15.0
huggingface_hub ==0.19.4
joblib ==1.3.2
loguru >=0.7.2
matplotlib ==3.7.4
pandas ==2.0.3
peft ==0.7.1
safetensors ==0.4.1
scikit-learn ==1.3.2
torch ==2.1.0
tqdm ==4.66.1
transformers ==4.40.0
typer ==0.9.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

predict_llm_memorization

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Predicting memorization within Large Language Models fine-tuned for classification

Presentation of the repository

Abstract of the paper

License and Copyright

Reproducing the CIFAR-10 part of our paper

Overview of the repository

Main configs: `configs` folder

Python source code: `grokking_llm` folder

Module `grokking_llm.utils`

Module `grokking_llm.training`

Module `grokking_llm.measures_dyn` and `grokking_llm.measures_stat`

Figures: `figures` folder

Deployment: `scripts` folder

References

Acknowledgements

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies

predict_llm_memorization

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Predicting memorization within Large Language Models fine-tuned for classification

Presentation of the repository

Abstract of the paper

License and Copyright

Reproducing the CIFAR-10 part of our paper

Overview of the repository

Main configs: configs folder

Python source code: grokking_llm folder

Module grokking_llm.utils

Module grokking_llm.training

Module grokking_llm.measures_dyn and grokking_llm.measures_stat

Figures: figures folder

Deployment: scripts folder

References

Acknowledgements

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies

Main configs: `configs` folder

Python source code: `grokking_llm` folder

Module `grokking_llm.utils`

Module `grokking_llm.training`

Module `grokking_llm.measures_dyn` and `grokking_llm.measures_stat`

Figures: `figures` folder

Deployment: `scripts` folder