predict_llm_memorization

Predicting memorization within Large Language Models fine-tuned for classification

https://github.com/orailix/predict_llm_memorization

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.4%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Predicting memorization within Large Language Models fine-tuned for classification

Basic Info
  • Host: GitHub
  • Owner: orailix
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 5.62 MB
Statistics
  • Stars: 4
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Contributing License Citation

README.md

Predicting memorization within Large Language Models fine-tuned for classification

Jérémie Dentan1, Davide Buscaldi1, 2, Aymen Shabou3, Sonia Vanier1

1LIX (École Polytechnique, IP Paris, CNRS) 2LIPN (Sorbonne Paris Nord) 3Crédit Agricole SA

Presentation of the repository

This repository implements the experiments of our paper "Predicting memorization within Large Language Models fine-tuned for classification", published at ECAI 2025.

Abstract of the paper

Large Language Models have received significant attention due to their abilities to solve a wide range of complex tasks. However these models memorize a significant proportion of their training data, posing a serious threat when disclosed at inference time. To mitigate this unintended memorization, it is crucial to understand what elements are memorized and why. This area of research is largely unexplored, with most existing works providing a posteriori explanations. To address this gap, we propose a new approach to detect memorized samples a priori in LLMs fine-tuned for classification tasks. This method is effective from the early stages of training and readily adaptable to other classification settings, such as training vision models from scratch. Our method is supported by new theoretical results, and requires a low computational budget. We achieve strong empirical results, paving the way for the systematic identification and protection of vulnerable samples before they are memorized.

License and Copyright

Copyright 2023-present Laboratoire d'Informatique de Polytechnique. Apache Licence v2.0.

Please cite this work as follows:

bibtex @inproceedings{dentan_predicting_2025, title = {Predicting Memorization within Large Language Models Fine-Tuned for Classification}, author = {Dentan, Jérémie and Buscaldi, Davide and Shabou, Aymen and Vanier, Sonia}, booktitle = {Proceedings of the 28th European Conference on Artificial Intelligence (ECAI 2025)}, year = {2025}, note = {To appear}, url = {https://arxiv.org/abs/2409.18858} }

Reproducing the CIFAR-10 part of our paper

This repository contains the source code needed to reproduce our results, except the experiments with CIFAR-10 dataset. For these experiments, we provide another repository containing the corresponding source code: https://github.com/orailix/predictllmmemorization_cifar10

Overview of the repository

The repository contains three main directories

  • grokking_llm contains Python source code for the experiments
  • scripts contains Bash and Slurm scripts for deployment on an HPC cluster
  • figures contains notebook to reproduce our figures from the paper

Important notice: the module we developed is called grokking_llm because the original purpose of this project was to study the Grokking phenomenon on LLM.

Main configs: configs folder

  • Appart from the training configs and the deployment configs (see below), two config files are necessary:
    • main.cfg: To declare where the HuggingFace cache should be stored (for deploment on an offline HPC cluster, for example), as well as the paths where ouputs and logs should be stored.
    • env_vars.cfg: Optionally, to declare environment variables. For example on a HPC cluster with shared CPUS, you might have to use variable OMP_NUM_THREADS to make sure that default libraries do not use too many threads compared to what is really available.

Python source code: grokking_llm folder

Module grokking_llm.utils

  • training_cfg.py Every training config is mapped to an instance of this class. The instance is associated to an alphanumeric hash (the config_id), and all output associated to this training config will be stored in outputs/individual/<config_id>. You can use TrainingCfg.autoconfig to retrieve any config that was already created.
  • deployment_cfg.py A deployment config describes the procedure to train models with many training configs. For example, we use deployment config to vary the random split of the dataset between 0 and 99 to train shadow models. Similarly, every deployment config is associated with a deployment_id and its outputs stored in outputs/deployment/<deployment_id>

Module grokking_llm.training

  • Contains the scripts needed to train models and manage datasets

Module grokking_llm.measures_dyn and grokking_llm.measures_stat

  • In appendix A of the paper, we explain the difference between local and global measures of memorization. In this paper, we use the terms dynamic and static to refer to these concepts, respectively.
  • grokking_llm.measures_dyn contains scripts for the local measures, i.e. the ones aligned with our threat model: a practitioners willing to audit a fixed model trained on a fixed dataset.
  • grokking_llm.measures_stat contains scripts for the global measures, i.e. the ones not aligned with our threat model: we obtain average vulnerability metrics of a population of models trained on random splits of a dataset.

Figures: figures folder

  • 01_main_figures.ipynb: code used for the main figures of the paper
  • 01_compare_memorization.ipynb: code used for figure 6 in the appendix

Deployment: scripts folder

We provide our Bash and Slurm scripts for deplyment on an HPC cluster. We used Jean-Zay HPC cluster from IDRIS. We used some Nvidia A100 80G GPUs and Intel Xeon 6248 CPUs with 40 cores. The training took between 3 and 10 hours on a single GPU. Overall, our experiments are equivalent to around 5000 hours of single GPU and 4000 hours of single-core CPU.

  • arc_mistral: Deployment scripts for a Mistral 7B model [1] trained on ARC dataset [2].
  • ethics_mistral: Deployment scripts for a Mistral 7B model [1] trained on ETHICS dataset [3].
  • mmlu_mistral: Deployment scripts for a Mistral 7B model [1] trained on MMLU dataset [4].
  • mmlu_llama: Deployment scripts for a Llama 2 7B model [5] trained on MMLU dataset [4].
  • mmlu_gemma: Deployment scripts for a Gemma 7B model [6] trained on MMLU dataset [4].

References

  • [1] Albert Q. Jiang et al. Mistral 7B, October 2023. http://arxiv.org/abs/2310.06825
  • [2] Michael Boratko et al. Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset. In Proceedings of the Workshop on Machine Reading for Question Answering, 2018. http://aclweb.org/anthology/W18-2607
  • [3] Dan Hendrycks et al. Aligning AI With Shared Human Values. In ICLR, 2021. https://openreview.net/forum?id=dNy_RKzJacY
  • [4] Dan Hendrycks et al. Measuring Massive Multitask Language Understanding. In ICLR, 2021. https://openreview.net/forum?id=d7KBjmI3GmQ
  • [5] Hugo Touvron et al. LLaMA: Open and Efficient Foundation Language Models, February 2023. https://arxiv.org/abs/2302.13971
  • [6] Gemma Team et al. Gemma: Open Models Based on Gem- ini Research and Technology, April 2024. http://arxiv.org/abs/2403.08295

Acknowledgements

This work received financial support from Crédit Agricole SA through the research chair “Trustworthy and responsible AI” with École Polytechnique. This work was performed using HPC resources from GENCI-IDRIS 2023-AD011014843. We thank Arnaud Grivet Sébert and Mohamed Dhouib for discussions on this paper.

Owner

  • Name: ORAILIX
  • Login: orailix
  • Kind: organization
  • Location: France

Research team focusing on Operation Research and Artificial Intelligence at LIX (Comuter Science lab of École Polytechnique, Paris)

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Predicting memorization within Large Language Models
  fine-tuned for classification
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Jérémie
    family-names: Dentan
    orcid: 'https://orcid.org/0009-0001-5561-8030'
  - given-names: Davide
    family-names: Buscaldi
  - given-names: Aymen
    family-names: Shabou
  - given-names: Sonia
    family-names: Vanier
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2409.18858'
repository-code: 'https://github.com/orailix/predict_llm_memorization'
license: Apache-2.0

GitHub Events

Total
  • Watch event: 3
  • Push event: 3
Last Year
  • Watch event: 3
  • Push event: 3

Dependencies

.github/workflows/ci.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/dev_requirements.txt pypi
  • detect-secrets ==1.5.0 development
  • ipywidgets ==8.1.1 development
  • notebook ==7.0.6 development
  • pre-commit ==3.5.0 development
  • pytest ==7.4.3 development
  • pytest-cov ==4.1.0 development
requirements.txt pypi
  • accelerate ==0.29.3
  • datasets ==2.15.0
  • huggingface_hub ==0.19.4
  • joblib ==1.3.2
  • loguru >=0.7.2
  • matplotlib ==3.7.4
  • pandas ==2.0.3
  • peft ==0.7.1
  • safetensors ==0.4.1
  • scikit-learn ==1.3.2
  • torch ==2.1.0
  • tqdm ==4.66.1
  • transformers ==4.40.0
  • typer ==0.9.0