dp-rag

A simple implementation of DP-RAG

https://github.com/sarus-tech/dp-rag

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

A simple implementation of DP-RAG

Basic Info
  • Host: GitHub
  • Owner: sarus-tech
  • License: apache-2.0
  • Language: TeX
  • Default Branch: main
  • Size: 7.74 MB
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Twitter Follow arXiv

What is Sarus DP-RAG?

This is a simple implementation of the popular RAG technique with differential privacy guarantees.

DP-RAG addresses privacy concerns in RAG systems by using DP to aggregate information from multiple documents, thereby preventing the inadvertent disclosure of sensitive data. The core innovation involves a novel token-by-token aggregation technique and a DP-based document retrieval method.

The technical report presents empirical results demonstrating DP-RAG's effectiveness, particularly when sufficient documents provide the necessary information. The repo also contains the code to evaluate the system on synthetic medical data.

Quick Start

On a computer with a GPU and CUDA installed, clone thie repository:

sh git clone git@github.com:sarus-tech/dp-rag.git

Then cd to this folder, type uv venv and activate the virtualenv with source .venv/bin/activate.

You can then install the packages with uv sync and run the test script: python test_dp_rag.py.

Technical Report

A report with the technical details and benchmark results is available there: RAG with Differential Privacy. bibtex @misc{grislain2024ragdifferentialprivacy, title={RAG with Differential Privacy}, author={Nicolas Grislain}, year={2024}, eprint={2412.19291}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2412.19291}, }

Owner

  • Name: Sarus Technologies
  • Login: sarus-tech
  • Kind: organization
  • Location: Paris, France

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Sarus DP-RAG
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Nicolas
    family-names: Grislain
    email: nicolas.grislain@ens-lyon.org
    affiliation: Sarus Technologies
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2412.19291'
    description: Technical Report
repository-code: 'https://github.com/sarus-tech/dp-rag'
abstract: >-
  Sarus DP-RAG is a simple implementation of the popular RAG
  technique with differential privacy guarantees.


  DP-RAG addresses privacy concerns in RAG systems by using
  DP to aggregate information from multiple documents,
  thereby preventing the inadvertent disclosure of sensitive
  data. The core innovation involves a novel token-by-token
  aggregation technique and a DP-based document retrieval
  method.
keywords:
  - RAG
  - Differential Privacy
  - AI
license: Apache-2.0

GitHub Events

Total
  • Watch event: 11
  • Push event: 54
  • Fork event: 2
  • Create event: 2
Last Year
  • Watch event: 11
  • Push event: 54
  • Fork event: 2
  • Create event: 2

Dependencies

pyproject.toml pypi
  • accelerate ~=1.0
  • bitsandbytes >=0.44.1
  • datasets >=3.1.0
  • dp-accounting >=0.4.4
  • faker >=30.8.2
  • huggingface_hub ~=0.26
  • numpy ~=1.21
  • protobuf >=5.28.3
  • sentencepiece >=0.2.0
  • termcolor >=2.5.0
  • torch ~=2.4.0
  • transformers ~=4.0