Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Repository
A simple implementation of DP-RAG
Basic Info
- Host: GitHub
- Owner: sarus-tech
- License: apache-2.0
- Language: TeX
- Default Branch: main
- Size: 7.74 MB
Statistics
- Stars: 9
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
What is Sarus DP-RAG?
This is a simple implementation of the popular RAG technique with differential privacy guarantees.
DP-RAG addresses privacy concerns in RAG systems by using DP to aggregate information from multiple documents, thereby preventing the inadvertent disclosure of sensitive data. The core innovation involves a novel token-by-token aggregation technique and a DP-based document retrieval method.
The technical report presents empirical results demonstrating DP-RAG's effectiveness, particularly when sufficient documents provide the necessary information. The repo also contains the code to evaluate the system on synthetic medical data.
Quick Start
On a computer with a GPU and CUDA installed, clone thie repository:
sh
git clone git@github.com:sarus-tech/dp-rag.git
Then cd to this folder, type uv venv and activate the virtualenv with source .venv/bin/activate.
You can then install the packages with uv sync and run the test script: python test_dp_rag.py.
Technical Report
A report with the technical details and benchmark results is available there: RAG with Differential Privacy.
bibtex
@misc{grislain2024ragdifferentialprivacy,
title={RAG with Differential Privacy},
author={Nicolas Grislain},
year={2024},
eprint={2412.19291},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.19291},
}
Owner
- Name: Sarus Technologies
- Login: sarus-tech
- Kind: organization
- Location: Paris, France
- Website: https://sarus.tech
- Twitter: Sarus_tech
- Repositories: 4
- Profile: https://github.com/sarus-tech
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Sarus DP-RAG
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Nicolas
family-names: Grislain
email: nicolas.grislain@ens-lyon.org
affiliation: Sarus Technologies
identifiers:
- type: url
value: 'https://arxiv.org/abs/2412.19291'
description: Technical Report
repository-code: 'https://github.com/sarus-tech/dp-rag'
abstract: >-
Sarus DP-RAG is a simple implementation of the popular RAG
technique with differential privacy guarantees.
DP-RAG addresses privacy concerns in RAG systems by using
DP to aggregate information from multiple documents,
thereby preventing the inadvertent disclosure of sensitive
data. The core innovation involves a novel token-by-token
aggregation technique and a DP-based document retrieval
method.
keywords:
- RAG
- Differential Privacy
- AI
license: Apache-2.0
GitHub Events
Total
- Watch event: 11
- Push event: 54
- Fork event: 2
- Create event: 2
Last Year
- Watch event: 11
- Push event: 54
- Fork event: 2
- Create event: 2
Dependencies
- accelerate ~=1.0
- bitsandbytes >=0.44.1
- datasets >=3.1.0
- dp-accounting >=0.4.4
- faker >=30.8.2
- huggingface_hub ~=0.26
- numpy ~=1.21
- protobuf >=5.28.3
- sentencepiece >=0.2.0
- termcolor >=2.5.0
- torch ~=2.4.0
- transformers ~=4.0