dp-opt

[ICLR'24 Spotlight] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

https://github.com/vita-group/dp-opt

Science Score: 28.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Keywords

chatgpt gradio llama llm privacy prompt-engineering prompt-tuning transfer-learning vicuna

Last synced: 9 months ago · JSON representation ·

Repository

[ICLR'24 Spotlight] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Basic Info

Host: GitHub
Owner: VITA-Group
License: mit
Language: Python
Default Branch: main
Homepage: https://jyhong.gitlab.io/publication/2023dp_opt/
Size: 93.8 KB

Statistics

Stars: 39
Watchers: 12
Forks: 10
Open Issues: 2
Releases: 0

Topics

chatgpt gradio llama llm privacy prompt-engineering prompt-tuning transfer-learning vicuna

Created over 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Official PyTorch Code for Paper: "DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer" Junyuan Hong, Jiachen T. Wang, Chenhui Zhang, Zhangheng Li, Bo Li, Zhangyang Wang, ICLR (Spotlight, top-5%) 2024.

paper / code / blog

TL;DR: We proposed the first end-to-end privacy-preserving automatic prompt engineering method.

Overview

featured

Large Language Models (LLMs) have emerged as dominant tools for various tasks, particularly when tailored for a specific target by prompt tuning. Nevertheless, concerns surrounding data privacy present obstacles due to the tuned prompts' dependency on sensitive private information. A practical solution is to host a local LLM and optimize a soft prompt privately using data. Yet, hosting a local model becomes problematic when model ownership is protected. Alternative methods, like sending data to the model’s provider for training, intensify these privacy issues facing an untrusted provider. In this paper, we present a novel solution called Differentially-Private Offsite Prompt Tuning (DP-OPT) to address this challenge. Our approach involves tuning a discrete prompt on the client side and then applying it to the desired cloud models. We demonstrate that prompts suggested by LLMs themselves can be transferred without compromising performance significantly. To ensure that the prompts do not leak private information, we introduce the first private prompt generation mechanism, by a differentially-private (DP) ensemble of in-context learning with private demonstrations. With DP-OPT, generating privacy-preserving prompts by Vicuna-7b can yield competitive performance compared to non-private in-context learning on GPT3.5 or local private prompt tuning.

Get Started

Prepare conda env. ```shell conda create --name dp-opt python=3.8 -y conda activate dp-opt pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install transformers datasets accelerate sentencepiece scikit-learn wandb autodp

transformers==4.28.1

```

Prepare DLN datasets shell bash setup_data.sh

To use openai models, create openai_config.py in the root folder. This will be only used for evaluation. ```python import openai

openai.api_key = ""

openai.organization = ""

openai.apibase = "https://api.openai.com/v1" openaimodel_types = ['text-davinci-003'] ```

:warning: Warning: Setting echo and logprobs simultaneously is no longer supported for certain OpenAI models. However, classification inference with openai models requires both settings. Consider to host your own models, e.g., thru vLLM, instead.

Example: Do prompt engineer on website: ```shell pip install gradio python web_demo.py

open http://127.0.0.1:7860

```

| Train | Test | | :---- | :--- | | | |

Example: Use local model (lmsys/vicuna-7b-v1.3) to generate a instruction and test the instruction by OpenAI model (text-davinci-003). * OPT: ```shell

generate a instruction

python trainopt.py --apemode=iidibwd --ensemblegen=True --gentemp=1.1 --numprompt=40 --maxnewtokens=50 \ --data=sst2 --holdout_ratio=0.01

evaluate the instruction

python evalopt.py --apemode=iidibwd --ensemblegen=True --gentemp=1.1 --numprompt=40 --maxnewtokens=50 \ --data=sst2 \ --test_model=text-davinci-003 * DP-OPT:shell

generate a instruction

python trainopt.py --apemode=iidibwd --ensemblegen=True --gentemp=1.1 --numprompt=40 --maxnewtokens=50 \ --data=sst2 --holdoutratio=0.01 \ --targeteps=8. --dpeps=1.8 --dpdelta=5e-7 --tokenwise_gen=True

evaluate the instruction

python evalopt.py --apemode=iidibwd --ensemblegen=True --gentemp=1.1 --numprompt=40 --maxnewtokens=50 \ --data=sst2 \ --targeteps=8. --dpeps=1.8 --dpdelta=5e-7 --tokenwisegen=True \ --test_model=text-davinci-003 ```

Experiments

Wandb sweeps files are under sweeps/<data_name>/<method>.yml. sweeps/<data_name>/<method>.yml is used for tuning prompts. We use sweeps/<data_name>/<method>_test.yml to test prompts on different models.

Supported datasets: sst2, trec, mpqa, disaster.

Methods (exmaplified on sst2): * 5-shot In-Context Learning (ICL) shell wandb sweep sweeps/sst2/icl.yml * Deep Language Network with One-layer (DLN-1) shell wandb sweep sweeps/sst2/dln1.yml wandb sweep sweeps/sst2/dln1_test.yml * Offsite Prompt Tuning (OPT) shell wandb sweep sweeps/sst2/opt.yml wandb sweep sweeps/sst2/opt_test.yml * Differentially-Private Offsite Prompt Tuning (DP-OPT) ```shell wandb sweep sweeps/sst2/dp-opt.yml wandb sweep sweeps/sst2/dp-opt_test.yml

```

Part of the codes are based on deep-language-networks.

Owner

Name: VITA
Login: VITA-Group
Kind: organization

Website: https://vita-group.github.io
Repositories: 75
Profile: https://github.com/VITA-Group

Visual Informatics Group @ University of Texas at Austin

Citation (CITATION.bib)

@inproceedings{hong2022efficient,
  title={DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer},
  author={Hong, Junyuan and Wang, Jiachen T. and Zhang, Chenhui and Li, Zhangheng and Li, Bo and Wang, Zhangyang},
  booktitle={ICLR},
  year={2024}
}

GitHub Events

Total

Issues event: 4
Watch event: 14
Issue comment event: 2
Fork event: 1

Last Year

Issues event: 4
Watch event: 14
Issue comment event: 2
Fork event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 4
Total pull requests: 1
Average time to close issues: about 2 months
Average time to close pull requests: N/A
Total issue authors: 4
Total pull request authors: 1
Average comments per issue: 1.25
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 1
Average time to close issues: about 2 months
Average time to close pull requests: N/A
Issue authors: 4
Pull request authors: 1
Average comments per issue: 1.25
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

dp-opt

Science Score: 28.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Overview

Get Started

transformers==4.28.1

openai.organization = ""

open http://127.0.0.1:7860

generate a instruction

evaluate the instruction

generate a instruction

evaluate the instruction

Experiments

```

Owner

Citation (CITATION.bib)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels