shield
[EMNLP 2024] SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
[EMNLP 2024] SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation
Basic Info
- Host: GitHub
- Owner: xz-liu
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://arxiv.org/abs/2406.12975
- Size: 513 KB
Statistics
- Stars: 8
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation
Large Language Models (LLMs) have transformed machine learning but raised significant legal concerns due to their potential to produce text that infringes on copyrights, resulting in several high-profile lawsuits. The legal landscape is struggling to keep pace with these rapid advancements, with ongoing debates about whether generated text might plagiarize copyrighted materials. Current LLMs may infringe on copyrights or overly restrict non-copyrighted texts, leading to these challenges: (i) the need for a comprehensive evaluation benchmark to assess copyright compliance from multiple aspects; (ii) evaluating robustness against safeguard bypassing attacks; and (iii) developing effective defenses targeted against the generation of copyrighted text. To tackle these challenges, we introduce a curated dataset to evaluate methods, test attack strategies, and propose lightweight, real-time defenses to prevent the generation of copyrighted text, ensuring the safe and lawful use of LLMs. Our experiments demonstrate that current LLMs frequently output copyrighted text, and that jailbreaking attacks can significantly increase the volume of copyrighted output. Our proposed defense mechanisms significantly reduce the volume of copyrighted text generated by LLMs by effectively refusing malicious requests.
Dataset
The BS-NC and BEP datasets are made public in this repository. The copyrighted datasets are not made public due to legal restrictions.
The jailbreak templates are collected from Liu et al's Google Docs. Please refer to their paper for details.
Update on dataset
We have received numerous inquiries regarding the sharing of copyrighted datasets. However, we are not aware of any precedent for publicly releasing such datasets in North America. To mitigate any potential legal risks, we will be consulting with the relevant university departments. This process is expected to be lengthy, but if you would like to stay informed, please feel free to send me an email.
Setup Environment
We provide a requirements.txt file to install the required dependencies.
angular2html
pip install -r requirements.txt
Setup API Key
Please refer to the respective API documentation to get the API key. Once you have the API key, please set it in the environment variable.
To allow agent search, please set the following ppplx API key.
angular2html
export PPLX_API_KEY=<API_KEY>
For Claude and Gemini, please set
angular2html
export ANTHROPIC_API_KEY=<API_KEY>
export GOOGLE_API_KEY=<API_KEY>
For the OpenAI API key, please set the organization key as well.
angular2html
export OPENAI_API_KEY=<API_KEY>
export OPENAI_ORGANIZATION=<API_KEY>
To access the gated models on huggingface, please login with huggingface-cli or set HF_TOKEN.
Run
For open-source models, use the following command to run the code.
angular2html
python main.py --max_dataset_num 100 --batch_size <BATCH_SIZE> --dtype fp16 --defense_type <DEFENSE_TYPE> --prompt_type <PROMPT_TYPE> --jailbreak_num -1 --hf_model_id <HF_MODEL_ID> --dataset <DATASET> --jailbreak <JAILBREAK>
For API-based models, use the following command to run the code.
angular2html
python main.py --max_dataset_num 100 --batch_size 1 --defense_type <DEFENSE_TYPE> --prompt_type <PROMPT_TYPE> --jailbreak_num -1 --api_model yes --api_model_name <API_MODEL_NAME> --api_model_sleep_time <API_MODEL_SLEEP_TIME> --dataset <DATASET> --jailbreak <JAILBREAK>
Explanation of the arguments:
| Argument | Explanation | | --- | --- | | maxdatasetnum | The number of samples to evaluate. Set to 100 for all titles to be evaluated. | | batchsize | The batch size for the model. Please adjust according to the GPU memory. | | dtype | The data type for the model. FP16 or BF16, default is FP16. | | temperature | The temperature of the model. The default is 0. | | defensetype | The defense type to be used. Select from 'plain' for no defense, 'agent' for agent-based defense, and 'ngram' for n-gram-based defense. | | prompttype | The prompt type to be used. Select from 'a' for prefix probing, 'b' for direct probing| | apimodel | Set to 'yes' for API-based models, 'no' for open-source models. | | hfmodelid | The Hugging Face model ID is to be used for open-source models. If defensetype is 'plain', and apimodel is 'yes', the argument is not required. If defensetype is 'agent' or 'ngram', the argument is required. The model ID is also used as the tokenizer of n-gram defense. | | apimodelname | The API model name to be used for API-based models. | | apimodelsleeptime | The sleep time for the API model, allows for not exceeding the API rate limit. | | dataset | The dataset to be used. Select from 'bsnc' and 'bep'. | | jailbreak | Set to 'general' for jailbreak, 'no' for no jailbreak. | | jailbreaknum | The number of jailbreaks to be used. Set to -1 for all jailbreaks. | | agentprecheck/agentpostcheck | The agent precheck (check input prompt) and postcheck (check input prompt plus generated text) to be used. Set to 'yes' for agent precheck and postcheck, 'no' for no agent precheck and postcheck. | overwritecopyright_status | You can disable the copyright status verifier and overwrite all the input as 'C' (copyrighted) or 'P' (public domain). This is used to analysis the effectiveness of the few-shot prompts. In common usage, leave it to default value.|
To see the detailed arguments, you can run the following command.
angular2html
python main.py --help
Citation
Please cite our paper if you find the repository helpful.
@misc{liu2024shieldevaluationdefensestrategies,
title={SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation},
author={Xiaoze Liu and Ting Sun and Tianyang Xu and Feijie Wu and Cunxiang Wang and Xiaoqian Wang and Jing Gao},
year={2024},
eprint={2406.12975},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.12975},
}
Owner
- Name: Xiaoze Liu
- Login: xz-liu
- Kind: user
- Website: https://xz-liu.github.io/
- Repositories: 1
- Profile: https://github.com/xz-liu
ねだるな、勝ち取れ、さすれば与えられん
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you find this project useful in your research or work, please consider citing it."
preferred-citation:
type: article
authors:
- family-names: "Liu"
given-names: "Xiaoze"
- family-names: "Sun"
given-names: "Ting"
- family-names: "Xu"
given-names: "Tianyang"
- family-names: "Wu"
given-names: "Feijie"
- family-names: "Wang"
given-names: "Cunxiang"
- family-names: "Wang"
given-names: "Xiaoqian"
- family-names: "Gao"
given-names: "Jing"
title: "SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation"
year: 2024
journal: "arXiv preprint arXiv:2406.12975"
GitHub Events
Total
- Watch event: 3
- Push event: 1
- Fork event: 1
Last Year
- Watch event: 3
- Push event: 1
- Fork event: 1
Dependencies
- Deprecated *
- dataclasses *
- datasets *
- evaluate *
- google_books_api_wrapper *
- langchain ==0.1.19
- langchain_community ==0.0.38
- langchain_core ==0.1.52
- langchain_google_community ==1.0.3
- langchain_openai ==0.1.6
- matplotlib *
- numpy *
- pandas *
- regex *
- requests *
- rouge_score *
- seaborn *
- torch *
- tqdm *
- transformers *
- vllm *