https://github.com/cohere-labs-community/goodtriever

Code for "Goodtriever: Toxicity Mitigation with Retrieval-augmented Language Models"

https://github.com/cohere-labs-community/goodtriever

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Code for "Goodtriever: Toxicity Mitigation with Retrieval-augmented Language Models"

Basic Info
  • Host: GitHub
  • Owner: Cohere-Labs-Community
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 9.96 MB
Statistics
  • Stars: 23
  • Watchers: 8
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme

README.md

Model Safety with Retrieval-Augmented Language Models

Code for "Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models".

[Paper][Data]

Checkout to branch "from-one-to-many" to access the code for "From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models".

Goodtriever

The kNN-LM section of the code is largely based on https://github.com/neulab/knn-transformers, and the DExperts reimplementation is based on the original repo.

Currently we support base models from HuggingFace's Transformers library in the PyTorch framework.

Setup

Run the following to create the environment. Packages will be installed as well. bash conda env create -f environment.yml conda activate model_safety

Download data and model generations

Results and Datasets to build datastores and models generations are available in our HuggingFace dataset repo. First, clone our repo excluding big files:

bash git lfs install GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/luizapzbn/goodtriever-data

In order to download datasets locally run:

bash git lfs pull --include=data

If you'd like to check the results of our experiments, run (~10 GBs):

bash git lfs pull --include=outputs

Usage

In order to use Goodtriever, you need the toxic/non-toxic datastores. Here are the files for the jigsaw datastore used in the main experiments section. Next, you can build the datastores for your model.

Save/Train datastores

Change train_file, output_dir and dstore_dir to match your data.

```bash

Save datastores to disk

python -u -m generation.knntransformers.runclm \ --modelnameorpath gpt2-large \ --evalsubset train \ --trainfile goodtriever-data/data/jigsaw/toxicitygte0.5clean.json \ --outputdir checkpoints/gpt2-largetoxic \ --dstoredir checkpoints/gpt2-largetoxic \ --saveknnlmdstore \ --doeval ```

If you'd like to limit the size of the datastore to 100,000 tokens, for example, you should add --limit_eval_to_dstore --dstore_size 100000. An example of usage can be found in experiments/datastore_size_experiment.py.

```bash

Train index

python -u -m generation.knntransformers.runclm \ --modelnameorpath gpt2-large \ --evalsubset train \ --trainfile goodtriever-data/data/jigsaw/toxicitygte0.5clean.json \ --outputdir checkpoints/gpt2-largetoxic \ --dstoredir checkpoints/gpt2-largetoxic \ --buildindex ```

Goodtriever experiments

Once you have both (or only the toxic) datastores trained, you can generate completions to prompts and evaluate for perplexity, toxicity and diversity. For toxicity evaluation, you'll need to export your Perspective API key:

bash export PERSPECTIVE_API_KEY=$API_KEY

Then, the following command will take care of all three steps: generation, scoring and evaluation. Default generation arguments are found in generation/args.py.

bash python -m scripts.run_all \ --output_folder outputs/goodtriever-large \ --prompts_path goodtriever-data/data/nontoxic_prompts-10k.jsonl \ --model_name gpt2-large \ --perplexity_model gpt2-xl \ --perspective_rate_limit 30 \ --batch_size 4 \ --knn True \ --knn_temp 100 \ --lmbda 2.0 \ --dstore_dir checkpoints/gpt2-large_toxic \ --other_dstore_dir checkpoints/gpt2-large_nontoxic

In the output folder you'll have the files: _generations.jsonl (25 generations per prompt), _perspective.jsonl (toxicity scores for each generation), _collated.jsonl (joint prompts, continuations and their toxicity scores), and the metrics _perplexity.csv, _toxicity.csv, _diversity.csv.

Other parameters/setups you may want: - Limit number of prompts for evaluation: --num_prompts 100 - To have more precise computation of distances of neighbors (good to use when datastore is too small): --recompute_dists True - To change order of dstores in the ensemble equation: --ensemble_order add,subtract - To change the top-p filtering before ensemble: --filter_p 0.8 - If you want to use a single datastore just use the --dstore_dir parameter. - If you want to evaluate the raw model: run the command above until the line --perspective_rate_limit - If you want to debug your generations and sentences being retrieved add: --debug True

To run the experiment with the base model only, set --knn False.

DExperts experiments

To run the evaluation code with the DExperts model, just change --knn to --dexperts. The --dstore_dir and --other_dstore_dir parameters point to the anti-expert and expert models, respectively. For example:

bash python -m scripts.run_all \ --output_folder outputs/dexperts-large \ --prompts_path goodtriever-data/data/nontoxic_prompts-10k.jsonl \ --model_name gpt2-large \ --batch_size 4 \ --lmbda 2.0 \ --perspective_rate_limit 30 \ --dstore_dir models/experts/toxicity/large/finetuned_gpt2_toxic \ # Anti-expert --other_dstore_dir models/experts/toxicity/large/finetuned_gpt2_nontoxic \ # Expert --dexperts True

Finetune anti-expert and expert models

Just like in the original DExperts repo. Change parameters/datasets in the file scripts/finetuning/finetune_toxicity_experts.sh and then run:

bash bash scripts/finetuning/finetune_toxicity_experts.sh

Ablation and other experiments

Ablation experiments (dstore size, alpha vs. temperature, etc.) have a ready-to-run script in the experiments folder. To run continual learning experiments, you should run:

bash python -m experiments.continual_learning.cl_experiment \ --rate_limit 30 \ --kind knn \ --prompts_path goodtriever-data/data/continual_mitigation/prompts/wilds_5_clusters_200_samples_toxic.jsonl \ --experiment_name continual_mitigation/clustered/toxic_adaptation/12345 \ --train_folder goodtriever-data/data/continual_mitigation/domains/clustered/toxic \ --batch_size 1 \ --group_toxicity_by cluster \ --toxicity_choices toxic,nontoxic \ --domains 1,2,3,4,5 \ --pretrained_nontoxic checkpoints/continual_learning/gpt2-large_wilds_non-toxic

In this command, as in the paper results, we'll have a fixed non-toxic datastore (previously trained) and data is continuously added to the toxic datastore. You can vary domain order by the --domain flag. Domains are extracted from filename patterns and the code currently supports the pattern of wilds_*_toxic.jsonl. To run multitask finetuning experiments, add the --multitask True flag.

References

The kNN-LM section of the code is largely based on https://github.com/neulab/knn-transformers, and the DExperts reimplementation is based on the original repo.

Citation

Owner

  • Name: Cohere Labs Community
  • Login: Cohere-Labs-Community
  • Kind: organization
  • Email: info@for.ai
  • Location: Toronto, Canada

Cohere Labs is Cohere's non-profit research lab that seeks to solve complex ML problems and are focused on creating more points of entry to the field.

GitHub Events

Total
  • Fork event: 1
Last Year
  • Fork event: 1

Dependencies

environment.yml pypi
  • pip-tools *
requirements.in pypi
  • altair *
  • black *
  • datasets *
  • faiss-gpu ==1.7.2
  • fire *
  • flake8 *
  • fsspec *
  • gcsfs *
  • hdbscan *
  • ipywidgets *
  • isort *
  • joblib *
  • matplotlib *
  • nomic *
  • pandarallel *
  • pandas *
  • psutil *
  • pytest *
  • scipy *
  • seaborn *
  • sentence_transformers *
  • spacy *
  • tensorboard *
  • torch >=1.9.0
  • tqdm *
  • transformers *
  • umap-learn *
  • wordcloud *
requirements.txt pypi
  • 170 dependencies