https://github.com/alignmentresearch/scaling-poisoning

https://github.com/alignmentresearch/scaling-poisoning

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: AlignmentResearch
  • Language: Python
  • Default Branch: main
  • Size: 25.1 MB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 0
  • Open Issues: 2
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

Scaling laws for data poisoning

Setup

To work locally, I recommend using VS Code with the devcontainer extension.

See this guide to start working on the cluster. Note that the devcontainer is not set up to launch cluster jobs (which requires kubectl).

Then, create a .env file with the following variables:

TESTING=True

OpenAI API

If using the OpenAI API (e.g., the StrongREJECT evaluator):

  1. Create an OpenAI API key in the appropriate organization.
  2. Add the API key to the .env file. The file should now be

TESTING=True OPENAI_API_KEY=<your-openai-api-key>

  1. Add the API key as a Kubernetes secret

$ kubectl create secret generic openai-api-key --from-literal=OPENAI_API_KEY=<your-openai-api-key>

Gated models (including Llama)

If using a gated model (like Llama):

  1. Request access to the gated model. Llama granted me (Dillon) access almost immediately.
  2. Create a HuggingFace token.
  3. To use locally, add the token to the .env file. The file should now be

... HF_TOKEN=<your-huggingface-token>

  1. Add the token as a Kubernetes secret

$ kubectl create secret generic huggingface --from-literal=token=<your-huggingface-token>

Test locally

Run a test on your local machine. The test should be light-weight enough to run in ~30-60 seconds on a CPU.

bash $ python train.py

Run an experiment

An experiment consists of a set of runs spread across one or more cluster nodes.

Environment

Because the devcontainer is not set up to launch cluster jobs, you need to create a virtual environment and install the requirements outside the container. Install the requirements with:

bash $ pip3 install ".[dev]"

Build and push the container

If you make changes to the source code required to run the experiment, you need to re-build and push the container

TODO: change the container name.

bash $ docker build -t ghcr.io/dsbowen/python-test . $ docker push ghcr.io/dsbowen/python-test

Run

See the experiment configuration yaml files

bash $ python experiments/<initials>/<experiment-file>.py --dry-run

for example,

bash $ python experiments/test/test_000.py --dry-run

Launch the experiment

bash $ python experiments/<initials>/<experiment-file>.py

View logs

See running jobs

bash $ kubectl get pods

The job names are the pod names minus the random string of characters at the end.

See logs for a given job

bash $ kubectl logs -f jobs/<job-name>

Clean up

bash $ kubectl delete jobs -l launch-id=$(cat launch-id.txt)

Add evaluation metrics

  1. Create a callback by subclassing src.callbacks.MetricLoggerCallback and filling in its evaluate method. See src.callbacks.StringMatchingForRefusal and src.callbacks.Bias for examples.
  2. Add it to the list of callbacks in src.__init__.

TODO: Callbacks should be a training argument.

Add datasets

To add a dataset for fine-tuning, write a function in src.datasets. This should return a datasets.Dataset where each element is a dictionary with a "content" key mapping to the text which you want to use for fine-tuning. See src.datasets for examples.

Configure an experiment

See experiments/db/db_000_bias.py for examples.

Configure W&B

To configure Weights and Biases locally, simply run wandb login. To configure Weights and Biases on the cluster:

  1. Make sure you've been added to the scaling-poisoning group on W&B.
  2. Find your API key under wandb.ai/settings > Danger Zone > API keys.
  3. Run kubectl create secret generic wandb-secret --from-literal=WANDB_API_KEY='YOUR_API_KEY_HERE'.

Fine-tuning GPT

Make sure the OPENAI_API_KEY environment variable is set.

  1. Create the fine-tuning datasets with make openai_dataset
  2. Launch fine-tuning jobs with make openai_fine_tune
  3. Check the status of the jobs with make openai_check_jobs
  4. Once all of the jobs have succeeded, run evaluations with make opena_evaluate

This stores the results in openai/eval.csv.

Owner

  • Name: FAR AI
  • Login: AlignmentResearch
  • Kind: organization
  • Email: hello@far.ai

FAR AI is an alignment research non-profit working to ensure AI systems are trustworthy and beneficial to society.

GitHub Events

Total
  • Watch event: 10
  • Delete event: 1
  • Pull request event: 1
  • Fork event: 3
  • Create event: 1
Last Year
  • Watch event: 10
  • Delete event: 1
  • Pull request event: 1
  • Fork event: 3
  • Create event: 1

Dependencies

Dockerfile docker
  • pytorch/pytorch latest build
pyproject.toml pypi
  • bitsandbytes *
  • datasets *
  • lm-eval *
  • multiple-inference *
  • openai *
  • peft *
  • python-dotenv *
  • sentencepiece *
  • statsmodels *
  • torch *
  • transformers [accelerate]
  • trl *
  • wandb *