bias-memit

Mass-Editing Stereotypical Associations to Mitigate Bias in Language Models

https://github.com/dfki-nlp/bias-memit

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
✓
Institutional organization owner
Organization dfki-nlp has institutional domain (www.dfki.de)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Mass-Editing Stereotypical Associations to Mitigate Bias in Language Models

Basic Info

Host: GitHub
Owner: DFKI-NLP
License: mit
Language: Jupyter Notebook
Default Branch: master
Size: 20.3 MB

Statistics

Stars: 0
Watchers: 4
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

Mass-Editing Stereotypical Associations to Mitigate Bias in Language Models

This repository contains the scripts and data to replicate the experiments of the master thesis "Mass-Editing Stereotypical Associations to Mitigate Bias in Language Models", which has been carried out as as a cooperation between Potsdam University (Department of Linguistics) and "Deutsches Forschungszentrum für künstliche Intelligenz" (DFKI) - "Speech and Language Technology Lab".
The goal of this study is to approach bias mitigation in pre-trained Transformer language models (LMs) as a knowledge update. To this end it employs the "Mass-Editing Memory in a Transformer" (MEMIT) algorithm by Meng et al. (2022). This repository provides four different sets with anti-stereotypical updates from four bias domains (gender, profession, race, religion) in English and German. It currently supports editing three English, two German and one multilingual LM. There are three possibilities to evalute the de-biasing results: an instrinsic evaluation on the StereoSet (Nadeem et al., 2021) bias benchmark dataset, a quantitative analysis of entropy and perplexity and a qualitative assessment of selected examples.

Installation
Causal Tracing
MEMIT for Bias Mitigation
Evaluation
External Sources

Installation

To set up the environment and dependencies for the MEMIT update Meng et al. (2022) provide a shell script, which can be found under scripts. It is recommended to use conda to install Python, CUDA, and PyTorch and pip for all other dependencies. First install conda and run: bash CONDA_HOME=$CONDA_HOME ./scripts/setup_conda.sh

$CONDA_HOME should be the path to your conda installation, e.g., ~/miniconda3.

Causal Tracing

Before running the MEMIT update one needs to identify locations where the weight updates should be applied. The scripts for this step and further instructions on how to run them can be found in the following folder: experiments.

MEMIT for Bias Mitigation

notebooks/memit_debias.ipynb is a simple demonstration to illustrate how MEMIT can be applied to alter stereotypical associations. MEMIT takes as input rewrites of the following form:

python request = [ { "prompt": " All {} are", "subject": "princesses", "target_new": { "str": "ugly" } }, { "prompt": "{} are generally", "subject": "Muslims", "target_new": { "str": "violent" } }, ]

Other examples can be found in the notebook. The scripts for the actual large scale updates can be found in the directory 'memit' with further instructions on how to execute the code.

Evaluation

evaluation contains the scripts and notebooks for the evaluation on StereoSet, a quantiative analysis as well as a notebook for the inspection of generated exmples 'evaluation/experiments/qualitative_evaluation.ipynb'. Detailed instructions and explanations can also be found in the respective directories.

External Sources and Source Code

Causal tracing and MEMIT algorithm:
- Paper: Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. "Locating and Editing Factual Associations in GPT." Advances in Neural Information Processing Systems 36 (2022).
- Code: Meng et al. (2022)
StereoSet:
- Paper: Nadeem, Moin and Bethke, Anna and Reddy, Siva. "StereoSet: Measuring stereotypical bias in pretrained language models". arXiv preprint arXiv:2004.09456 (2020).
- Data: https://huggingface.co/datasets/stereoset; `https://github.com/moinnadeem/StereoSet
- Evaluation scripts:
  - Paper: Meade, Nicholas and Poole-Dayan, Elinor and Reddy, Siva. "An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2022).
  - Code: https://github.com/McGill-NLP/bias-bench/tree/main

Owner

Name: DFKI-NLP
Login: DFKI-NLP
Kind: organization
Location: Berlin, Germany

Website: https://www.dfki.de/en/web/research/research-departments-and-groups/speech-and-language-technology/
Repositories: 64
Profile: https://github.com/DFKI-NLP

Speech and Language Technology (SLT) Group of the Berlin lab of the German Research Center for Artificial Intelligence (DFKI)

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
preferred-citation:
  type: article
  authors:
  - family-names: "Meng"
    given-names: "Kevin"
  - family-names: "Sen Sharma"
    given-names: "Arnab"
  - family-names: "Andonian"
    given-names: "Alex"
  - family-names: "Belinkov"
    given-names: "Yonatan"
  - family-names: "Bau"
    given-names: "David"
  journal: "arXiv preprint arXiv:2210.07229"
  title: "Mass-Editing Memory in a Transformer"
  year: 2022

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

bias-memit

Science Score: 52.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Mass-Editing Stereotypical Associations to Mitigate Bias in Language Models

Table of Contents

Installation

Causal Tracing

MEMIT for Bias Mitigation

Evaluation

External Sources and Source Code

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year