detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.

https://github.com/unitaryai/detoxify

Keywords

bert bert-model hate-speech hate-speech-detection hatespeech huggingface huggingface-transformers kaggle-competition nlp pytorch-lightning sentence-classification toxic-comment-classification toxic-comments toxicity toxicity-classification

Last synced: 10 months ago · JSON representation ·

Repository

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.

Basic Info

Host: GitHub
Owner: unitaryai
License: apache-2.0
Language: Python
Default Branch: master
Homepage: https://www.unitary.ai/
Size: 50.8 MB

Statistics

Stars: 1,098
Watchers: 12
Forks: 130
Open Issues: 37
Releases: 13

Topics

bert bert-model hate-speech hate-speech-detection hatespeech huggingface huggingface-transformers kaggle-competition nlp pytorch-lightning sentence-classification toxic-comment-classification toxic-comments toxicity toxicity-classification

Created almost 6 years ago · Last pushed 12 months ago

Metadata Files

Readme Contributing License Citation

README.md

# 🙊 Detoxify ## Toxic Comment Classification with ⚡ Pytorch Lightning and 🤗 Transformers [![PyPI version](https://badge.fury.io/py/detoxify.svg)](https://badge.fury.io/py/detoxify) ![GitHub all releases](https://img.shields.io/github/downloads/unitaryai/detoxify/total) ![CI testing](https://github.com/unitaryai/detoxify/workflows/CI%20testing/badge.svg) ![Lint](https://github.com/unitaryai/detoxify/workflows/Lint/badge.svg)

Examples image

News & Updates

22-10-2021: New improved multilingual model & standardised class names

Updated the multilingual model weights used by Detoxify with a model trained on the translated data from the 2nd Jigsaw challenge (as well as the 1st). This model has also been trained to minimise bias and now returns the same categories as the unbiased model. New best AUC score on the test set: 92.11 (89.71 before).
All detoxify models now return consistent class names (e.g. "identityattack" replaces "identityhate" in the original model to match the unbiased classes).

03-09-2021: New improved unbiased model

Updated the unbiased model weights used by Detoxify with a model trained on both datasets from the first 2 Jigsaw challenges. New best score on the test set: 93.74 (93.64 before).

15-02-2021: Detoxify featured in Scientific American!

Our opinion piece "Can AI identify toxic online content?" is now live on Scientific American

14-01-2021: Lightweight models

Added smaller models trained with Albert for the original and unbiased models! Can access these in the same way with detoxify using original-small and unbiased-small as inputs. The original-small achieved a mean AUC score of 98.28 (98.64 before) and the unbiased-small achieved a final score of 93.36 (93.64 before).

Description

Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.

Built by Laura Hanu at Unitary, where we are working to stop harmful content online by interpreting visual content in context.

Dependencies: - For inference: - 🤗 Transformers - ⚡ Pytorch lightning - For training will also need: - Kaggle API (to download data)

| Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score % | Detoxify Score % |-|-|-|-|-|-|-| | Toxic Comment Classification Challenge | 2018 | build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | original | 98.86 | 98.64 | Jigsaw Unintended Bias in Toxicity Classification | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | unbiased | 94.73 | 93.74 | Jigsaw Multilingual Toxic Comment Classification | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | multilingual | 95.36 | 92.11

It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use.

Multilingual model language breakdown

| Language Subgroup | Subgroup size | Subgroup AUC Score % | |:-----------|----------------:|---------------:| 🇮🇹 it | 8494 | 89.18 | 🇫🇷 fr | 10920 | 89.61 | 🇷🇺 ru | 10948 | 89.81 | 🇵🇹 pt | 11012 | 91.00 | 🇪🇸 es | 8438 | 92.74 | 🇹🇷 tr | 14000 | 97.19 |

Limitations and ethical considerations

If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups.

The intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker.

Some useful resources about the risk of different biases in toxicity or hate speech detection are: - The Risk of Racial Bias in Hate Speech Detection - Automated Hate Speech Detection and the Problem of Offensive Language - Racial Bias in Hate Speech and Abusive Language Detection Datasets

Quick prediction

The multilingual model has been trained on 7 different languages so it should only be tested on: english, french, spanish, italian, portuguese, turkish or russian.

```bash

install detoxify

pip install detoxify

python

from detoxify import Detoxify

each model takes in either a string or a list of strings

results = Detoxify('original').predict('example text')

results = Detoxify('unbiased').predict(['example text 1','example text 2'])

results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])

to specify the device the model will be allocated on (defaults to cpu), accepts any torch.device input

model = Detoxify('original', device='cuda')

optional to display results nicely (will need to pip install pandas)

import pandas as pd

print(pd.DataFrame(results, index=input_text).round(5))

``` For more details check the Prediction section.

Labels

All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema: - Very Toxic (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective) - Toxic (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective) - Hard to Say - Not Toxic

More information about the labelling schema can be found here.

Toxic Comment Classification Challenge

This challenge includes the following labels:

toxic
severe_toxic
obscene
threat
insult
identity_hate

Jigsaw Unintended Bias in Toxicity Classification

This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments.

Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation.

toxicity
severe_toxicity
obscene
threat
insult
identity_attack
sexual_explicit

Identity labels used: - male - female - homosexual_gay_or_lesbian - christian - jewish - muslim - black - white - psychiatric_or_mental_illness

A complete list of all the identity labels available can be found here.

Jigsaw Multilingual Toxic Comment Classification

Since this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on:

toxicity

How to run

First, install dependencies ```bash

clone project

git clone https://github.com/unitaryai/detoxify

create virtual env

python3 -m venv toxic-env source toxic-env/bin/activate

install project

pip install -e detoxify

or for training

pip install -e 'detoxify[dev]'

cd detoxify

```

Prediction

Trained models summary:

For a quick prediction can run the example script on a comment directly or from a txt containing a list of comments. ```bash

load model via torch.hub

python runprediction.py --input 'example' --modelname original

load model from from checkpoint path

python runprediction.py --input 'example' --fromckptpath modelpath

save results to a .csv file

python runprediction.py --input testset.txt --modelname original --saveto results.csv

to see usage

python run_prediction.py --help

```

Checkpoints can be downloaded from the latest release or via the Pytorch hub API with the following names: - toxic_bert - unbiased_toxic_roberta - multilingual_toxic_xlm_r bash model = torch.hub.load('unitaryai/detoxify','toxic_bert')

Importing detoxify in python:

```python

from detoxify import Detoxify

results = Detoxify('original').predict('some text')

results = Detoxify('unbiased').predict(['example text 1','example text 2'])

results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])

to display results nicely

import pandas as pd

print(pd.DataFrame(results,index=input_text).round(5))

```

Training

If you do not already have a Kaggle account: - you need to create one to be able to download the data

go to My Account and click on Create New API Token - this will download a kaggle.json file
make sure this file is located in ~/.kaggle

```bash

create data directory

mkdir jigsawdata cd jigsawdata

download data

kaggle competitions download -c jigsaw-toxic-comment-classification-challenge unzip jigsaw-toxic-comment-classification-challenge.zip -d jigsaw-toxic-comment-classification-challenge find jigsaw-toxic-comment-classification-challenge -name '*.csv.zip' | xargs -n1 unzip -d jigsaw-toxic-comment-classification-challenge

kaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification unzip jigsaw-unintended-bias-in-toxicity-classification.zip -d jigsaw-unintended-bias-in-toxicity-classification

kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification unzip jigsaw-multilingual-toxic-comment-classification.zip -d jigsaw-multilingual-toxic-comment-classification

```

Start Training

### Toxic Comment Classification Challenge

```bash

combine test.csv and test_labels.csv

python preprocessingutils.py --testcsv jigsawdata/jigsaw-toxic-comment-classification-challenge/test.csv --updatetest

python train.py --config configs/Toxiccommentclassification_BERT.json ``` ### Unintended Bias in Toxicicity Challenge

```bash

python train.py --config configs/UnintendedbiastoxiccommentclassificationRoBERTacombined.json

``` ### Multilingual Toxic Comment Classification

The translated data (source 1 source 2) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).

```bash

combine test.csv and test_labels.csv

python preprocessingutils.py --testcsv jigsawdata/jigsaw-multilingual-toxic-comment-classification/test.csv --updatetest

python train.py --config configs/MultilingualtoxiccommentclassificationXLMR.json

```

Monitor progress with tensorboard

```bash

tensorboard --logdir=./saved

```

Model Evaluation

Toxic Comment Classification Challenge

This challenge is evaluated on the mean AUC score of all the labels.

```bash

python evaluate.py --checkpoint saved/lightninglogs/checkpoints/examplecheckpoint.pth --test_csv test.csv

```

Unintended Bias in Toxicicity Challenge

This challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric here.

```bash

python evaluate.py --checkpoint saved/lightninglogs/checkpoints/examplecheckpoint.pth --test_csv test.csv

to get the final bias metric

python modeleval/computebias_metric.py

```

Multilingual Toxic Comment Classification

This challenge is evaluated on the AUC score of the main toxic label.

```bash

python evaluate.py --checkpoint saved/lightninglogs/checkpoints/examplecheckpoint.pth --test_csv test.csv

```

Citation

@misc{Detoxify, title={Detoxify}, author={Hanu, Laura and {Unitary team}}, howpublished={Github. https://github.com/unitaryai/detoxify}, year={2020} }

Owner

Name: Unitary
Login: unitaryai
Kind: organization

Website: https://www.unitary.ai
Twitter: unitaryai
Repositories: 4
Profile: https://github.com/unitaryai

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Hanu"
  given-names: "Laura"
- family-names: "Unitary"
  given-names: "team"
title: "Detoxify"
version: 0.5.1
doi: 10.5281/zenodo.7925667
date-released: 2020-11-11
url: "https://github.com/unitaryai/detoxify"

GitHub Events

Total

Issues event: 3
Watch event: 132
Delete event: 1
Issue comment event: 15
Push event: 24
Pull request review comment event: 2
Pull request review event: 9
Pull request event: 22
Fork event: 23
Create event: 6

Last Year

Issues event: 3
Watch event: 132
Delete event: 1
Issue comment event: 15
Push event: 24
Pull request review comment event: 2
Pull request review event: 9
Pull request event: 22
Fork event: 23
Create event: 6

Committers

Last synced: over 3 years ago

All Time

Total Commits: 199
Total Committers: 11
Avg Commits per committer: 18.091
Development Distribution Score (DDS): 0.296

Top Committers

Name	Email	Commits
Laura	l**0@g**m	140
Laura Hanu	3**u@u**m	19
James Thewlis	j**s@u**i	8
MJ Rossetti	s**2@u**m	8
pre-commit-ci[bot]	6**]@u**m	6
Jirka	j**c@s**z	6
Jirka Borovec	B**a@u**m	4
Laura Hanu	l**u@L**l	4
Anita Vero	a**e@g**m	2
Greg Priday	g**g@s**m	1
Omid Foroughi	f**i@p**e	1

Committer Domains (Top 20 + Academic)

proton.me: 1 siteorigin.com: 1 seznam.cz: 1 unitary.ai: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 65
Total pull requests: 65
Average time to close issues: about 1 month
Average time to close pull requests: about 1 month
Total issue authors: 56
Total pull request authors: 18
Average comments per issue: 2.02
Average comments per pull request: 1.11
Merged pull requests: 49
Bot issues: 0
Bot pull requests: 13

Past Year

Issues: 2
Pull requests: 18
Average time to close issues: about 11 hours
Average time to close pull requests: 10 days
Issue authors: 2
Pull request authors: 6
Average comments per issue: 1.0
Average comments per pull request: 0.5
Merged pull requests: 12
Bot issues: 0
Bot pull requests: 4

View more stats

Top Authors

Issue Authors

laurahanu (4)
jamt9000 (3)
grecosalvatore (2)
SallyBean (2)
mbach138 (2)
hwsamuel (2)
marsouin (1)
bottiger1 (1)
MLRadfys (1)
cyriltw (1)
JayThibs (1)
garbit (1)
gmachinromero (1)
annabechang (1)
KrautByte (1)

Pull Request Authors

jamt9000 (25)
pre-commit-ci[bot] (13)
laurahanu (11)
Borda (5)
dependabot[bot] (4)
s2t2 (3)
ACMCMC (2)
t-davidson (2)
dosatos (2)
ijonglin (2)
Vela-zz (2)
amritap-ef (2)
Vasilije1990 (2)
dcferreira (1)
ghost (1)

Top Labels

Issue Labels

bug (2) enhancement (2)

Pull Request Labels

dependencies (4)

Packages

Total packages: 2
Total downloads:
- pypi 108,395 last-month
Total docker downloads: 104

Total dependent packages: 17
(may contain duplicates)
Total dependent repositories: 89
(may contain duplicates)
Total versions: 21
Total maintainers: 3

pypi.org: detoxify

A python library for detecting toxic comments

Homepage: https://github.com/unitaryai/detoxify
Documentation: https://detoxify.readthedocs.io/
License: Apache Software License
Latest release: 0.5.2
published over 2 years ago

Versions: 12
Dependent Packages: 17
Dependent Repositories: 89
Downloads: 108,395 Last month
Docker Downloads: 104

Rankings

Dependent packages count: 0.8%

Dependent repos count: 1.6%

Downloads: 1.8%

Average: 2.3%

Stargazers count: 2.3%

Docker downloads count: 2.6%

Forks count: 4.7%

Maintainers (3)

jdthewlis laurahanu unitary

Last synced: 10 months ago

proxy.golang.org: github.com/unitaryai/detoxify

Documentation: https://pkg.go.dev/github.com/unitaryai/detoxify#section-documentation
License: apache-2.0
Latest release: v0.5.2
published over 2 years ago

Versions: 9
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.5%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 10 months ago

detoxify

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

News & Updates

22-10-2021: New improved multilingual model & standardised class names

03-09-2021: New improved unbiased model

15-02-2021: Detoxify featured in Scientific American!

14-01-2021: Lightweight models

Description

Multilingual model language breakdown

Limitations and ethical considerations

Quick prediction

install detoxify

each model takes in either a string or a list of strings

to specify the device the model will be allocated on (defaults to cpu), accepts any torch.device input

optional to display results nicely (will need to pip install pandas)

Labels

Toxic Comment Classification Challenge

Jigsaw Unintended Bias in Toxicity Classification

Jigsaw Multilingual Toxic Comment Classification

How to run

clone project

create virtual env

install project

or for training

Prediction

load model via torch.hub

load model from from checkpoint path

save results to a .csv file

to see usage

to display results nicely

Training

create data directory

download data

Start Training

combine test.csv and test_labels.csv

combine test.csv and test_labels.csv

Monitor progress with tensorboard

Model Evaluation

Toxic Comment Classification Challenge

Unintended Bias in Toxicicity Challenge

to get the final bias metric

Multilingual Toxic Comment Classification

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: detoxify

Rankings

Maintainers (3)

proxy.golang.org: github.com/unitaryai/detoxify

Rankings

Dependencies