detoxify
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Keywords
Repository
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
Basic Info
- Host: GitHub
- Owner: unitaryai
- License: apache-2.0
- Language: Python
- Default Branch: master
- Homepage: https://www.unitary.ai/
- Size: 50.8 MB
Statistics
- Stars: 1,098
- Watchers: 12
- Forks: 130
- Open Issues: 37
- Releases: 13
Topics
Metadata Files
README.md

News & Updates
22-10-2021: New improved multilingual model & standardised class names
- Updated the
multilingualmodel weights used by Detoxify with a model trained on the translated data from the 2nd Jigsaw challenge (as well as the 1st). This model has also been trained to minimise bias and now returns the same categories as theunbiasedmodel. New best AUC score on the test set: 92.11 (89.71 before). - All detoxify models now return consistent class names (e.g. "identityattack" replaces "identityhate" in the
originalmodel to match theunbiasedclasses).
03-09-2021: New improved unbiased model
- Updated the
unbiasedmodel weights used by Detoxify with a model trained on both datasets from the first 2 Jigsaw challenges. New best score on the test set: 93.74 (93.64 before).
15-02-2021: Detoxify featured in Scientific American!
- Our opinion piece "Can AI identify toxic online content?" is now live on Scientific American
14-01-2021: Lightweight models
- Added smaller models trained with Albert for the
originalandunbiasedmodels! Can access these in the same way with detoxify usingoriginal-smallandunbiased-smallas inputs. Theoriginal-smallachieved a mean AUC score of 98.28 (98.64 before) and theunbiased-smallachieved a final score of 93.36 (93.64 before).
Description
Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.
Built by Laura Hanu at Unitary, where we are working to stop harmful content online by interpreting visual content in context.
Dependencies: - For inference: - 🤗 Transformers - ⚡ Pytorch lightning - For training will also need: - Kaggle API (to download data)
| Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score % | Detoxify Score %
|-|-|-|-|-|-|-|
| Toxic Comment Classification Challenge | 2018 | build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | original | 98.86 | 98.64
| Jigsaw Unintended Bias in Toxicity Classification | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | unbiased | 94.73 | 93.74
| Jigsaw Multilingual Toxic Comment Classification | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | multilingual | 95.36 | 92.11
It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use.
Multilingual model language breakdown
| Language Subgroup | Subgroup size | Subgroup AUC Score % | |:-----------|----------------:|---------------:| 🇮🇹 it | 8494 | 89.18 | 🇫🇷 fr | 10920 | 89.61 | 🇷🇺 ru | 10948 | 89.81 | 🇵🇹 pt | 11012 | 91.00 | 🇪🇸 es | 8438 | 92.74 | 🇹🇷 tr | 14000 | 97.19 |
Limitations and ethical considerations
If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups.
The intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker.
Some useful resources about the risk of different biases in toxicity or hate speech detection are: - The Risk of Racial Bias in Hate Speech Detection - Automated Hate Speech Detection and the Problem of Offensive Language - Racial Bias in Hate Speech and Abusive Language Detection Datasets
Quick prediction
The multilingual model has been trained on 7 different languages so it should only be tested on: english, french, spanish, italian, portuguese, turkish or russian.
```bash
install detoxify
pip install detoxify
python
from detoxify import Detoxify
each model takes in either a string or a list of strings
results = Detoxify('original').predict('example text')
results = Detoxify('unbiased').predict(['example text 1','example text 2'])
results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])
to specify the device the model will be allocated on (defaults to cpu), accepts any torch.device input
model = Detoxify('original', device='cuda')
optional to display results nicely (will need to pip install pandas)
import pandas as pd
print(pd.DataFrame(results, index=input_text).round(5))
``` For more details check the Prediction section.
Labels
All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema: - Very Toxic (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective) - Toxic (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective) - Hard to Say - Not Toxic
More information about the labelling schema can be found here.
Toxic Comment Classification Challenge
This challenge includes the following labels:
toxicsevere_toxicobscenethreatinsultidentity_hate
Jigsaw Unintended Bias in Toxicity Classification
This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments.
Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation.
toxicitysevere_toxicityobscenethreatinsultidentity_attacksexual_explicit
Identity labels used:
- male
- female
- homosexual_gay_or_lesbian
- christian
- jewish
- muslim
- black
- white
- psychiatric_or_mental_illness
A complete list of all the identity labels available can be found here.
Jigsaw Multilingual Toxic Comment Classification
Since this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on:
toxicity
How to run
First, install dependencies ```bash
clone project
git clone https://github.com/unitaryai/detoxify
create virtual env
python3 -m venv toxic-env source toxic-env/bin/activate
install project
pip install -e detoxify
or for training
pip install -e 'detoxify[dev]'
cd detoxify
```
Prediction
Trained models summary:
|Model name| Transformer type| Data from
|:--:|:--:|:--:|
|original| bert-base-uncased | Toxic Comment Classification Challenge
|unbiased| roberta-base| Unintended Bias in Toxicity Classification
|multilingual| xlm-roberta-base| Multilingual Toxic Comment Classification
For a quick prediction can run the example script on a comment directly or from a txt containing a list of comments. ```bash
load model via torch.hub
python runprediction.py --input 'example' --modelname original
load model from from checkpoint path
python runprediction.py --input 'example' --fromckptpath modelpath
save results to a .csv file
python runprediction.py --input testset.txt --modelname original --saveto results.csv
to see usage
python run_prediction.py --help
```
Checkpoints can be downloaded from the latest release or via the Pytorch hub API with the following names:
- toxic_bert
- unbiased_toxic_roberta
- multilingual_toxic_xlm_r
bash
model = torch.hub.load('unitaryai/detoxify','toxic_bert')
Importing detoxify in python:
```python
from detoxify import Detoxify
results = Detoxify('original').predict('some text')
results = Detoxify('unbiased').predict(['example text 1','example text 2'])
results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])
to display results nicely
import pandas as pd
print(pd.DataFrame(results,index=input_text).round(5))
```
Training
If you do not already have a Kaggle account: - you need to create one to be able to download the data
go to My Account and click on Create New API Token - this will download a kaggle.json file
make sure this file is located in ~/.kaggle
```bash
create data directory
mkdir jigsawdata cd jigsawdata
download data
kaggle competitions download -c jigsaw-toxic-comment-classification-challenge unzip jigsaw-toxic-comment-classification-challenge.zip -d jigsaw-toxic-comment-classification-challenge find jigsaw-toxic-comment-classification-challenge -name '*.csv.zip' | xargs -n1 unzip -d jigsaw-toxic-comment-classification-challenge
kaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification unzip jigsaw-unintended-bias-in-toxicity-classification.zip -d jigsaw-unintended-bias-in-toxicity-classification
kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification unzip jigsaw-multilingual-toxic-comment-classification.zip -d jigsaw-multilingual-toxic-comment-classification
```
Start Training
### Toxic Comment Classification Challenge
```bash
combine test.csv and test_labels.csv
python preprocessingutils.py --testcsv jigsawdata/jigsaw-toxic-comment-classification-challenge/test.csv --updatetest
python train.py --config configs/Toxiccommentclassification_BERT.json ``` ### Unintended Bias in Toxicicity Challenge
```bash
python train.py --config configs/UnintendedbiastoxiccommentclassificationRoBERTacombined.json
``` ### Multilingual Toxic Comment Classification
The translated data (source 1 source 2) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).
```bash
combine test.csv and test_labels.csv
python preprocessingutils.py --testcsv jigsawdata/jigsaw-multilingual-toxic-comment-classification/test.csv --updatetest
python train.py --config configs/MultilingualtoxiccommentclassificationXLMR.json
```
Monitor progress with tensorboard
```bash
tensorboard --logdir=./saved
```
Model Evaluation
Toxic Comment Classification Challenge
This challenge is evaluated on the mean AUC score of all the labels.
```bash
python evaluate.py --checkpoint saved/lightninglogs/checkpoints/examplecheckpoint.pth --test_csv test.csv
```
Unintended Bias in Toxicicity Challenge
This challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric here.
```bash
python evaluate.py --checkpoint saved/lightninglogs/checkpoints/examplecheckpoint.pth --test_csv test.csv
to get the final bias metric
python modeleval/computebias_metric.py
```
Multilingual Toxic Comment Classification
This challenge is evaluated on the AUC score of the main toxic label.
```bash
python evaluate.py --checkpoint saved/lightninglogs/checkpoints/examplecheckpoint.pth --test_csv test.csv
```
Citation
@misc{Detoxify,
title={Detoxify},
author={Hanu, Laura and {Unitary team}},
howpublished={Github. https://github.com/unitaryai/detoxify},
year={2020}
}
Owner
- Name: Unitary
- Login: unitaryai
- Kind: organization
- Website: https://www.unitary.ai
- Twitter: unitaryai
- Repositories: 4
- Profile: https://github.com/unitaryai
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Hanu" given-names: "Laura" - family-names: "Unitary" given-names: "team" title: "Detoxify" version: 0.5.1 doi: 10.5281/zenodo.7925667 date-released: 2020-11-11 url: "https://github.com/unitaryai/detoxify"
GitHub Events
Total
- Issues event: 3
- Watch event: 132
- Delete event: 1
- Issue comment event: 15
- Push event: 24
- Pull request review comment event: 2
- Pull request review event: 9
- Pull request event: 22
- Fork event: 23
- Create event: 6
Last Year
- Issues event: 3
- Watch event: 132
- Delete event: 1
- Issue comment event: 15
- Push event: 24
- Pull request review comment event: 2
- Pull request review event: 9
- Pull request event: 22
- Fork event: 23
- Create event: 6
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 199
- Total Committers: 11
- Avg Commits per committer: 18.091
- Development Distribution Score (DDS): 0.296
Top Committers
| Name | Commits | |
|---|---|---|
| Laura | l****0@g****m | 140 |
| Laura Hanu | 3****u@u****m | 19 |
| James Thewlis | j****s@u****i | 8 |
| MJ Rossetti | s****2@u****m | 8 |
| pre-commit-ci[bot] | 6****]@u****m | 6 |
| Jirka | j****c@s****z | 6 |
| Jirka Borovec | B****a@u****m | 4 |
| Laura Hanu | l****u@L****l | 4 |
| Anita Vero | a****e@g****m | 2 |
| Greg Priday | g****g@s****m | 1 |
| Omid Foroughi | f****i@p****e | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 65
- Total pull requests: 65
- Average time to close issues: about 1 month
- Average time to close pull requests: about 1 month
- Total issue authors: 56
- Total pull request authors: 18
- Average comments per issue: 2.02
- Average comments per pull request: 1.11
- Merged pull requests: 49
- Bot issues: 0
- Bot pull requests: 13
Past Year
- Issues: 2
- Pull requests: 18
- Average time to close issues: about 11 hours
- Average time to close pull requests: 10 days
- Issue authors: 2
- Pull request authors: 6
- Average comments per issue: 1.0
- Average comments per pull request: 0.5
- Merged pull requests: 12
- Bot issues: 0
- Bot pull requests: 4
Top Authors
Issue Authors
- laurahanu (4)
- jamt9000 (3)
- grecosalvatore (2)
- SallyBean (2)
- mbach138 (2)
- hwsamuel (2)
- marsouin (1)
- bottiger1 (1)
- MLRadfys (1)
- cyriltw (1)
- JayThibs (1)
- garbit (1)
- gmachinromero (1)
- annabechang (1)
- KrautByte (1)
Pull Request Authors
- jamt9000 (25)
- pre-commit-ci[bot] (13)
- laurahanu (11)
- Borda (5)
- dependabot[bot] (4)
- s2t2 (3)
- ACMCMC (2)
- t-davidson (2)
- dosatos (2)
- ijonglin (2)
- Vela-zz (2)
- amritap-ef (2)
- Vasilije1990 (2)
- dcferreira (1)
- ghost (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 108,395 last-month
- Total docker downloads: 104
-
Total dependent packages: 17
(may contain duplicates) -
Total dependent repositories: 89
(may contain duplicates) - Total versions: 21
- Total maintainers: 3
pypi.org: detoxify
A python library for detecting toxic comments
- Homepage: https://github.com/unitaryai/detoxify
- Documentation: https://detoxify.readthedocs.io/
- License: Apache Software License
-
Latest release: 0.5.2
published about 2 years ago
Rankings
proxy.golang.org: github.com/unitaryai/detoxify
- Documentation: https://pkg.go.dev/github.com/unitaryai/detoxify#section-documentation
- License: apache-2.0
-
Latest release: v0.5.2
published about 2 years ago
Rankings
Dependencies
- actions/cache v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout master composite
- actions/setup-python v1 composite
- suo/flake8-github-action releases/v1 composite
- datasets >=1.0.2
- kaggle >=1.5.8
- pandas >=1.1.2
- pytorch-lightning >1.5.0
- scikit-learn >=0.23.2
- sentencepiece >=0.1.94
- torch >=1.10.0
- tqdm >=4.41.0
- transformers ==4.22.1
- sentencepiece *
- torch *
- transformers *
- check-manifest * test
- codecov >=2.1 test
- coverage * test
- flake8 * test
- pytest >=3.0.5 test
- pytest-cov * test
- pytest-flake8 * test
- torch >=1.7.0 test
- twine ==1.13.0 test