https://github.com/compnet/wikisynch

Synchronization between two Wikipedia-based Corpora

https://github.com/compnet/wikisynch

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary

Keywords

abuse-detection annotation conversations corpus
Last synced: 5 months ago · JSON representation

Repository

Synchronization between two Wikipedia-based Corpora

Basic Info
  • Host: GitHub
  • Owner: CompNet
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 26.4 KB
Statistics
  • Stars: 1
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
abuse-detection annotation conversations corpus
Created over 6 years ago · Last pushed about 6 years ago

https://github.com/CompNet/WikiSynch/blob/master/

Wikipedia Abusive Conversations
===================
*Synchronization between two Wikipedia-based Corpora*

* Copyright 2019-2020 No Ccillon

`WikiSynch` is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see licence.txt

* **Lab site:** http://lia.univ-avignon.fr
* **GitHub repo:** https://github.com/CompNet/Pang
* **Contact:** No Ccillon 

-------------------------------------------------------------------------

## Description
*Wikipedia Abusive Conversations* (WAC) is a large corpus of Wikipedia conversations annotated 3 types of abusive content (personal attack, aggression and toxicity). We developped a reconstruction pipeline to synchronize 2 existing corpora of Wikipedia comments and create WAC. This repository contains the source code used to perform this alignment.

If you use this source code or the associated data, please cite article [[C'20](#references)]:
```bibtex
@InProceedings{Cecillon2020,
  author    = {Ccillon, No and Labatut, Vincent and Dufour, Richard and Linars, Georges},
  title     = {{WAC}: A Corpus of {W}ikipedia Conversations for Online Abuse Detection},
  booktitle = {12th Language Resources and Evaluation Conference},
  year      = {2020},
  pages     = {1375-1383},
  address   = {Marseille, FR},
  url       = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.172.pdf},
}
```


## Dataset
The dataset itself is available for download on [Zenodo](https://doi.org/10.5281/zenodo.6817093). 
The content of Wikipedia comments is distributed under the [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license. The dataset is distributed under the [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) license.

## References
* **[C'20]** N. Ccillon, V. Labatut, R. Dufour, and G. Linars, *WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection*, 12th Language Resources and Evaluation Conference (LREC), 2020, pp. 13751383. [hal-02497514](https://hal.archives-ouvertes.fr/hal-02497514) 

Owner

  • Name: Complex Networks
  • Login: CompNet
  • Kind: organization
  • Location: Avignon, France

GitHub Events

Total
Last Year