facilitation-dataset
The "Prosocial and Effective Facilitation in Konversations" (PEFK) dataset. An aggregated and standardized dataset composed of important facilitation datasets presented in Social Science literature.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, acm.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Repository
The "Prosocial and Effective Facilitation in Konversations" (PEFK) dataset. An aggregated and standardized dataset composed of important facilitation datasets presented in Social Science literature.
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
The PEFK dataset
Repository housing the "Prosocial and Effective Facilitation in Konversations" (PEFK) dataset. This dataset is an aggregation and standardization of important facilitation datasets presented in Social Science literature. It also includes numerous metrics and augmented labels from Machine Learning, Deep Learning and LLM classifiers.
The dataset is provided as a large CSV file. Due to its overall size, it is not available directly on GitHub, but can be constructed by executing a shell script (see Usage Section).
The dataset is released under a CC-BY-SA License, and the code producing it uses the MIT software license.
This repository is currently under development. We plan on adding more datasets and quantitative discussion quality metrics in the near future.
List of datasets used
- WikiDisputes
- WikiTactics
- WikiConv
- Conversations Gone Awry / CMV II
- CeRI data
- User Moderation (UMOD)
- Virtual Moderation Dataset (VMD)
- Intelligence Squared 2 (IQ2)
- Why How Who (WHoW)
- Fora
A list of references for each of the papers presenting the datasets can be found in the refs.bib file.
Environment
The code that creates the dataset runs only on Linux (or WSL). We provide a conda environment with all dependencies in environment.yml.
Usage
```bash git clone https://github.com/dimits-ts/facilitation-dataset.git
cd facilitation-dataset conda env create -f environment.yml conda activate pefk-dataset
bash createbasedataset.sh # data only contained inside the datasets OR
bash createaugmenteddataset.sh # includes ``Inferred'' data (see Table below)
```
Important Notes
The Fora dataset is NOT publicly available. Under an agreement with the MIT CCC we do not include this dataset by default in this repository, although the code to process it is present.
- If you have access to Fora, place the provided
.zipfile in theproject_root/downloads_externaldirectory. - You may request access to Fora following the researchers' provided instructions
- If you have access to Fora, place the provided
The WikiConv dataset is extremely large and may take multiple hours to download and process, depending on your hardware.
Dataset Description
| Name | Type | Description | Inferred |
|-------------|--------|-----------------------------------------------------------------------------| --------|
| convid | string | The discussion's ID. Comments under the same discussion refer to the same discussion ID.| |
| messageid | string | The message's (comment's) unique ID.| |
| replyto | string | The ID of the comment which the current comment responds to. nan if the comment does not respond to another comment (e.g., it's the Original Post (OP)). | |
| user | string | Username or hash of the user that posted the comment | |
| ismoderator| bool | Whether the user is a moderator/facilitator. In some datasets (e.g., UMOD, Wikitactics), normal users are considered facilitators if their comments are facilitative in nature. See Section Preprocessing for more details ||
| moderationsupported | bool | True if the moderation labels are directly computed from the original dataset | |
| escalated | bool | A discussion-level measure denoting discussions which have been derailed | |
| escalationsupported | bool | True if the escalation labels are directly computed from the original dataset | |
| text | string | The contents of the comment | |
| dataset | string | The dataset from which this comments originated from | |
| notes | JSON | A dictionary holding notable dataset-specific information | |
| toxicity | float | The "toxicity" score given to the comment by the Perspective API | ✔ |
| severetoxicity | float | The "severe toxicity" score given to the comment by the Perspective API | ✔ |
| modprobabilities | float | The probability that the comment is facilitative (given by a DL classifier - see Section "Facilitative detection") | ✔ |
| shouldhaveintervened | bool | Whether the next comment is facilitative. Valid only where moderationsupported=1 | ✔ |
| shouldhaveintervenedprobabilities | float | The probability that the next comment is facilitative (given by a DL classifier - see Section "Facilitative detection") | ✔ |
Inferred columns contain information not obtained by the actual datasets, but by our own analysis.
Preprocessing
See preprocessing.md.
Facilitative detection
See facilitation_detection.md.
Intervention detection
Taxonomy annotation
Acknowledgements
This work has been partially supported by project MIS 5154714 of the National Recovery and Resilience Plan Greece 2.0 funded by the European Union under the NextGenerationEU Program.
Owner
- Name: Dimitris Tsirmpas
- Login: dimits-ts
- Kind: user
- Repositories: 1
- Profile: https://github.com/dimits-ts
I like playing around with data and building stuff.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this dataset, please cite it as below." authors: - family-names: "Tsirmpas" given-names: "Dimitris" orcid: "https://orcid.org/0000-0002-5675-3939" title: "PEFK: Patterns of Engagement in Facilitated Konversations" version: 0.0.1 date-released: 2025-06-19 url: "https://github.com/dimits-ts/facilitation-dataset"
GitHub Events
Total
- Delete event: 2
- Push event: 23
- Pull request event: 5
- Create event: 4
Last Year
- Delete event: 2
- Push event: 23
- Pull request event: 5
- Create event: 4
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- dimits-ts (4)