facilitation-dataset

The "Prosocial and Effective Facilitation in Konversations" (PEFK) dataset. An aggregated and standardized dataset composed of important facilitation datasets presented in Social Science literature.

https://github.com/dimits-ts/facilitation-dataset

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, acm.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

The "Prosocial and Effective Facilitation in Konversations" (PEFK) dataset. An aggregated and standardized dataset composed of important facilitation datasets presented in Social Science literature.

Basic Info
  • Host: GitHub
  • Owner: dimits-ts
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 163 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 9 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

The PEFK dataset

Repository housing the "Prosocial and Effective Facilitation in Konversations" (PEFK) dataset. This dataset is an aggregation and standardization of important facilitation datasets presented in Social Science literature. It also includes numerous metrics and augmented labels from Machine Learning, Deep Learning and LLM classifiers.

The dataset is provided as a large CSV file. Due to its overall size, it is not available directly on GitHub, but can be constructed by executing a shell script (see Usage Section).

The dataset is released under a CC-BY-SA License, and the code producing it uses the MIT software license.

This repository is currently under development. We plan on adding more datasets and quantitative discussion quality metrics in the near future.

List of datasets used

A list of references for each of the papers presenting the datasets can be found in the refs.bib file.

Environment

The code that creates the dataset runs only on Linux (or WSL). We provide a conda environment with all dependencies in environment.yml.

Usage

```bash git clone https://github.com/dimits-ts/facilitation-dataset.git

cd facilitation-dataset conda env create -f environment.yml conda activate pefk-dataset

bash createbasedataset.sh # data only contained inside the datasets OR

bash createaugmenteddataset.sh # includes ``Inferred'' data (see Table below)

```

Important Notes

  • The Fora dataset is NOT publicly available. Under an agreement with the MIT CCC we do not include this dataset by default in this repository, although the code to process it is present.

    • If you have access to Fora, place the provided .zip file in the project_root/downloads_external directory.
    • You may request access to Fora following the researchers' provided instructions
  • The WikiConv dataset is extremely large and may take multiple hours to download and process, depending on your hardware.

Dataset Description

| Name | Type | Description | Inferred | |-------------|--------|-----------------------------------------------------------------------------| --------| | convid | string | The discussion's ID. Comments under the same discussion refer to the same discussion ID.| | | messageid | string | The message's (comment's) unique ID.| | | replyto | string | The ID of the comment which the current comment responds to. nan if the comment does not respond to another comment (e.g., it's the Original Post (OP)). | | | user | string | Username or hash of the user that posted the comment | | | ismoderator| bool | Whether the user is a moderator/facilitator. In some datasets (e.g., UMOD, Wikitactics), normal users are considered facilitators if their comments are facilitative in nature. See Section Preprocessing for more details || | moderationsupported | bool | True if the moderation labels are directly computed from the original dataset | | | escalated | bool | A discussion-level measure denoting discussions which have been derailed | | | escalationsupported | bool | True if the escalation labels are directly computed from the original dataset | | | text | string | The contents of the comment | | | dataset | string | The dataset from which this comments originated from | | | notes | JSON | A dictionary holding notable dataset-specific information | | | toxicity | float | The "toxicity" score given to the comment by the Perspective API | ✔ | | severetoxicity | float | The "severe toxicity" score given to the comment by the Perspective API | ✔ | | modprobabilities | float | The probability that the comment is facilitative (given by a DL classifier - see Section "Facilitative detection") | ✔ | | shouldhaveintervened | bool | Whether the next comment is facilitative. Valid only where moderationsupported=1 | ✔ | | shouldhaveintervenedprobabilities | float | The probability that the next comment is facilitative (given by a DL classifier - see Section "Facilitative detection") | ✔ |

Inferred columns contain information not obtained by the actual datasets, but by our own analysis.

Preprocessing

See preprocessing.md.

Facilitative detection

See facilitation_detection.md.

Intervention detection

See intervention_detection.md

Taxonomy annotation

See taxonomy_annotation.md

Acknowledgements

This work has been partially supported by project MIS 5154714 of the National Recovery and Resilience Plan Greece 2.0 funded by the European Union under the NextGenerationEU Program.

Owner

  • Name: Dimitris Tsirmpas
  • Login: dimits-ts
  • Kind: user

I like playing around with data and building stuff.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset, please cite it as below."
authors:
- family-names: "Tsirmpas"
  given-names: "Dimitris"
  orcid: "https://orcid.org/0000-0002-5675-3939"
title: "PEFK: Patterns of Engagement in Facilitated Konversations"
version: 0.0.1
date-released: 2025-06-19
url: "https://github.com/dimits-ts/facilitation-dataset"

GitHub Events

Total
  • Delete event: 2
  • Push event: 23
  • Pull request event: 5
  • Create event: 4
Last Year
  • Delete event: 2
  • Push event: 23
  • Pull request event: 5
  • Create event: 4

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • dimits-ts (4)
Top Labels
Issue Labels
Pull Request Labels