fast-dawid-skene

Code for the algorithms in the paper: Vaibhav B Sinha, Sukrut Rao, Vineeth N Balasubramanian. Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification. KDD WISDOM 2018

https://github.com/sukrutrao/fast-dawid-skene

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary

Keywords

crowdsourced-aggregation crowdsourcing expectation-maximization python sentiment-classification

Last synced: 6 months ago · JSON representation ·

Repository

Code for the algorithms in the paper: Vaibhav B Sinha, Sukrut Rao, Vineeth N Balasubramanian. Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification. KDD WISDOM 2018

Basic Info

Host: GitHub
Owner: sukrutrao
License: mit
Language: Python
Default Branch: master
Homepage: https://sites.google.com/view/fast-dawid-skene
Size: 37.1 KB

Statistics

Stars: 43
Watchers: 5
Forks: 11
Open Issues: 2
Releases: 0

Topics

crowdsourced-aggregation crowdsourcing expectation-maximization python sentiment-classification

Created over 7 years ago · Last pushed over 3 years ago

Metadata Files

Readme License Citation

Fast Dawid-Skene

Paper | arXiv | Code | Slides | Supplementary Results

Implementation of the Fast Dawid-Skene and Hybrid algorithms described in the paper:

Vaibhav B Sinha, Sukrut Rao, Vineeth N Balasubramanian. Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification. In Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM) at the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2018, August 2018.

Implementations of the Dawid-Skene (Dawid and Skene, 1979) and Majority Voting algorithms are also provided.

These algorithms can be used to aggregate crowd-sourced labels to estimate the true labels. Given the labels of a data point from many annotators out of a set of classes, the algorithms output the most likely correct class for the data point.

Setup

Prerequisites

The prerequisites are: * Python 2.7 or 3.4-3.6 * pip

Setting up dependencies

All other dependencies can be installed using pip, as $ pip install -r requirements.txt

If tests are to be run, use instead, $ pip install -r requirements-dev.txt

Preparing the data

A description of the data format and the procedure to add a new dataset is given here. A toy dataset is also provided, and can be found here.

Running the program

To run the program, use $ python scripts/fast_dawid_skene.py [OPTIONS]

To view a list of available options along with descriptions, use $ python scripts/fast_dawid_skene.py --help

Example Run

To run on the toy dataset, with two annotators per question, using the FDS algorithm to obtain predictions, use $ python scripts/fast_dawid_skene.py --dataset toy --k 2 --mode aggregate --algorithm FDS --print_result To run using all available annotations for every question, using the FDS algorithm to obtain predictions, use $ python scripts/fast_dawid_skene.py --dataset toy --mode aggregate --algorithm FDS --print_result

Running tests

Tests can be run using pytest, as, $ py.test

License

This code is provided under the MIT License.

Some parts of the code in this file are derived from this implementation, and the original license and copyright notice can be found at the top of the file.

Citation

If the Fast Dawid-Skene / Hybrid algorithms are useful for your research, please cite our paper 2.

Acknowledgements

Parts of the code for the implementation of the algorithms use or derive from code in this implementation.

References

A. P. Dawid and A. M. Skene. 1979. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. J. Royal Stat. Soc. Series C 28, 1 (1979), 20–28.
Vaibhav B Sinha, Sukrut Rao, Vineeth N Balasubramanian. Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification. In Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM) at the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2018, August 2018.

Owner

Name: Sukrut Rao
Login: sukrutrao
Kind: user
Location: Germany
Company: Max Planck Institute for Informatics

Website: https://sukrutrao.github.io
Twitter: sukrutrao
Repositories: 11
Profile: https://github.com/sukrutrao

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If the Fast Dawid-Skene / Hybrid algorithms are useful for your research, please cite our paper as below."
preferred-citation:
  authors:
    - family-names: Sinha
      given-names: Vaibhav B
      orcid: "https://orcid.org/0000-0003-3499-6136"
    - family-names: Rao
      given-names: Sukrut
      orcid: "https://orcid.org/0000-0001-8896-7619"
    - family-names: Balasubramanian
      given-names: Vineeth N
      orcid: "https://orcid.org/0000-0003-2656-0375"
  title: "Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification"
  type: article
  journal: arXiv preprint arXiv:1803.02781
  year: 2018

GitHub Events

Total

Watch event: 1
Fork event: 2

Last Year

Watch event: 1
Fork event: 2

Dependencies

requirements-dev.txt pypi

numpy ==1.13.1
pandas ==0.20.2
pytest ==3.0.7

requirements.txt pypi

numpy ==1.13.1
pandas ==0.20.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science