mumin

Seamlessly build the MuMiN dataset.

https://github.com/mumin-dataset/mumin-build

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.4%) to scientific vocabulary

Keywords

dataset deep-graph-library graph misinformation mumin pytorch-geometric

Keywords from Contributors

transformers interactive observability autograding hacking shellcodes network-simulation packaging pretrained-models serializer
Last synced: 6 months ago · JSON representation

Repository

Seamlessly build the MuMiN dataset.

Basic Info
  • Host: GitHub
  • Owner: MuMiN-dataset
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 2.3 MB
Statistics
  • Stars: 30
  • Watchers: 2
  • Forks: 0
  • Open Issues: 7
  • Releases: 37
Topics
dataset deep-graph-library graph misinformation mumin pytorch-geometric
Created over 4 years ago · Last pushed about 2 years ago
Metadata Files
Readme Changelog License

README.md

MuMiN-Build

This repository contains the package used to build the MuMiN dataset from the paper Nielsen and McConville: MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset (2021).

See the MuMiN website for more information, including a leaderboard of the top performing models.


PyPI Status Documentation License LastCommit Code Coverage

Installation

The mumin package can be installed using pip: $ pip install mumin

To be able to build the dataset, Twitter data needs to be downloaded, which requires a Twitter API key. You can get one for free here. You will need the Bearer Token.

Quickstart

The main class of the package is the MuminDataset class: ```

from mumin import MuminDataset dataset = MuminDataset(twitterbearertoken=XXXXX) dataset MuminDataset(size='small', compiled=False) ```

By default, this loads the small version of the dataset. This can be changed by setting the size argument of MuminDataset to one of 'small', 'medium' or 'large'. To begin using the dataset, it first needs to be compiled. This will download the dataset, rehydrate the tweets and users, and download all the associated news articles, images and videos. This usually takes a while. ```

dataset.compile() MuminDataset(numnodes=388,149, numrelations=475,490, size='small', compiled=True) ```

Note that this dataset does not contain all the nodes and relations in MuMiN-small, as that would take way longer to compile. The data left out are timelines, profile pictures and article images. These can be included by specifying include_extra_images=True and/or include_timelines=True in the constructor of MuminDataset.

After compilation, the dataset can also be found in the mumin-<size>.zip file. This file name can be changed using the dataset_path argument when initialising the MuminDataset class. If you need embeddings of the nodes, for instance for use in machine learning models, then you can simply call the add_embeddings method: ```

dataset.addembeddings() MuminDataset(numnodes=388,149, num_relations=475,490, size='small', compiled=True) ```

Note: If you need to use the add_embeddings method, you need to install the mumin package as either pip install mumin[embeddings] or pip install mumin[all], which will install the transformers and torch libraries. This is to ensure that such large libraries are only downloaded if needed.

It is possible to export the dataset to the Deep Graph Library, using the to_dgl method: ```

dglgraph = dataset.todgl() type(dgl_graph) dgl.heterograph.DGLHeteroGraph ```

Note: If you need to use the to_dgl method, you need to install the mumin package as pip install mumin[dgl] or pip install mumin[all], which will install the dgl and torch libraries.

For a more in-depth tutorial of how to work with the dataset, including training multiple different misinformation classifiers, see the tutorial.

Dataset Statistics

| Dataset | #Claims | #Threads | #Tweets | #Users | #Articles | #Images | #Languages | %Misinfo | | ---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | MuMiN-large | 12,914 | 26,048 | 21,565,018 | 1,986,354 | 10,920 | 6,573 | 41 | 94.57% | | MuMiN-medium | 5,565 | 10,832 | 12,650,371 | 1,150,259 | 4,212 | 2,510 | 37 | 94.07% | | MuMiN-small | 2,183 | 4,344 | 7,202,506 | 639,559 | 1,497 | 1,036 | 35 | 92.87% |

Related Repositories

  • MuMiN website, the central place for the MuMiN ecosystem, containing tutorials, leaderboards and links to the paper and related repositories.
  • MuMiN, containing the paper in PDF and LaTeX form.
  • MuMiN-trawl, containing the source code used to construct the dataset from scratch.
  • MuMiN-baseline, containing the source code for the baselines.

Owner

  • Name: MuMiN-dataset
  • Login: MuMiN-dataset
  • Kind: organization
  • Location: Bristol, United Kingdom

A multimodal machine learning based study of medical misinformation on social networks.

GitHub Events

Total
  • Watch event: 3
  • Issue comment event: 2
Last Year
  • Watch event: 3
  • Issue comment event: 2

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 1,282
  • Total Committers: 5
  • Avg Commits per committer: 256.4
  • Development Distribution Score (DDS): 0.079
Past Year
  • Commits: 6
  • Committers: 1
  • Avg Commits per committer: 6.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
saattrupdan s****n@g****m 1,181
Dan Saattrup Nielsen d****n@a****k 77
dependabot[bot] 4****] 16
Ryan McConville r****n@r****m 6
Dan Saattrup Nielsen 4****n 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 13
  • Total pull requests: 19
  • Average time to close issues: 13 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 11
  • Total pull request authors: 1
  • Average comments per issue: 4.69
  • Average comments per pull request: 0.16
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 19
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • HarryWu-CHN (3)
  • lujain96 (1)
  • fablos (1)
  • camila-cg (1)
  • ziw42 (1)
  • Luke-Gassmann (1)
  • shawn-dm (1)
  • GiuseppePipicelli96 (1)
  • Yutaaa76 (1)
  • ramirezmichelle (1)
  • DanniXu98 (1)
Pull Request Authors
  • dependabot[bot] (18)
Top Labels
Issue Labels
bug (3) enhancement (2) duplicate (1)
Pull Request Labels
dependencies (18)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 311 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 36
  • Total maintainers: 1
pypi.org: mumin

Seamlessly build the MuMiN dataset.

  • Versions: 36
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 311 Last month
Rankings
Dependent packages count: 10.0%
Stargazers count: 12.6%
Average: 19.3%
Dependent repos count: 21.7%
Downloads: 22.4%
Forks count: 29.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

poetry.lock pypi
  • astunparse 1.6.3 develop
  • atomicwrites 1.4.1 develop
  • attrs 21.4.0 develop
  • black 22.6.0 develop
  • cfgv 3.3.1 develop
  • coverage 5.5 develop
  • distlib 0.3.5 develop
  • execnet 1.9.0 develop
  • identify 2.5.2 develop
  • iniconfig 1.1.1 develop
  • isort 5.10.1 develop
  • jinja2 3.1.2 develop
  • markupsafe 2.1.1 develop
  • mypy-extensions 0.4.3 develop
  • nodeenv 1.7.0 develop
  • pathspec 0.9.0 develop
  • pdoc 7.4.0 develop
  • platformdirs 2.5.2 develop
  • pluggy 1.0.0 develop
  • pre-commit 2.20.0 develop
  • py 1.11.0 develop
  • pygments 2.12.0 develop
  • pytest 6.2.5 develop
  • pytest-cov 3.0.0 develop
  • pytest-forked 1.4.0 develop
  • pytest-xdist 2.5.0 develop
  • readme-coverage-badger 0.1.2 develop
  • toml 0.10.2 develop
  • tomli 2.0.1 develop
  • virtualenv 20.15.1 develop
  • beautifulsoup4 4.11.1
  • certifi 2022.6.15
  • charset-normalizer 2.1.0
  • cli-exit-tools 1.2.3.2
  • click 8.1.3
  • colorama 0.4.5
  • cssselect 1.1.0
  • dill 0.3.4
  • dill 0.3.5.1
  • feedfinder2 0.0.4
  • feedparser 6.0.10
  • filelock 3.7.1
  • huggingface-hub 0.8.1
  • idna 3.3
  • jieba3k 0.35.1
  • joblib 1.1.0
  • lib-detect-testenv 2.0.2.2
  • lxml 4.9.1
  • multiprocess 0.70.12.2
  • multiprocess 0.70.13
  • newspaper3k 0.2.8
  • nltk 3.7
  • numpy 1.23.1
  • packaging 21.3
  • pandas 1.4.3
  • pillow 9.2.0
  • pyparsing 3.0.9
  • python-dateutil 2.8.2
  • python-dotenv 0.20.0
  • pytz 2022.1
  • pyyaml 6.0
  • regex 2022.7.9
  • requests 2.28.1
  • requests-file 1.5.1
  • sgmllib3k 1.0.0
  • six 1.16.0
  • soupsieve 2.3.2.post1
  • tinysegmenter 0.3
  • tldextract 3.3.1
  • tokenizers 0.12.1
  • torch 1.12.0
  • tqdm 4.64.0
  • transformers 4.20.1
  • typing-extensions 4.3.0
  • urllib3 1.26.10
  • wrapt 1.14.1
  • wrapt-timeout-decorator 1.3.12.2
pyproject.toml pypi
  • black ^22.3.0 develop
  • isort ^5.10.1 develop
  • pdoc ^7.1.1 develop
  • pre-commit ^2.17.0 develop
  • pytest ^6.2.5 develop
  • pytest-cov ^3.0.0 develop
  • pytest-xdist ^2.5.0 develop
  • python-dotenv ^0.20.0 develop
  • readme-coverage-badger ^0.1.2 develop
  • newspaper3k ^0.2.8
  • pandas ^1.4.3
  • python >=3.8,<3.11
  • python-dotenv ^0.20.0
  • torch ^1.12.0
  • tqdm ^4.62.0
  • transformers ^4.20.0
  • wrapt-timeout-decorator ^1.3.12
.github/workflows/ci.yaml actions
  • abatilo/actions-poetry v2.0.0 composite
  • actions/cache v1 composite
  • actions/checkout v2 composite
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
  • jpetrucciani/black-check master composite
.github/workflows/docs.yaml actions
  • abatilo/actions-poetry v2.0.0 composite
  • actions/cache v1 composite
  • actions/checkout v2 composite
  • actions/deploy-pages v1 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v3 composite