Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Seamlessly build the MuMiN dataset.
Basic Info
Statistics
- Stars: 30
- Watchers: 2
- Forks: 0
- Open Issues: 7
- Releases: 37
Topics
Metadata Files
README.md
MuMiN-Build
This repository contains the package used to build the MuMiN dataset from the paper Nielsen and McConville: MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset (2021).
See the MuMiN website for more information, including a leaderboard of the top performing models.
Installation
The mumin package can be installed using pip:
$ pip install mumin
To be able to build the dataset, Twitter data needs to be downloaded, which requires a Twitter API key. You can get one for free here. You will need the Bearer Token.
Quickstart
The main class of the package is the MuminDataset class:
```
from mumin import MuminDataset dataset = MuminDataset(twitterbearertoken=XXXXX) dataset MuminDataset(size='small', compiled=False) ```
By default, this loads the small version of the dataset. This can be changed by
setting the size argument of MuminDataset to one of 'small', 'medium' or
'large'. To begin using the dataset, it first needs to be compiled. This will
download the dataset, rehydrate the tweets and users, and download all the
associated news articles, images and videos. This usually takes a while.
```
dataset.compile() MuminDataset(numnodes=388,149, numrelations=475,490, size='small', compiled=True) ```
Note that this dataset does not contain all the nodes and relations in
MuMiN-small, as that would take way longer to compile. The data left out are
timelines, profile pictures and article images. These can be included by
specifying include_extra_images=True and/or include_timelines=True in the
constructor of MuminDataset.
After compilation, the dataset can also be found in the mumin-<size>.zip
file. This file name can be changed using the dataset_path argument when
initialising the MuminDataset class. If you need embeddings of the nodes, for
instance for use in machine learning models, then you can simply call the
add_embeddings method:
```
dataset.addembeddings() MuminDataset(numnodes=388,149, num_relations=475,490, size='small', compiled=True) ```
Note: If you need to use the add_embeddings method, you need to install
the mumin package as either pip install mumin[embeddings] or pip install
mumin[all], which will install the transformers and torch libraries. This
is to ensure that such large libraries are only downloaded if needed.
It is possible to export the dataset to the
Deep Graph Library, using the to_dgl method:
```
dglgraph = dataset.todgl() type(dgl_graph) dgl.heterograph.DGLHeteroGraph ```
Note: If you need to use the to_dgl method, you need to install the
mumin package as pip install mumin[dgl] or pip install mumin[all], which
will install the dgl and torch libraries.
For a more in-depth tutorial of how to work with the dataset, including training multiple different misinformation classifiers, see the tutorial.
Dataset Statistics
| Dataset | #Claims | #Threads | #Tweets | #Users | #Articles | #Images | #Languages | %Misinfo | | ---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | MuMiN-large | 12,914 | 26,048 | 21,565,018 | 1,986,354 | 10,920 | 6,573 | 41 | 94.57% | | MuMiN-medium | 5,565 | 10,832 | 12,650,371 | 1,150,259 | 4,212 | 2,510 | 37 | 94.07% | | MuMiN-small | 2,183 | 4,344 | 7,202,506 | 639,559 | 1,497 | 1,036 | 35 | 92.87% |
Related Repositories
- MuMiN website, the central place for the MuMiN ecosystem, containing tutorials, leaderboards and links to the paper and related repositories.
- MuMiN, containing the paper in PDF and LaTeX form.
- MuMiN-trawl, containing the source code used to construct the dataset from scratch.
- MuMiN-baseline, containing the source code for the baselines.
Owner
- Name: MuMiN-dataset
- Login: MuMiN-dataset
- Kind: organization
- Location: Bristol, United Kingdom
- Website: https://www.rephrain.ac.uk/clariti/
- Repositories: 3
- Profile: https://github.com/MuMiN-dataset
A multimodal machine learning based study of medical misinformation on social networks.
GitHub Events
Total
- Watch event: 3
- Issue comment event: 2
Last Year
- Watch event: 3
- Issue comment event: 2
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| saattrupdan | s****n@g****m | 1,181 |
| Dan Saattrup Nielsen | d****n@a****k | 77 |
| dependabot[bot] | 4****] | 16 |
| Ryan McConville | r****n@r****m | 6 |
| Dan Saattrup Nielsen | 4****n | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 13
- Total pull requests: 19
- Average time to close issues: 13 days
- Average time to close pull requests: 1 day
- Total issue authors: 11
- Total pull request authors: 1
- Average comments per issue: 4.69
- Average comments per pull request: 0.16
- Merged pull requests: 12
- Bot issues: 0
- Bot pull requests: 19
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- HarryWu-CHN (3)
- lujain96 (1)
- fablos (1)
- camila-cg (1)
- ziw42 (1)
- Luke-Gassmann (1)
- shawn-dm (1)
- GiuseppePipicelli96 (1)
- Yutaaa76 (1)
- ramirezmichelle (1)
- DanniXu98 (1)
Pull Request Authors
- dependabot[bot] (18)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 311 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 36
- Total maintainers: 1
pypi.org: mumin
Seamlessly build the MuMiN dataset.
- Homepage: https://mumin-dataset.github.io/
- Documentation: https://mumin.readthedocs.io/
- License: MIT
-
Latest release: 1.10.0
published over 3 years ago
Rankings
Maintainers (1)
Dependencies
- astunparse 1.6.3 develop
- atomicwrites 1.4.1 develop
- attrs 21.4.0 develop
- black 22.6.0 develop
- cfgv 3.3.1 develop
- coverage 5.5 develop
- distlib 0.3.5 develop
- execnet 1.9.0 develop
- identify 2.5.2 develop
- iniconfig 1.1.1 develop
- isort 5.10.1 develop
- jinja2 3.1.2 develop
- markupsafe 2.1.1 develop
- mypy-extensions 0.4.3 develop
- nodeenv 1.7.0 develop
- pathspec 0.9.0 develop
- pdoc 7.4.0 develop
- platformdirs 2.5.2 develop
- pluggy 1.0.0 develop
- pre-commit 2.20.0 develop
- py 1.11.0 develop
- pygments 2.12.0 develop
- pytest 6.2.5 develop
- pytest-cov 3.0.0 develop
- pytest-forked 1.4.0 develop
- pytest-xdist 2.5.0 develop
- readme-coverage-badger 0.1.2 develop
- toml 0.10.2 develop
- tomli 2.0.1 develop
- virtualenv 20.15.1 develop
- beautifulsoup4 4.11.1
- certifi 2022.6.15
- charset-normalizer 2.1.0
- cli-exit-tools 1.2.3.2
- click 8.1.3
- colorama 0.4.5
- cssselect 1.1.0
- dill 0.3.4
- dill 0.3.5.1
- feedfinder2 0.0.4
- feedparser 6.0.10
- filelock 3.7.1
- huggingface-hub 0.8.1
- idna 3.3
- jieba3k 0.35.1
- joblib 1.1.0
- lib-detect-testenv 2.0.2.2
- lxml 4.9.1
- multiprocess 0.70.12.2
- multiprocess 0.70.13
- newspaper3k 0.2.8
- nltk 3.7
- numpy 1.23.1
- packaging 21.3
- pandas 1.4.3
- pillow 9.2.0
- pyparsing 3.0.9
- python-dateutil 2.8.2
- python-dotenv 0.20.0
- pytz 2022.1
- pyyaml 6.0
- regex 2022.7.9
- requests 2.28.1
- requests-file 1.5.1
- sgmllib3k 1.0.0
- six 1.16.0
- soupsieve 2.3.2.post1
- tinysegmenter 0.3
- tldextract 3.3.1
- tokenizers 0.12.1
- torch 1.12.0
- tqdm 4.64.0
- transformers 4.20.1
- typing-extensions 4.3.0
- urllib3 1.26.10
- wrapt 1.14.1
- wrapt-timeout-decorator 1.3.12.2
- black ^22.3.0 develop
- isort ^5.10.1 develop
- pdoc ^7.1.1 develop
- pre-commit ^2.17.0 develop
- pytest ^6.2.5 develop
- pytest-cov ^3.0.0 develop
- pytest-xdist ^2.5.0 develop
- python-dotenv ^0.20.0 develop
- readme-coverage-badger ^0.1.2 develop
- newspaper3k ^0.2.8
- pandas ^1.4.3
- python >=3.8,<3.11
- python-dotenv ^0.20.0
- torch ^1.12.0
- tqdm ^4.62.0
- transformers ^4.20.0
- wrapt-timeout-decorator ^1.3.12
- abatilo/actions-poetry v2.0.0 composite
- actions/cache v1 composite
- actions/checkout v2 composite
- actions/checkout v3 composite
- actions/setup-python v2 composite
- jpetrucciani/black-check master composite
- abatilo/actions-poetry v2.0.0 composite
- actions/cache v1 composite
- actions/checkout v2 composite
- actions/deploy-pages v1 composite
- actions/setup-python v2 composite
- actions/upload-artifact v3 composite