factchecks.br
Collection of Portuguese Fact-Checking Benchmarks.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, springer.com, acm.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Keywords
Repository
Collection of Portuguese Fact-Checking Benchmarks.
Basic Info
- Host: GitHub
- Owner: fake-news-UFG
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://huggingface.co/datasets/fake-news-UFG/FactChecksbr
- Size: 299 KB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
FactChecks.br
Collection of Portuguese Fact-Checking Benchmarks.
Getting Started
| Dataset | Type | Domain | Anotation | Data time | Number of samples | | --- | --- | --- | --- | --- | --- | | Fake.br | Claim | News | Annotated | 01/2016 - 01/2018 | 7.200 | | FakeRecogna | Source | News | Agency | 03/2017 - 05/2020 | 11.773 | | Central de Fatos | Source | News | Agency | 01/2013 - 05/2021 | 10.461 | | Fact-check_tweet (pt split) | Claim-source pair | Tweets-News | Auto-Agency | 2019 - 2021 | 656 - 656 | | FakeNewsSet | Claim-source pair | Tweets-News | Auto-Agencys | | 26.970 - 598 |
Usage 🤗
```python from datasets import load_dataset
data = load_dataset("fake-news-UFG/FactChecksbr") ``` We additionally upload raw versions from Fake.br, FakeRecogna, Central de Fatos, and FakeNewsSet.
Review urls were tagged using review id.
Scripts
- Notebook generation script and EDA is located at process.ipynb.
- Builder scripts for Dataset Hub are located at builders/.
Data Analysis
Agency domains per dataset
Duplication
There are 23,467 sources in total, of which there are 20,028 unique sources. The biggest overlap is between "FakeRecogna" and "Central de Fatos". There is no source in common between all datasets.
From 3303 duplicated sources, we excluded 130 contradictory examples, in which one dataset indicates that source alledges “fake” while not alledges as "not fake".
Samples per class
Evaluation
If you evaluated any dataset, please feel free to pull a request. :smile:
| Dataset | Model | Accuracy | Precision | Recall | macro-F1 | URL | | ---------------- | ------------------------------- | -------- | -------- | -------- | -------- | ----------------------------------------------------------------------------------------------------------- | | Fake.br | Bertimbau | 99,22% | - | - | - | repo | | Fake.Br | GloVe 100-600D - HAN | 97% | - | - | - | paper | | Fake.br | Bertimbau + Regressão Logística | 96,14% | 96,40% | 95,49% | 96,13% | paper | | Fake.Br | BoW | 96% | - | - | - | paper | | Fake.br | GloVe 100D + BiLSTM | 93.56% | - | - | - | repo | | Fake.br | TfidfVectorizer | 92,85% | 92,19% | 93,36% | - | repo | | Fake.BR | BoW | 89% | 89% | 89% | 89% | paper | | Fake.br | BoW + MLP | 88,65% | - | - | - | repo | | FakeNewsSetGen | Detective | 97,93% | 97,93% | - | - | repo | | Fact-check_tweet | XLM-R | 84,08% | - | - | 83,63% | paper | | FakeRecogna | MLP + BoW | 93,1% | 93,1% | 93,1% | 93,0% | repo
Citing
bibtex
@misc{FactChecksbr,
author = {R. S. Gomes, Juliana},
title = {FactChecks.br},
url = {https://github.com/fake-news-UFG/FactChecks.br},
doi = { 10.57967/hf/1016 },
}
Acknowledgments
This work has been supported by the FAPEG (Fundação de Amparo à Pesquisa do Estado de Goiás) and ANATEL (Agência Nacional de Telecomunicações).
Owner
- Name: fake-news-UFG
- Login: fake-news-UFG
- Kind: organization
- Repositories: 1
- Profile: https://github.com/fake-news-UFG
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: 'FactChecksbr '
message: Collection of Portuguese Fact-Checking Benchmarks.
type: dataset
authors:
- email: julianarsg13@gmail.com
given-names: Juliana
name-particle: R. S.
family-names: Gomes
affiliation: Federal University of Goiás
orcid: 'https://orcid.org/0000-0001-6900-1931'
identifiers:
- type: doi
value: 10.57967/hf/1016
description: Hugging Face
repository-code: 'https://github.com/fake-news-UFG/FactChecks.br'
repository-artifact: 'https://huggingface.co/datasets/fake-news-UFG/FactChecksbr'
keywords:
- fake news
- fact-checking
- portuguese
license: MIT