factchecks.br

Collection of Portuguese Fact-Checking Benchmarks.

https://github.com/fake-news-ufg/factchecks.br

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, springer.com, acm.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary

Keywords

fact-check fact-checking fact-verification fake-news fake-news-classification fake-news-dataset portuguese
Last synced: 6 months ago · JSON representation ·

Repository

Collection of Portuguese Fact-Checking Benchmarks.

Basic Info
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
fact-check fact-checking fact-verification fake-news fake-news-classification fake-news-dataset portuguese
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md


FactChecks.br

https://www.flaticon.com/free-icon/detective_695826
GitHub release (latest by date) GitHub GitHub Repo stars

Collection of Portuguese Fact-Checking Benchmarks.

Getting Started

| Dataset | Type | Domain | Anotation | Data time | Number of samples | | --- | --- | --- | --- | --- | --- | | Fake.br | Claim | News | Annotated | 01/2016 - 01/2018 | 7.200 | | FakeRecogna | Source | News | Agency | 03/2017 - 05/2020 | 11.773 | | Central de Fatos | Source | News | Agency | 01/2013 - 05/2021 | 10.461 | | Fact-check_tweet (pt split) | Claim-source pair | Tweets-News | Auto-Agency | 2019 - 2021 | 656 - 656 | | FakeNewsSet | Claim-source pair | Tweets-News | Auto-Agencys | | 26.970 - 598 |

Usage 🤗

```python from datasets import load_dataset

data = load_dataset("fake-news-UFG/FactChecksbr") ``` We additionally upload raw versions from Fake.br, FakeRecogna, Central de Fatos, and FakeNewsSet.

Review urls were tagged using review id.

Scripts

  • Notebook generation script and EDA is located at process.ipynb.
  • Builder scripts for Dataset Hub are located at builders/.

Data Analysis

Agency domains per dataset

image

Duplication

There are 23,467 sources in total, of which there are 20,028 unique sources. The biggest overlap is between "FakeRecogna" and "Central de Fatos". There is no source in common between all datasets.

From 3303 duplicated sources, we excluded 130 contradictory examples, in which one dataset indicates that source alledges “fake” while not alledges as "not fake".

image

Samples per class

image

Evaluation

If you evaluated any dataset, please feel free to pull a request. :smile:

| Dataset | Model | Accuracy | Precision | Recall | macro-F1 | URL | | ---------------- | ------------------------------- | -------- | -------- | -------- | -------- | ----------------------------------------------------------------------------------------------------------- | | Fake.br | Bertimbau | 99,22% | - | - | - | repo | | Fake.Br | GloVe 100-600D - HAN | 97% | - | - | - | paper | | Fake.br | Bertimbau + Regressão Logística | 96,14% | 96,40% | 95,49% | 96,13% | paper | | Fake.Br | BoW | 96% | - | - | - | paper | | Fake.br | GloVe 100D + BiLSTM | 93.56% | - | - | - | repo | | Fake.br | TfidfVectorizer | 92,85% | 92,19% | 93,36% | - | repo | | Fake.BR | BoW | 89% | 89% | 89% | 89% | paper | | Fake.br | BoW + MLP | 88,65% | - | - | - | repo | | FakeNewsSetGen | Detective | 97,93% | 97,93% | - | - | repo | | Fact-check_tweet | XLM-R | 84,08% | - | - | 83,63% | paper | | FakeRecogna | MLP + BoW | 93,1% | 93,1% | 93,1% | 93,0% | repo

Citing

bibtex @misc{FactChecksbr, author = {R. S. Gomes, Juliana}, title = {FactChecks.br}, url = {https://github.com/fake-news-UFG/FactChecks.br}, doi = { 10.57967/hf/1016 }, }

Acknowledgments

This work has been supported by the FAPEG (Fundação de Amparo à Pesquisa do Estado de Goiás) and ANATEL (Agência Nacional de Telecomunicações).

Owner

  • Name: fake-news-UFG
  • Login: fake-news-UFG
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: 'FactChecksbr '
message: Collection of Portuguese Fact-Checking Benchmarks.
type: dataset
authors:
  - email: julianarsg13@gmail.com
    given-names: Juliana
    name-particle: R. S.
    family-names: Gomes
    affiliation: Federal University of Goiás
    orcid: 'https://orcid.org/0000-0001-6900-1931'
identifiers:
  - type: doi
    value: 10.57967/hf/1016
    description: Hugging Face
repository-code: 'https://github.com/fake-news-UFG/FactChecks.br'
repository-artifact: 'https://huggingface.co/datasets/fake-news-UFG/FactChecksbr'
keywords:
  - fake news
  - fact-checking
  - portuguese
license: MIT

GitHub Events

Total
Last Year