hateful_memes-hate_detectron

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975

https://github.com/rizavelioglu/hateful_memes-hate_detectron

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, scholar.google
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

challenge hateful-memes hateful-memes-challenge multimodal-deep-learning vision-and-language
Last synced: 6 months ago · JSON representation ·

Repository

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975

Basic Info
Statistics
  • Stars: 59
  • Watchers: 1
  • Forks: 18
  • Open Issues: 6
  • Releases: 0
Topics
challenge hateful-memes hateful-memes-challenge multimodal-deep-learning vision-and-language
Created about 5 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

Hateful Memes Challenge-Team HateDetectron Submissions

GitHub Repo stars GitHub forks GitHub GitHub repo size PWC

Check out the paper on arXiv and check out my thesis which offers an in-depth analysis of the approach as well as an overview of Multimodal Research and its foundations.

This repository contains all the code used at the Hateful Memes Challenge by Facebook AI. There are 2 main Jupyter notebooks where all the job is done and documented: - The 'reproducing results' notebook --> Open In Colab - The 'end-to-end' notebook --> Open In Colab

The first notebook is only for reproducing the results of Phase-2 submissions by the team HateDetectron. In other words, just loading the final models and getting predictions for the test set. See the end-to-end notebook to have a look at the whole approach in detail: how the models are trained, how the image features are extracted, which datasets are used, etc.


About the Competition

The Hateful Memes Challenge and Data Set is a competition and open source data set designed to measure progress in multimodal vision-and-language classification.

Check out the following sources to get more on the challenge: - Facebook AI - DrivenData - Competition Paper

Competition Results:

We are placed the 3rd out of 3.173 participants in total!

See the official Leaderboard here!


Repository structure

The repository consists of the following folders:

hyperparameter_sweep/ : where scripts for hyperparameter search are.

  • get_27_models.py: iterates through the folders those that were created for hyperparameter search and collects the metrics (ROC-AUC, accuracy) on the 'devunseen' set and stores them in a pd.DataFrame. Then, it sorts the models according to AUROC metric and moves the best 27 models into a generated folder `majorityvoting_models/`
  • remove_unused_file.py: removes unused files, e.g. old checkpoints, to free the disk.
  • sweep.py: defines the hyperparameters and starts the process by calling /sweep.sh
  • sweep.sh: is the mmf cli command to do training on a defined dataset, parameters, etc.

notebooks/ : where Jupyter notebooks are stored.

  • [GitHub]end2end_process.ipynb: presents the whole approach end-to-end: expanding data, image feature extraction, hyperparameter search, fine-tuning, majority voting.
  • [GitHub]reproduce_submissions.ipynb: loads our fine-tuned (final) models and generates predictions.
  • [GitHub]label_memotion.ipynb: a notebook which uses /utils/label_memotion.py to label memes from Memotion and to save it in an appropriate form.
  • [GitHub]simple_model.ipynb: includes a simple multimodal model implementation, also known as 'mid-level concat fusion'. We train the model and generate submission for the challenge test set.
  • [GitHub]benchmarks.ipynb: reproduces the benchmark results.

utils/ : where some helper scripts are stored, such as labeling Memotion Dataset and merging the two datasets.

  • concat_memotion-hm.py: concatenates the labeled memotion samples and the hateful memes samples and saves them in a new train.jsonl file.
  • generate_submission.sh: generates predictions for 'test_unseen' set (phase 2 test set).
  • label_memotion.jsonl: presents the memes labeled by us from memotion dataset.
  • label_memotion.py: is the script for labelling Memotion Dataset. The script iterates over the samples in Memotion and labeler labels the samples by entering 1 or 0 on the keyboard. The labels and the sample metadata is saved at the end as a label_memotion.jsonl.


Citation:

@article{velioglu2020hateful, author = {Velioglu, Riza and Rose, Jewgeni}, title = {Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge}, doi = {https://doi.org/jhb3}, publisher = {arXiv}, year = {2020}, }

Please also consider citing my thesis: @mastersthesis{velioglu2021detecting, title = "Detecting Hate Speech In Multimodal Memes Using Vision-Language Models", author = "Velioglu, Riza", school = "Bielefeld University", year = "2021", url = "http://rizavelioglu.github.io/files/RizaVelioglu-MScThesis.pdf" }

Contact:


<!-- -->

Owner

  • Name: Riza Velioglu
  • Login: rizavelioglu
  • Kind: user
  • Location: Bielefeld, Germany
  • Company: Bielefeld University

Ph.D. candidate in Machine Learning | Co-founder & CRO @recommendy

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repository, please consider citing it."
title: "HateDetectron"
abstract: "This repository implements methods presented in the paper: Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge."
authors:
- family-names: "Velioglu"
  given-names: "Riza"
  orcid: https://orcid.org/0000-0002-2160-4976
preferred-citation:
  type: article
  title: "Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge"
  doi: 10.48550/arXiv.2012.12975
  url: https://arxiv.org/abs/2012.12975
  journal: arXiv
  authors:
  - family-names: "Velioglu"
    given-names: "Riza"
  - family-names: "Rose"
    given-names: "Jewgeni"
  year: 2020
keywords:
- artificialintelligence
- deeplearning
- hatefulmemes
repository-code: "https://github.com/rizavelioglu/hateful_memes-hate_detectron"

GitHub Events

Total
  • Watch event: 10
  • Fork event: 1
Last Year
  • Watch event: 10
  • Fork event: 1