idg_hatexplain

This repository contains the code developed for my Master's Thesis "Explaining Word Interactions using Integrated Directional Gradients", part of the Master in the Fundamental Principles of Data Science of the University of Barcelona.

https://github.com/srmarcballestero/idg_hatexplain

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

This repository contains the code developed for my Master's Thesis "Explaining Word Interactions using Integrated Directional Gradients", part of the Master in the Fundamental Principles of Data Science of the University of Barcelona.

Basic Info
  • Host: GitHub
  • Owner: srmarcballestero
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 1.62 GB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Explaining Word Interactions using Integrated Directional Gradients

This repository contains the code developed for my Master's Thesis "Explaining Word Interactions using Integrated Directional Gradients", part of the Master in the Fundamental Principles of Data Science of the University of Barcelona.

  • Author: Marc Ballestero Rib.
  • Program: Master's in the Fundamental Principles of Data Science.
  • Institution: University of Barcelona
  • Advisors: Dr. Daniel Ortiz-Martnez, Prof. Dr. Petia Radeva.
  • Thesis Period: September 2024 - June 2025.
  • Qualification: 10 / 10.

Abstract

Explainability methods are key for understanding the decision-making processes behind complex text models. In this thesis, we theoretically and empirically explore Integrated Directional Gradients (IDG), a method that can attribute importance to both individual features and their high-order interactions for deep neural network (DNN) models. We introduce evaluation metrics to quantitatively assess the quality of the generated explanations, and propose a framework to adapt word-level evaluation methods to high-order phrase-level interactions. Applying IDG to a BERT-based hate speech detection model, we compare its performance at the word level against well-established methods such as Integrated Gradients (IG) and Shapley Additive Explanations (SHAP). Our results indicate that, while IDG's word-level attributions are less faithful than those of IG and SHAP, they are the best-scoring ones in terms of plausibility. On the other hand, IDG's high-order importance attributions exhibit high faithfulness metrics, indicating that IDG can consider hierarchical dependencies that traditional methods overlook. Qualitative analyses further support the interpretability of IDG explanations. Overall, this thesis highlights the potential of high-order explanation methods for improving transparency in text models.

Repository Structure

. CITATION.cff # Citation metadata data/ # Raw and preprocessed datasets docs/ # Thesis report and defence slides etc/ # Extra data, report figures, etc. LICENSE # License file models/ # Saved model checkpoints (not publicly available) notebooks/ # Jupyter notebooks for experiments output/ # Generated results README.md # You're here ruff.toml # Linting configuration (ruff) src/ # Source code

Citation

Please cite this thesis using the following references:

Written Report

bibtex TBD

Code

Use the CITATION.cff file or the following BibTeX reference: bibtex @misc{BallesteroRibo2025IDG_HateXplain, author = {Marc Ballestero Rib}, title = {IDG\_HateXplain}, year = {2025}, howpublished = {\url{https://github.com/srmarcballestero/IDG_HateXplain}}, institution = {University of Barcelona}, url = {https://srmarcballestero.github.io} }

Contributing

This project was developed as part of a Master's Thesis. If you'd like to build on it, feel free to fork the repo or open an issue.

Contact

If you have any questions, feedback, or collaboration ideas, feel free to reach out to me:

(*) This account will soon be deactivated.

Acknowledgements

This work was carried out under the supervision of Dr. Daniel Ortiz Martnez and Prof. Dr. Petia Radeva. We thank the HateXplain authors for the dataset, and the developers of IDG for the codebase of the method.

Owner

  • Name: Marc Ballestero Ribó
  • Login: srmarcballestero
  • Kind: user
  • Location: Barcelona
  • Company: Universitat de Barcelona

Student of Mathematics and Physics at Universitat de Barcelona.

GitHub Events

Total
  • Public event: 1
  • Push event: 5
Last Year
  • Public event: 1
  • Push event: 5