idg_hatexplain
This repository contains the code developed for my Master's Thesis "Explaining Word Interactions using Integrated Directional Gradients", part of the Master in the Fundamental Principles of Data Science of the University of Barcelona.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
This repository contains the code developed for my Master's Thesis "Explaining Word Interactions using Integrated Directional Gradients", part of the Master in the Fundamental Principles of Data Science of the University of Barcelona.
Basic Info
- Host: GitHub
- Owner: srmarcballestero
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 1.62 GB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Explaining Word Interactions using Integrated Directional Gradients
This repository contains the code developed for my Master's Thesis "Explaining Word Interactions using Integrated Directional Gradients", part of the Master in the Fundamental Principles of Data Science of the University of Barcelona.
- Author: Marc Ballestero Rib.
- Program: Master's in the Fundamental Principles of Data Science.
- Institution: University of Barcelona
- Advisors: Dr. Daniel Ortiz-Martnez, Prof. Dr. Petia Radeva.
- Thesis Period: September 2024 - June 2025.
- Qualification: 10 / 10.
Abstract
Explainability methods are key for understanding the decision-making processes behind complex text models. In this thesis, we theoretically and empirically explore Integrated Directional Gradients (IDG), a method that can attribute importance to both individual features and their high-order interactions for deep neural network (DNN) models. We introduce evaluation metrics to quantitatively assess the quality of the generated explanations, and propose a framework to adapt word-level evaluation methods to high-order phrase-level interactions. Applying IDG to a BERT-based hate speech detection model, we compare its performance at the word level against well-established methods such as Integrated Gradients (IG) and Shapley Additive Explanations (SHAP). Our results indicate that, while IDG's word-level attributions are less faithful than those of IG and SHAP, they are the best-scoring ones in terms of plausibility. On the other hand, IDG's high-order importance attributions exhibit high faithfulness metrics, indicating that IDG can consider hierarchical dependencies that traditional methods overlook. Qualitative analyses further support the interpretability of IDG explanations. Overall, this thesis highlights the potential of high-order explanation methods for improving transparency in text models.
Repository Structure
.
CITATION.cff # Citation metadata
data/ # Raw and preprocessed datasets
docs/ # Thesis report and defence slides
etc/ # Extra data, report figures, etc.
LICENSE # License file
models/ # Saved model checkpoints (not publicly available)
notebooks/ # Jupyter notebooks for experiments
output/ # Generated results
README.md # You're here
ruff.toml # Linting configuration (ruff)
src/ # Source code
Citation
Please cite this thesis using the following references:
Written Report
bibtex
TBD
Code
Use the CITATION.cff file or the following BibTeX reference:
bibtex
@misc{BallesteroRibo2025IDG_HateXplain,
author = {Marc Ballestero Rib},
title = {IDG\_HateXplain},
year = {2025},
howpublished = {\url{https://github.com/srmarcballestero/IDG_HateXplain}},
institution = {University of Barcelona},
url = {https://srmarcballestero.github.io}
}
Contributing
This project was developed as part of a Master's Thesis. If you'd like to build on it, feel free to fork the repo or open an issue.
Contact
If you have any questions, feedback, or collaboration ideas, feel free to reach out to me:
Email (*): marcballestero@ub.edu
Website: My personal website
GitHub: @srmarcballestero
LinkedIn: Marc Ballestero Rib
(*) This account will soon be deactivated.
Acknowledgements
This work was carried out under the supervision of Dr. Daniel Ortiz Martnez and Prof. Dr. Petia Radeva. We thank the HateXplain authors for the dataset, and the developers of IDG for the codebase of the method.
Owner
- Name: Marc Ballestero Ribó
- Login: srmarcballestero
- Kind: user
- Location: Barcelona
- Company: Universitat de Barcelona
- Repositories: 1
- Profile: https://github.com/srmarcballestero
Student of Mathematics and Physics at Universitat de Barcelona.
GitHub Events
Total
- Public event: 1
- Push event: 5
Last Year
- Public event: 1
- Push event: 5