idg_hatexplain

This repository contains the code developed for my Master's Thesis "Explaining Word Interactions using Integrated Directional Gradients", part of the Master in the Fundamental Principles of Data Science of the University of Barcelona.

https://github.com/srmarcballestero/idg_hatexplain

Last synced: 10 months ago · JSON representation

Repository

This repository contains the code developed for my Master's Thesis "Explaining Word Interactions using Integrated Directional Gradients", part of the Master in the Fundamental Principles of Data Science of the University of Barcelona.

Basic Info

Host: GitHub
Owner: srmarcballestero
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 1.62 GB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

README.md

Explaining Word Interactions using Integrated Directional Gradients

This repository contains the code developed for my Master's Thesis "Explaining Word Interactions using Integrated Directional Gradients", part of the Master in the Fundamental Principles of Data Science of the University of Barcelona.

Author: Marc Ballestero Rib.
Program: Master's in the Fundamental Principles of Data Science.
Institution: University of Barcelona
Advisors: Dr. Daniel Ortiz-Martnez, Prof. Dr. Petia Radeva.
Thesis Period: September 2024 - June 2025.
Qualification: 10 / 10.

Abstract

Explainability methods are key for understanding the decision-making processes behind complex text models. In this thesis, we theoretically and empirically explore Integrated Directional Gradients (IDG), a method that can attribute importance to both individual features and their high-order interactions for deep neural network (DNN) models. We introduce evaluation metrics to quantitatively assess the quality of the generated explanations, and propose a framework to adapt word-level evaluation methods to high-order phrase-level interactions. Applying IDG to a BERT-based hate speech detection model, we compare its performance at the word level against well-established methods such as Integrated Gradients (IG) and Shapley Additive Explanations (SHAP). Our results indicate that, while IDG's word-level attributions are less faithful than those of IG and SHAP, they are the best-scoring ones in terms of plausibility. On the other hand, IDG's high-order importance attributions exhibit high faithfulness metrics, indicating that IDG can consider hierarchical dependencies that traditional methods overlook. Qualitative analyses further support the interpretability of IDG explanations. Overall, this thesis highlights the potential of high-order explanation methods for improving transparency in text models.

Repository Structure

. CITATION.cff # Citation metadata data/ # Raw and preprocessed datasets docs/ # Thesis report and defence slides etc/ # Extra data, report figures, etc. LICENSE # License file models/ # Saved model checkpoints (not publicly available) notebooks/ # Jupyter notebooks for experiments output/ # Generated results README.md # You're here ruff.toml # Linting configuration (ruff) src/ # Source code

Citation

Please cite this thesis using the following references:

Written Report

bibtex TBD

Code

Use the CITATION.cff file or the following BibTeX reference: bibtex @misc{BallesteroRibo2025IDG_HateXplain, author = {Marc Ballestero Rib}, title = {IDG\_HateXplain}, year = {2025}, howpublished = {\url{https://github.com/srmarcballestero/IDG_HateXplain}}, institution = {University of Barcelona}, url = {https://srmarcballestero.github.io} }

Contributing

This project was developed as part of a Master's Thesis. If you'd like to build on it, feel free to fork the repo or open an issue.

Contact

If you have any questions, feedback, or collaboration ideas, feel free to reach out to me:

(*) This account will soon be deactivated.

Acknowledgements

This work was carried out under the supervision of Dr. Daniel Ortiz Martnez and Prof. Dr. Petia Radeva. We thank the HateXplain authors for the dataset, and the developers of IDG for the codebase of the method.

Owner

Name: Marc Ballestero Ribó
Login: srmarcballestero
Kind: user
Location: Barcelona
Company: Universitat de Barcelona

Repositories: 1
Profile: https://github.com/srmarcballestero

Student of Mathematics and Physics at Universitat de Barcelona.

GitHub Events

Total

Public event: 1
Push event: 5

Last Year

Public event: 1
Push event: 5

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

idg_hatexplain

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Explaining Word Interactions using Integrated Directional Gradients

Abstract

Repository Structure

Citation

Written Report

Code

Contributing

Contact

Acknowledgements

Owner

GitHub Events

Total

Last Year