califair-em
Official implementation of GUIDE-AI @ SIGMOD paper "Threshold-Independent Fair Matching through Score Calibration"
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Keywords
Repository
Official implementation of GUIDE-AI @ SIGMOD paper "Threshold-Independent Fair Matching through Score Calibration"
Basic Info
- Host: GitHub
- Owner: mhmoslemi2338
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://dl.acm.org/doi/abs/10.1145/3665601.3669845
- Size: 19.3 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Official implementation of GUIDE-AI @ SIGMOD 2024 paper "Threshold-Independent Fair Matching through Score Calibration"
Abstract
Entity Matching (EM) is a critical task in numerous fields, such as healthcare, finance, and public administration, as it identifies records that refer to the same entity within or across different databases. EM faces considerable challenges, particularly with false positives and negatives. These are typically addressed by generating matching scores and apply thresholds to balance false positives and negatives in various contexts. However, adjusting these thresholds can affect the fairness of the outcomes, a critical factor that remains largely overlooked in current fair EM research. The existing body of research on fair EM tends to concentrate on static thresholds, neglecting their critical impact on fairness. To address this, we introduce a new approach in EM using recent metrics for evaluating biases in score based binary classification, particularly through the lens of distributional parity. This approach enables the application of various bias metrics like equalized odds, equal opportunity, and demographic parity without depending on threshold settings. Our experiments with leading matching methods reveal potential biases, and by applying a calibration technique for EM scores using Wasserstein barycenters, we not only mitigate these biases but also preserve accuracy across real world datasets. This paper contributes to the field of fairness in data cleaning, especially within EM, which is a central task in data cleaning, by promoting a method for generating matching scores that reduce biases across different thresholds.
Data Directory
You can find all the data we used in the DATA directory.
The dataset are from the paper Deep Learning for Entity Matching: A Design Space Exploration, SIGMOD 2018 at https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md
Implementation Details
In each of the directories for DITTO, DeepMatcher, EMTransformer, HierGAT, HierMatcher, and Magellan, you will find the implementation for each method and instructions for obtaining the results. Each directory contains a Python script named starting with Train_. You can use this script to retrain the network. After training, the score for the test data will be automatically saved in the SCORES directory.
Regenerating Experiments
You can regenerate the experiments from the experiments.ipynb file, which utilizes the scores in the SCORES directory. This notebook also saves some variables in .pkl format and saves the final results and measurements in .csv format, as well as figures in .pdf format, in the FIGURES directory.
Citation
If you use this code, please cite our paper:
```bibtex @inproceedings{moslemi2024threshold, title={Threshold-Independent Fair Matching through Score Calibration}, author={Moslemi, Mohammad Hossein and Milani, Mostafa}, booktitle={Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI}, pages={40--44}, year={2024} }
Owner
- Name: mohammad hosein moslemi
- Login: mhmoslemi2338
- Kind: user
- Location: tehran,Iran
- Company: sharif university of Tech, Tehran
- Website: http://ee.sharif.edu/~moslemi.mohammdhosein/
- Twitter: mh_moslemi
- Repositories: 7
- Profile: https://github.com/mhmoslemi2338
BSc. of Electrical Engineering at the Sharif university of tech.my main interest are : computer vision and image processing specially medical image process
Citation (CITATION.cff)
cff-version: 1.2.0
message: >
If you use this code, please cite our GUIDE-AI 2024 paper.
title: Threshold-Independent Fair Matching through Score Calibration
version: "1.0.0"
doi: 10.1145/3665601.3669845
date-released: 2024-06-14
authors:
- family-names: Moslemi
given-names: Mohammad Hossein
orcid: https://orcid.org/0009-0002-0278-4665
- family-names: Milani
given-names: Mostafa
repository-code: https://github.com/mhmoslemi2338/CaliFair-EM
url: https://doi.org/10.1145/3665601.3669845
license: MIT
preferred-citation:
type: conference-paper
title: Threshold-Independent Fair Matching through Score Calibration
authors:
- family-names: Moslemi
given-names: Mohammad Hossein
orcid: https://orcid.org/0009-0002-0278-4665
- family-names: Milani
given-names: Mostafa
conference-name: Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI '24)
year: 2024
pages: 40–44
doi: 10.1145/3665601.3669845
GitHub Events
Total
- Push event: 5
Last Year
- Push event: 5