canwetrustrefair
Reproducing and evaluating ReFAIR, a fairness-aware framework for domain and multi-label classification in requirements engineering.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Repository
Reproducing and evaluating ReFAIR, a fairness-aware framework for domain and multi-label classification in requirements engineering.
Basic Info
Statistics
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
CanWeTrustReFAIR: A Replication Study 🔍⚖️
Overview 📝
This repository replicates and extends ReFAIR, a Context-Aware Recommender for Fairness Requirements Engineering. Our study evaluates the reproducibility of ReFAIR's findings and explores opportunities for improvement in fairness-aware systems.
Objectives:
- 🔍 Understand fairness challenges in requirements engineering.
- 📊 Validate the ReFAIR framework's performance on domain and multi-label classification tasks.
- 🚀 Enhance the framework for improved robustness, reliability, and impact.
Project Structure 📂
- Dataset/: Contains all datasets used in testing and replication.
- Evaluation/: Subfolders for results of RQ1 and RQ2.
- MLModels/: Notebooks for training the best models in ReFAIR. Pre-trained weights are also stored here.
- main.py: Executes training and evaluation pipelines for RQ1 and RQ2.
- MoJo_Distance.py: Script for calculating results for RQ3.
- requirements.txt: Contains all Python dependencies for the project.
Installation Requirements ⚙️
1. Clone the Repository
bash
git clone https://github.com/your-repo-link/CanWeTrustReFAIR.git
cd CanWeTrustReFAIR
2. Install Required Python Packages
Install dependencies using requirements.txt:
bash
pip install -r requirements.txt
Alternatively, install packages manually:
bash
pip install numpy pandas scikit-learn transformers nltk gensim fasttext lazypredict
pip install scikit-multilearn xgboost
pip install torch # Required for BERT
3. Pre-trained Embedding Models
The code will automatically download large pre-trained embedding models:
| Model | Size | Notes |
|-----------------------|------------|--------------------------------|
| FastText | ~6.8 GB | cc.en.300.bin |
| Word2Vec | ~3.5 GB | word2vec-google-news-300.bin |
| GloVe | ~800 MB | Zipped and converted to Word2Vec format. |
Storage Requirement: Ensure at least 12GB of free disk space for embedding models and additional space for the Python environment and project files.
How to Run the Project ▶️
Main Pipeline:
Run themain.pyscript to execute training and evaluation for RQ1 (Domain Classification) and RQ2 (Multi-Label Classification):
bash python main.pyRQ3: MoJo Distance:
After runningmain.py, executeMoJo_Distance.pyto compute results for RQ3:
bash python MoJo_Distance.pyEvaluation Results:
Results for RQ1 and RQ2 will be saved under theEvaluation/folder.
Key Features 🚀
- Reproducibility: Fully replicates ReFAIR's domain and multi-label classification tasks.
- Flexible Pipeline: Supports multiple word embeddings (TF-IDF, Word2Vec, GloVe, FastText, BERT).
- Multi-Label Classification: Integrates Binary Relevance, Label Powerset, and Classifier Chains methods using scikit-multilearn.
- Evaluation Metrics: Reports F1-Score and Hamming Loss for rigorous performance analysis.
Results Overview 📊
| Research Question | Best Model | Embedding | Metric | Score | |-------------------|---------------------------------|----------------|-----------------|------------| | RQ1 | XGBClassifier | BERT | F1-Score | 98.4\% | | RQ2 | LinearSVC + Label Powerset | GloVe | F1-Score | 88.9\% | | | | | Hamming Loss | 0.392 |
References 📚
Ferrara, C., Casillo, F., Gravino, C., De Lucia, A., & Palomba, F. (2024, April).
ReFAIR: Toward a Context-Aware Recommender for Fairness Requirements Engineering.
In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (pp. 1–12).Chen, Z., Zhang, J. M., Sarro, F., & Harman, M. (2022, November).
MAAT: A Novel Ensemble Approach to Addressing Fairness and Performance Bugs for Machine Learning Software.
In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 1122–1134).
Contributors 👥
- Ahmed Radwan
- Claudia Farkas
- Amir Heari
Owner
- Name: Ahmed
- Login: AhmedRadwan02
- Kind: user
- Repositories: 1
- Profile: https://github.com/AhmedRadwan02
Linear Depression
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this repository."
authors:
- family-names: "Radwan"
given-names: "Ahmed"
- family-names: "Farkas"
given-names: "Claudia"
- family-names: "Haeri"
given-names: "Amir"
orcid: "https://orcid.org/0000-0000-0000-0000"
title: "CanWeTrustReFAIR"
version: "1.0.0"
doi: "10.5281/zenodo.1234"
date-released: "2024-12-03"
url: "https://github.com/AhmedRadwan02/CanWeTrustReFAIR"
GitHub Events
Total
- Watch event: 2
- Member event: 3
- Push event: 57
- Fork event: 2
- Create event: 1
Last Year
- Watch event: 2
- Member event: 3
- Push event: 57
- Fork event: 2
- Create event: 1
Dependencies
- fasttext *
- gensim *
- jupyter *
- lazypredict *
- nltk *
- notebook *
- numpy *
- pandas *
- scikit-learn *
- scikit-multilearn *
- torch *
- transformers *
- xgboost *