https://github.com/amazon-science/generalized-fairness-metrics

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: amazon-science
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Size: 62.6 MB

Statistics

Stars: 14
Watchers: 2
Forks: 3
Open Issues: 7
Releases: 0

Created over 4 years ago · Last pushed over 3 years ago

https://github.com/amazon-science/generalized-fairness-metrics/blob/main/

## Generalized Fairness Metrics

This repository contains the source code for the paper:

> [Quantifying Social Biases in NLP: A Generalization and Empirical Comparison of Extrinsic Fairness Metrics](https://arxiv.org/abs/2106.14574)\
> Paula Czarnowska, Yogarshi Vyas, Kashif Shah\
> Transaction of the Association for Computational Linguistics (TACL), 2021


__Reproducing classification experiments__:

1. Change the *MODELSDIR* variable in *get\_predictions.sh* and the *OUTDIR* variable in *run_experiments.sh* to where your models will be/are saved.

2. Change the CUDA variable in *setup.sh* to the appropriate version of CUDA.

3. Run *setup.sh* to:
    * fetch the required submodules
    * create and activate a new environment named *btools* based on the requirements.yml
    * download the SemEval valence classification data

4. Train the models from the config files in the *experiments* directory:
    > ./run_experiment.sh train=1 DATASET=semeval-2 exp=experiments/roberta.jsonnet\
    > ./run_experiment.sh train=1 DATASET=semeval-3 exp=experiments/roberta.jsonnet

5. Create the test suites and test the models. The plots for the results are saved in the *plots* directory:
    > conda activate btools\
    > python3 reproduce.py --classification --create-tests


__Reproducing NER experiments__:

1. Run the setup steps (1 and 2 above).

2. Get the CoNLL2003 data (https://www.clips.uantwerpen.be/conll2003/ner/). 
Place the *eng.train, eng.testa* and *eng.testb* files in *datasets/conll2003/ner* directory.

3. Train the model:
    > ./run_experiment.sh train=1 DATASET=conll2003 exp=experiments/ner-roberta.jsonnet

4. Test the trained model:
    > python3 reproduce.py --ner

    or, if you haven't created the test suites yet:
    > python3 reproduce.py --ner --create-tests


__Metric implementations__:

Implementations of all metrics can be found in *expanded_checklist/checklist/tests*.\
The code for generalized metrics is located in *expanded_checklist/checklist/tests/abstract_tests/generalized_metrics.py*.

## Acknowledgements
The code in the expanded_checklist directory is a restructured and expanded version of the repository 
> https://github.com/marcotcr/checklist

containing the code for testing NLP Models as described in the following paper:
> [Beyond Accuracy: Behavioral Testing of NLP models with CheckList](https://aclanthology.org/2020.acl-main.442/)\
> Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh\
> Association for Computational Linguistics (ACL), 2020

## Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

## License

This project is licensed under the Apache-2.0 License.

Owner

Name: Amazon Science
Login: amazon-science
Kind: organization

Website: https://amazon.science
Twitter: AmazonScience
Repositories: 80
Profile: https://github.com/amazon-science

GitHub Events

Total

Watch event: 1
Pull request event: 4
Create event: 1

Last Year

Watch event: 1
Pull request event: 4
Create event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 50
Average time to close issues: N/A
Average time to close pull requests: 3 months
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.78
Merged pull requests: 34
Bot issues: 0
Bot pull requests: 48

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/amazon-science/generalized-fairness-metrics

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/amazon-science/generalized-fairness-metrics/blob/main/

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels