https://github.com/amazon-science/minimax-fair
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.6%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: amazon-science
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 75.2 KB
Statistics
- Stars: 9
- Watchers: 1
- Forks: 7
- Open Issues: 2
- Releases: 0
Created about 5 years ago
· Last pushed about 5 years ago
https://github.com/amazon-science/minimax-fair/blob/main/
# MinimaxFair - Convergent Algorithms for (Relaxed) Minimax Group Fairness
MinimaxFair is a Python package for training ML models for (relaxed) minimax group fairness as discussed in https://arxiv.org/abs/2011.03108
This repository contains python code for
* learning models that achieve minimax group fairness for both regression and classification tasks
* learning models that minimize error subject to relaxed group fairness constraints
* visualizing tradeoffs between fairness and overall error
We also include some examples of fairness sensitive datasets for experimentation, though our package supports any
dataset formatted as a .csv whose columns are labeled
Our algorithms support the following training objectives (loss functions):
* Mean Squared Error
* 0/1 Loss
* Log Loss
* False Positive Rate
* False Negative Rate
Our algorithms support the following model classes:
* Linear Regression
* Logistic Regression
* Paired Regression Classifier from https://github.com/algowatchpenn/GerryFair
* Perceptron
* Multi-Layer Perceptron for Classification - Uses custom wrapper class to work with our algorithm
## Installation
To install the package and prepare for use, run:
```
git clone https://github.com/amazon-research/minimax-fair.git
```
The current version of the package uses the following packages: pandas, numpy, sklearn, matplotlib, pytorch, and
scipy. If you do not have these packages, please run the following command to install them:
```
pip install -r requirements.txt
```
## Using our package
To use our package, we recommend modifying the relevant parameters in `main_driver.py` (which are described in detail
within the file itself) and then running the following in the terminal from the root directory of MinimaxFair:
```
python3 main_driver.py
```
## Datasets
### Downloading pre-configured datasets.
To use one of the pre-configured datasets, you must first download the dataset from the corresponding online repository
, and place it in the `datasets` folder with the specified file name.
We have provided a brief description of each dataset, along with a link to a page from where it can be downloaded
. For some datasets, additional formatting or the running of provided scripts may be required.
#### Descriptions of pre-configured datasets
**Adult** - Predict whether income exceeds $50K/yr based on census data. http://archive.ics.uci.edu/ml/datasets/Adult
Downloading instructions: On the linked webpage click the "Data Folder" link and download `adult.data`. Rename the file to `adult.csv`
and place it in the `datasets` folder in this package. Then, in the `src` folder of this package, run the script
`clean_adult_data.py` which will create the the file `adult_cleaned.csv` which is formatted to be usedwith our pre-configured code.
**Seoul Bike Sharing** - The dataset contains count of public bikes rented at each hour in Seoul Bike sharing System
with
the
corresponding Weather data and Holidays information. https://archive.ics.uci.edu/ml/datasets/Seoul+Bike+Sharing+Demand
Downloading instructions: On the linked webpage click the "Data Folder" link, download `SeoulBikeData.csv`, and place
in the `datasets` folder. If there are issues, make sure that the csv is saved in utf-8 encoding, and delete the "" symbol
in the column headers "Temperature(C)" and "Dew point temperature(C)".
**Communities and Crime** - Communities within the United States. The data combines socio-economic data from the
1990 US
Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. We added a column
for plurality race, which we used to divide communities into groups. https://github.com/algowatchpenn/GerryFair/tree/master/dataset
Downloading instructions: On the provided webpage, download `communities.csv` and place in the `datasets` folder. Then, in the `src` folder of this package, run the script
`clean_communities_data.py` which will create the the file `communities_cleaned.csv` which is formatted to be used
with our pre-configured code.
**COMPAS** - Data from Broward County Clerks Office, Broward County Sherrif's Office, Florida Department of Corrections
, compiled by ProPublica. https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis
Downloading instructions: Go to the linked github, download the file `compas-scores-two-years.csv` and place in the `datasets`
folder.
**Credit** - German Credit Data for predicting good and bad credit risks, compiled by Prof Dr. Hans Hoffman. https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
Downloading instructions: Open "Data Folder" and download `german.data`. Rename to `german.csv` and place in `datasets`
folder. Then, in the `src` folder of this package, run the script
`setup_credit_dataset.py` which will create the the file `german_cleaned.csv` which is formatted to be used
with our pre-configured code.
**Default** - This dataset contains information on default payments, demographic factors, credit data, history of
payment
, and bill statements of credit card clients in Taiwan from April 2005 to September 2005. https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
Downloading instructions: Follow the "Data Folder" link and download `default of credit card clients.xls`. Open the
file in
excel
and delete the top row with X1, X2, etc. Save the file as `default.csv` and place in `datasets` folder.
**Fires** - This is a regression task, where the aim is to predict the burned area of forest fires, in the
northeast region of Portugal, by using meteorological and other data. https://archive.ics.uci.edu/ml/datasets/forest+fires
Downloading instructions: In the Data Folder, download `forestfires.csv` directly into `datasets` folder.
**Heart** - This dataset contains the medical records of 299 patients who had heart failure, collected during their
follow-up period, where each patient profile has 13 clinical features. https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records
Downloading instructions: Download `heart_failure_clinical_records_dataset.csv` into `datasets` folder.
**Bank Marketing** - The data is related with direct marketing campaigns (phone calls) of a Portuguese banking
institution. The classification goal is to predict if the client will subscribe a term deposit (variable y). https://archive.ics.uci.edu/ml/datasets/bank+marketing
Downloading instructions: In the Data Folder, download `bank.zip`, extract the folder, and move the files `bank-full
.csv` and `bank.csv` into the `datasets` folder.
**Student** - Predict student performance in secondary education (high school). https://archive.ics.uci.edu/ml/datasets/student+performance
Downloading instructions: In the Data folder, download `student.zip`, extract, and move the file `student-mat.csv` to the `datasets` folder.
If there are errors, the ";" characters may need to be replaced with ", " to meet the csv format. This can also be done by using the data-to-columns
tool in Excel and using the ";" character as a delimiter to split the first column.
**Wine** - Using chemical analysis determine the origin of wines. https://archive.ics.uci.edu/ml/datasets/wine+quality
Downloading instructions: In the Data Folder, download `winequality-red.data`, and `winequality-white.data` and
replace the `.data` extensions with `.csv` for both files, and place them in `datasets` folder. Then, run
`combine_wine_data.py` which will create a new csv named `winequality-full.csv` in the `datasets` folder which is
ready for use.
### Running the code with pre-configured datasets
To use a pre-configured dataset simply change the value of
`use_preconfigured_dataset` to True and select the relevant dataset index in `main_drive .py`. This selects the
relevant parameter values which can be viewed for each dataset in the file `dataset_mapping.py`.
This setting can also be used to generate customized synthetic data using the dataset index 0. Settings for
synthetic data generation are described in more detail in `main_driver.py`.
### Using custom datasets
Our algorithm can be used with any dataset that is in the form of csv files with labeled columns. To use a custom
dataset, do the following:
1) Place the csv file for the dataset in the `datasets` folder which is located in the root directory of the package
2) Specify the relevant parameters for interpreting the dataset in the section starting on line 63 of `main_driver
.py`
* Relevant parameters include target label, column labels for group categories, usable features, etc.
3) Ensure the other parameters in `main_driver.py` have the desired value and run the file from the command line as
described above
## License
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License").
You may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
## Citation
If you publish material that uses this code, you can use the following citation:
```js
@inproceedings{diana2021minimax,
title={Minimax Group Fairness: Algorithms and Experiments},
author={Diana, Emily and Gill, Wesley and Kearns, Michael and Kenthapadi, Krishnaram and Roth, Aaron},
year = {2021},
booktitle={Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society}
}
```
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 2
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 14 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- zahidirfan (2)
Pull Request Authors
- jimmyren23 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
requirements.txt
pypi
- matplotlib *
- numpy *
- pandas *
- scipy *
- sklearn *
- torch *