Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: fangdai-dear
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 641 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 1
  • Open Issues: 1
  • Releases: 1
Created almost 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

readme.md

Quasi-Pareto Improvement

standard-readme compliant

Enhancing the Generalizability and Fairness of Ultrasonographical AI Model among Heterogeneous Thyroid Nodule Population by a Novel Quasi-Pareto Improvement

Siqiong Yao, Fang Dai, Peng Sun, Weituo Zhang, Biyun Qian, Hui Lu

Abstract

Artificial Intelligence (AI) models for medical diagnosis often face challenges of generalizability and fairness. We highlighted the algorithmic unfairness in a large thyroid ultrasound dataset with significant diagnostic performance disparities across subgroups linked causally to sample size imbalances. To address this, we introduced the Quasi-Pareto Improvement (QPI) approach and a deep learning implementation (QP-Net) combining multi-task learning and domain adaptation to improve model performance among disadvantaged subgroups without compromising overall population performance. On the thyroid ultrasound dataset, our method significantly mitigated the area under curve (AUC) disparity for three less-prevalent subgroups by 0.213, 0.112, and 0.173 while maintaining the AUC for dominant subgroups; we also further confirmed the generalizability of our approach on two public datasets: the ISIC2019 skin disease dataset and the CheXpert chest radiograph dataset. Here we show the QPI approach to be widely applicable in promoting AI for equitable healthcare outcomes.

For details, see[Nature Communications Paper].

https://github.com/fangdai-dear/QuasiParetoImprovement/scripts/Figure/figure.png

This repository contains:

  1. This is a code for the work being submitted, we provide only a brief description
  2. This includes model structure, training code

Table of Contents

Model architecture

figure2

Install

This project uses requirements.txt.

sh $ pip install -r requirements.txt

Datasets

  1. We have shared part of the thyroid ultrasound dataset for verification. Please refer to this article for other studies using this dataset. If you use this dataset in your research, please cite the following references: A portion of the data from this article is publicly available on Huggingface (https://huggingface.co/datasets/FangDai/ThyroidUltrasoundImages. To download this dataset, you must register on Hugging Face and sign our data usage application before gaining access. image4

Please read the following information for data usage permissions and the conditions for accessing the full dataset. All data that fueled the findings can be found within the article and the Supplementary Information. The Thyroid datasets trained and analyzed during this study are available in a deidentified form to protect patient privacy. The minimum Thyroid dataset required to interpret, verify, and extend the findings of this study has been deposited in Huggingface under accession code https://huggingface.co/datasets/FangDai/Thyroid_Ultrasound_Images. This includes: - Pre-processed imaging data (ultrasound images with anonymized metadata). - Clinical feature tables (age, gender, tumor size) with all direct identifiers removed. Due to ethical restrictions and patient confidentiality agreements, the full dataset (e.g., raw imaging data, detailed clinical records) cannot be made publicly available. This pertains to detailed clinical records and high-resolution imaging data that, even after de-identification, may pose a risk of re-identification given the unique characteristics of thyroid cancer cases. Researchers who wish to access additional data for non-commercial academic purposes may submit a formal request to the corresponding author. Requests will be reviewed by the institutional ethics committee and data custodians. The following conditions apply: - Purpose: Data will only be shared for research purposes that align with the original study objectives. - Access Restrictions: Requesters must sign a data use agreement prohibiting re-identification or redistribution. - Data Retention: Approved data will be available for 2 years from the date of publication. This dataset contains 900 thyroid ultrasound images, categorized into three subtypes of thyroid carcinoma: - PTC (Papillary Thyroid Carcinoma) - FTC (Follicular Thyroid Carcinoma) - MTC (Medullary Thyroid Carcinoma)

The dataset is curated to support medical image classification and segmentation tasks, particularly for deep learning applications in thyroid cancer diagnosis.

It is curated to support medical image classification, particularly for AI applications in thyroid cancer diagnosis.

Citation

bibtex @article{yao2024enhancing, title={Enhancing the fairness of AI prediction models by Quasi-Pareto improvement among heterogeneous thyroid nodule population}, author={Yao, Siqiong and Dai, Fang and Sun, Peng and Zhang, Weituo and Qian, Biyun and Lu, Hui}, journal={Nature Communications}, volume={15}, number={1}, pages={1958}, year={2024}, publisher={Nature Publishing Group UK London} } 2. MICCAI 2020 TN-SCUI ultrasound image dataset (This study took into account the clinical significance of the contest and segmented according to the data segmentation style of the contest) sh ├─Thyroid └─TNS ├─test │ ├─0 │ └─1 ├─train │ ├─0 │ └─1 3. Chexpert chest radiograph multi-classification dataset sh ├─CheXpert-v1.0 │ ├─train │ │ └─patient00001 │ │ └─study1 │ │ view1_frontal.jpg │ │ │ └─valid │ └─patient64541 │ └─study1 │ view1_frontal.jpg 4. ISIC2019 skin disease multi-classification dataset sh ├─ISIC │ ├─ISIC_2018 │ │ ISIC_0024306.jpg │ │ │ └─ISIC_2019 │ ISIC_0000000.jpg │ Partial thyroid ultrasonography data used in this study are subject to privacy restrictions, but may be anonymized and made available upon reasonable request to the corresponding author.

Usage

This

sh $ sh ./main.sh sh ├─CSV │ CXP_female_age.csv │ CXP_female_race.csv │ CXP_male_age.csv │ CXP_male_race.csv │ CXP_test_age.csv │ CXP_train_age.csv │ CXP_train_race.csv │ CXP_valid_race.csv │ ISIC_2019_Test.csv │ ISIC_2019_Training_age.csv │ ISIC_2019_Training_sex.csv │ ISIC_2019_valid.csv

Citing

If you use our code and any information in your research, please consider citing with the following BibTex. bibtex @article{yao2024enhancing, title={Enhancing the fairness of AI prediction models by Quasi-Pareto improvement among heterogeneous thyroid nodule population}, author={Yao, Siqiong and Dai, Fang and Sun, Peng and Zhang, Weituo and Qian, Biyun and Lu, Hui}, journal={Nature Communications}, volume={15}, number={1}, pages={1958}, year={2024}, publisher={Nature Publishing Group UK London} }

Reference

All references are listed in the article

Licence

Licence

Owner

  • Name: fangdai
  • Login: fangdai-dear
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this code, please cite it as below."
authors:
- family-names: Dai
  given-names: Fang
orcid: https://orcid.org/0009-0007-0727-5655
title: Enhancing the Fairness of AI among Heterogeneous Thyroid Nodule Population by Quasi-Pareto Improvement
version: v1.0.0
date-released: 2023-11-28

GitHub Events

Total
  • Push event: 14
Last Year
  • Push event: 14

Dependencies

requirements.txt pypi
  • Pillow ==8.4.0
  • PyJWT ==2.6.0
  • PyWavelets ==1.3.0
  • Pygments ==2.11.2
  • SimpleITK ==2.2.1
  • about-time ==3.1.1
  • aioboto3 ==10.4.0
  • aiobotocore ==2.4.2
  • aiohttp ==3.8.4
  • aioitertools ==0.11.0
  • aiosignal ==1.3.1
  • alive-progress ==2.4.1
  • asttokens ==2.0.5
  • async-timeout ==4.0.2
  • attrs ==22.2.0
  • backcall ==0.2.0
  • boto3 ==1.24.59
  • botocore ==1.27.59
  • certifi ==2022.9.24
  • charset-normalizer ==2.0.12
  • click ==8.1.3
  • cycler ==0.11.0
  • decorator ==4.4.2
  • deeplake ==3.2.21
  • dill ==0.3.6
  • efficientnet-pytorch ==0.7.1
  • entrypoints ==0.4
  • et-xmlfile ==1.1.0
  • executing ==0.8.3
  • fastcluster ==1.2.6
  • filetype ==1.2.0
  • fonttools ==4.28.5
  • frozenlist ==1.3.3
  • grapheme ==0.6.0
  • humbug ==0.3.1
  • idna ==3.3
  • imageio ==2.18.0
  • imageio-ffmpeg ==0.4.7
  • ipython ==8.1.1
  • jedi ==0.18.1
  • jmespath ==1.0.1
  • joblib ==1.1.0
  • kiwisolver ==1.3.2
  • matplotlib ==3.5.1
  • matplotlib-inline ==0.1.3
  • moviepy ==1.0.3
  • multidict ==6.0.4
  • multiprocess ==0.70.14
  • nest-asyncio ==1.5.6
  • networkx ==2.8
  • numcodecs ==0.11.0
  • opencv-python ==4.5.5.62
  • openpyxl ==3.0.10
  • packaging ==21.3
  • pandas ==1.3.5
  • parso ==0.8.3
  • pathos ==0.3.0
  • patsy ==0.5.3
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • plotly ==5.11.0
  • plotly-express ==0.4.1
  • pox ==0.3.2
  • ppft ==1.7.6.6
  • prettytable ==3.7.0
  • proglog ==0.1.10
  • prompt-toolkit ==3.0.28
  • protobuf ==3.19.4
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • pynvml ==11.5.0
  • pyparsing ==3.0.6
  • python-dateutil ==2.8.2
  • pytz ==2021.3
  • requests ==2.27.1
  • s3transfer ==0.6.0
  • scikit-image ==0.19.2
  • scikit-learn ==1.0.2
  • scipy ==1.8.0
  • seaborn ==0.11.2
  • sklearn ==0.0
  • stack-data ==0.2.0
  • statsmodels ==0.13.2
  • tabulate ==0.9.0
  • tenacity ==8.1.0
  • tensorboardX ==2.4.1
  • threadpoolctl ==3.1.0
  • tifffile ==2022.4.22
  • torch ==1.10.0
  • torch-cka ==0.21
  • torchaudio ==0.10.0
  • torchsummary ==1.5.1
  • torchvision ==0.11.0
  • tqdm ==4.63.1
  • traitlets ==5.1.1
  • typing_extensions ==4.5.0
  • unzip ==1.0.0
  • urllib3 ==1.26.9
  • wcwidth ==0.2.5
  • wrapt ==1.15.0
  • yarl ==1.8.2
  • zepid ==0.9.1
.github/workflows/restrict-folder-access.yml actions