https://github.com/aailab-uct/feature-scaling-leakge-in-vpd

https://github.com/aailab-uct/feature-scaling-leakge-in-vpd

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: aailab-uct
  • License: agpl-3.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 153 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

Feature Scaling Induced Data Leakage Quantification in Machine-learning Based Voice Pathology Detection

DOI

This repository contains the code for the paper "Feature Scaling Induced Data Leakage Quantification in Machine-learning Based Voice Pathology Detection" by Jan Vrba, Jakub Steinbach, Tomáš Jirsa and Noriyasu Homma. DOI: XYZ

Requirements

Used libraries and software - Python 3.13.2 - see requiretemnts.txt for all dependencies - we recommend using virtual environment and using pip install -r requirements.txt to install all requirements

Used setup for experiments - AMD Ryzen 9 5900X - 112 GB RAM - 1TB SSD hard drive - Ubuntu 24.04.2 LTS

Dataset preparation

The SVD dataset is not included in this repository due to the license reason, but it can be downloaded from publicly available website. Please, follow the instructions in our repository available here.

Once the features.csv file is generated, place it into the data folder, run flatten_features.py to generate the flattened_features.csv file.

For any following work, we assume following directory structure:

vpd_scaling_leakage_study └───data │ features.csv │ flattened_features.csv │ voiced_features_8000_fft.csv

Reproducing the results

After data preparation, run main.py to run the calculations. The results will be saved in the form of json files in four folders named XXX_results for randomly splitted data and XXX_results_stratified for stratified data split. Note that XXX represents the database name.

You can utilize the result_tables.ipynb notebook to generate results in the form of Tables 3 to 7. Similarly, you can utilize the permutation_test_bias.ipynb notebook to conduct the permutation test of statistical significance of bias for each dataset-transformer-model-split combination.

Owner

  • Name: aailab-uct
  • Login: aailab-uct
  • Kind: organization

GitHub Events

Total
  • Release event: 1
  • Push event: 1
  • Create event: 2
Last Year
  • Release event: 1
  • Push event: 1
  • Create event: 2

Dependencies

requirements.txt pypi
  • joblib ==1.5.0
  • numpy ==2.2.5
  • pandas ==2.2.3
  • python-dateutil ==2.9.0.post0
  • pytz ==2025.2
  • scikit-learn ==1.6.1
  • scipy ==1.15.3
  • six ==1.17.0
  • threadpoolctl ==3.6.0
  • tqdm ==4.67.1
  • tzdata ==2025.2