https://github.com/aailab-uct/feature-scaling-leakge-in-vpd
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: aailab-uct
- License: agpl-3.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 153 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Feature Scaling Induced Data Leakage Quantification in Machine-learning Based Voice Pathology Detection
This repository contains the code for the paper "Feature Scaling Induced Data Leakage Quantification in Machine-learning Based Voice Pathology Detection" by Jan Vrba, Jakub Steinbach, Tomáš Jirsa and Noriyasu Homma. DOI: XYZ
Requirements
Used libraries and software
- Python 3.13.2
- see requiretemnts.txt for all dependencies
- we recommend using virtual environment and using pip install -r requirements.txt to install all requirements
Used setup for experiments - AMD Ryzen 9 5900X - 112 GB RAM - 1TB SSD hard drive - Ubuntu 24.04.2 LTS
Dataset preparation
The SVD dataset is not included in this repository due to the license reason, but it can be downloaded from publicly available website. Please, follow the instructions in our repository available here.
Once the features.csv file is generated, place it into the data folder, run flatten_features.py to generate the flattened_features.csv file.
For any following work, we assume following directory structure:
vpd_scaling_leakage_study
└───data
│ features.csv
│ flattened_features.csv
│ voiced_features_8000_fft.csv
Reproducing the results
After data preparation, run main.py to run the calculations. The results will be saved in the form of json files in four folders named XXX_results for randomly splitted data and XXX_results_stratified for stratified data split. Note that XXX represents the database name.
You can utilize the result_tables.ipynb notebook to generate results in the form of Tables 3 to 7. Similarly, you can utilize the permutation_test_bias.ipynb notebook to conduct the permutation test of statistical significance of bias for each dataset-transformer-model-split combination.
Owner
- Name: aailab-uct
- Login: aailab-uct
- Kind: organization
- Repositories: 1
- Profile: https://github.com/aailab-uct
GitHub Events
Total
- Release event: 1
- Push event: 1
- Create event: 2
Last Year
- Release event: 1
- Push event: 1
- Create event: 2
Dependencies
- joblib ==1.5.0
- numpy ==2.2.5
- pandas ==2.2.3
- python-dateutil ==2.9.0.post0
- pytz ==2025.2
- scikit-learn ==1.6.1
- scipy ==1.15.3
- six ==1.17.0
- threadpoolctl ==3.6.0
- tqdm ==4.67.1
- tzdata ==2025.2