https://github.com/artefactory/smote_strategies_study
Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants (Sakho, Malherbe and Scornet; 2024)
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary
Keywords
Repository
Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants (Sakho, Malherbe and Scornet; 2024)
Basic Info
Statistics
- Stars: 9
- Watchers: 0
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
If you want to use our proposed method MGS, please use the following updated repository : https://github.com/artefactory/mgs-grf
Repository for Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants paper.
In praticular, you will find code to reproduce the paper experiments.
⭐ Table of Contents
⭐ Getting Started
If you want to reproduce our paper experiments: - the notebooks here and here reproduce the experiments - thise code contains implementation the protocols used for the numerical experiments of our article.
In order to use our MGS strategy: - this notebook illustrates how to use it - the strategy is implemented here
⭐ Data sets
The data sets of used for our article should be dowloaded inside the data/externals folder. The data sets are available at the followings adresses :
- Pima
- Phoneme : https://github.com/jbrownlee/Datasets/blob/master/phoneme.csv
- Abalone : https://archive.ics.uci.edu/dataset/1/abalone
- Wine : https://archive.ics.uci.edu/dataset/186/wine+quality
- Haberman : https://archive.ics.uci.edu/dataset/43/haberman+s+survival
- Yeast : https://archive.ics.uci.edu/dataset/110/yeast
- Vehicle : https://archive.ics.uci.edu/dataset/149/statlog+vehicle+silhouettes
- Ionosphere : https://archive.ics.uci.edu/dataset/52/ionosphere
- Breast cancer Wisconsin : https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original
- CreditCard : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
- MagicTel : https://www.openml.org/d/44125
- California : https://www.openml.org/d/44090
- House_16H : https://openml.org/d/821
⭐ Acknowledgements
This work was done through a partenership between Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.
If you find the code usefull, please consider citing us :
bib
@article{sakho2024we,
title={Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants},
author={Sakho, Abdoulaye and Malherbe, Emmanuel and Scornet, Erwan},
journal={arXiv preprint arXiv:2402.03819},
year={2024}
}
Owner
- Name: artefactory
- Login: artefactory
- Kind: organization
- Repositories: 12
- Profile: https://github.com/artefactory
GitHub Events
Total
- Issues event: 2
- Watch event: 4
- Issue comment event: 6
- Push event: 18
- Pull request event: 4
- Create event: 1
Last Year
- Issues event: 2
- Watch event: 4
- Issue comment event: 6
- Push event: 18
- Pull request event: 4
- Create event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 2
- Average time to close issues: about 2 months
- Average time to close pull requests: about 1 month
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 6.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 1
- Average time to close issues: about 2 months
- Average time to close pull requests: less than a minute
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 6.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- flippercy (1)
Pull Request Authors
- VincentAuriau (2)
- Abdoulaye-SAKHO (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- numpy ==1.24.3
- pandas ==1.5.3