https://github.com/artefactory/smote_strategies_study

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants (Sakho, Malherbe and Scornet; 2024)

https://github.com/artefactory/smote_strategies_study

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary

Keywords

research-center
Last synced: 10 months ago · JSON representation

Repository

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants (Sakho, Malherbe and Scornet; 2024)

Basic Info
  • Host: GitHub
  • Owner: artefactory
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 364 KB
Statistics
  • Stars: 9
  • Watchers: 0
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
research-center
Created over 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

If you want to use our proposed method MGS, please use the following updated repository : https://github.com/artefactory/mgs-grf

Repository for Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants paper.

In praticular, you will find code to reproduce the paper experiments.

⭐ Table of Contents

⭐ Getting Started

If you want to reproduce our paper experiments: - the notebooks here and here reproduce the experiments - thise code contains implementation the protocols used for the numerical experiments of our article.

In order to use our MGS strategy: - this notebook illustrates how to use it - the strategy is implemented here

⭐ Data sets

The data sets of used for our article should be dowloaded inside the data/externals folder. The data sets are available at the followings adresses :

  • Pima
  • Phoneme : https://github.com/jbrownlee/Datasets/blob/master/phoneme.csv
  • Abalone : https://archive.ics.uci.edu/dataset/1/abalone
  • Wine : https://archive.ics.uci.edu/dataset/186/wine+quality
  • Haberman : https://archive.ics.uci.edu/dataset/43/haberman+s+survival
  • Yeast : https://archive.ics.uci.edu/dataset/110/yeast
  • Vehicle : https://archive.ics.uci.edu/dataset/149/statlog+vehicle+silhouettes
  • Ionosphere : https://archive.ics.uci.edu/dataset/52/ionosphere
  • Breast cancer Wisconsin : https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original
  • CreditCard : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
  • MagicTel : https://www.openml.org/d/44125
  • California : https://www.openml.org/d/44090
  • House_16H : https://openml.org/d/821

⭐ Acknowledgements

This work was done through a partenership between Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.

   

If you find the code usefull, please consider citing us : bib @article{sakho2024we, title={Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants}, author={Sakho, Abdoulaye and Malherbe, Emmanuel and Scornet, Erwan}, journal={arXiv preprint arXiv:2402.03819}, year={2024} }

Owner

  • Name: artefactory
  • Login: artefactory
  • Kind: organization

GitHub Events

Total
  • Issues event: 2
  • Watch event: 4
  • Issue comment event: 6
  • Push event: 18
  • Pull request event: 4
  • Create event: 1
Last Year
  • Issues event: 2
  • Watch event: 4
  • Issue comment event: 6
  • Push event: 18
  • Pull request event: 4
  • Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 2
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 6.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: about 2 months
  • Average time to close pull requests: less than a minute
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 6.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • flippercy (1)
Pull Request Authors
  • VincentAuriau (2)
  • Abdoulaye-SAKHO (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

validation/requirements.txt pypi
  • numpy ==1.24.3
  • pandas ==1.5.3