cbdgen

An Evolutionary Scalable Framework for Synthetic Data Generation based in Data Complexity.

https://github.com/steffanop/cbdgen

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary

Keywords

dataset-generation deap nsga-iii python
Last synced: 6 months ago · JSON representation ·

Repository

An Evolutionary Scalable Framework for Synthetic Data Generation based in Data Complexity.

Basic Info
  • Host: GitHub
  • Owner: SteffanoP
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 493 KB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 13
  • Releases: 3
Topics
dataset-generation deap nsga-iii python
Created over 4 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

Complexity-based Dataset Generation

An Evolutionary Scalable Framework for Synthetic Data Generation based in Data Complexity.

🇬🇧 English - 🇧🇷 Português Brasileiro

cbdgen (Complexity-based Dataset Generation) is a software, currently in development to become a framework, that implements a many-objective algorithm to generate synthetic datasets from characteristics (complexities).

Requirements

Due to the actual state of the framework, a few steps are necessary/optional to run the framework. Here we list the requirements to run this project, as well as a few tutorials:

  1. Install R
  2. Install Python
  3. Python Environment (Optional)
  4. Setup cbdgen

Setting up

Install R packages

It is required ECoL package to correctly calculate data complexity, to do so you can use the following command:

console ./install_packages.r

If you've successfully installed R, this Rscript will work fine, but if you get any error using the R environment, Try Working with ECoL notebook to setup ECoL package with Python.

Install Python dependencies

Let's use pip to install our packages based on our requirements.txt.

console pip install --upgrade pip pip install -r requirements.txt

Now you're ready to Generate Synthetic Data!

Citation

BibTeX @inproceedings{Pereira_A_Many-Objective_Optimization_2022, author = {Pereira, Steffano X. and Miranda, Péricles B. C. and França, Thiago R. F. and Bastos-Filho, Carmelo J. A. and Si, Tapas}, booktitle = {2022 IEEE Latin American Conference on Computational Intelligence (LA-CCI)}, doi = {10.1109/la-cci54402.2022.9981848}, month = {11}, pages = {1--6}, title = {{A Many-Objective Optimization Approach to Generate Synthetic Datasets based on Real-World Classification Problems}}, year = {2022} }

For more details, see CITATION.cff.

References

Lorena, A. C., Garcia, L. P. F., Lehmann, J., Souto, M. C. P., and Ho, T. K. (2019). How Complex Is Your Classification Problem?: A Survey on Measuring Classification Complexity. ACM Computing Surveys (CSUR), 52:1-34.

Owner

  • Name: Steffano Pereira
  • Login: SteffanoP
  • Kind: user
  • Location: Recife - PE
  • Company: @Valcann

Computer Science Student at UFRPE Former Electronic Technician at IFPE Core member @ufrpe-devs and Collaborator @ifpeopensource

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Steffano X."
  given-names: "Pereira"
  affiliation: "Universidade Federal Rural de Pernambuco"
- family-names: "Thiago R."
  given-names: "França"
  affiliation: "Universidade Federal Rural de Pernambuco"
title: "Many-Objective Optimizer Approach for Complexity-based Data set Generation"
version: 0.1.0
date-released: 2022-05-19
url: "https://github.com/SteffanoP/cbdgen-framework"
preferred-citation:
  type: conference-paper
  title: "A Many-Objective Optimization Approach to Generate Synthetic Datasets based on Real-World Classification Problems"
  doi: "10.1109/la-cci54402.2022.9981848"
  year: 2022
  month: 11
  start: 1 # First page number
  end: 6 # Last page number
  collection-title: "2022 IEEE Latin American Conference on Computational Intelligence (LA-CCI)"
  authors:
  - family-names: "Pereira"
    given-names: "Steffano X."
    affiliation: "Federal Rural University of Pernambuco, Recife, Brazil"
  - family-names: "Miranda"
    given-names: "Péricles B. C."
    affiliation: "Federal Rural University of Pernambuco, Recife, Brazil"
  - family-names: "França"
    given-names: "Thiago R. F."
    affiliation: "Federal Rural University of Pernambuco, Recife, Brazil"
  - family-names: "Bastos-Filho"
    given-names: "Carmelo J. A."
    affiliation: "University of Pernambuco, Recife, Brazil"
  - family-names: "Si"
    given-names: "Tapas"
    affiliation: "Bankura Unnayani Institute of Engineering, Bankura, West Bengal, India"

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • deap ==1.3.1
  • matplotlib ==3.4.3
  • numpy ==1.22.0
  • pandas ==1.3.3
  • rpy2 ==3.4.5
  • scikit-learn ==0.24.2