cbdgen
An Evolutionary Scalable Framework for Synthetic Data Generation based in Data Complexity.
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Keywords
Repository
An Evolutionary Scalable Framework for Synthetic Data Generation based in Data Complexity.
Basic Info
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 13
- Releases: 3
Topics
Metadata Files
README.md
Complexity-based Dataset Generation
An Evolutionary Scalable Framework for Synthetic Data Generation based in Data Complexity.
🇬🇧 English - 🇧🇷 Português Brasileiro
cbdgen (Complexity-based Dataset Generation) is a software, currently in development to become a framework, that implements a many-objective algorithm to generate synthetic datasets from characteristics (complexities).
Requirements
Due to the actual state of the framework, a few steps are necessary/optional to run the framework. Here we list the requirements to run this project, as well as a few tutorials:
- Install R
- Install Python
- Python Environment (Optional)
- Setup
cbdgen
Setting up
Install R packages
It is required ECoL package to correctly calculate data complexity, to do so you can use the following command:
console
./install_packages.r
If you've successfully installed R, this Rscript will work fine, but if you get any error using the R environment, Try Working with ECoL notebook to setup
ECoLpackage with Python.
Install Python dependencies
Let's use pip to install our packages based on our requirements.txt.
console
pip install --upgrade pip
pip install -r requirements.txt
Now you're ready to Generate Synthetic Data!
Citation
BibTeX
@inproceedings{Pereira_A_Many-Objective_Optimization_2022,
author = {Pereira, Steffano X. and Miranda, Péricles B. C. and França, Thiago R. F. and Bastos-Filho, Carmelo J. A. and Si, Tapas},
booktitle = {2022 IEEE Latin American Conference on Computational Intelligence (LA-CCI)},
doi = {10.1109/la-cci54402.2022.9981848},
month = {11},
pages = {1--6},
title = {{A Many-Objective Optimization Approach to Generate Synthetic Datasets based on Real-World Classification Problems}},
year = {2022}
}
For more details, see CITATION.cff.
References
Lorena, A. C., Garcia, L. P. F., Lehmann, J., Souto, M. C. P., and Ho, T. K. (2019). How Complex Is Your Classification Problem?: A Survey on Measuring Classification Complexity. ACM Computing Surveys (CSUR), 52:1-34.
Owner
- Name: Steffano Pereira
- Login: SteffanoP
- Kind: user
- Location: Recife - PE
- Company: @Valcann
- Repositories: 6
- Profile: https://github.com/SteffanoP
Computer Science Student at UFRPE Former Electronic Technician at IFPE Core member @ufrpe-devs and Collaborator @ifpeopensource
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Steffano X."
given-names: "Pereira"
affiliation: "Universidade Federal Rural de Pernambuco"
- family-names: "Thiago R."
given-names: "França"
affiliation: "Universidade Federal Rural de Pernambuco"
title: "Many-Objective Optimizer Approach for Complexity-based Data set Generation"
version: 0.1.0
date-released: 2022-05-19
url: "https://github.com/SteffanoP/cbdgen-framework"
preferred-citation:
type: conference-paper
title: "A Many-Objective Optimization Approach to Generate Synthetic Datasets based on Real-World Classification Problems"
doi: "10.1109/la-cci54402.2022.9981848"
year: 2022
month: 11
start: 1 # First page number
end: 6 # Last page number
collection-title: "2022 IEEE Latin American Conference on Computational Intelligence (LA-CCI)"
authors:
- family-names: "Pereira"
given-names: "Steffano X."
affiliation: "Federal Rural University of Pernambuco, Recife, Brazil"
- family-names: "Miranda"
given-names: "Péricles B. C."
affiliation: "Federal Rural University of Pernambuco, Recife, Brazil"
- family-names: "França"
given-names: "Thiago R. F."
affiliation: "Federal Rural University of Pernambuco, Recife, Brazil"
- family-names: "Bastos-Filho"
given-names: "Carmelo J. A."
affiliation: "University of Pernambuco, Recife, Brazil"
- family-names: "Si"
given-names: "Tapas"
affiliation: "Bankura Unnayani Institute of Engineering, Bankura, West Bengal, India"
GitHub Events
Total
Last Year
Dependencies
- deap ==1.3.1
- matplotlib ==3.4.3
- numpy ==1.22.0
- pandas ==1.3.3
- rpy2 ==3.4.5
- scikit-learn ==0.24.2