edo

A library for generating artificial datasets through genetic evolution.

https://github.com/daffidwilde/edo

Science Score: 41.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.3%) to scientific vocabulary

Keywords

data-generation evolutionary-algorithms optimisation

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 62% confidence

Last synced: 10 months ago · JSON representation ·

Repository

A library for generating artificial datasets through genetic evolution.

Basic Info

Host: GitHub
Owner: daffidwilde
License: mit
Language: Python
Default Branch: main
Homepage: https://doi.org/10.1007/s10489-019-01592-4
Size: 8.24 MB

Statistics

Stars: 13
Watchers: 2
Forks: 0
Open Issues: 7
Releases: 0

Topics

data-generation evolutionary-algorithms optimisation

Created about 8 years ago · Last pushed over 5 years ago

Metadata Files

Readme Changelog Contributing License Citation

README.rst

.. image:: https://img.shields.io/pypi/v/edo.svg
   :target: https://pypi.org/project/edo/

.. image:: https://github.com/daffidwilde/edo/workflows/CI/badge.svg
   :target: https://github.com/daffidwilde/edo/actions?query=workflow%3ACI+branch%3Amain

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/ambv/black

Evolutionary Dataset Optimisation
*********************************

A library for generating artificial datasets through evolution.
===============================================================

The ``edo`` library provides an evolutionary algorithm that optimises any
real-valued function over a subset of the space of all possible datasets that we
call `Evolutionary Dataset Optimisation`. The output of the algorithm is a bank
of effective datasets for which the provided function performs well that can
then be studied.

The applications of this method are varied but an important and relevant one is
in learning an algorithm's strengths and weaknesses.

When determining the quality of an algorithm, the standard route is to run the
comparable algorithms on a finite set of existing (or newly simulated) datasets
and calculating some metric. The algorithm(s) with the smallest value of this
metric are chosen to be the best performing.

An issue with this approach is that it pays little regard to the reliability
and quality of the datasets being used, which begs the question: what makes
a dataset "good" for an algorithm? Or, why is it that an algorithm performs well
on some datasets but not others?

By passing the objective function of the algorithm to the ``edo.DataOptimiser``
class, questions like these can be answered by studying the properties of the
resultant datasets. Beyond that, a combination of objective functions could be
used to determine how an algorithm performs against any number of other
algorithms. A comprehensive description of the evolutionary algorithm and an
examplar case study is available at https://doi.org/10.1007/s10489-019-01592-4.

Installation
============

The ``edo`` library requires Python 3.6+ and is ``pip``-installable::

    $ python -m pip install edo

To install from source then clone the GitHub repo::

    $ git clone https://github.com/daffidwilde/edo.git
    $ cd edo
    $ python setup.py install

A command line tool has been developed to make using ``edo`` for larger
experiments easier: https://github.com/daffidwilde/edolab

Publications and documentation
==============================

Full documentation for the library is available at https://edo.readthedocs.io.

An article on the theory behind the algorithm has been published:

    Wilde, H., Knight, V. & Gillard, J. Evolutionary dataset optimisation:
    learning algorithm quality through evolution. *Appl Intell* **50**,
    1172-1191 (2020). https://doi.org/10.1007/s10489-019-01592-4

Citation instructions
=====================

Citing the library
------------------

Please use the following to cite the library::

    @misc{edo-library,
        author = {{The EDO library developers}},
        title = {edo: },
        year = ,
        doi = {},
        url = {http://doi.org/}
    }

To check the relevant details (i.e. ``RELEASE TITLE``, ``RELEASE YEAR`` and
``DOI NUMBER``) head to the library's Zenodo page:

.. image:: https://zenodo.org/badge/139703799.svg
   :target: https://zenodo.org/badge/latestdoi/139703799

Citing the paper
----------------

If you wish to cite the paper, then use the following::

    @article{edo-paper,
        title = {Evolutionary dataset optimisation: learning algorithm quality
                 through evolution},
        author = {Wilde, Henry and Knight, Vincent and Gillard, Jonathan},
        journal = {Applied Intelligence},
        year = 2020,
        volume = 50,
        pages = {1172--1191},
        doi = {10.1007/s10489-019-01592-4},
    }

Contributing to the library
===========================

Contributions are always welcome whether they come in the form of providing a
fix for a current `issue `_,
reporting a bug or implementing an enhancement to the library code itself. Pull
requests (PRs) will be reviewed and collaboration is encouraged.

To make a contribution via a PR, follow these steps:

1. Make a fork of the `GitHub repo `_ and
   clone your fork locally::

        $ git clone https://github.com//edo.git

2. Install the library in development mode. If you use Anaconda, there is a
   ``conda`` environment file (``environment.yml``) with all of the development
   dependencies::

        $ cd edo
        $ conda env create -f environment.yml
        $ conda activate edo-dev
        $ python setup.py develop

3. Make your changes and write tests to go with them. Ensure that they pass and
   you have 100% coverage::
   
        $ python -m pytest --cov=edo --cov-fail-under=100 tests

4. Push to your fork and open a pull request.

Owner

Name: Henry Wilde
Login: daffidwilde
Kind: user
Location: Cardiff, UK
Company: Dŵr Cymru Welsh Water

Repositories: 29
Profile: https://github.com/daffidwilde

Data scientist and advocate for open-source, sustainably developed software 🛸 🐐 🦆

Citation (CITATION.rst)

Citation instructions
=====================

Citing the library
------------------

Please use the following to cite the library::

    @misc{edo-library,
        author = {{The EDO library developers}},
        title = {edo: <RELEASE TITLE>},
        year = <RELEASE YEAR>,
        doi = {<DOI INFORMATION>},
        url = {http://doi.org/<DOI INFORMATION>}
    }

To check the relevant details (i.e. ``RELEASE TITLE``, ``RELEASE YEAR`` and
``DOI NUMBER``) head to the library's Zenodo page:

.. image:: https://zenodo.org/badge/139703799.svg
   :target: https://zenodo.org/badge/latestdoi/139703799

Citing the paper
----------------

If you wish to cite the paper, then use the following::

    @article{edo-paper,
        title = {Evolutionary dataset optimisation: learning algorithm quality
                 through evolution},
        author = {Wilde, Henry and Knight, Vincent and Gillard, Jonathan},
        journal = {Applied Intelligence},
        year = 2020,
        volume = 50,
        pages = {1172--1191},
        doi = {10.1007/s10489-019-01592-4},
    }

GitHub Events

Total

Last Year

Committers

Last synced: over 2 years ago

All Time

Total Commits: 207
Total Committers: 2
Avg Commits per committer: 103.5
Development Distribution Score (DDS): 0.005

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Henry Wilde	h**e@g**m	206
Vince Knight	v**t@g**m	1

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 16
Total pull requests: 84
Average time to close issues: 6 months
Average time to close pull requests: about 8 hours
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 1.0
Average comments per pull request: 0.5
Merged pull requests: 83
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

daffidwilde (15)
drvinceknight (1)

Pull Request Authors

daffidwilde (84)

Top Labels

Issue Labels

refactor (2) enhancement (2)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 37 last-month

Total dependent packages: 0
Total dependent repositories: 2
Total versions: 15
Total maintainers: 1

pypi.org: edo

Generating artificial datasets through evolution.

Homepage: https://github.com/daffidwilde/edo
Documentation: https://edo.readthedocs.io/
License: MIT
Latest release: 0.3.6
published over 5 years ago

Versions: 15
Dependent Packages: 0
Dependent Repositories: 2
Downloads: 37 Last month

Rankings

Dependent packages count: 10.1%

Dependent repos count: 11.5%

Stargazers count: 15.6%

Average: 17.0%

Forks count: 22.7%

Downloads: 25.3%

Maintainers (1)

daffidwilde

Last synced: 10 months ago

Dependencies

docs/environment.yml conda

ipykernel
pip
python >=3.6

docs/requirements.txt pypi

blackbook *
ipython >=7.6
matplotlib >=2.2
nbsphinx *
nbval *
numpydoc *
pytest ==5.4.3
scikit-learn *
sphinx *
sphinx_rtd_theme *
sphinxcontrib-bibtex <2.0.0

environment.yml pypi

black *
flake8 *
hypothesis ==5.20.3
isort *

requirements.txt pypi

dask ==2.30.0
numpy *
pandas *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

edo

Science Score: 41.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.rst

Owner

Citation (CITATION.rst)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: edo

Rankings

Maintainers (1)

Dependencies