bnbad

optimizer written as data miner

https://github.com/timm/bnbad

Science Score: 51.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.0%) to scientific vocabulary

Keywords

datamining explanation optimization

Keywords from Contributors

data-mining lua
Last synced: 6 months ago · JSON representation ·

Repository

optimizer written as data miner

Basic Info
  • Host: GitHub
  • Owner: timm
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage: http://menzies.us/bnbad
  • Size: 2.88 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
datamining explanation optimization
Created over 5 years ago · Last pushed about 5 years ago
Metadata Files
Readme Contributing License Citation

README.md

BnBAD (break 'n bad):
fast, explicable, multi-objective reasoning

License DOI
Platform Python Style syntax tests

BnBAD is a multi-objective optimizer that reasons by:

  1. breaking up problems into regions of bad and better;
  2. then looks for ways on how to jump between those regions.

:-------: | Ba | Bad <----. planning= (better - bad) | 56 | | monitor = (bad - better) :-------:------: | | B | v | 5 | Better :------:

BnBAD might be a useful choice when:

  • users have to trade-off competing goals,
  • succinct explanations are needed about what the system is doing,
  • those explanations have to include ranges within which it is safe to change the system,
  • guidance is needed for how to improve things (or know what might make things worse);
  • thing being studied is constantly changing so:
    • we have to perpetually check if the current system is still trustworthy
    • and, if not, we need to update our models

Install

Download the repo or the zip from http://github.com/timm/bnbad

In the same directory as the setup.py file....

Install pypy3:

brew install pypy3      # mac os/x
sudo apt install pypy3  # unix

Install support packages into the pypy3 space:

pip3      install termcolor rerun
pip_pypy3 install termcolor rerun # optional.. if you want speed

Install bnbad using setup.py:

python3 setup.py install
pypy3   setup.py install # optional.. if you want speed

Technical Notes:

  • Examples are clustered in goal space and the better cluster is the one that dominates all the other bad clusters.
  • bad and better are score via Zitler's continuous domination predicate
  • Numerics are then broken up into just a few ranges using a bottom-up merging process guided by the ratio of better to bad in each range.
  • These numeric ranges, and the symbolic ranges are then used to build a succinct decision list that can explain what constitutes better behavior. This decision list has many uses:
    • Planning: The deltas in the conditions that lead to the leaves of that decision list can offer guidance on how to change bad to better.
    • Monitoring: The opposite of planning. Learn what can change better to bad, then watch out for those things.
    • Anomaly detection and incremental certification: The current decision list can be trusted as long as new examples fall close to the old examples seen in the leaves of the decision list.
    • Stream mining: Stop learning while the anomaly detector is not triggering. Track the anomalies seen each branch of the decision list. Update just the branches that get too many anomalies (if that ever happens).

Example

Here, we show how a clustered-based analysis can dramatically simpligy multi-objective reasoning.

Here's a data set where the first line names the columns. In that line "$" denotes numerics and ">" and "<" denotes goals we want to maximize or minimize (respectively). Hence:

  • $displacement and $horsepower and $model are numeric;
  • cylinders and origin are symbolic;
  • We want to mimimize _weight_ and maximize _acceleration_

txt cylinders, $displacement, $horsepower, <weight, >acceleration, $model, origin, >mpg 8, 304, 193, 4732, 18.5, 70, 1, 10 8, 360, 215, 4615, 14, 70, 1, 10 8, 307, 200, 4376, 15, 70, 1, 10 8, 318, 210, 4382, 13.5, 70, 1, 10 ... 300+ more rows

Without BnBAD, using classical methods, we might (a) learn one equation for each goal; (b) then use some multi-objective optimizer to explore trade-offs between those equations.

Here is what linear regression tells us:

```txt mpg = -0.6599 * cylinders + -0.016 * displacement + -0.0627 * horsepower + 0.6251 * model + 1.2385 * origin + -12.3701

acceleration = 0.009 * displacement + -0.0712 * horsepower + 21.2507

weight = 62.3829 * cylinders + 5.128 * displacement + 4.3461 * horsepower + 13.836 * model + -49.7531 * origin + 211.281 `` But with BnBAD, we can learn a much, much simpler model. First, we recursively cluster the data based on the three goal scores. For each leaf, we write down the mean goal scores. For example, for the first leaf (labelled[0]`):

  • acceleration, >!mpg
  • is 3452.97, 15.04, 20

txt 398 | 211 | | 121 | | | 57 | | | | 31 [0] {3452.97, 15.04, 20.00}** 26 % | | | | 26 [1] {3506.65, 18.31, 20.00}*** 33 % | | | 64 | | | | 27 [2] {2637.00, 15.10, 20.00}**** 46 % | | | | 37 [3] {2979.89, 16.51, 20.00}**** 40 % | | 90 | | | 48 | | | | 21 [4] {4181.14, 14.19, 11.90}* 13 % | | | | 27 [5] {3838.85, 12.71, 19.26}** 20 % | | | 42 | | | | 21 [6] {4180.90, 11.54, 13.81} 6 % | | | | 21 [7] {4665.00, 11.88, 10.00} 0 % | 187 | | 104 | | | 51 | | | | 30 [8] {2343.73, 16.46, 29.33}****** 66 % | | | | 21 [9] {2514.90, 19.63, 27.14}******* 73 % | | | 53 | | | | 18 [10] {2628.83, 15.37, 24.44}***** 53 % | | | | 35 [11] {2485.60, 14.44, 30.00}****** 60 % | | 83 | | | 49 | | | | 29 [12] {2045.55, 16.74, 30.00}******** 80 % | | | | 20 [13] {1977.90, 17.44, 31.00}******** 86 % | | | 34 [14] {2030.09, 17.05, 40.29}********* 93 %

The last leaf, (labelled [14]) is "best" since it dominates 93% of the other nodes (where "dominates" is a measure of "better" across the goals).

BnBAD reports a decision list that hows how to select this best node from everything else:

txt % best rule if cylinders in 4 .. 4 then {2270.90, 16.66, 29.70} 67 else {3724.51, 14.17, 17.14} 63

Another thing we do is ask what take us from the best to worst leaf (labelled [7]). That decision list is:

txt if cylinders in 8 .. 8 then {4311.71, 11.57, 12.29} 35 else {2267.46, 16.79, 31.92} 26

So what is being said here is that:

  • Eight cylinder cars are heavier and slower.
  • Four cylinder cars and lighter and more nimble.

Obvious, right? But here's the important thing-- the number of cylinders effects everything else. That effect is very clear from the decision lists, but that is not clear from the regression equations.

Classes

For information on the following design, read the docs.

Owner

  • Name: Tim Menzies
  • Login: timm
  • Kind: user
  • Location: Raleigh, North Carolina, USA
  • Company: CS, NC State, USA

IEEE Fellow, prof, phd, computer scientist, ex-nurse, rocketman, taxi-driver, journalist (it all made sense at the time).

Citation (CITATION.md)

<img align=left width=300
src="https://live.staticflickr.com/1070/1430045001_7dd540ff1a_b.jpg">

# Cite as ...

T. Menzies,      
_BnBAD_:
Explicable Multi-objective Optimization_,   
July, 2020

```bibtex
@article{timm:bnbad,
  title     = {BnBAD: Explicable Mulit-objective Optimization},
  DOI       = {10.5281/zenodo.3947026}, 
  author    = {Tim Menzies}, 
  publisher = {Zenodo}, 
  year      = {2020}, 
  month     = {Jul}
}
```

GitHub Events

Total
Last Year

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 255
  • Total Committers: 3
  • Avg Commits per committer: 85.0
  • Development Distribution Score (DDS): 0.106
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Tim Menzies t****s@g****m 228
Tim Menzies t****m@i****g 26
Tim Menzies t****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • autopep8 *
  • flake8 *
  • rerun *
  • termcolor *
setup.py pypi
  • rerun *
  • termcolor *