bnbad

optimizer written as data miner

https://github.com/timm/bnbad

Science Score: 51.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 3 committers (33.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary

Keywords

datamining explanation optimization

Keywords from Contributors

data-mining lua

Last synced: 10 months ago · JSON representation ·

Repository

optimizer written as data miner

Basic Info

Host: GitHub
Owner: timm
License: other
Language: Python
Default Branch: master
Homepage: http://menzies.us/bnbad
Size: 2.88 MB

Statistics

Stars: 0
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Topics

datamining explanation optimization

Created almost 6 years ago · Last pushed over 5 years ago

Metadata Files

Readme Contributing License Citation

BnBAD (break 'n bad):
fast, explicable, multi-objective reasoning

BnBAD is a multi-objective optimizer that reasons by:

breaking up problems into regions of bad and better;
then looks for ways on how to jump between those regions.

:-------: | Ba | Bad <----. planning= (better - bad) | 56 | | monitor = (bad - better) :-------:------: | | B | v | 5 | Better :------:

BnBAD might be a useful choice when:

users have to trade-off competing goals,
succinct explanations are needed about what the system is doing,
those explanations have to include ranges within which it is safe to change the system,
guidance is needed for how to improve things (or know what might make things worse);
thing being studied is constantly changing so:
- we have to perpetually check if the current system is still trustworthy
- and, if not, we need to update our models

Install

Download the repo or the zip from http://github.com/timm/bnbad

In the same directory as the setup.py file....

Install pypy3:

brew install pypy3      # mac os/x
sudo apt install pypy3  # unix

Install support packages into the pypy3 space:

pip3      install termcolor rerun
pip_pypy3 install termcolor rerun # optional.. if you want speed

Install bnbad using setup.py:

python3 setup.py install
pypy3   setup.py install # optional.. if you want speed

Technical Notes:

Examples are clustered in goal space and the better cluster is the one that dominates all the other bad clusters.
bad and better are score via Zitler's continuous domination predicate
Numerics are then broken up into just a few ranges using a bottom-up merging process guided by the ratio of better to bad in each range.
These numeric ranges, and the symbolic ranges are then used to build a succinct decision list that can explain what constitutes better behavior. This decision list has many uses:
- Planning: The deltas in the conditions that lead to the leaves of that decision list can offer guidance on how to change bad to better.
- Monitoring: The opposite of planning. Learn what can change better to bad, then watch out for those things.
- Anomaly detection and incremental certification: The current decision list can be trusted as long as new examples fall close to the old examples seen in the leaves of the decision list.
- Stream mining: Stop learning while the anomaly detector is not triggering. Track the anomalies seen each branch of the decision list. Update just the branches that get too many anomalies (if that ever happens).

Example

Here, we show how a clustered-based analysis can dramatically simpligy multi-objective reasoning.

Here's a data set where the first line names the columns. In that line "$" denotes numerics and ">" and "<" denotes goals we want to maximize or minimize (respectively). Hence:

$displacement and $horsepower and $model are numeric;
cylinders and origin are symbolic;
We want to mimimize _weight_ and maximize _acceleration_

txt cylinders, $displacement, $horsepower, <weight, >acceleration, $model, origin, >mpg 8, 304, 193, 4732, 18.5, 70, 1, 10 8, 360, 215, 4615, 14, 70, 1, 10 8, 307, 200, 4376, 15, 70, 1, 10 8, 318, 210, 4382, 13.5, 70, 1, 10 ... 300+ more rows

Without BnBAD, using classical methods, we might (a) learn one equation for each goal; (b) then use some multi-objective optimizer to explore trade-offs between those equations.

Here is what linear regression tells us:

```txt mpg = -0.6599 * cylinders + -0.016 * displacement + -0.0627 * horsepower + 0.6251 * model + 1.2385 * origin + -12.3701

acceleration = 0.009 * displacement + -0.0712 * horsepower + 21.2507

weight = 62.3829 * cylinders + 5.128 * displacement + 4.3461 * horsepower + 13.836 * model + -49.7531 * origin + 211.281 ``But with BnBAD, we can learn a much, much simpler model. First, we recursively cluster the data based on the three goal scores. For each leaf, we write down the mean goal scores. For example, for the first leaf (labelled[0]`):

acceleration, >!mpg
is 3452.97, 15.04, 20

txt 398 | 211 | | 121 | | | 57 | | | | 31 [0] {3452.97, 15.04, 20.00}** 26 % | | | | 26 [1] {3506.65, 18.31, 20.00}*** 33 % | | | 64 | | | | 27 [2] {2637.00, 15.10, 20.00}**** 46 % | | | | 37 [3] {2979.89, 16.51, 20.00}**** 40 % | | 90 | | | 48 | | | | 21 [4] {4181.14, 14.19, 11.90}* 13 % | | | | 27 [5] {3838.85, 12.71, 19.26}** 20 % | | | 42 | | | | 21 [6] {4180.90, 11.54, 13.81} 6 % | | | | 21 [7] {4665.00, 11.88, 10.00} 0 % | 187 | | 104 | | | 51 | | | | 30 [8] {2343.73, 16.46, 29.33}****** 66 % | | | | 21 [9] {2514.90, 19.63, 27.14}******* 73 % | | | 53 | | | | 18 [10] {2628.83, 15.37, 24.44}***** 53 % | | | | 35 [11] {2485.60, 14.44, 30.00}****** 60 % | | 83 | | | 49 | | | | 29 [12] {2045.55, 16.74, 30.00}******** 80 % | | | | 20 [13] {1977.90, 17.44, 31.00}******** 86 % | | | 34 [14] {2030.09, 17.05, 40.29}********* 93 %

The last leaf, (labelled [14]) is "best" since it dominates 93% of the other nodes (where "dominates" is a measure of "better" across the goals).

BnBAD reports a decision list that hows how to select this best node from everything else:

txt % best rule if cylinders in 4 .. 4 then {2270.90, 16.66, 29.70} 67 else {3724.51, 14.17, 17.14} 63

Another thing we do is ask what take us from the best to worst leaf (labelled [7]). That decision list is:

txt if cylinders in 8 .. 8 then {4311.71, 11.57, 12.29} 35 else {2267.46, 16.79, 31.92} 26

So what is being said here is that:

Eight cylinder cars are heavier and slower.
Four cylinder cars and lighter and more nimble.

Obvious, right? But here's the important thing-- the number of cylinders effects everything else. That effect is very clear from the decision lists, but that is not clear from the regression equations.

Classes

For information on the following design, read the docs.

Owner

Name: Tim Menzies
Login: timm
Kind: user
Location: Raleigh, North Carolina, USA
Company: CS, NC State, USA

Website: http://menzies.us
Twitter: timmenzies
Repositories: 202
Profile: https://github.com/timm

IEEE Fellow, prof, phd, computer scientist, ex-nurse, rocketman, taxi-driver, journalist (it all made sense at the time).

Citation (CITATION.md)

<img align=left width=300
src="https://live.staticflickr.com/1070/1430045001_7dd540ff1a_b.jpg">

# Cite as ...

T. Menzies,      
_BnBAD_:
Explicable Multi-objective Optimization_,   
July, 2020

```bibtex
@article{timm:bnbad,
  title     = {BnBAD: Explicable Mulit-objective Optimization},
  DOI       = {10.5281/zenodo.3947026}, 
  author    = {Tim Menzies}, 
  publisher = {Zenodo}, 
  year      = {2020}, 
  month     = {Jul}
}
```

GitHub Events

Total

Last Year

Committers

Last synced: over 2 years ago

All Time

Total Commits: 255
Total Committers: 3
Avg Commits per committer: 85.0
Development Distribution Score (DDS): 0.106

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Tim Menzies	t**s@g**m	228
Tim Menzies	t**m@i**g	26
Tim Menzies	t****m	1

Committer Domains (Top 20 + Academic)

ieee.org: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

autopep8 *
flake8 *
rerun *
termcolor *

setup.py pypi

rerun *
termcolor *

bnbad

Science Score: 51.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

BnBAD (break 'n bad):fast, explicable, multi-objective reasoning

Install

Technical Notes:

Example

Classes

Owner

Citation (CITATION.md)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

BnBAD (break 'n bad):
fast, explicable, multi-objective reasoning