dfa_identify

Python library for identifying/learning DFAs from labeled examples by reduction to SAT.

https://github.com/mvcisback/dfa-identify

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, springer.com
  • Committers with academic emails
    2 of 3 committers (66.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary

Keywords

automata dfa learning sat
Last synced: 6 months ago · JSON representation ·

Repository

Python library for identifying/learning DFAs from labeled examples by reduction to SAT.

Basic Info
  • Host: GitHub
  • Owner: mvcisback
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 275 KB
Statistics
  • Stars: 8
  • Watchers: 3
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Topics
automata dfa learning sat
Created almost 5 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

dfa-identify

Python library for identifying (learning) minimal DFAs from labeled examples by reduction to SAT.

Build Status PyPI version License: MIT

Table of Contents

Installation

If you just need to use dfa, you can just run:

$ pip install dfa

For developers, note that this project uses the poetry python package/dependency management tool. Please familarize yourself with it and then run:

$ poetry install

Usage

dfa_identify is centered around the find_dfa and find_dfas function. Both take in sequences of accepting and rejecting "words", where are word is a sequence of arbitrary python objects.

  1. find_dfas returns all minimally sized (no DFAs exist of size smaller) consistent with the given labeled data.

  2. find_dfa returns an arbitrary (first) minimally sized DFA.

The returned DFA object is from the dfa library.

```python from dfaidentify import finddfa

accepting = ['a', 'abaa', 'bb'] rejecting = ['abb', 'b']

mydfa = finddfa(accepting=accepting, rejecting=rejecting)

assert all(mydfa.label(x) for x in accepting) assert all(not mydfa.label(x) for x in rejecting) ```

Because words are sequences of arbitrary python objects, the identification problem, with a ↦ 0 and b ↦ 1, is given below:

```python accepting = [[0], [0, 'z', 0, 0], ['z', 'z']] rejecting = [[0, 'z', 'z'], ['z']]

mydfa = finddfa(accepting=accepting, rejecting=rejecting) ```

Active learning

There are also active variants of find_dfa and find_dfas called find_dfa_active and find_dfas_active resp.

An example from the unit tests:

```python import dfa from dfaidentify import finddfa_active

def oracle(word): return sum(word) % 2 == 0

lang = finddfaactive(alphabet=[0, 1], oracle=oracle, n_queries=20) assert lang == dfa.DFA( inputs=[0,1], label=lambda s: s, transition=lambda s, c: s ^ bool(c), start=True )

Can also send in positive and negative examples:

lang = finddfaactive(alphabet=[0, 1], positive=[(0,), (0,0)], negative=[(1,), (1,0)], oracle=oracle, n_queries=20)

```

Learning Decomposed DFAs

The is also support for learning decomposed DFAs following, Learning Deterministic Finite Automata Decompositions from Examples and Demonstrations. FMCAD`22. These are tuples of DFAs whose labels are combined to determine the acceptence / rejection of string, e.g., via conjunction or disjunction.

Similar to learning a monolithic dfa, this functionality can be accessed using find_decomposed_dfas. For example:

```python from dfaidentify import finddecomposed_dfas accepting = ['y', 'yy', 'gy', 'bgy', 'bbgy', 'bggy'] rejecting = ['', 'r', 'ry', 'by', 'yr', 'gr', 'rr', 'rry', 'rygy']

--------------------------------------

1. Learn a disjunctive decomposition.

--------------------------------------

gendfas = finddecomposeddfas(accepting=accepting, rejecting=rejecting, ndfas=2, orderbystutter=True, decomposevia="disjunction") dfas = next(gendfas)

monolithic = dfas[0] | dfas[1] # Build DFA that is the union of languages. assert all(monolithic.label(w) for w in accepting) assert not any(monolithic.label(w) for w in rejecting)

Each dfa must reject a rejecting string.

assert all(all(~d.label(w) for d in dfas) for w in rejecting)

At least one dfa must accept an accepting string.

assert all(any(d.label(w) for d in dfas) for w in accepting)

--------------------------------------

2. Learn a conjunctive decomposition.

--------------------------------------

gendfas = finddecomposeddfas(accepting=accepting, rejecting=rejecting, ndfas=2, orderbystutter=True, decomposevia="conjunction") dfas = next(gendfas)

monolithic = dfas[0] & dfas[1] # Build DFA that is the union of languages. assert all(monolithic.label(w) for w in accepting) assert not any(monolithic.label(w) for w in rejecting)

```

Minimality

There are two forms of "minimality" supported by dfa-identify.

  1. By default, dfa-identify returns DFAs that have the minimum number of states required to seperate the accepting and rejecting set.
  2. If the order_by_stutter flag is set to True, then the find_dfas (lazily) orders the DFAs so that the number of self loops (stuttering transitions) appearing the DFAs decreases. find_dfa thus returns a DFA with the most number of self loops given the minimal number of states.

Encoding

This library currently uses the encodings outlined in Heule, Marijn JH, and Sicco Verwer. "Exact DFA identification using SAT solvers." International Colloquium on Grammatical Inference. Springer, Berlin, Heidelberg, 2010. and Ulyantsev, Vladimir, Ilya Zakirzyanov, and Anatoly Shalyto. "Symmetry Breaking Predicates for SAT-based DFA Identification.".

The key difference is in the use of the symmetry breaking clauses. Two kinds are exposed.

  1. clique (Heule 2010): Partially breaks symmetries by analyzing conflict graph.
  2. bfs (Ulyantsev 2016): Breaks all symmetries so that each model corresponds to a unique DFA.

Goals and related libraries

There are many other python libraries that perform DFA and other automata inference.

  1. DFA-Inductor-py - State of the art passive inference via reduction to SAT (as of 2019).
  2. z3gi: Uses SMT backed passive learning algorithm.
  3. lstar: Active learning algorithm based L* derivative.

The primary goal of this library is to loosely track the state of the art in passive SAT based inference while providing a simple implementation and API.

Owner

  • Name: Marcell Vazquez-Chanlatte
  • Login: mvcisback
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Vazquez-Chanlatte"
  given-names: "Marcell"
  orcid: "https://orcid.org/0000-0002-1248-0000"
- family-names: "Lee"
  given-names: "Vint"
- family-names: "Shah"
  given-names: "Ameesh"
title: "dfa-identify"
version: 1
date-released: 2021-08-02
url: "https://github.com/mvcisback/dfa-identify"

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 113
  • Total Committers: 3
  • Avg Commits per committer: 37.667
  • Development Distribution Score (DDS): 0.142
Top Committers
Name Email Commits
Marcell Vazquez-Chanlatte m****c@l****m 97
vint v****t@b****u 10
ameesh a****7@r****u 6
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 13
  • Average time to close issues: 10 days
  • Average time to close pull requests: about 2 months
  • Total issue authors: 1
  • Total pull request authors: 4
  • Average comments per issue: 7.0
  • Average comments per pull request: 0.15
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • alexeyignatiev (1)
Pull Request Authors
  • mvcisback (5)
  • vint-1 (4)
  • niklaslauffer (3)
  • ameesh-shah (3)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 184 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 34
  • Total maintainers: 1
pypi.org: dfa_identify

Python library for identifying (learning) DFAs (automata) from labeled examples.

  • Versions: 34
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 184 Last month
Rankings
Dependent packages count: 9.8%
Downloads: 13.1%
Average: 29.2%
Dependent repos count: 64.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

poetry.lock pypi
  • flake8 4.0.1 develop
  • mccabe 0.6.1 develop
  • pycodestyle 2.8.0 develop
  • pydot 1.4.2 develop
  • pyflakes 2.4.0 develop
  • pytest-flake8 1.0.7 develop
  • atomicwrites 1.4.0
  • attrs 21.2.0
  • bidict 0.21.4
  • colorama 0.4.4
  • dfa 4.0.0
  • execnet 1.9.0
  • funcy 1.16
  • iniconfig 1.1.1
  • lazytree 0.3.3
  • more-itertools 8.12.0
  • networkx 2.6.3
  • packaging 21.0
  • pluggy 1.0.0
  • py 1.11.0
  • pyparsing 3.0.4
  • pytest 6.2.5
  • pytest-forked 1.3.0
  • pytest-xdist 2.4.0
  • python-sat 0.1.7.dev11
  • six 1.16.0
  • toml 0.10.2
pyproject.toml pypi
  • pydot ^1.4.2 develop
  • pytest ^6.2.3 develop
  • pytest-flake8 ^1.0.7 develop
  • attrs ^21.0.0
  • bidict ^0.21.2
  • dfa ^4
  • funcy ^1.15
  • more-itertools ^8.12.0
  • networkx ^2.6
  • python ^3.9
  • python-sat ^0.1.7.dev11