DeepSynth

DeepSynth: Scaling Neural Program Synthesis with Distribution-based Search - Published in JOSS (2022)

https://github.com/nathanael-fijalkow/deepsynth

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Mathematics Computer Science - 84% confidence
Engineering Computer Science - 60% confidence
Artificial Intelligence and Machine Learning Computer Science - 60% confidence
Last synced: 4 months ago · JSON representation

Repository

General-purpose program synthesiser

Basic Info
  • Host: GitHub
  • Owner: nathanael-fijalkow
  • License: mit
  • Language: Slash
  • Default Branch: main
  • Size: 138 MB
Statistics
  • Stars: 47
  • Watchers: 6
  • Forks: 9
  • Open Issues: 0
  • Releases: 1
Created over 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

DeepSynth

DeepSynth is a general-purpose program synthesizer in the programming by example framework: the user provides a few examples as pairs of input and output, DeepSynth finds a program matching the examples.

See the official webpage

This is the repository for the code of the paper "Scaling Neural Program Synthesis with Distribution-based Search" published in the conference proceedings of the AAAI Conference on Artificial Intelligence, AAAI'22 and selected for Oral Presentation.

The code was published in the Journal of Open Source Software.

Authors: Nathanaël Fijalkow, Guillaume Lagarde, Théo Matricon, Kevin Ellis, Pierre Ohlmann, Akarsh Potta

Overview

DeepSynth is a tool for automatically synthesizing programs from examples. It combines machine learning predictions with efficient enumeration techniques in a very generic way.

The following figure shows the pipeline. Figure

  • The first step is to define a domain-specific language (DSL), which is the programming language specifically designed to solve particular tasks.
  • The second step is a compilation step: a context-free grammar (CFG) describing the set of all programs is compiled from the DSL and a number of syntactic constraints. The grammar is used to enumerate programs in an efficient way. However the number of programs grows extremely fast with size, making program synthesis very hard computationally. We believe that the path to scalability is to leverage machine learning predictions in combination with efficient enumeration techniques.
  • The third step is to obtain predictions from the examples: a prediction model outputs predictions in the form of probabilities for the rules of the grammar, yielding a probabilistic context-free grammar (PCFG).
  • The fourth and final step is the search: enumerating programs from the PCFG. We introduced the distribution-based search as a theoretical framework to analyse algorithms, and constructed two new algorithms: HeapSearch and SQRT Sampling.

Usage

Installation

```bash

clone this repository

git clone https://github.com/nathanael-fijalkow/DeepSynth.git

create your new env

conda create -n deep_synth "python>=3.7"

activate it

conda activate deep_synth

install pip

yes | conda install pip

install this package and the dependencies

pip install -r requirements.txt

or to do it manually

conda install -c conda-forge cython tqdm numpy matplotlib scipy conda install -c pytorch "pytorch>=1.8" pip install git+https://github.com/MaxHalford/vose

For flashfill dataset

pip install sexpdata

If you want to do the parallel experiments

pip install ray

If you run in an ValueError: numpy.ufunc size changed

pip install --upgrade numpy

You are good to go :)

To test your installation you can run the following tests:

python unittestsalgorithms.py python unittestsprograms.py python unittestspredictions.py

Only if you installed ray

python unittestsparallel.py ```

File structure

bash ./ Algorithms/ # the search algorithms + parallel pipeline DSL/ # DSL: dreamcoder, deepcoder, flashfill list_dataset/ # DreamCoder dataset in pickle format Predictions/ # all files related to the ANN for prediction of the grammars

Documentation and examples

Table of contents:

Creating a DSL

The folder DSL contains some Domain-Specific-Languages (DSL), you can use these or create your own. The DSL we use for the dreamcoder dataset is based on the DeepCoder paper, it is defined in DSL/deepcoder.py. It contains two important objects: primitive_types and semantics. The former (primitive_types) is a dictionary mapping the primitive names of the DSL to their types, and the latter (semantics) maps the primitive names to their semantics, which are either values or functions to be used when evaluating the primitives. Note that a primitive can be simply a constant such as 0, 1, 2 (see the list.py DSL for an example of that).

Let us first discuss the type system, imported with from type_system import *. The atomic types called PrimitiveType are INT and BOOL, you can create your own with PrimitiveType(type_name). There are two constructors: List constructs lists by List(type), and Arrow(a, b) represents a function from a to b. For instance, the type of + is Arrow(INT, Arrow(INT, INT)). Note that this is different from Arrow(Arrow(INT, INT), INT). We support polymorphic types: PolymorphicType(name). For example with t0 = PolymorphicType("t0"), t1 = PolymorphicType("t1"), the type Arrow(t0, Arrow(t1), t0)) can be instantiated into Arrow(INT, Arrow(INT), INT)) but not into a Arrow(INT, Arrow(INT), BOOL)) since both t0 must correspond to the same type.

One important point: when defining the semantics of functions, Python must be able to evaluate them one argument at a time: for instance, for addition: lambda a: lambda b: a + b, not lambda a,b: a + b.

The optional no_repetitions is a set of primitives than cannot be repeated such as SORT (indeed it is useless to sort a sorted list). This reduces the program space.

To create a DSL object we use the following: dsl = DSL(primitive_types, semantics, no_repetitions) (no_repetitions can be None).

Compiling a DSL into a CFG

We can build a Context-Free Grammar (CFG) from a DSL, using the following: dsl.DSL_to_CFG(type_request, max_program_depth=4, min_variable_depth=1, upper_bound_type_size=10, n_gram=2). The only required field is type_request, it defines the type of the program you want to synthesize. For example if the goal is to synthesize the multiplication function that the type request would be Arrow(INT, Arrow(INT, INT)).

The max_program_depth: int gives an upper bound on the depth of the programs. The min_variable_depth: int gives a lower bound on the depth of all variables in a program. The upper_bound_type_size: int is used when instantitating polymorphic types. The type size is defined as follows: a PrimitiveType has size 1, a List(a) has size 1 + size(a) and an Arrow(a, b) has size 1 + size(a) + size(b). The n_gram: int chooses the granularity of the CFG, meaning how many primitives are used to choose the next primitive. The two expected values are 1 and 2.

From a CFG to a PCFG

To turn a CFG into a Probabilistic CFG (PCFG), we need to add probabilities to derivation rules. The expected way to do that is by training a neural network, we'll discuss that later. For testing purposes there are two easier ways to get a PCFG: * the uniform PCFG can be obtained using cfg.CFG_to_Uniform_PCFG() * a random PCFG with cfg.CFG_to_Random_PCFG(alpha=0.7). The parameter alpha serves as temperature: the larger the closer to uniform.

To see some programs generated by the PCFG you can run the following code:

```python for program in pcfg.sampling(): print(program)

Note that this is an infinite loop

```

Synthesis

We can now solve a programming by example task. Let us consider the following: the type request is Arrow(INT, Arrow(INT, INT)) and the set of examples

python examples = [ ([1, 1], 5), ([10, 0], 13), ([0, 2], 1), ] A solution program is lambda x: lambda y: x - y + 3, there could be others (more complicated).

Following the above let us assume we have constructed a DSL and a PCFG. We will use the following function from run_experiment.py:

python run_algorithm(is_correct_program: Callable[[Program, bool], bool], pcfg: PCFG, algo_index: int)

Let us explain what are the three arguments: * is_correct_program is a function that checks if the program is correct, it can be easily created with the help of experiment_helper.py which provides the following function:

python make_program_checker(dsl: DSL, examples) -> Callable[[Program, bool], bool]

  • pcfg is the PCFG
  • algo_index is the index of the algorithm to use for the search. Here is the mapping:

python 0 => Heap Search 1 => SQRT 2 => Threshold 3 => Sort & Add 4 => DFS 5 => BFS 6 => A*

We can further tune three parameters in run_experiment.py: * timeout: int = 100 is the timeout in seconds before the search is stopped * total_number_programs: int = 1_000_000 maximum number of programs enumerated before the search is stopped. On a personal computer this is about 30sec of search for Heap Search. * use_heap_search_cached_eval = True only when using Heap Search this enables caching of evaluations of partial programs and thus provides a much faster evaluation at the cost of additional memory.

Now, the run_experiment returns a tuple: program, search_time, evaluation_time, nb_programs, cumulative_probability, probability. Times are in seconds. Program is None is no solution was found. probability is the probability of the latest program enumerated if no solution was found and the probability of the solution program otherwise.

Parallelisation

There is in run_experiment.py the parallel variant run_algorithm_parallel which takes additional arguments:

  • splits: int the number of splits of the grammar
  • n_filters: int = 4 the number of evaluators threads (a CPU per evaluator is required)
  • transfer_queue_size: int = 500_000 the size of the queue between enumerators and evaluators
  • transfer_batch_size: int = 10 the size of batches transferred from enumerators to the queue and from the queue to evaluators

The output is the same, except for some metrics which are now in list form which means they are per enumerator or per evaluator.

Up to now we did not use machine learned models. Let us get to the machine learning part.

Model Creation

The file model_loader.py contains generic functions for the two types of models: int list models and the generic models. The former support only one type request while the latter support generic type requests.

For instance, the function build_deepcoder_generic_model creates a BigGramsPredictor. There are a few parameters for our model:

  • nb_arguments_max is the maximum number of arguments a function can have
  • lexicon is the list of all symbols that can be encountered in examples, it helps us with the first step to create an integer encoding of the examples
  • embedding_output_dimension the embedding size of an example, then for each example they are embedded using a classic torch embedding
  • size_hidden is the sizes of the inner layers and output layers of the RNN which is run on all examples of a task sequentially
  • number_layers_RNN is the number of layers the RNN should have

Model training

The reference file is produce_network.py. After loading a model, it generates valid tasks with their solution for the model and train the model on these tasks. The model is saved at each epoch. The part you might want to change according to your needs is:

```python dataset_name = "dreamcoder"

dataset_name = "deepcoder"

dataset_name = "flashfill"

Set to None for generic model

type_request = Arrow(List(INT), List(INT))

type_request = None

datasetsize: int = 10000 nbepochs: int = 1 batchsize: int = 128

TRAINING

if datasetname == "dreamcoder": curdsl, cfg, model = builddreamcoderintlistmodel() elif datasetname == "deepcoder": if typerequest is None: _, typerequests = \ deepcoderdatasetloader.loadtasks("./deepcoderdataset/T=3test.json") curdsl, cfgdict, model = builddeepcodergenericmodel(typerequests) else: curdsl, cfg, model = builddeepcoderintlistmodel() elif datasetname == "flashfill": curdsl, cfgdict, model = buildflashfillgenericmodel() else: assert False, f"Unrecognized dataset: {datasetname}"

print("Training model:", getmodelname(model), "on", datasetname) print("Type Request:", typerequest or "generic")

if typerequest: nbexamplesmax: int = 2 else: nbexamples_max: int = 5 ```

Predictions from a model

Now we have a model, let us see how to use it to construct a PCFG. This is model dependent, but the good news is that the function task_set2dataset from experiment_helper.py does it for us. The arguments are:

  • tasks a list where for each task there is a list of tuples (input, output) which are the examples of the task
  • model is the model
  • dsl is the DSL

The output is for each task the tuple (task_name, PCFG_predicted_by_model, is_correct_program).

Reproducing the experiments from the AAAI'22 paper

For the experiments, you only need to run the produce_network.py file (editing the parameters based on the dataset or the batch size). A .weights file should appear at the root folder. This will train a neural network on random generated programs as described in Appendix F in the paper.

All of the files mentioned in this section are located in the root folder and follow this pattern run_*_experiments*.py. Here is a short summary of each experiment:

  • run_random_PCFGsearch.py produce a list of all programs generated under Xsec of search time by all algorithms.
  • run_random_PCFGsearch_parallel.py same experiment but iwth the grammar_splitter and multiple CPUs.
  • run_experiments_<dataset>.py try to find solutions using an ANN to predict the grammar and for each algorithm logs the search data for the corresponding <dataset>. The suffix parallel can also be found indicating that the algorithms are run in parallel. The semantics experiments in the paper used a trained model thatn can be obtained using produce_network.py or directly in the repository. The results can be plotted using plot_results_semantics.py.

Note that for the DreamCoder experiment in our paper, we did not use the cached evaluation of HeapSearch, this can be reproduced by setting use_heap_search_cached_eval to False in run_experiment.py.

How to download the DeepCoder dataset

First, download the archive from here (DeepCoder repo): https://storage.googleapis.com/deepcoder/dataset.tar.gz in a folder deepcoder_dataset at the root of DeepSynth. Then you simply need to:

bash gunzip dataset.tar.gz tar -xf dataset.tar

You should see a few JSON files.

Report an issue

If you find any issue, please create a GitHub issue with specifics steps to reproduce the bug.

Contribute

Contributions are welcome! However, feature-wise we believe DeepSynth is in a maintenance state. Please first, create an issue with what your contribution should be about. Then you can create a pull request.

Owner

  • Name: Nathanaël Fijalkow
  • Login: nathanael-fijalkow
  • Kind: user
  • Location: Bordeaux, France
  • Company: CNRS

Computer science researcher working on program synthesis

JOSS Publication

DeepSynth: Scaling Neural Program Synthesis with Distribution-based Search
Published
October 16, 2022
Volume 7, Issue 78, Page 4151
Authors
Théo Matricon ORCID
CNRS, LaBRI and Université de Bordeaux, France
Nathanaël Fijalkow ORCID
CNRS, LaBRI and Université de Bordeaux, France, The Alan Turing Institute of data science, United Kingdom
Guillaume Lagarde
CNRS, LaBRI and Université de Bordeaux, France
Kevin Ellis
Cornell University, United States
Editor
Arfon Smith ORCID
Tags
program synthesis programming by example neuro-symbolic methods parallel search procedures

GitHub Events

Total
  • Issues event: 2
  • Watch event: 13
  • Issue comment event: 1
  • Push event: 1
  • Pull request review event: 1
  • Pull request event: 4
  • Fork event: 2
Last Year
  • Issues event: 2
  • Watch event: 13
  • Issue comment event: 1
  • Push event: 1
  • Pull request review event: 1
  • Pull request event: 4
  • Fork event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 294
  • Total Committers: 6
  • Avg Commits per committer: 49.0
  • Development Distribution Score (DDS): 0.231
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Theomat t****n@g****m 226
Nathanael Fijalkow n****w@g****m 50
Guillaume Lagarde g****e@g****m 8
ellisk42@gmail.com e****2@g****m 6
Alex Bezzubov b****z@a****g 3
Francesco Compagno 4****h 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 2
  • Total pull requests: 4
  • Average time to close issues: 11 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 2
  • Total pull request authors: 4
  • Average comments per issue: 3.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: 43 minutes
  • Average time to close pull requests: 39 minutes
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kataph (1)
  • njuaplusplus (1)
Pull Request Authors
  • kataph (2)
  • bzz (1)
  • arfon (1)
  • ellisk42 (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • Cython ==0.29.32
  • Pillow ==9.2.0
  • contourpy ==1.0.5
  • cycler ==0.11.0
  • fonttools ==4.37.4
  • kiwisolver ==1.4.4
  • matplotlib ==3.6.0
  • numpy ==1.23.3
  • packaging ==21.3
  • pyparsing ==3.0.9
  • python-dateutil ==2.8.2
  • scipy ==1.9.1
  • six ==1.16.0
  • torch >=1.8
  • tqdm ==4.64.1
  • typing_extensions ==4.4.0