deepisahol

Official repository of the DeepIsaHOL project (number: 101102608) titled Reinforcement learning to improve proof-automation in theorem proving

https://github.com/yonoteam/deepisahol

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Official repository of the DeepIsaHOL project (number: 101102608) titled Reinforcement learning to improve proof-automation in theorem proving

Basic Info
  • Host: GitHub
  • Owner: yonoteam
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Size: 31.7 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

DeepIsaHOL

This is the repository of the DeepIsaHOL project. It is supported by the DeepIsaHOL MSCA Fellowship (number: 101102608) titled Reinforcement learning to improve proof-automation in theorem proving. The project's long-term objective is to use the Isabelle proof assistant as a reinforcement learning (RL) environment for training RL algorithms that ultimately serve as tools for Isabelle users.

The project currently offers: 1. proof data retrieving capabilities from Isabelle libraries 2. a read-eval-print-loop interface for Isabelle in Scala and Python 3. a Python training loop for Hugging Face T5 models for conditional generation 4. a simple dfs-algorithm enabling the models to automatically prove some Isabelle theorems

Additionally, generated data examples, weights of the T5 models trained with this repository, and logs and examples of the proofs generated by the models are available in this Zenodo repository from March, 2025.

Table of Contents

Instalation

  1. Prerequisites:
    • The Isabelle2025 proof assistant
    • Optionally (see 6. below): Its Archive of Formal Proofs (AFP)
    • Dominique Unruh's scala-isabelle updated library for Isabelle2025 (see "Scala level" below).
    • The sbt build tool for Java and Scala.
    • A Python environment with the packages listed in this project's ./src/main/python/requirments.txt. In particular:
    • For connecting Python to Isabelle, this repository uses the Py4J library (see "Python level" below).
    • For training and evaluating models, it uses Hugging Face's openly available Transformers and Accelerate libraries, both depending on PyTorch.
  2. Clone this repository.
  3. Adapt this project's build.sbt file to your need (e.g. correct the location of scala-isabelle).
  4. Adapt this project's directories.scala to your needs. Specifically, you will need to update the location of this project's src/main/ml/Isabelle_RL.thy file and paste it to isabelle_rl, and the location of your Isabelle application to isabelle_app.
  5. Compile this project's Scala sources.
  6. If you wish to extract data from a specific AFP entry, ensure that:
    • You can make Isabelle aware of your AFP location
    • Your AFP entry (or Isabelle library) has a ROOT file (or ROOTS file), with a header resembling the snippet below (see Isabelle's sessions and built management in Isabelle's documentation). sml session logic_name (AFP) = "parent_logic" + options [timeout = 1800] theories (* these correspond to .thy files in the entry's directory *) theory_file_name1 theory_file_name2 document_files "root.tex" "root.bib"
    • Or call the project's functions from the location of your entry.

Data extraction

This project's main.scala extracts data from a directory with Isabelle files.

One can call this functionality via sbt from this repository's top directory: sbt "run /path/to/a/read/directory/ /path/to/a/writing/directory/"

Alternatively, you can use writer.scala or writer.python for interactively extracting data via the corresponding REPLs.

Scala:

```scala scala> import isabellerl.

scala> import scala.sys.process._

// For this example, let us extract data from the Isabelle session on lineal temporal logic (LTL). scala> val logic = "LTL"

scala> val data_extractor = new Writer("/path/to/your/afp/thys/LTL", "/path/to/your/data/output/dir/", logic) /* (replies with) Imports: Adding local /path/to/your/afp/thys/LTL/example/Example.thy Imports: Adding local /path/to/your/afp/thys/LTL/example/Rewriting.thy ... Initialised minion for directory: /path/to/your/afp/thys/LTL */

// Extract data from the LTL theory file EquivalenceRelations.thy scala> dataextractor.writedata("EquivalenceRelations.thy")

// Check that data is extracted scala> "ls /path/to/your/data/output/dir/Equivalence_Relations".! /* (replies with) proof0.json proof1.json proof10.json proof11.json ... */

// Alternatively, extract data from all .thy files in the directory scala> dataextractor.writeall() /* (replies with) Creating proofs for /path/to/your/afp/thys/LTL/LTL.thy Creating proofs for /path/to/your/afp/thys/LTL/Equivalence_Relations.thy ... */

// Close the Isabelle process scala> data_extractor.shutdown() ```

Python:

Enable a Scala port in a command-line window via sbt "runMain isabelle_rl.Py4j_Gateway_Main" (from this repository's top directory), then:

```python

import sys sys.path.append('/path/to/this/projects/src/main/python') from writer import Writer

dataextractor = Writer('/path/to/your/read/directory/with/isabelle/files/', '/path/to/your/data/output/dir', 'thelogicnameintheroot_file')

dataextractor.writedata("Equivalence_Relations") # same interface in Python as in Scala

dataextractor.writeall() # you can see the process working in the Scala window

dataextractor.shutdowngateway() # close the Isabelle and Scala processes ```

In case of connections issues, you can manage your ports in the (automatically created) ports.json at the top directory of this repository.

Data interaction

You can see the REPL's methods in this project's /src/main/scala/repl.scala. For an example of how these methods are used, you can check the script /src/main/python/dfs.py that uses T5 models for interacting with Isabelle via the Python REPL. The functionality is analogous to that of the Writer object above.

Scala:

```scala scala> import isabellerl.

scala> repl = new REPL("HOL") // see Isabelle's documentation for more logic options REPL started! val repl: isabellerl.REPL = isabellerl.REPL@7274a164

scala> repl.apply("lemma \"\x. P x \ P c\"") val res0: String = proof (prove) goal (1 subgoal): 1. <forall>x. P x <Longrightarrow> P c

scala> repl.apply("apply blast") val res3: String = proof (prove) goal: No subgoals!

scala> repl.apply("done") val res4: String = ""

scala> repl.shutdown_isabelle() // close the Isabelle process ```

Python:

Again, start a Scala port in a command-line window via sbt "runMain isabelle_rl.Py4j_Gateway_Main" (from this repository's top directory), then run a Python session:

```python

import sys sys.path.append('/path/to/this/project/src/main/python') from repl import REPL

repl = REPL("HOL") REPL and minion initialized.

repl.apply("lemma \"\x. P x \ P c\"") 'proof (prove) goal (1 subgoal): 1. \x. P x \ P c'

repl.apply("apply simp") 'proof (prove) goal: No subgoals!'

repl.apply("done") ''

repl.shutdown_gateway() # do not forget to close the Isabelle process ```

Implementation details

How it works?

The project has three levels: Isabelle, Scala and Python. They are linearly ordered with Isabelle being the lowest level, and Python being the highest. The Isabelle proof assistant is in charge of retrieving all data from its sources. The other 2 levels interact with functions, structures and/or methods from the level immediately below. Each level satisfies a need that the previous level cannot. Scala enables a programmatic interaction with Isabelle while also having more features than the Isabelle/ML programming language. Python enables the interaction with popular machine learning, deep learning, and reinforcement learning libaries.

How to explore?

Isabelle level

You can use Isabelle's jEdit prover IDE (PIDE) to experiment with this project's ML libraries. For instance, this project's file get.ML includes various functions for retrieving specific data from the proof assistant. You can open a .thy file (e.g. Temp.thy) with Isabelle and see the result of get.ML functions by giving arguments to them inside Temp.thy and seeing the result in the PIDE's output panel:

``` theory "Temp" imports "Complex_Main"

begin

MLfile "~/path/to/this/project/src/main/ml/pred.ML" MLfile "~/path/to/this/project/src/main/ml/ops.ML" ML_file "~/path/to/this/project/src/main/ml/get.ML"

ML ‹Get.thms <^context> "listall2eq"›

ML ‹Get.grouped_commands <^theory>›

(* Returns a list of names and their corresponding list of theorems proved up to this point where the name contains the word "List" and the associated list has only one theorem *) ML ‹val listthms = Get.filteredthms [Pred.onfst (Get.passesfactcheck <^context>), Pred.neg (Pred.onsnd Pred.hasmany), Pred.onfst (Pred.contains "List")] <^context>›

end ```

Scala level

This level is handled by scala-isabelle. The project provides a minion.scala that receives a working directory and manages Isabelle's theory stack via our imports.scala graph. Then, the writer.scala asks this minion to call the writer.ML function extract_jsons to produce a string of json's with proof data from a .thy file. Alternatively, the writer's write_all uses the minion to retrieve the data and write it to a directory of your choice. Similarly, the repl.scala delegates its functions to the minion.scala who has access to a repl_state.ML model from this project's sources.

Python level

Finally, the project provides Python classes in writer.py and repl.py with the same functionality as their Scala analogs (writer.scala and repl.scala). In the background, these classes simply ask their Scala counterparts to execute the requested commands via Py4j.

Repository structure

Warning: The description below might be outdated. It is here to aid in the navigation for new library users.

bash . ├── README.md ├── build.sbt └── src # sources for the above-listed implementations ├── main │   ├── ml │   │   ├── Isabelle_RL.thy # Theory importing this project's ML libraries │   │   ├── ROOT # Corresponding ROOT file for this project │   │   ├── actions.ML # Isabelle's top-level representation of user-actions │   │   ├── data.ML # definition of Data.T grouping retrieved data │   │   ├── g2tac_formatter.ML │   │   ├── get.ML # methods for retrieving Isabelle data │   │   ├── imports.ML │   │   ├── json_maker.ML # methods formatting the generated JSON files │   │   ├── ops.ML # frequently used operations │   │   ├── pred.ML # frequently used operations using predicates │   │   ├── print.ML # data displying methods │   │   ├── repl_state.ML # model for the REPLs' state │   │   ├── sections.ML # methods for identifying sections of a .thy file │   │   ├── seps.ML # .thy file section-delimiter methods based on keywords │   │   ├── temp.ML │   │   └── writer.ML # methods for writing the JSON objects │   ├── python | │ ├── config_ops.py # for the configuration file of the training of models │   │   ├── data_clumper.py │   │   ├── dfs.py # depth-first-search loop for automated theorem proving │   │   ├── dicts.py # operations manipulating Python dictionaries │   │   ├── distrib.py # operations for distributed model training and evaluation │   │   ├── eval_t5.py # evaluation loops for the LLMs │   │   ├── ops.py # frequently used operations │   │   ├── proofs.py │   │   │   ├── __init__.py # methods for data-retrieval from the generated proof-JSONs │   │   │   ├── data_dir.py # methods on the generated proof-JSONs directory │   │   │   ├── data_stats.py # statistics from the generated proof-JSONs │   │   │   └── str_ops.py # string operations on the generated proof-JSONs │   │   ├── repl.py # implementation of the Python REPL │   │   ├── requiremnts.txt # dependencies for this project's Python scripts │   │   ├── save_ops.py # methods for saving tokenizers, datasets, and models │   │   ├── tokenizer_ops.py # frequently used tokenizer operations │   │   ├── train_config.json # template of the input training configuration file │   │   ├── train_t5.py # training loop for the models │   │   └── writer.py # interface with the data-extraction algorithm │   └── scala │   ├── directories.scala # register of important directories for this repository │   ├── graph.scala # a generic graph model │   ├── imports.scala # graph for managing Isabelle sessions │   ├── main.scala # user-entry to the data-extraction algorithm │   ├── minion.scala # module in charge of managing Isabelle from Scala │   ├── py4j_gateway.scala # the interface between Scala and Python │   ├── repl.scala # implementation of the Scala REPL │   ├── utils.scala # frequently used operations │   └── writer.scala # implementation of the data-extraction algorithm └── tests # testing files for the methods in the main directory ├── ml    │   └── Get.thy    └── python ├── count_samples.py ├── test_dataloading.py ├── test_proofs.py └── test_repl.py

Owner

  • Name: Jonathan Julian Huerta y Munive
  • Login: yonoteam
  • Kind: user
  • Location: Copenhagen

Formal verification and its automation researcher. Assistant Professor at U of Aalborg

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this library or its functions, you could cite it as below."
authors:
- family-names: "Huerta y Munive"
  given-names: "Jonathan Julian"
  orcid: "https://orcid.org/0000-0003-3279-3685"
title: "DeepIsaHOL"
version: 2025.03.28
date-released: 2023-11-21
url: "https://github.com/yonoteam/DeepIsaHOL"

GitHub Events

Total
  • Watch event: 1
  • Delete event: 1
  • Push event: 278
  • Create event: 3
Last Year
  • Watch event: 1
  • Delete event: 1
  • Push event: 278
  • Create event: 3