https://github.com/google-deepmind/alphageometry

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: google-deepmind
License: apache-2.0
Language: Python
Default Branch: main
Size: 176 KB

Statistics

Stars: 4,500
Watchers: 57
Forks: 517
Open Issues: 128
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License

Solving Olympiad Geometry without Human Demonstrations

This repository contains the code necessary to reproduce DDAR and AlphaGeometry, the two geometry theorem provers introduced in the Nature 2024 paper:

"Solving Olympiad Geometry without Human Demonstrations".

Dependencies

For the instructions presented below, we use Python 3.10.9, and dependencies with their exact version numbers listed in requirements.txt.

Our code depends on meliad, which is not a registered package with pip. See instructions below for how to manually install meliad.

Note that one can still run the DDAR solver without the meliad and sentencepiece dependencies.

Run the instructions

All instructions in this README.md can be run in one go by:

bash run.sh

Below, we explain these instructions step-by-step.

Install dependencies, download weights and vocabulary.

Installation is done in a virtual environment:

virtualenv -p python3 . source ./bin/activate pip install --require-hashes -r requirements.txt

Download weights and vocabulary:

bash download.sh DATA=ag_ckpt_vocab

Finally, install meliad separately as it is not registered with pip:

MELIAD_PATH=meliad_lib/meliad mkdir -p $MELIAD_PATH git clone https://github.com/google-research/meliad $MELIAD_PATH export PYTHONPATH=$PYTHONPATH:$MELIAD_PATH

Set up common flags

Before running the python scripts, let us first prepare some commonly used flags. The symbolic engine needs definitions and deduction rules to operate. These definitions and rules are provided in two text files defs.txt and rules.txt.

shell DDAR_ARGS=( --defs_file=$(pwd)/defs.txt \ --rules_file=$(pwd)/rules.txt \ );

Next, we define the flags relevant to the proof search. To reproduce the simple examples below, we use lightweight values for the proof search parameters:

```shell BATCHSIZE=2 BEAMSIZE=2 DEPTH=2

SEARCHARGS=( --beamsize=$BEAMSIZE --searchdepth=$DEPTH ) ```

NOTE: The results in our paper can be obtained by setting BATCH_SIZE=32, BEAM_SIZE=512, DEPTH=16 as described in section Methods. To stay under IMO time limits, 4 V100-GPUs and 250 CPU workers are needed as shown in Extended Data - Figure 1. Note that we also strip away other memory/speed optimizations due to internal dependencies and to promote code clarity.

Assume the downloaded checkpoint and vocabulary is placed in DATA, and the installed meliad source code is at MELIAD_PATH. We make use of the gin library to manage model configurations, following meliad conventions. We now define the flags relevant to the language model:

shell LM_ARGS=( --ckpt_path=$DATA \ --vocab_path=$DATA/geometry.757.model --gin_search_paths=$MELIAD_PATH/transformer/configs,$(pwd) \ --gin_file=base_htrans.gin \ --gin_file=size/medium_150M.gin \ --gin_file=options/positions_t5.gin \ --gin_file=options/lr_cosine_decay.gin \ --gin_file=options/seq_1024_nocache.gin \ --gin_file=geometry_150M_generate.gin \ --gin_param=DecoderOnlyLanguageModelGenerate.output_token_losses=True \ --gin_param=TransformerTaskConfig.batch_size=$BATCH_SIZE \ --gin_param=TransformerTaskConfig.sequence_length=128 \ --gin_param=Trainer.restore_state_variables=False );

TIP: Note that you can still run the DDAR solver without defining SEARCH_ARGS and LM_ARGS. In such case, simply disable the import of the lm_inference module inside alphageometry.py.

Run DDAR

The script loads a problem by reading a list of problems from a text file and solves the specific problem in the list according to its name. We pass these two pieces of information through the flags --problems_file and --problem_name. We use --mode=ddar to indicate that we want to use the DDAR solver.

Below we showed this solver solving IMO 2000 P1:

shell python -m alphageometry \ --alsologtostderr \ --problems_file=$(pwd)/imo_ag_30.txt \ --problem_name=translated_imo_2000_p1 \ --mode=ddar \ "${DDAR_ARGS[@]}"

Expect the following output

```shell graph.py:468] translatedimo2000p1 graph.py:469] a b = segment a b; g1 = ontline g1 a a b; g2 = ontline g2 b b a; m = oncircle m g1 a, oncircle m g2 b; n = oncircle n g1 a, oncircle n g2 b; c = onpline c m a b, oncircle c g1 a; d = onpline d m a b, oncircle d g2 b; e = online e a c, online e b d; p = online p a n, online p c d; q = online q b n, on_line q c d ? cong e p e q ddar.py:41] Depth 1/1000 time = 1.7772269248962402 ddar.py:41] Depth 2/1000 time = 5.63526177406311 ddar.py:41] Depth 3/1000 time = 6.883412837982178 ddar.py:41] Depth 4/1000 time = 10.275688409805298 ddar.py:41] Depth 5/1000 time = 12.048273086547852

alphageometry.py:190]

From theorem premises: A B G1 G2 M N C D E P Q : Points AG1 ⟂ AB [00] BA ⟂ G2B [01] G2M = G2B [02] G1M = G1A [03]

... [log omitted] ...

∠QEB = ∠(QP-EA) [46] & ∠(BE-QP) = ∠AEP [55] ⇒ ∠EQP = ∠QPE [56]
∠PQE = ∠EPQ [56] ⇒ EP = EQ

========================== ```

The output first includes a list of relevant premises that it uses, and then proof steps that gradually build up the proof. All predicates are numbered to track how they are derived from the premises, and to show that the proof is fully justified.

TIP: Additionally passing the flag --out_file=path/to/output/text/file.txt will write the proof to a text file.

Running on all problems in imo_ag_30.txt will yield solutions to 14 of them, as reported in Table 1 in our paper.

Run AlphaGeometry:

As a simple example, we load --problem_name=orthocenter from --problem_file=examples.txt. This time, we pass --mode=alphageometry to use the AlphaGeometry solver and pass the SEARCH_ARGS and LM_ARGS flags.

shell python -m alphageometry \ --alsologtostderr \ --problems_file=$(pwd)/examples.txt \ --problem_name=orthocenter \ --mode=alphageometry \ "${DDAR_ARGS[@]}" \ "${SEARCH_ARGS[@]}" \ "${LM_ARGS[@]}"

Expect the following output:

```shell ... [log omitted] ... trainingloop.py:725] Total parameters: 152072288 trainingloop.py:739] Total state size: 0 trainingloop.py:492] Training loop: creating task for mode beamsearch

graph.py:468] orthocenter graph.py:469] a b c = triangle a b c; d = ontline d b a c, ontline d c a b ? perp a d b c ddar.py:41] Depth 1/1000 time = 0.009987592697143555 branch = 4 ddar.py:41] Depth 2/1000 time = 0.00672602653503418 branch = 0 alphageometry.py:221] DD+AR failed to solve the problem. alphageometry.py:457] Depth 0. There are 1 nodes to expand: alphageometry.py:460] {S} a : ; b : ; c : ; d : T a b c d 00 T a c b d 01 ? T a d b c {F1} x00 alphageometry.py:465] Decoding from {S} a : ; b : ; c : ; d : T a b c d 00 T a c b d 01 ? T a d b c {F1} x00 ... [log omitted] ... alphageometry.py:470] LM output (score=-1.102287): "e : C a c e 02 C b d e 03 ;" alphageometry.py:471] Translation: "e = online e a c, online e b d"

alphageometry.py:480] Solving: "a b c = triangle a b c; d = ontline d b a c, ontline d c a b; e = online e a c, online e b d ? perp a d b c" graph.py:468] graph.py:469] a b c = triangle a b c; d = ontline d b a c, ontline d c a b; e = online e a c, online e b d ? perp a d b c ddar.py:41] Depth 1/1000 time = 0.021120786666870117 ddar.py:41] Depth 2/1000 time = 0.033370018005371094 ddar.py:41] Depth 3/1000 time = 0.04297471046447754

alphageometry.py:140]

From theorem premises: A B C D : Points BD ⟂ AC [00] CD ⟂ AB [01]
Auxiliary Constructions: E : Points E,B,D are collinear [02] E,C,A are collinear [03]
Proof steps:
1. E,B,D are collinear [02] & E,C,A are collinear [03] & BD ⟂ AC [00] ⇒ ∠BEA = ∠CED [04]
2. E,B,D are collinear [02] & E,C,A are collinear [03] & BD ⟂ AC [00] ⇒ ∠BEC = ∠AED [05]
3. A,E,C are collinear [03] & E,B,D are collinear [02] & AC ⟂ BD [00] ⇒ EC ⟂ EB [06]
4. EC ⟂ EB [06] & CD ⟂ AB [01] ⇒ ∠(EC-BA) = ∠(EB-CD) [07]
5. E,C,A are collinear [03] & E,B,D are collinear [02] & ∠(EC-BA) = ∠(EB-CD) [07] ⇒ ∠BAE = ∠CDE [08]
6. ∠BEA = ∠CED [04] & ∠BAE = ∠CDE 08⇒ EB:EC = EA:ED [09]
7. EB:EC = EA:ED [09] & ∠BEC = ∠AED 05⇒ ∠BCE = ∠ADE [10]
8. EB:EC = EA:ED [09] & ∠BEC = ∠AED 05⇒ ∠EBC = ∠EAD [11]
009. ∠BCE = ∠ADE [10] & E,C,A are collinear [03] & E,B,D are collinear [02] & ∠EBC = ∠EAD [11] ⇒ AD ⟂ BC

alphageometry.py:505] Solved. ```

NOTE: Point H is automatically renamed to D, as the LM is trained on synthetic problems where the points are named alphabetically, and so it expects the same during test time.

NOTE: In this implementation of AlphaGeometry, we removed all optimizations that are dependent on internal infrastructure, e.g., parallelized model inference on multi GPUs, parallelized DDAR on multiple CPUs, parallel execution of LM and DDAR, shared pool of CPU workers across different problems, etc. We also removed some memory/speed optimizations and code abstractions in favor of code clarity.

As can be seen in the output, initially DDAR failed to solve the problem. The LM proposes two auxiliary constructions (because BATCH_SIZE=2):

e = eqdistance e c a b, eqdistance e b a c, i.e., construct E as the intersection of circle (center=C, radius=AB) and circle (center=B, radius=AC). This construction has a score of -1.186.
e = on_line e a c, on_line e b d, i.e., E is the intersection of AC and BD. This construction has a higher score (-1.102287) than the previous.

Since the second construction has a higher score, DDAR attempted the second construction first and found the solution right away. The proof search therefore terminates and there is no second iteration.

Results

Before attempting to reproduce the AlphaGeometry numbers in our paper, please make sure to pass all tests in the prepared test suite:

bash run_tests.sh

NOTE: Issues#14 reports that although the top beam decodes are still the same, the LM is not giving the same score for different users.

Then, pass the corresponding values for --problem_file (column) and --mode (row), and iterate on all problems to obtain the following results:

Number of solved problems: | | `imo_ag_30.txt` | `jgex_ag_231.txt` | |----------|------------------|-------------------| | `ddar` | 14 | 198 | | `alphageometry` | 25 | 228 |

Source code description

Files in this repository include python modules/scripts to run the solvers and resource files necessary for the script to execute. We listed below each of them and their description.

| File name |------------------------| | geometry.py | numericals.py | graph_utils.py | graph.py | problem.py | dd.py | ar.py | trace_back.py | ddar.py | beam_search.py | models.py | transformer_layer.py | decoder_stack.py | lm_inference.py | alphageometry.py | pretty.py | *_test.py | download.sh | run.sh | run_tests.sh | Description | ------------------------------------------------------------------------------------| | Implements nodes (Point, Line, Circle, etc) in the proof state graph. | | Implements the numerical engine in the dynamic geometry environment. | | Implements utilities for the proof state graph. | | Implements the proof state graph. | | Implements the classes that represent the problem premises, conclusion, DAG nodes. | | Implements DD and its traceback. | | Implements AR and its traceback. | | Implements the recursive traceback and dependency difference algorithm. | | Implements the combination DD+AR. | | Implements beam decoding of a language model in JAX. | | Implements the transformer model. | | Implements the transformer layer. | | Implements the transformer decoder stack. | | Implements an interface to a trained LM to perform decoding. | | Main script that loads problems, calls DD+AR or AlphaGeometry solver, and prints solutions. | | Pretty formating the solutions output by solvers. | | Tests for the corresponding module. | | Script to download model checkpoints and LM | | Script to execute instructions in README. | | Script to execute the test suite. |

Resource files:

| Resource file name | Description | |------------------------|------------------------------------------------------------------------------------| | defs.txt | Definitions of different geometric construction actions. | | rules.txt | Deduction rules for DD. | | geometry_150M_generate.gin| Gin config of the LM implemented in meliad. | | imo_ag_30.txt | Problems in IMO-AG-30. | | jgex_ag_231.txt | Problems in JGEX-AG-231. |

Citing this work

bibtex @Article{AlphaGeometryTrinh2024, author = {Trinh, Trieu and Wu, Yuhuai and Le, Quoc and He, He and Luong, Thang}, journal = {Nature}, title = {Solving Olympiad Geometry without Human Demonstrations}, year = {2024}, doi = {10.1038/s41586-023-06747-5} }

Acknowledgements

This research is a collaboration between the Google Brain team (now Google Deepmind) and the Computer Science Department of New York University. We thank Rif A. Saurous, Denny Zhou, Christian Szegedy, Delesley Hutchins, Thomas Kipf, Hieu Pham, Petar Veličković, Debidatta Dwibedi, Kyunghyun Cho, Lerrel Pinto, Alfredo Canziani, Thomas Wies, He He’s research group, Evan Chen (the USA’s IMO team coach), Mirek Olsak, Patrik Bak, and all three Nature's referees for their help and support.

The code of AlphaGeometry communicates with and/or references the following separate libraries and packages:

We thank all their contributors and maintainers!

Disclaimer

This is not an officially supported Google product.

This research code is provided "as-is" to the broader research community. Google does not promise to maintain or otherwise support this code in any way.

Code License

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

Model Parameters License

The AlphaGeometry checkpoints and vocabulary are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode

Owner

Name: Google DeepMind
Login: google-deepmind
Kind: organization

Website: https://www.deepmind.com/
Repositories: 245
Profile: https://github.com/google-deepmind

GitHub Events

Total

Issues event: 31
Watch event: 510
Delete event: 1
Issue comment event: 43
Pull request event: 3
Fork event: 78
Create event: 2

Last Year

Issues event: 31
Watch event: 509
Delete event: 1
Issue comment event: 43
Pull request event: 3
Fork event: 77
Create event: 2

Issues and Pull Requests

Last synced: 7 months ago

All Time

Total issues: 142
Total pull requests: 39
Average time to close issues: 16 days
Average time to close pull requests: 4 months
Total issue authors: 102
Total pull request authors: 8
Average comments per issue: 0.3
Average comments per pull request: 0.44
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 23

Past Year

Issues: 28
Pull requests: 2
Average time to close issues: 10 days
Average time to close pull requests: N/A
Issue authors: 26
Pull request authors: 1
Average comments per issue: 0.18
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 2

View more stats

Top Authors

Issue Authors

Noxlgoldi (6)
shufan1 (6)
Jerry-Master (5)
Landau1994 (5)
foldl (3)
unhandyandy (3)
mxsurui (2)
geometer (2)
ghost (2)
Aritra-ckb (2)
mathinfinity (2)
Monmayo (2)
stephtchoko (2)
2nazero (2)
vnpower (2)

Pull Request Authors

dependabot[bot] (23)
cclauss (4)
okajimad (3)
pitmonticone (2)
minhnhatnoe (2)
salihmarangoz (2)
eltociear (2)
oliviamarkham0 (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (23)

https://github.com/google-deepmind/alphageometry

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Solving Olympiad Geometry without Human Demonstrations

Dependencies

Run the instructions

Install dependencies, download weights and vocabulary.

Set up common flags

Run DDAR

alphageometry.py:190]

Run AlphaGeometry:

alphageometry.py:140]

009. ∠BCE = ∠ADE [10] & E,C,A are collinear [03] & E,B,D are collinear [02] & ∠EBC = ∠EAD [11] ⇒ AD ⟂ BC

Results

Source code description

Citing this work

Acknowledgements

Disclaimer

Code License

Model Parameters License

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies