moatless-tools

https://github.com/aorwall/moatless-tools

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary

Keywords

ai code-gen code-search gpt-4 python tree-sitter

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: aorwall
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 106 MB

Statistics

Stars: 577
Watchers: 10
Forks: 53
Open Issues: 21
Releases: 1

Topics

ai code-gen code-search gpt-4 python tree-sitter

Created over 2 years ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

Moatless Tools

Moatless Tools is a hobby project where I experiment with some ideas I have about how LLMs can be used to edit code in large existing codebases. I believe that rather than relying on an agent to reason its way to a solution, it is crucial to build good tools to insert the right context into the prompt and handle the response.

For the implementation used in the paper SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement, please see moatless-tree-search.

SWE-Bench

I use the SWE-bench benchmark as a way to verify my ideas.

Claude 4 Sonnet - 70.8% solve rate, $0.63 per instance.

Try it out

Run in Docker

Clone the repository: bash git clone https://github.com/aorwall/moatless-tools.git cd moatless-tools
Set up environment variables: bash cp .env.example .env

Edit the .env file to set your API keys and other configuration options, including the required MOATLESS_DIR variable: MOATLESS_DIR=/path/to/your/moatless/data

Note: MOATLESS_DIR specifies the directory where Moatless will store configuration files and trajectory data. This directory will be mounted as a volume in the Docker containers.

Start the services: bash make run
Access the UI at http://localhost

Install from PyPI

```bash

Install base package only

pip install moatless

Install with Kubernetes runner support

pip install "moatless[kubernetes]" ```

Install from source

Clone the repository and install using Poetry

```bash

Clone the repository

git clone https://github.com/aorwall/moatless-tools.git cd moatless-tools

Install using uv

uv sync ```

Code Examples

Basic agent flow

```python from moatless.actions import Respond from moatless.agent import ActionAgent from moatless.completion.tool_call import ToolCallCompletionModel

completionmodel = ToolCallCompletionModel( model="gpt-4.1-mini", temperature=0.0, modelapi_key="" )

agent = ActionAgent( completionmodel=completionmodel, system_prompt="You are a helpful assistant that can answer questions.", actions=[ Respond() ] )

observation = await agent.run_simple("Hello")

print(observation.message) ```

Code inspector agent

notebooks/codeinspectoragent.ipynb

Run SWE-Bench evaluations

Before running the evaluation, you'll need: 1. At least one LLM provider API key (e.g., OpenAI, Anthropic, etc.) 2. A Voyage AI API key from voyageai.com to use the pre-embedded vector stores for SWE-Bench instances.

Verify Setup

Before running the full evaluation, you can verify your setup running a simple SWE-Bench instance.

bash uv run python scripts/docker_run.py --flow swebench_tools --model-id gpt-4o-mini-2024-07-18 --instance-id django__django-11099 --evaluation-name testing_setup

The script will run the model against a sample SWE-Bench instance

Results are saved in .moatless/projects/testing_setup.

Run evaluation

Evaluation script to run evaluation in a docker containers.

bash python3 scripts/run_evaluation.py --model gpt-4o-mini-2024-07-18 --dataset-split [dataset_split] --evaluation-name [evaluation_name]

Required arguments: - --dataset-split: Dataset split to use - --evaluation-name: Name of the evaluation

Optional arguments: - --model: Model to use for evaluation (default: gpt-4o-mini-2024-07-18, equivalent to --model-id) - --model-id: Model configuration ID to use (replaces entire model configuration) - --litellm-model-name: LiteLLM model name to override (keeps other model settings) - --flow: Flow to use for evaluation (defaults to "simple_coding") - --num-parallel-jobs: Number of parallel jobs (default: 1)

Flows

Available flows that can be specified with the --flow argument:

| Flow ID | Format | Best Suited For | Default Model | |---------|--------|-----------------|---------------| | swebenchtools | Function calling | Models with native function calling support | gpt-4o-mini-2024-07-18 | | swebenchtoolsandreasoning | Function calling | Reasoning models with native function calling support | claude-sonnet-4-20250514 | | swebenchreact | ReACT | Open source models without native function calling support | openrouter/mistralai/devstral-small | | swebenchreact_reasoning | ReACT | Reasoning models without function calling support | openrouter/deepseek/deepseek-r1-0528 |

Model Configuration

Both evaluation scripts support flexible model configuration through the following options:

Model Options

--model-id: Specify a complete model configuration ID. This replaces the entire completion model configuration including temperature, max tokens, and all other settings.
--litellm-model-name: Override only the LiteLLM model name while keeping all other completion model settings (temperature, max tokens, etc.) from the flow or model configuration.

Verified Models

Default model configurations are provided for verified models. Note that other models may work but have not been extensively tested. Verified models are models that have been tested and found to work with the Verified Mini subset of the SWE-Bench dataset.

When specifying just the --model-id argument, the following configurations are used:

| Model | Response Format | Message History | Thoughts in Action | Verified Mini | |-------|----------------|-----------------|-------------------|---------------| | claude-3-5-sonnet-20241022 | toolcall | messages | no | 46% | | claude-3-5-haiku-20241022 | toolcall | messages | no | 28% | | gpt-4o-2024-11-20 | toolcall | messages | yes | 32% | | gpt-4o-mini-2024-07-18 | toolcall | messages | yes | 16% | | o1-mini-2024-09-12 | react | react | no (disabled thoughts) | 28% | | deepseek/deepseek-chat | react | react | no | 36% | | deepseek/deepseek-reasoner | react | react | no (disabled thoughts) | 50% | | gemini/gemini-2.0-flash-exp | react | react | no | 38% | | openrouter/meta-llama/llama-3.1-70b-instruct | react | react | no | - | | openrouter/meta-llama/llama-3.1-405b-instruct | react | react | no | 28% | - | | openrouter/qwen/qwen-2.5-coder-32b-instruct | react | react | no | 32% | - |

Dataset splits

Available dataset splits that can be specified with the --dataset-split argument:

| Split Name | Description | Instance Count | |------------|-------------|----------------| | lite | All instances from the lite dataset | 300 | | verified | All instances from the verified dataset | 500 | | verifiedmini | MariusHobbhahn/swe-bench-verified-mini, a subset of SWE-Bench Verified | 50 | | liteandverifiedsolvable | Instances that exist in both lite and verified datasets and have at least one solved submission to SWE-Bench | 84 |

Example usage

```bash

Run evaluation with Claude 3.5 Sonnet using complete model configuration

python3 scripts/runevaluation.py \ --model-id claude-3-5-sonnet-20241022 \ --flow swebenchtoolsandreasoning \ --dataset-split verified_mini \ --num-parallel-jobs 5

Run evaluation overriding just the model name while keeping flow's model settings

python3 scripts/runevaluation.py \ --litellm-model-name openrouter/qwen/qwen-2.5-coder-32b-instruct \ --flow swebenchreact \ --dataset-split verified_mini \ --num-parallel-jobs 5

Owner

Name: Albert Örwall
Login: aorwall
Kind: user
Location: Göteborg

Repositories: 5
Profile: https://github.com/aorwall

Citation (CITATION.cff)

cff-version: 1.2.0
message: Please cite this project using these metadata.
title: "Moatless Tools"
type: software
authors:
  - family-names: Örwall
    given-names: Albert
    orcid: https://orcid.org/0009-0001-8645-6601
doi: 10.5281/zenodo.15614422
date-released: 2024-06-23
repository-code: "https://github.com/aorwall/moatless-tools"
url: "https://github.com/aorwall/moatless-tools"
license: MIT

GitHub Events

Total

Issues event: 12
Watch event: 265
Issue comment event: 18
Push event: 109
Pull request event: 22
Fork event: 22
Create event: 13

Last Year

Issues event: 12
Watch event: 265
Issue comment event: 18
Push event: 109
Pull request event: 22
Fork event: 22
Create event: 13

Committers

Last synced: 9 months ago

All Time

Total Commits: 95
Total Committers: 4
Avg Commits per committer: 23.75
Development Distribution Score (DDS): 0.074

Past Year

Commits: 71
Committers: 4
Avg Commits per committer: 17.75
Development Distribution Score (DDS): 0.099

Top Committers

Name	Email	Commits
Albert Örwall	a**t@m**i	88
Jens Roland	m**l@j**m	5
Minsoo Kim	4****m	1
B?!	1****h	1

Committer Domains (Top 20 + Academic)

jensroland.com: 1 moatless.ai: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 24
Total pull requests: 34
Average time to close issues: 8 days
Average time to close pull requests: 8 days
Total issue authors: 15
Total pull request authors: 7
Average comments per issue: 1.25
Average comments per pull request: 0.03
Merged pull requests: 26
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 9
Pull requests: 19
Average time to close issues: 17 days
Average time to close pull requests: about 4 hours
Issue authors: 9
Pull request authors: 4
Average comments per issue: 0.56
Average comments per pull request: 0.0
Merged pull requests: 14
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

JensRoland (6)
hhn12138 (3)
zkx06111 (3)
kwikiel (2)
mistoFENG (1)
mnskim (1)
alancelestino (1)
zdaoguang (1)
thughy (1)
srmusukula (1)
john-b-yang (1)
callmeBalloch (1)
zhimin-z (1)
ZCWei51 (1)
Aoi-cn (1)

Pull Request Authors

aorwall (28)
JensRoland (10)
callmeBalloch (2)
Aoi-cn (2)
mnskim (2)
a-antoniades (2)
eltociear (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

poetry.lock pypi

aiohttp 3.8.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.1.0
beautifulsoup4 4.12.2
certifi 2023.7.22
charset-normalizer 3.2.0
colorama 0.4.6
dataclasses-json 0.5.14
frozenlist 1.4.0
fsspec 2023.6.0
greenlet 2.0.2
idna 3.4
iniconfig 2.0.0
langchain 0.0.263
langsmith 0.0.22
llama-index 0.7.24.post1
marshmallow 3.20.1
multidict 6.0.4
mypy-extensions 1.0.0
nest-asyncio 1.5.7
numexpr 2.8.5
numpy 1.25.2
openai 0.27.8
openapi-schema-pydantic 1.2.4
packaging 23.1
pandas 2.0.3
pluggy 1.2.0
pydantic 1.10.12
pyfakefs 5.2.3
pytest 7.4.0
python-dateutil 2.8.2
pytz 2023.3
pyyaml 6.0.1
regex 2023.8.8
requests 2.31.0
six 1.16.0
soupsieve 2.4.1
sqlalchemy 2.0.19
tenacity 8.2.2
tiktoken 0.4.0
tqdm 4.66.1
tree-sitter 0.20.1
tree-sitter-languages 1.7.0
typing-extensions 4.7.1
typing-inspect 0.9.0
tzdata 2023.3
urllib3 1.26.16
yarl 1.9.2

pyproject.toml pypi

pytest ^7.3.1
python >=3.11,<3.12
tree-sitter-languages ^1.7.0

setup.py pypi

moatless-tools

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Moatless Tools

SWE-Bench

Try it out

Run in Docker

Install from PyPI

Install base package only

Install with Kubernetes runner support

Install from source

Clone the repository

Install using uv

Code Examples

Basic agent flow

Code inspector agent

Run SWE-Bench evaluations

Verify Setup

Run evaluation

Flows

Model Configuration

Model Options

Verified Models

Dataset splits

Example usage

Run evaluation with Claude 3.5 Sonnet using complete model configuration

Run evaluation overriding just the model name while keeping flow's model settings

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies