https://github.com/bjodah/local-aider
Proof-of-concept Aider w. local (24GB vram) QwQ+Qwen2.5-Coder using litellm-proxy / llama-swap / llama.cpp
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Repository
Proof-of-concept Aider w. local (24GB vram) QwQ+Qwen2.5-Coder using litellm-proxy / llama-swap / llama.cpp
Basic Info
- Host: GitHub
- Owner: bjodah
- Language: Shell
- Default Branch: main
- Size: 3.83 MB
Statistics
- Stars: 9
- Watchers: 1
- Forks: 1
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
local-aider
UPDATE 2025-04-07: Please take a look at llama-swap's example for a better approach to this. I have since changed my approach by not relying on litellm_proxy/ prefix, but instead highjacking the openai-base url (with the minor inconvenience that we need to restart aider if we actually want to connect to an openai model). My current setup using llama-swap is found here: https://github.com/bjodah/llm-multi-backend-container Original README follows:
This repo is just an attempt to collect scripts and notes on how one can enable using aider with a local reasoning "architect" model (Qwen/QwQ-32B) and a non-reasoning "editor" model (Qwen/Qwen2.5-Coder-Instruct-32B) on a single consumer-grade GPU (tested on RTX 3090).
I should mention that the simplest solution is probably to use the Ollama support in aider, the approach here however, allows you (in principle) to experiment with different backends (such as vLLM, ExllamaV2+tabbyAPI, ...).
Usage
console
$ mkdir brainstorming-repo
$ cd brainstorming-repo
$ git init .
$ ./bin/local-model-enablement-wrapper \
aider \
--architect --model litellm_proxy/local-qwq-32b \
--editor-model litellm_proxy/local-qwen25-coder-32b
A less "magical" approach would be to launch the compose file manually.
In one terminal:
console
$ podman compose up
[pod-llama-cpp-swap] | llama-swap listening on :8686
[pod-litellm-proxy] | INFO: Started server process [1]
[pod-litellm-proxy] | INFO: Waiting for application startup.
[pod-litellm-proxy] | INFO: Application startup complete.
[pod-litellm-proxy] | INFO: Uvicorn running on http://0.0.0.0:4000 (Press CTRL+C to quit)
...
and then in another terminal, launch aider as usual, but make sure you export the relevant
environment variables:
console
$ env \
LITELLM_PROXY_API_BASE="http://localhost:4000" \
LITELLM_PROXY_API_KEY=sk-deadbeef0badcafe \
aider \
--architect --model litellm_proxy/local-qwq-32b \
--editor-model litellm_proxy/local-qwen25-coder-32b
Customization
Everything in this repo is probably subject to customization. If you
want to increase the verbosity of the logging (e.g. trouble-shooting)
you can adjust these settings:
console
grep -E '(logRequests|detailed_debug)' -R .
./compose.yml: host-litellm.py --config /root/litellm.yml --detailed_debug
./config-llamacpp-container.yaml:logRequests: true
Challenges
- The 32B parameter models fit in 24GB VRAM, but only one at a time, solution: llama-swap
- The easiest way to run the models is using llama.cpp's Docker image. But llama-swap has a problem stopping the container when unloading a model, solution: run llama-swap inside llama.cpp's server container.
aiderrelies on litellm for routing model selection to different backends. litellm relies on 'openai/' prefix to indicate OpenAI compatible API endpoint. And whilelitellmoffers custom prompts as well as taking prompts from huggingface config files, there are two problems: the former does not have an effect when the prefix isopenai/(I submitted a PR to address this here) the latter requires ahuggingface/prefix which unfortunately changes the request format to that of huggingface's API which is not OpenAI compatible (as far as I can tell). Workaround: I use a patched litellm (from the PR) for now.
TODOs
- [ ] the prompt template might not be working quite right, looking at the logs, and responses, \n\n might not be correctly escaped, I see occurrences of "nn" and "nnnn"
- [ ] litellm proxy does not seem to propagate request interruption
Demo
or view the full cast using asciinema player
here.
Miscellaneous
- Aider needs to be informed about context window
size,
you may copy/append
.aider.model.metadata.jsonto your $HOME directory (or the root of your git repo in which you intend to run aider). - The health-check query, in the wrapper-script gives some delay when launching the script, set LOCALAIDERSKIPHEALTHCHECK=1 to skip it.
- Best practice is to run aider in a sandboxed environment (executing LLM generated code is
risky). We can replace the aider call with e.g "podman run ..." or "docker run ...". At this
point, an alias might come in handy:
console $ grep aider-local-qwq32 ~/.bashrc alias aider-local-qwq32="env LOCAL_AIDER_SKIP_HEALTH_CHECK=1 local-model-enablement-wrapper contaider --architect --model litellm_proxy/local-qwq-32b --editor-model litellm_proxy/local-qwen25-coder-32b"this alias uses a utility script to launch aider in a container (contaider).
Owner
- Name: Bjorn
- Login: bjodah
- Kind: user
- Repositories: 48
- Profile: https://github.com/bjodah
GitHub Events
Total
- Watch event: 9
- Delete event: 1
- Push event: 12
- Pull request review comment event: 1
- Pull request review event: 1
- Pull request event: 5
- Create event: 4
Last Year
- Watch event: 9
- Delete event: 1
- Push event: 12
- Pull request review comment event: 1
- Pull request review event: 1
- Pull request event: 5
- Create event: 4