https://github.com/bjodah/local-aider

Proof-of-concept Aider w. local (24GB vram) QwQ+Qwen2.5-Coder using litellm-proxy / llama-swap / llama.cpp

https://github.com/bjodah/local-aider

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Proof-of-concept Aider w. local (24GB vram) QwQ+Qwen2.5-Coder using litellm-proxy / llama-swap / llama.cpp

Basic Info
  • Host: GitHub
  • Owner: bjodah
  • Language: Shell
  • Default Branch: main
  • Size: 3.83 MB
Statistics
  • Stars: 9
  • Watchers: 1
  • Forks: 1
  • Open Issues: 2
  • Releases: 0
Archived
Created 12 months ago · Last pushed 11 months ago
Metadata Files
Readme

README.md

local-aider

UPDATE 2025-04-07: Please take a look at llama-swap's example for a better approach to this. I have since changed my approach by not relying on litellm_proxy/ prefix, but instead highjacking the openai-base url (with the minor inconvenience that we need to restart aider if we actually want to connect to an openai model). My current setup using llama-swap is found here: https://github.com/bjodah/llm-multi-backend-container Original README follows:

This repo is just an attempt to collect scripts and notes on how one can enable using aider with a local reasoning "architect" model (Qwen/QwQ-32B) and a non-reasoning "editor" model (Qwen/Qwen2.5-Coder-Instruct-32B) on a single consumer-grade GPU (tested on RTX 3090).

I should mention that the simplest solution is probably to use the Ollama support in aider, the approach here however, allows you (in principle) to experiment with different backends (such as vLLM, ExllamaV2+tabbyAPI, ...).

Usage

console $ mkdir brainstorming-repo $ cd brainstorming-repo $ git init . $ ./bin/local-model-enablement-wrapper \ aider \ --architect --model litellm_proxy/local-qwq-32b \ --editor-model litellm_proxy/local-qwen25-coder-32b

A less "magical" approach would be to launch the compose file manually.

In one terminal: console $ podman compose up [pod-llama-cpp-swap] | llama-swap listening on :8686 [pod-litellm-proxy] | INFO: Started server process [1] [pod-litellm-proxy] | INFO: Waiting for application startup. [pod-litellm-proxy] | INFO: Application startup complete. [pod-litellm-proxy] | INFO: Uvicorn running on http://0.0.0.0:4000 (Press CTRL+C to quit) ... and then in another terminal, launch aider as usual, but make sure you export the relevant environment variables: console $ env \ LITELLM_PROXY_API_BASE="http://localhost:4000" \ LITELLM_PROXY_API_KEY=sk-deadbeef0badcafe \ aider \ --architect --model litellm_proxy/local-qwq-32b \ --editor-model litellm_proxy/local-qwen25-coder-32b

Customization

Everything in this repo is probably subject to customization. If you want to increase the verbosity of the logging (e.g. trouble-shooting) you can adjust these settings: console grep -E '(logRequests|detailed_debug)' -R . ./compose.yml: host-litellm.py --config /root/litellm.yml --detailed_debug ./config-llamacpp-container.yaml:logRequests: true

Challenges

  • The 32B parameter models fit in 24GB VRAM, but only one at a time, solution: llama-swap
  • The easiest way to run the models is using llama.cpp's Docker image. But llama-swap has a problem stopping the container when unloading a model, solution: run llama-swap inside llama.cpp's server container.
  • aider relies on litellm for routing model selection to different backends. litellm relies on 'openai/' prefix to indicate OpenAI compatible API endpoint. And while litellm offers custom prompts as well as taking prompts from huggingface config files, there are two problems: the former does not have an effect when the prefix is openai/ (I submitted a PR to address this here) the latter requires a huggingface/ prefix which unfortunately changes the request format to that of huggingface's API which is not OpenAI compatible (as far as I can tell). Workaround: I use a patched litellm (from the PR) for now.

TODOs

  • [ ] the prompt template might not be working quite right, looking at the logs, and responses, \n\n might not be correctly escaped, I see occurrences of "nn" and "nnnn"
  • [ ] litellm proxy does not seem to propagate request interruption

Demo

asciicast or view the full cast using asciinema player here.

Miscellaneous

  • Aider needs to be informed about context window size, you may copy/append .aider.model.metadata.json to your $HOME directory (or the root of your git repo in which you intend to run aider).
  • The health-check query, in the wrapper-script gives some delay when launching the script, set LOCALAIDERSKIPHEALTHCHECK=1 to skip it.
  • Best practice is to run aider in a sandboxed environment (executing LLM generated code is risky). We can replace the aider call with e.g "podman run ..." or "docker run ...". At this point, an alias might come in handy: console $ grep aider-local-qwq32 ~/.bashrc alias aider-local-qwq32="env LOCAL_AIDER_SKIP_HEALTH_CHECK=1 local-model-enablement-wrapper contaider --architect --model litellm_proxy/local-qwq-32b --editor-model litellm_proxy/local-qwen25-coder-32b" this alias uses a utility script to launch aider in a container (contaider).

Owner

  • Name: Bjorn
  • Login: bjodah
  • Kind: user

GitHub Events

Total
  • Watch event: 9
  • Delete event: 1
  • Push event: 12
  • Pull request review comment event: 1
  • Pull request review event: 1
  • Pull request event: 5
  • Create event: 4
Last Year
  • Watch event: 9
  • Delete event: 1
  • Push event: 12
  • Pull request review comment event: 1
  • Pull request review event: 1
  • Pull request event: 5
  • Create event: 4