https://github.com/learningcircuit/local-deep-research

Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local.

https://github.com/learningcircuit/local-deep-research

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary

Keywords

academia anthropic arxiv brave deep-research gemma home-automation homeserver local local-deep-research local-llm mistral ollama openai pubmed research research-tool retrieval-augmented-generation searxng self-hosted

Keywords from Contributors

diffusion transformers
Last synced: 6 months ago · JSON representation

Repository

Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local.

Basic Info
  • Host: GitHub
  • Owner: LearningCircuit
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 8.51 MB
Statistics
  • Stars: 3,349
  • Watchers: 25
  • Forks: 326
  • Open Issues: 54
  • Releases: 49
Topics
academia anthropic arxiv brave deep-research gemma home-automation homeserver local local-deep-research local-llm mistral ollama openai pubmed research research-tool retrieval-augmented-generation searxng self-hosted
Created about 1 year ago · Last pushed 6 months ago
Metadata Files
Readme Contributing Funding License Codeowners Security

README.md

Local Deep Research

[![GitHub stars](https://img.shields.io/github/stars/LearningCircuit/local-deep-research?style=for-the-badge)](https://github.com/LearningCircuit/local-deep-research/stargazers) [![Docker Pulls](https://img.shields.io/docker/pulls/localdeepresearch/local-deep-research?style=for-the-badge)](https://hub.docker.com/r/localdeepresearch/local-deep-research) [![PyPI Downloads](https://img.shields.io/pypi/dm/local-deep-research?style=for-the-badge)](https://pypi.org/project/local-deep-research/) [![Tests](https://img.shields.io/github/actions/workflow/status/LearningCircuit/local-deep-research/tests.yml?branch=main&style=for-the-badge&label=Tests)](https://github.com/LearningCircuit/local-deep-research/actions/workflows/tests.yml) [![CodeQL](https://img.shields.io/github/actions/workflow/status/LearningCircuit/local-deep-research/codeql.yml?branch=main&style=for-the-badge&label=CodeQL)](https://github.com/LearningCircuit/local-deep-research/security/code-scanning) [![Discord](https://img.shields.io/discord/1352043059562680370?style=for-the-badge&logo=discord)](https://discord.gg/ttcqQeFcJ3) [![Reddit](https://img.shields.io/badge/Reddit-r/LocalDeepResearch-FF4500?style=for-the-badge&logo=reddit)](https://www.reddit.com/r/LocalDeepResearch/) **AI-powered research assistant for deep, iterative research** *Performs deep, iterative research using multiple LLMs and search engines with proper citations*

🚀 What is Local Deep Research?

LDR is an AI research assistant that performs systematic research by:

  • Breaking down complex questions into focused sub-queries
  • Searching multiple sources in parallel (web, academic papers, local documents)
  • Verifying information across sources for accuracy
  • Creating comprehensive reports with proper citations

It aims to help researchers, students, and professionals find accurate information quickly while maintaining transparency about sources.

🎯 Why Choose LDR?

  • Privacy-Focused: Run entirely locally with Ollama + SearXNG
  • Flexible: Use any LLM, any search engine, any vector store
  • Comprehensive: Multiple research modes from quick summaries to detailed reports
  • Transparent: Track costs and performance with built-in analytics
  • Open Source: MIT licensed with an active community

📊 Performance

~95% accuracy on SimpleQA benchmark (preliminary results) - Tested with GPT-4.1-mini + SearXNG + focused-iteration strategy - Comparable to state-of-the-art AI research systems - Local models can achieve similar performance with proper configuration - Join our community benchmarking effort →

✨ Key Features

🔍 Research Modes

  • Quick Summary - Get answers in 30 seconds to 3 minutes with citations
  • Detailed Research - Comprehensive analysis with structured findings
  • Report Generation - Professional reports with sections and table of contents
  • Document Analysis - Search your private documents with AI

🛠️ Advanced Capabilities

  • LangChain Integration - Use any vector store as a search engine
  • REST API - Authenticated HTTP access with per-user databases
  • Benchmarking - Test and optimize your configuration
  • Analytics Dashboard - Track costs, performance, and usage metrics
  • Real-time Updates - WebSocket support for live research progress
  • Export Options - Download results as PDF or Markdown
  • Research History - Save, search, and revisit past research
  • Adaptive Rate Limiting - Intelligent retry system that learns optimal wait times
  • Keyboard Shortcuts - Navigate efficiently (ESC, Ctrl+Shift+1-5)
  • Per-User Encrypted Databases - Secure, isolated data storage for each user

🌐 Search Sources

Free Search Engines

  • Academic: arXiv, PubMed, Semantic Scholar
  • General: Wikipedia, SearXNG
  • Technical: GitHub, Elasticsearch
  • Historical: Wayback Machine
  • News: The Guardian

Premium Search Engines

  • Tavily - AI-powered search
  • Google - Via SerpAPI or Programmable Search Engine
  • Brave Search - Privacy-focused web search

Custom Sources

  • Local Documents - Search your files with AI
  • LangChain Retrievers - Any vector store or database
  • Meta Search - Combine multiple engines intelligently

Full Search Engines Guide →

⚡ Quick Start

Option 1: Docker (Quickstart on MAC/ARM)

```bash

Step 1: Pull and run SearXNG for optimal search results

docker run -d -p 8080:8080 --name searxng searxng/searxng

Step 2: Pull and run Local Deep Research (Please build your own docker on ARM)

docker run -d -p 5000:5000 --network host --name local-deep-research --volume 'deep-research:/data' -e LDRDATADIR=/data localdeepresearch/local-deep-research ```

Option 2: Docker Compose (Recommended)

LDR uses Docker compose to bundle the web app and all it's dependencies so you can get up and running quickly.

Option 2a: Quick Start (One Command)

Linux:

bash curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml && docker compose up -d

Windows:

bash curl.exe -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml; docker compose up -d

Use with a different model:

bash MODEL=gemma:1b curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml && docker compose up -d

Open http://localhost:5000 after ~30 seconds. This starts LDR with SearXNG and all dependencies.

Option 2b: DIY docker-compose

See docker-compose.yml for a docker-compose file with reasonable defaults to get up and running with ollama, searxng, and local deep research all running locally.

Things you may want/need to configure: * Ollama GPU driver * Ollama context length (depends on available VRAM) * Ollama keep alive (duration model will stay loaded into VRAM and idle before getting unloaded automatically) * Deep Research model (depends on available VRAM and preference)

Option 2c: Use Cookie Cutter to tailor a docker-compose to your needs:

Prerequisites

Clone the repository:

bash git clone https://github.com/LearningCircuit/local-deep-research.git cd local-deep-research

Configuring with Docker Compose

Cookiecutter will interactively guide you through the process of creating a docker-compose configuration that meets your specific needs. This is the recommended approach if you are not very familiar with Docker.

In the LDR repository, run the following command to generate the compose file:

bash cookiecutter cookiecutter-docker/ docker compose -f docker-compose.default.yml up

Docker Compose Guide →

Option 3: Python Package

```bash

Step 1: Install the package

pip install local-deep-research

Step 2: Setup SearXNG for best results

docker pull searxng/searxng docker run -d -p 8080:8080 --name searxng searxng/searxng

Step 3: Install Ollama from https://ollama.ai

Step 4: Download a model

ollama pull gemma3:12b

Step 5: Build frontend assets (required for Web UI)

Note: If installed via pip and using the Web UI, you need to build assets

Navigate to the installation directory first (find with: pip show local-deep-research)

npm install npm run build

Step 6: Start the web interface

python -m localdeepresearch.web.app ```

Important for pip users: If you installed via pip and want to use the web UI, you must run npm install and npm run build in the package installation directory to generate frontend assets (icons, styles). Without this, the UI will have missing icons and styling issues. For programmatic API usage only, these steps can be skipped.

Full Installation Guide →

💻 Usage Examples

Python API

```python from localdeepresearch.api import quicksummary from localdeepresearch.settings import CachedSettingsManager from localdeepresearch.database.sessioncontext import getuserdb_session

Authentication required - use with user session

with getuserdbsession(username="yourusername", password="yourpassword") as session: settingsmanager = CachedSettingsManager(session, "yourusername") settingssnapshot = settingsmanager.getall_settings()

# Simple usage with settings
result = quick_summary(
    query="What are the latest advances in quantum computing?",
    settings_snapshot=settings_snapshot
)
print(result["summary"])

```

HTTP API

```python import requests

Create session and authenticate

session = requests.Session() session.post("http://localhost:5000/auth/login", json={"username": "user", "password": "pass"})

Get CSRF token

csrf = session.get("http://localhost:5000/auth/csrf-token").json()["csrf_token"]

Make API request

response = session.post( "http://localhost:5000/research/api/start", json={"query": "Explain CRISPR gene editing"}, headers={"X-CSRF-Token": csrf} ) ```

More Examples →

Command Line Tools

```bash

Run benchmarks from CLI

python -m localdeepresearch.benchmarks --dataset simpleqa --examples 50

Manage rate limiting

python -m localdeepresearch.websearchengines.ratelimiting status python -m localdeepresearch.websearchengines.ratelimiting reset ```

🔗 Enterprise Integration

Connect LDR to your existing knowledge base:

```python from localdeepresearch.api import quick_summary

Use your existing LangChain retriever

result = quicksummary( query="What are our deployment procedures?", retrievers={"companykb": yourretriever}, searchtool="company_kb" ) ```

Works with: FAISS, Chroma, Pinecone, Weaviate, Elasticsearch, and any LangChain-compatible retriever.

Integration Guide →

📊 Performance & Analytics

Benchmark Results

Early experiments on small SimpleQA dataset samples:

| Configuration | Accuracy | Notes | |--------------|----------|--------| | gpt-4.1-mini + SearXNG + focusediteration | 90-95% | Limited sample size | | gpt-4.1-mini + Tavily + focusediteration | 90-95% | Limited sample size | | gemini-2.0-flash-001 + SearXNG | 82% | Single test run |

Note: These are preliminary results from initial testing. Performance varies significantly based on query types, model versions, and configurations. Run your own benchmarks →

Built-in Analytics Dashboard

Track costs, performance, and usage with detailed metrics. Learn more →

🤖 Supported LLMs

Local Models (via Ollama)

  • Llama 3, Mistral, Gemma, DeepSeek
  • LLM processing stays local (search queries still go to web)
  • No API costs

Cloud Models

  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic (Claude 3)
  • Google (Gemini)
  • 100+ models via OpenRouter

Model Setup →

📚 Documentation

Getting Started

Core Features

Advanced Features

Development

Examples & Tutorials

🤝 Community & Support

🚀 Contributing

We welcome contributions! See our Contributing Guide to get started.

📄 License

MIT License - see LICENSE file.

Built with: LangChain, Ollama, SearXNG, FAISS

Support Free Knowledge: Consider donating to Wikipedia, arXiv, or PubMed.

Owner

  • Login: LearningCircuit
  • Kind: user

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 586
  • Total Committers: 13
  • Avg Commits per committer: 45.077
  • Development Distribution Score (DDS): 0.408
Past Year
  • Commits: 586
  • Committers: 13
  • Avg Commits per committer: 45.077
  • Development Distribution Score (DDS): 0.408
Top Committers
Name Email Commits
LearningCircuit 1****t 347
hashedviking 6****g 118
Daniel Petti d****i@g****m 98
dim-tsoukalas d****g@g****m 5
ScottVR s****r@g****m 4
Davit Mnatobishvili s****s@g****m 3
JayLiu 1****7@q****m 3
Sam s****j 2
Nikhil Dev Goyal n****l@g****m 2
kabachuha a****1@y****u 1
Ikko Eltociear Ashimine e****r@g****m 1
Chris Cowley 1****y 1
Dominik Witczak d****o@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 159
  • Total pull requests: 744
  • Average time to close issues: 8 days
  • Average time to close pull requests: about 23 hours
  • Total issue authors: 75
  • Total pull request authors: 21
  • Average comments per issue: 2.22
  • Average comments per pull request: 0.32
  • Merged pull requests: 519
  • Bot issues: 0
  • Bot pull requests: 22
Past Year
  • Issues: 159
  • Pull requests: 744
  • Average time to close issues: 8 days
  • Average time to close pull requests: about 23 hours
  • Issue authors: 75
  • Pull request authors: 21
  • Average comments per issue: 2.22
  • Average comments per pull request: 0.32
  • Merged pull requests: 519
  • Bot issues: 0
  • Bot pull requests: 22
Top Authors
Issue Authors
  • LearningCircuit (42)
  • djpetti (16)
  • MicahZoltu (5)
  • theodorevo (5)
  • taoeffect (3)
  • EmmanuelROGER (3)
  • StatusQuo209 (3)
  • EggzYy (3)
  • xybernaut (2)
  • kendonB (2)
  • i-d-lytvynenko (2)
  • lixy910915 (2)
  • AhaZsy (2)
  • Penner10000 (2)
  • kabachuha (2)
Pull Request Authors
  • LearningCircuit (489)
  • djpetti (156)
  • HashedViking (28)
  • dependabot[bot] (19)
  • scottvr (10)
  • MicahZoltu (7)
  • dim-tsoukalas (5)
  • Drswagzz (4)
  • sammcj (4)
  • Nikhil0250 (4)
  • github-actions[bot] (3)
  • mehmetcanfarsak (2)
  • wutzebaer (2)
  • catsudon (2)
  • JayLiu7319 (2)
Top Labels
Issue Labels
bug (67) enhancement (35) technical-debt (9) fixed in dev (8) needs-replication (5) rc/0.1.* (5) feature-branch (4) help wanted (3) docs (3) rc/0.3.0 (3) metrics (2) good first issue (2) developer-experience (2) rc/0.2.0 (2) question (2) documentation (2) configuration (1) unclear (1) security (1) visualization (1) priority: high (1) snappy (1) docker (1) discussion (1) database (1) priority: low (1) benchmark-results (1) wontfix (1)
Pull Request Labels
bug (69) sync (63) auto-merge (48) enhancement (33) conflicts (15) docs (4) security (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 1,777 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 58
  • Total maintainers: 2
  • Total advisories: 1
pypi.org: local-deep-research

AI-powered research assistant with deep, iterative analysis using LLMs and web searches

  • Homepage: https://github.com/LearningCircuit/local-deep-research
  • Documentation: https://local-deep-research.readthedocs.io/
  • License: MIT License Copyright (c) 2025 LearningCircuit Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  • Latest release: 1.1.11
    published 6 months ago
  • Versions: 58
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,777 Last month
Rankings
Dependent packages count: 9.5%
Average: 31.4%
Dependent repos count: 53.3%
Maintainers (2)
Last synced: 6 months ago