autonomous research loop that decomposes queries, routes searches across heterogeneous sources, iteratively validates content, and compiles structured, cited reports. The architecture supports both fully-local deployments and cloud model fallbacks.
Architecture & Core Loop
- Strategy Selection: Parses the input query and selects a research mode (quick summary, deep analysis, academic, etc.).
- Sub-Query Decomposition: Breaks complex questions into targeted search threads.
- Multi-Source Retrieval: Queries web, arXiv, PubMed, Wikipedia, GitHub, and local document libraries.
- Iterative Synthesis: Discards low-quality content, expands promising threads, and cross-references findings.
- Report Generation: Outputs a structured citation-backed report and optionally indexes sources into an encrypted local library.
Deployment Options
Option 1: Docker (Recommended for most people)
This is the fastest path. It handles dependencies, encryption, and all service wiring automatically.
Standard setup (CPU, works on Mac, Windows, Linux):
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d
Enter fullscreen mode Exit fullscreen mode
Wait about 30 seconds, then open http://localhost:5000.
With NVIDIA GPU acceleration (Linux only):
First install the NVIDIA Container Toolkit:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor \
-o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install nvidia-container-toolkit -y
sudo systemctl restart docker
nvidia-smi # verify it worked
Enter fullscreen mode Exit fullscreen mode
Then bring up the stack with GPU support:
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.gpu.override.yml
docker compose -f docker-compose.yml -f docker-compose.gpu.override.yml up -d
Enter fullscreen mode Exit fullscreen mode
The Docker Compose setup bundles Ollama (local LLM runner) and SearXNG (self-hosted meta-search engine) together with LDR. Everything runs locally.
Option 2: pip (For developers / Python integration)
If you want to embed LDR in a Python project or prefer to manage dependencies yourself:
# Install the package
pip install local-deep-research
# Run SearXNG in Docker for search
docker run -d -p 8080:8080 --name searxng searxng/searxng
# Install Ollama from https://ollama.ai, then pull a model
ollama pull gemma3:12b
# Start the web UI
python -m local_deep_research.web.app
Enter fullscreen mode Exit fullscreen mode
Important note on encryption: The pip install does not automatically set up SQLCipher (the AES-256 encrypted database LDR uses for storing your data and API keys). If you hit errors during setup, bypass it for now with:
export LDR_ALLOW_UNENCRYPTED=true
Enter fullscreen mode Exit fullscreen mode
This stores data in plain SQLite. Fine for local dev, not recommended for production or shared setups. Docker handles encryption out of the box.
API Integration
Using the Python API
Once running, you can drive LDR programmatically:
from local_deep_research.api import LDRClient, quick_query
# One-liner research
summary = quick_query("username", "password", "What is the current state of Rust async runtimes?")
print(summary)
# Client for more control
client = LDRClient()
client.login("username", "password")
result = client.quick_research("Compare FAISS vs Hnswlib for vector search at scale")
print(result["summary"])
Enter fullscreen mode Exit fullscreen mode
Using the HTTP API
LDR exposes a REST API with session-based authentication and CSRF protection. The auth flow is a bit verbose but works reliably:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
# Get CSRF token from login page
login_page = session.get("http://localhost:5000/auth/login")
soup = BeautifulSoup(login_page.text, "html.parser")
csrf = soup.find("input", {"name": "csrf_token"}).get("value")
# Authenticate
session.post("http://localhost:5000/auth/login", data={
"username": "user",
"password": "pass",
"csrf_token": csrf
})
# Get API CSRF token
api_csrf = session.get("http://localhost:5000/auth/csrf-token").json()["csrf_token"]
# Submit a research query
response = session.post(
"http://localhost:5000/api/start_research",
json={"query": "What are the tradeoffs between gRPC and REST for internal microservices?"},
headers={"X-CSRF-Token": api_csrf}
)
print(response.json())
Enter fullscreen mode Exit fullscreen mode
The repository includes ready-to-run HTTP examples under examples/api_usage/http/ that handle authentication, retry logic, and progress polling.
Enterprise / RAG Integration
If you already have a vector store or internal knowledge base, LDR can search it as one of its sources via LangChain retrievers:
from local_deep_research.api import quick_summary
result = quick_summary(
query="What are our current deployment procedures for the payments service?",
retrievers={"internal_kb": your_langchain_retriever},
search_tool="internal_kb"
)
Enter fullscreen mode Exit fullscreen mode
It supports FAISS, Chroma, Pinecone, Weaviate, Elasticsearch, and anything LangChain-compatible. This enables seamless bridging between live web/academic search and proprietary enterprise knowledge graphs.
Pitfall Guide
- Bypassing Encryption in Production: Using
LDR_ALLOW_UNENCRYPTED=true stores API keys, research queries, and source metadata in plain SQLite. In shared or production environments, this exposes sensitive credentials and violates data governance policies. Always enforce SQLCipher AES-256 via Docker or proper pip configuration.
- NVIDIA Container Toolkit Mismatch: Failing to verify
nvidia-smi inside the container or using an outdated nvidia-container-toolkit version causes silent fallback to CPU inference. Iterative synthesis and embedding generation become prohibitively slow. Always validate GPU passthrough before deploying the .gpu.override.yml stack.
- CSRF & Session State Mishandling: The HTTP API requires strict CSRF token extraction and session cookie persistence. Skipping the initial
/auth/login GET request or failing to attach X-CSRF-Token to subsequent POST requests results in 403 Forbidden or silent authentication drops. Use the provided examples/api_usage/http/ templates as a baseline.
- Knowledge Base Indexing Latency: Downloaded sources (arXiv, PubMed, web pages) are not immediately searchable. Vector indexing and metadata extraction run asynchronously. Querying the local library immediately after ingestion will return stale or empty results. Implement retry/polling logic or check indexing status before dependent queries.
- Strategy-Query Mismatch: Selecting "quick summary" for academic or technical deep-dives truncates sub-query expansion and source validation. Always align the research strategy parameter with query complexity: use "academic" or "deep analysis" for literature reviews, and reserve "quick summary" for high-level overviews.
- LangChain Retriever Compatibility: Not all enterprise vector stores map cleanly to LDR's
search_tool parameter. Custom retrievers must implement the standard LangChain BaseRetriever interface and return properly formatted Document objects. Failing to wrap proprietary clients correctly breaks the retrievers dict injection.
- Model Context Window Exhaustion: Running iterative synthesis with models that have small context windows (<8k tokens) causes truncation during source aggregation. This leads to incomplete citations and hallucinated summaries. Pair LDR with models supporting β₯32k context or enable chunked synthesis routing.
Deliverables
- π Architecture Blueprint: System topology diagram detailing Ollama β SearXNG β LDR core loop, data flow for encrypted storage (SQLCipher), and LangChain retriever injection points. Includes deployment variants (CPU-only, NVIDIA GPU, cloud-fallback).
- β
Pre-Flight Checklist: Step-by-step validation matrix covering Docker/NVIDIA toolkit verification, CSRF/auth flow testing, encryption status confirmation, model context window validation, and SimpleQA benchmark execution.
- βοΈ Configuration Templates:
docker-compose.override.yml for GPU acceleration & resource limits
.env template for API key routing, encryption toggles, and SearXNG instance configuration
api_auth_session.py hardened template with automatic CSRF refresh, retry backoff, and progress polling
langchain_retriever_wrapper.py adapter template for FAISS/Chroma/Pinecone integration