Back to KB
Difficulty
Intermediate
Read Time
6 min

Install the package

By Codcompass TeamΒ·Β·6 min read

Install the package

Current Situation Analysis

Traditional AI research workflows suffer from critical failure modes when tasked with complex, multi-source queries. Standard LLM chat interfaces operate on single-turn generation, frequently producing hallucinated paragraphs without verifiable citations. They lack iterative research loops, cannot natively traverse academic databases (arXiv, PubMed) or local document repositories, and force data exfiltration to third-party cloud APIs. This creates three primary pain points:

  1. Verification Debt: Engineers and researchers spend disproportionate time cross-referencing AI outputs against original sources.
  2. Privacy & Compliance Risks: Sensitive internal queries, proprietary codebases, or regulated data cannot be safely processed by commercial deep-research APIs.
  3. Fragmented Knowledge Bases: Each research session is ephemeral. There is no compounding, searchable library that grows with each query, forcing teams to rebuild context repeatedly.

Traditional RAG pipelines often fail here because they rely on static vector embeddings and lack the dynamic, multi-step search synthesis required for open-ended research questions. They also struggle to balance live web retrieval with local document indexing without heavy custom orchestration.

WOW Moment: Key Findings

Local Deep Research (LDR) closes the gap between commercial cloud-based research agents and self-hosted infrastructure by implementing an iterative search-synthesis loop with persistent, encrypted local storage. Benchmark testing against the SimpleQA dataset demonstrates parity with enterprise-grade tools while maintaining full data sovereignty.

ApproachCitation Accuracy (SimpleQA)Source DiversityData PrivacyIterative SynthesisSetup Complexity
Traditional LLM Chat~45-60%Low (Training data only)Cloud-dependentNoneLow
Commercial Deep Research (Cloud)~85-90%High (Web/Academic)Third-party APIYesLow
Local Deep Research (LDR)~90-95%High (Web/Academic/Local)Fully Local/Zero-KnowledgeYesModerate

Key Findings:

  • LDR achieves ~95% accuracy on SimpleQA when paired with GPT-4.1-mini and SearXNG, matching commercial benchmarks.
  • The iterative discard/expand loop filters low-quality content dynamically, reducing hallucination rates by ~40% compared to single-pass RAG.
  • SQLCipher (AES-256) encryption ensures zero-knowledge storage; even server administrators cannot decrypt user research libraries.
  • Full local execution (Ollama + SearXNG) eliminates API costs and data leakage, with WebSocket support enabling real-time progress tracking.

Core Solution

LDR operates as a self-hosted AI research assistant that orchestrates multi-source retrieval, iterative synthesis, and structured report generation. The architecture bundles three core components:

  • Ollama: Local LLM inference engine
  • SearXNG: Self-hosted meta-search engine
  • SQLCipher: Encrypted SQLite database for persistent knowledge storage

Deployment Architecture

Option 1: Docker (Recommended) Handles dependency resolution, encryption initialization, and service wiring automatically.

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d

Wait about 30 seconds, then open http://localhost:5000.

With NVIDIA GPU acceleration (Linux only): First install the NVIDIA Container Toolkit:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor \
  -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install nvidia-container-toolkit -y
sudo systemctl restart docker
nvidia-smi  # verify it worked

Then bring up the stack with GPU support:

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/mai

n/docker-compose.gpu.override.yml docker compose -f docker-compose.yml -f docker-compose.gpu.override.yml up -d


**Option 2: pip (For developers / Python integration)**

Install the package

pip install local-deep-research

Run SearXNG in Docker for search

docker run -d -p 8080:8080 --name searxng searxng/searxng

Install Ollama from https://ollama.ai, then pull a model

ollama pull gemma3:12b

Start the web UI

python -m local_deep_research.web.app

**Important note on encryption:** The pip install does not automatically set up SQLCipher (the AES-256 encrypted database LDR uses for storing your data and API keys). If you hit errors during setup, bypass it for now with:

export LDR_ALLOW_UNENCRYPTED=true

This stores data in plain SQLite. Fine for local dev, not recommended for production or shared setups. Docker handles encryption out of the box.

### Programmatic Integration
**Python API:**

from local_deep_research.api import LDRClient, quick_query

One-liner research

summary = quick_query("username", "password", "What is the current state of Rust async runtimes?") print(summary)

Client for more control

client = LDRClient() client.login("username", "password") result = client.quick_research("Compare FAISS vs Hnswlib for vector search at scale") print(result["summary"])


**HTTP API:**
LDR exposes a REST API with session-based authentication and CSRF protection. The auth flow is a bit verbose but works reliably:

import requests from bs4 import BeautifulSoup

session = requests.Session()

Get CSRF token from login page

login_page = session.get("http://localhost:5000/auth/login") soup = BeautifulSoup(login_page.text, "html.parser") csrf = soup.find("input", {"name": "csrf_token"}).get("value")

Authenticate

session.post("http://localhost:5000/auth/login", data={ "username": "user", "password": "pass", "csrf_token": csrf })

Get API CSRF token

api_csrf = session.get("http://localhost:5000/auth/csrf-token").json()["csrf_token"]

Submit a research query

response = session.post( "http://localhost:5000/api/start_research", json={"query": "What are the tradeoffs between gRPC and REST for internal microservices?"}, headers={"X-CSRF-Token": api_csrf} ) print(response.json())

The repository includes ready-to-run HTTP examples under `examples/api_usage/http/` that handle authentication, retry logic, and progress polling.

### Enterprise / RAG Integration
LDR integrates with existing vector stores via LangChain retrievers, enabling hybrid search across live web results and internal knowledge bases:

from local_deep_research.api import quick_summary

result = quick_summary( query="What are our current deployment procedures for the payments service?", retrievers={"internal_kb": your_langchain_retriever}, search_tool="internal_kb" )

Supported backends include FAISS, Chroma, Pinecone, Weaviate, Elasticsearch, and any LangChain-compatible retriever.

### Search Sources & LLM Configuration
**Free (no API key needed):** arXiv, PubMed, Semantic Scholar, Wikipedia, SearXNG, GitHub, The Guardian, Wikinews, Wayback Machine.
**Premium (API key required):** Tavily, Google (SerpAPI/Programmable Search), Brave Search.
**Local LLMs:** Llama 3, Mistral, Gemma, DeepSeek, and any Ollama-supported model.
**Cloud LLMs:** OpenAI (GPT-4, GPT-4.1-mini), Anthropic (Claude 3), Google (Gemini), 100+ models via OpenRouter.

## Pitfall Guide
1. **SQLCipher Encryption Bypass in pip Install**: The `pip install` path does not auto-configure SQLCipher. Setting `LDR_ALLOW_UNENCRYPTED=true` drops data into plain SQLite, which is acceptable for local development but violates security compliance in production or multi-user environments. Always validate encryption status before deploying to shared infrastructure.
2. **Hardware Constraints for Local LLM Execution**: Running Ollama + SearXNG + LDR concurrently demands significant RAM and VRAM. CPU-only setups will experience severe latency during synthesis loops. A dedicated GPU (8GB+ VRAM recommended) is required for acceptable throughput, and NVIDIA Container Toolkit misconfiguration is a common deployment blocker.
3. **CSRF Token Handling in HTTP API**: The REST API enforces strict CSRF protection. Failing to extract the token from the login page HTML and attach it to subsequent requests will result in `403 Forbidden` errors. Always parse the `<input name="csrf_token">` value and include it in both authentication and research submission headers.
4. **Over-Provisioning for Simple Q&A Workloads**: LDR is engineered for multi-step research synthesis, not conversational Q&A. Routing simple factual queries through the iterative search loop introduces unnecessary latency and resource consumption. Reserve LDR for literature reviews, competitive analysis, and complex technical investigations.
5. **GPU Passthrough & NVIDIA Container Toolkit Configuration**: Docker GPU acceleration requires explicit `--gpus all` flags and proper NVIDIA driver/container toolkit alignment. Mismatched driver versions or missing `nvidia-container-toolkit` packages will cause silent fallback to CPU execution, degrading performance by 10-50x. Verify with `nvidia-smi` inside the container.
6. **Ignoring Zero-Knowledge Password Recovery Limits**: LDR's zero-knowledge architecture means there is no password recovery mechanism. If you lose credentials, the SQLCipher database becomes permanently inaccessible. Implement secure credential management (e.g., HashiCorp Vault, Bitwarden) and maintain encrypted backups of the database volume.

## Deliverables
**πŸ“˜ Deployment & Architecture Blueprint**
A structured reference covering the LDR service topology (Ollama β†’ SearXNG β†’ LDR Core β†’ SQLCipher), network port mapping, GPU passthrough requirements, and hybrid RAG integration patterns. Includes environment variable matrices for cloud vs. local LLM routing.

**βœ… Pre-Flight & Integration Checklist**
- [ ] Verify NVIDIA driver & container toolkit compatibility (Linux/GPU path)
- [ ] Validate SQLCipher encryption initialization or explicitly acknowledge unencrypted fallback
- [ ] Configure SearXNG instance and confirm meta-search endpoint responsiveness
- [ ] Pull target Ollama model and verify VRAM allocation via `ollama ps`
- [ ] Test CSRF token extraction flow before automating HTTP API calls
- [ ] Map LangChain retrievers to internal vector stores (FAISS/Chroma/Pinecone)
- [ ] Establish credential backup strategy aligned with zero-knowledge constraints
- [ ] Run SimpleQA benchmark query to validate synthesis loop & citation accuracy