the extraction workflow and Docker Compose for infrastructure.
Step 1: Infrastructure Provisioning
FastGPT relies on PostgreSQL with pgvector for semantic search and MongoDB for conversation state. The following configuration isolates services, enforces health checks, and prepares the environment for Ollama integration.
version: '3.8'
services:
fastgpt-core:
image: ghcr.io/labring/fastgpt:latest
ports:
- "3000:3000"
environment:
- MONGO_URI=mongodb://mongo:27017/fastgpt
- PGVECTOR_URI=postgresql://postgres:secure_pass@pgvector:5432/fastgpt
- OPENAI_BASE_URL=http://ollama:11434/v1
- OPENAI_API_KEY=ollama
depends_on:
mongo:
condition: service_healthy
pgvector:
condition: service_healthy
restart: unless-stopped
pgvector:
image: pgvector/pgvector:pg16
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: secure_pass
POSTGRES_DB: fastgpt
volumes:
- pg_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5
mongo:
image: mongo:7
environment:
MONGO_INITDB_DATABASE: fastgpt
volumes:
- mongo_data:/data/db
healthcheck:
test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
interval: 5s
timeout: 3s
retries: 5
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_models:/root/.ollama
volumes:
pg_data:
mongo_data:
ollama_models:
Naive chunking embeds raw text. QA extraction requires an LLM to parse documents and output structured pairs. The following module demonstrates how to call an OpenAI-compatible endpoint, enforce JSON schema validation, and prepare embeddings for storage.
import { OpenAI } from 'openai';
import { z } from 'zod';
const qaPairSchema = z.object({
question: z.string().min(10).max(200),
answer: z.string().min(5).max(1000),
category: z.enum(['policy', 'technical', 'billing', 'general']).optional()
});
const qaBatchSchema = z.array(qaPairSchema);
export class KnowledgeExtractor {
private client: OpenAI;
private embeddingModel: string;
constructor(baseURL: string, apiKey: string, embeddingModel: string) {
this.client = new OpenAI({ baseURL, apiKey });
this.embeddingModel = embeddingModel;
}
async extractQAFromDocument(rawText: string): Promise<z.infer<typeof qaBatchSchema>> {
const prompt = `
Analyze the following document excerpt. Extract exactly 3-5 distinct question-answer pairs.
Format the output as a JSON array matching the schema:
{ "question": string, "answer": string, "category": "policy" | "technical" | "billing" | "general" }
Document:
${rawText}
`;
const response = await this.client.chat.completions.create({
model: 'llama3',
messages: [{ role: 'user', content: prompt }],
response_format: { type: 'json_object' },
temperature: 0.2
});
const rawOutput = response.choices[0]?.message?.content ?? '[]';
const parsed = JSON.parse(rawOutput);
return qaBatchSchema.parse(parsed);
}
async generateEmbeddings(questions: string[]): Promise<number[][]> {
const embeddings: number[][] = [];
for (const q of questions) {
const res = await this.client.embeddings.create({
model: this.embeddingModel,
input: q
});
embeddings.push(res.data[0].embedding);
}
return embeddings;
}
}
Step 3: Architecture Rationale
Why PostgreSQL with pgvector? Vector search requires ACID compliance, transactional safety, and mature indexing strategies. pgvector supports HNSW and IVFFlat indexes, enabling sub-millisecond similarity searches at scale. MongoDB lacks native vector capabilities, making it unsuitable as the primary retrieval store.
Why MongoDB for conversation state? Chat history, audit logs, and user sessions require flexible schemas and high write throughput. MongoDB's document model aligns naturally with session tracking, while keeping vector operations isolated in PostgreSQL.
Why node-based routing? FastGPT's visual workflow builder decouples intent classification from retrieval strategy. Instead of a monolithic pipeline, you route queries through conditional nodes: intent classification β FAQ lookup β document search β fallback response. This modularity reduces hallucination rates and simplifies debugging.
Pitfall Guide
1. License Misinterpretation
Explanation: FastGPT uses a custom license that explicitly prohibits reselling the platform as a managed SaaS to third parties. Teams often assume "open source" equals commercial freedom.
Fix: Verify deployment scope. Internal team usage and backend integration into proprietary products are permitted. If external commercialization is planned, migrate to MaxKB (Apache 2.0) or WeKnora (MIT).
2. Ollama Endpoint Misconfiguration
Explanation: Ollama exposes an OpenAI-compatible API, but the base path must include /v1. Omitting it or using the root path causes 404 errors during embedding or chat requests.
Fix: Always configure OPENAI_BASE_URL=http://ollama:11434/v1. The API key field accepts any non-empty string when running locally, but production deployments should enforce token authentication.
Explanation: QA extraction excels on policy documents, FAQs, and procedural guides. It fails on narrative text, research papers, or highly technical specifications where context spans multiple paragraphs.
Fix: Implement a document classifier node before ingestion. Route structured content to QA extraction and unstructured content to hybrid chunking with overlap. Never force QA generation on documents lacking clear Q&A boundaries.
4. Ignoring Vector Index Tuning
Explanation: Default pgvector settings use brute-force search. As the knowledge base grows beyond 10K vectors, latency degrades exponentially.
Fix: Create an HNSW index after initial ingestion: CREATE INDEX ON vectors USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);. Adjust m and ef_construction based on memory constraints and query latency requirements.
5. Skipping Fallback Routing in Workflows
Explanation: Visual pipelines often chain nodes without error handling. If the QA retrieval node returns zero matches, the pipeline fails silently or returns generic errors.
Fix: Always attach a fallback branch. Route low-confidence scores (<0.75 cosine similarity) to a secondary search node, then to a generic LLM response with a disclaimer. Log all fallback triggers for pipeline optimization.
6. Prompt Drift in QA Generation
Explanation: LLM-generated Q&A pairs vary in tone, length, and terminology across documents. Inconsistent phrasing reduces retrieval accuracy because semantically similar questions use different vocabulary.
Fix: Enforce strict prompt templates with few-shot examples. Add a post-processing step that normalizes terminology using a synonym dictionary or a secondary LLM call focused on standardization.
7. Production Security Gaps
Explanation: Default credentials (root/1234) and unencrypted HTTP endpoints are common in development. Exposing these in production invites unauthorized access and data exfiltration.
Fix: Rotate default credentials immediately. Terminate SSL at a reverse proxy (Nginx/Traefik). Restrict MongoDB and PostgreSQL to internal Docker networks. Implement rate limiting on the /v1 Ollama endpoint.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Internal HR/IT Knowledge Base | LLM QA Extraction | High accuracy on policy documents, low maintenance | Medium (LLM ingestion cost) |
| Customer Support Bot | LLM QA Extraction + Fallback | Direct question matching reduces hallucination | Medium-High (requires workflow tuning) |
| Commercial SaaS Product | MaxKB or WeKnora | Apache/MIT license permits resale and white-labeling | Low (no license restrictions) |
| Technical Research Archive | Hybrid Chunking + Keyword Search | QA extraction fails on dense, cross-referenced content | Low-Medium (dual-index overhead) |
| High-Volume Real-Time Chat | Naive Chunking + Aggressive Caching | QA preprocessing adds latency; speed prioritized | Low (minimal compute) |
Configuration Template
# .env.production
MONGO_URI=mongodb://app_user:strong_password@mongo:27017/fastgpt_prod
PGVECTOR_URI=postgresql://app_user:strong_password@pgvector:5432/fastgpt_prod
OPENAI_BASE_URL=http://ollama:11434/v1
OPENAI_API_KEY=prod_ollama_token_12345
DEFAULT_EMBEDDING_MODEL=text-embedding-3-small
QA_EXTRACTION_MODEL=llama3
WORKFLOW_TIMEOUT_MS=5000
VECTOR_SIMILARITY_THRESHOLD=0.75
LOG_LEVEL=info
Quick Start Guide
- Initialize Infrastructure: Clone the repository, copy
.env.example to .env, populate credentials, and run docker compose up -d. Verify services via docker compose ps.
- Connect LLM Provider: Navigate to
localhost:3000, log in with default credentials, and configure the AI model settings. Set provider to OpenAI Compatible, base URL to http://ollama:11434/v1, and model to llama3.
- Ingest Knowledge Base: Upload documents, select QA Split processing mode, and trigger extraction. Monitor the ingestion queue for JSON validation errors.
- Build Routing Workflow: Use the visual node editor to create a pipeline: Intent Classifier β QA Retrieval β Confidence Check β Fallback LLM. Set similarity threshold to
0.75.
- Validate & Iterate: Test with 20+ domain-specific queries. Log low-confidence retrievals, refine prompts, and adjust vector index parameters. Rotate credentials and enable TLS before public exposure.