s perform better running grep or find operations directly against the filesystem for precise file location, reserving embeddings for semantic discovery.
3. Graph for Structural Truth: The code graph must be the source of truth for impact analysis and cross-language calls. It is built via program analysis, not LLM summarization, ensuring accuracy even for legacy languages like COBOL or PL/I where LLMs are weak.
4. Wiki for Velocity: Curated wikis remain valuable for fast onboarding and answering "what" questions, provided they are kept in sync via CI/CD pipelines.
Implementation: TypeScript Intelligence Router
The following TypeScript example demonstrates a router that directs queries to the appropriate provider. This replaces naive RAG pipelines with intent-aware grounding.
// intelligence-router.ts
import { CodeGraphClient } from './providers/code-graph-client';
import { WikiProvider } from './providers/wiki-provider';
import { AgenticSearch } from './providers/agentic-search';
export interface QueryContext {
question: string;
targetRepo: string;
intent: 'IMPACT_ANALYSIS' | 'ONBOARDING' | 'SEMANTIC_SEARCH' | 'LEGACY_TRACE';
}
export interface GroundingResult {
provider: string;
context: string | object;
confidence: number;
metadata: Record<string, unknown>;
}
export class CodeIntelligenceRouter {
private graphClient: CodeGraphClient;
private wikiProvider: WikiProvider;
private agenticSearch: AgenticSearch;
constructor(config: RouterConfig) {
this.graphClient = new CodeGraphClient(config.graphEndpoint);
this.wikiProvider = new WikiProvider(config.wikiIndex);
this.agenticSearch = new AgenticSearch(config.repoPath);
}
async resolve(context: QueryContext): Promise<GroundingResult> {
switch (context.intent) {
case 'IMPACT_ANALYSIS':
case 'LEGACY_TRACE':
return this.resolveViaGraph(context);
case 'ONBOARDING':
return this.resolveViaWiki(context);
case 'SEMANTIC_SEARCH':
return this.resolveViaAgentic(context);
default:
throw new Error(`Unsupported intent: ${context.intent}`);
}
}
private async resolveViaGraph(context: QueryContext): Promise<GroundingResult> {
// Graph queries return structured data, not prose
const result = await this.graphClient.query({
tool: 'impact_of_change',
arguments: {
entity: context.question, // e.g., "PaymentService.refund"
change: 'signature',
scope: ['workflows', 'business_rules', 'data_entities']
}
});
return {
provider: 'CodeGraph',
context: result,
confidence: 0.95, // High confidence due to program analysis
metadata: { languages: result.metadata.languages, nodes_traversed: result.metadata.nodes }
};
}
private async resolveViaWiki(context: QueryContext): Promise<GroundingResult> {
// Wiki provides summarized context for descriptive queries
const pages = await this.wikiProvider.search(context.question);
const combinedContext = pages.map(p => `## ${p.title}\n${p.content}`).join('\n\n');
return {
provider: 'CuratedWiki',
context: combinedContext,
confidence: 0.75, // Lower confidence due to potential staleness
metadata: { page_count: pages.length, last_updated: pages[0]?.timestamp }
};
}
private async resolveViaAgentic(context: QueryContext): Promise<GroundingResult> {
// Agentic search uses filesystem tools for precise file retrieval
const files = await this.agenticSearch.findRelevantFiles(context.question);
const snippets = await this.agenticSearch.readSnippets(files, context.question);
return {
provider: 'AgenticSearch',
context: snippets,
confidence: 0.85,
metadata: { files_scanned: files.length, matches: snippets.length }
};
}
}
Rationale
- Typed Intents: By classifying queries, the router avoids using a wiki for impact analysis, which is a common failure mode. Impact analysis requires deterministic graph traversal, not probabilistic text matching.
- Structured Graph Output: The
CodeGraphClient returns objects containing workflows, business rules, and data entities. This allows the agent to plan refactors programmatically rather than interpreting a markdown summary.
- Confidence Scoring: The router assigns confidence based on the provider's reliability for the specific task. Graph providers score higher for structural queries; wikis score lower due to drift risk. This enables the agent to request verification when confidence is low.
Pitfall Guide
-
The Embedding Blind Spot
- Explanation: Vector embeddings favor frequently accessed or well-documented code. Edge cases, error handlers, and rarely used utility functions often have poor embedding coverage, leading to retrieval misses.
- Fix: Use hybrid retrieval combining BM25 keyword search with embeddings. For critical paths, rely on the code graph which indexes all nodes regardless of frequency.
-
Wiki Drift and Staleness
- Explanation: Curated wikis degrade rapidly as the codebase evolves. If the ingestion pipeline is not triggered on every commit, the wiki contains outdated summaries, causing agents to hallucinate based on old logic.
- Fix: Integrate wiki updates into the CI/CD pipeline. Trigger re-ingestion on PR merge. Alternatively, use the code graph as the authoritative source and generate wiki pages dynamically from graph data.
-
Legacy Language Neglect
- Explanation: LLMs are trained predominantly on modern languages. Summarization and embedding quality drop sharply for COBOL, PL/I, and mainframe dialects. Agents may ignore or misinterpret legacy code.
- Fix: Deploy a code graph engine that supports program analysis for 40+ languages, including legacy dialects. Ensure the graph treats legacy nodes with the same structural fidelity as modern code.
-
Chunking Artifacts
- Explanation: Naive chunking (e.g., splitting by file or fixed token count) breaks function boundaries and control flow. This destroys semantic coherence, making retrieval useless for understanding logic.
- Fix: Use semantic chunking based on AST boundaries. Chunk at the function or class level. Better yet, use the code graph where nodes represent complete logical units.
-
Confusing Summary with Truth
- Explanation: Agents may treat LLM-generated summaries as factual ground truth. If a summary omits a conditional branch or misinterprets a business rule, the agent will propagate the error.
- Fix: Implement verification steps. For critical operations, require the agent to cross-reference summaries with graph data or raw source snippets. Use the graph to validate business rule traceability.
-
Agentic Search Overhead
- Explanation: While agentic search (grep/find) is transparent, running it repeatedly on massive repositories can be slow and consume excessive context.
- Fix: Pre-compute file indexes or use the code graph for fast traversal. Reserve agentic search for interactive exploration where the agent needs to narrow down candidates dynamically.
-
Context Window Illusion
- Explanation: Developers assume that increasing the context window solves retrieval issues. However, models still struggle to locate relevant information in ultra-long contexts, and costs scale linearly.
- Fix: Treat context windows as a cache, not a database. Use precise retrieval to inject only the necessary context. Optimize for signal-to-noise ratio rather than raw token count.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Startup / Small Repo (<50k LOC) | Curated Wiki + Vector | Low overhead, fast setup, sufficient for descriptive queries. | Low |
| Enterprise / Large Repo (>1M LOC) | Code Graph + Agentic Search | Precision required for impact analysis; vector search fails at scale. | High |
| Legacy Modernization | Code Graph | Essential for cross-language traceability and understanding legacy logic. | Medium |
| High-Frequency Refactoring | Code Graph | Deterministic impact analysis prevents regressions during rapid changes. | Medium |
| Developer Onboarding | Curated Wiki | Provides high-level summaries and architecture docs for new hires. | Low |
Configuration Template
// intelligence-stack.config.ts
export const stackConfig = {
router: {
defaultIntent: 'SEMANTIC_SEARCH',
confidenceThreshold: 0.80,
fallbackProvider: 'AgenticSearch'
},
providers: {
graph: {
endpoint: 'https://graph-engine.internal/api',
supportedLanguages: ['java', 'typescript', 'cobol', 'pl/i'],
updateInterval: 'ON_COMMIT',
cacheTTL: 3600
},
wiki: {
index: 's3://knowledge-base/wiki-index',
ingestionPipeline: 'ci/wiki-ingest',
maxAge: 86400, // 24 hours
autoRefresh: true
},
agentic: {
repoPath: '/workspace/repo',
maxFiles: 50,
timeout: 5000
}
},
security: {
mcpAuth: 'SERVICE_ACCOUNT',
dataEncryption: 'AES-256',
auditLog: true
}
};
Quick Start Guide
- Initialize Graph Analysis: Run the code graph engine against your repository. Ensure it parses all languages and generates the knowledge graph.
graph-engine analyze --repo ./my-service --output ./graph-data
- Configure Router: Set up the intelligence router with the configuration template. Point it to the graph endpoint and wiki index.
- Connect Agent: Link your coding agent (e.g., Claude Code, Cursor) to the router via MCP or API. Ensure the agent can call
resolve with query intents.
- Validate Impact Query: Test the system with a structural query. Verify the graph returns accurate downstream dependencies.
agent query --intent IMPACT_ANALYSIS --question "UserService.createAccount"
- Enable CI Integration: Add wiki and graph update steps to your CI pipeline. Verify that knowledge updates automatically on merge.