AI Cited a URL That Didn't Contain the Claim. I Built the Tooling to Measure How Often
Current Situation Analysis
The Silent Failure of Grounded Generation
In production Retrieval-Augmented Generation (RAG) systems, the industry has largely solved the problem of "hallucinated facts" by grounding models in external knowledge bases. However, a more insidious failure mode has emerged: Citation Hallucination. This occurs when a model generates a response that appears factually groundedācomplete with valid URLs and structured referencesābut the link between the claim and the source is broken.
The user experience is deceptive. The response looks authoritative. The links are clickable. The domain names are correct. Yet, the cited document does not support the specific assertion made in the text. This is not a simple "model made up a fact" error; it is a structural failure in the attribution pipeline.
Why This Problem is Overlooked
Most engineering teams treat citation verification as a binary check: Does the URL exist? or Is the URL in the retrieved set?
This approach misses the nuance of modern LLM behavior. Models are optimized for fluency and synthesis. When generating text, they often compress information from multiple sources or substitute a "canonical" URL from their training data in place of the actual retrieved document. Because the output is semantically coherent and the links are valid, these errors slip past standard automated tests and are rarely caught until they reach end-users or compliance audits.
Data-Backed Evidence
Analysis of grounded queries across major search-tool APIs reveals that citation errors are not rare edge cases; they are systemic. In controlled audits of factual and product-oriented queries, citation failures account for a significant percentage of "correct-looking" responses.
Crucially, these failures are not monolithic. They fall into distinct categories, each with a different root cause and requiring a different mitigation strategy. Treating them all as "hallucinations" prevents engineers from applying the correct fix.
WOW Moment: Key Findings
The critical insight for engineering teams is that not all citation errors are equal. A fabricated URL is a hard failure. A URL substitution might be acceptable depending on the use case. Anchor-text drift is the hardest to detect but often the most damaging in regulated industries.
The following matrix breaks down the four distinct failure modes observed in production environments.
| Failure Mode | Mechanism | Detection Difficulty | User Impact |
|---|---|---|---|
| Fabricated URL | Model generates a plausible URL pattern not present in the retrieved context. | Low | High (Broken link / 404) |
| Retrieve-then-Misquote | Model cites a real URL, but the claim is supported by a different source or synthesis. | Medium | High (Misinformation) |
| URL Substitution | Model cites a "canonical" URL from training data instead of the actual retrieved source. | Medium | Medium (Broken audit trail) |
| Anchor-Text Drift | Model cites the correct URL, but the phrasing subtly alters the meaning of the source. | High | High (Compliance risk) |
Why this matters: By categorizing errors, teams can prioritize fixes. Blocking fabricated URLs is a quick win. Fixing anchor-text drift requires sophisticated semantic verification. Understanding the distribution of these errors allows for better resource allocation in model evaluation.
Core Solution
Architecture: The Citation Faithfulness Layer
To address these failures, we introduce a Citation Faithfulness Layer that sits between the LLM output and the user interface. This layer performs automated verification before the response is rendered.
The architecture consists of three stages:
- Extraction: Parse the LLM response to identify claims and their associated citations.
- Verification: Compare claims against the retrieved context using a tiered verification strategy.
- Enforcement: Apply policy-based actions (block, warn, or pass) based on the verification results.
Implementation Strategy
We will implement this using TypeScript. The solution focuses on modularity, allowing different verification strategies to be plugged in based on the required strictness.
1. Data Models
Define the structures for claims, citations, and verification results.
// models.ts
export interface Citation {
url: string;
claim: string;
}
export interface VerificationResult {
citation: Citation;
status: 'VALID' | 'FABRICATED' | 'MISQUOTE' | 'SUBSTITUTION' | 'DRIFT';
details: string;
}
export interface FaithfulnessReport {
totalCitations: number;
validCitations: number;
errors: VerificationResult[];
summary: {
fabricated: number;
misquote: number;
substitution: number;
drift: number;
};
}
2. The Verification Engine
The core logic implements the tiered verification strategy. It first checks for fabricated URLs, then validates the claim against the source text.
// verifier.ts
import { Citation, VerificationResult, FaithfulnessReport } from './models';
export class CitationVerifier {
private retrievedUrls: Set<string>;
private contextMap: Map<string, string>;
constructor(retrievedUrls: string[], contextMap: Map<string, string>) {
this.retrievedUrls = new Set(retrievedUrls);
this.contextMap = contextMap;
}
public verify(citations: Citation[]): FaithfulnessReport {
const results: VerificationResult[] = [];
const summary = { fabricated: 0, misquote: 0, substitution: 0, drift: 0 };
for (const citation of citations) {
const result = this.verifySingle(citation);
results.push(result);
if (result.status !== 'VALID') {
summary[result.status.toLowerCase() as keyof typeof summary]++;
}
}
return {
totalCitations: citations.length,
validCitations: citations.length - results.filter(r => r.status !== 'VALID').length,
errors: results.filter(r => r.status !== 'VALID'),
summary
};
}
private verifySingle(citation: Citation): VerificationResult {
// Check 1: Fabricated URL
if (!this.retrievedUrls.has(citation.url)) {
return {
citation,
status: 'FABRICATED',
details: 'URL not found in retrieved context.'
};
}
const sourceText = this.contextMap.get(citation.url) || '';
// Check 2: Claim Support (Semantic)
const isSupported = this.checkClaimSupport(sourceText, citation.claim);
if (!isSupported) {
// Check 3: Substitution (Is claim supported by another URL?)
const isSubstituted = this.checkSubstitution(citation.url, citation.claim);
if (isSubstituted) {
return {
citation,
status: 'SUBSTITUTION',
details: 'Claim supported by a different retrieved URL.'
};
}
return {
citation,
status: 'MISQUOTE',
details: 'Claim not supported by cited URL.'
};
}
// Check 4: Anchor-Text Drift (Semantic nuance)
const hasDrift = this.checkDrift(sourceText, citation.claim);
if (hasDrift) {
return {
citation,
status: 'DRIFT',
details: 'Claim supported but phrasing drifts from source.'
};
}
return {
citation,
status: 'VALID',
details: 'Citation verified successfully.'
};
}
// Placeholder for semantic verification logic
private checkClaimSupport(text: string, claim: string): boolean {
// In production, use an embedding model or LLM-as-a-judge
// to determine if the text supports the claim.
return text.includes(claim) || this.semanticMatch(text, claim);
}
private checkSubstitution(citedUrl: string, claim: string): boolean {
// Check if the claim is supported by any OTHER retrieved URL
for (const url of this.retrievedUrls) {
if (url !== citedUrl) {
const text = this.contextMap.get(url) || '';
if (this.checkClaimSupport(text, claim)) {
return true;
}
}
}
return false;
}
private checkDrift(text: string, claim: string): boolean {
// Detect subtle semantic shifts (e.g., "supports OAuth" vs "OAuth-compliant")
// Requires fine-grained semantic analysis.
return this.detectSemanticShift(text, claim);
}
// Mock semantic functions
private semanticMatch(text: string, claim: string): boolean {
return false; // Replace with actual implementation
}
private detectSemanticShift(text: string, claim: string): boolean {
return false; // Replace with actual implementation
}
}
3. Enforcement Policy
Define how the system should react to different error types.
// policy.ts
import { VerificationResult } from './models';
export type EnforcementAction = 'BLOCK' | 'WARN' | 'PASS';
export class EnforcementPolicy {
public determineAction(result: VerificationResult): EnforcementAction {
switch (result.status) {
case 'FABRICATED':
return 'BLOCK'; // Hard failure
case 'MISQUOTE':
return 'BLOCK'; // High risk of misinformation
case 'SUBSTITUTION':
return 'WARN'; // Acceptable in some contexts, but flagged
case 'DRIFT':
return 'WARN'; // Requires human review for compliance
case 'VALID':
return 'PASS';
default:
return 'WARN';
}
}
}
Architecture Decisions and Rationale
- Modular Verification: By separating the verification logic from the enforcement policy, the system can be adapted to different use cases. A customer support bot might tolerate substitutions, while a legal research tool must block them.
- Tiered Checking: The verification process is ordered by cost and complexity. Fabricated URLs are checked first (cheap, deterministic). Semantic checks are performed only if the URL is valid, optimizing for performance.
- Context Map: Storing retrieved content in a map allows for efficient lookups during verification, avoiding repeated network requests.
- Semantic Fallbacks: The implementation includes placeholders for semantic matching. In production, this should be backed by a dedicated embedding model or a smaller LLM fine-tuned for fact-checking.
Pitfall Guide
1. Ignoring URL Substitution
Explanation: Teams often focus only on fabricated URLs and misquotes, assuming that if the URL is real and the claim is true, the citation is valid. However, URL substitution breaks the audit trail. If the model cites a canonical documentation page instead of the specific forum post that contained the answer, the user cannot verify the exact source of the information. Fix: Implement substitution detection by checking if the claim is supported by any other retrieved URL. Flag these as warnings or block them in regulated contexts.
2. Over-Reliance on Exact String Matching
Explanation: Using simple string matching (text.includes(claim)) to verify citations is insufficient. LLMs paraphrase content, so exact matches will miss many valid citations, leading to false positives for misquotes.
Fix: Use semantic similarity checks (embeddings) or an LLM-as-a-judge approach to determine if the source text supports the claim, even if the wording differs.
3. Neglecting Anchor-Text Drift
Explanation: Anchor-text drift is subtle. A model might change "The product supports OAuth" to "The product is OAuth-compliant." While similar, the latter is a stronger claim that might not be true. Automated tools often miss this because the URL is valid and the topic matches. Fix: Implement fine-grained semantic analysis to detect shifts in meaning. This may require a specialized model or human-in-the-loop review for critical claims.
4. Treating All Errors Equally
Explanation: Applying the same enforcement policy to all citation errors can lead to poor user experience. Blocking a response for a URL substitution might be unnecessary if the substituted URL is equally authoritative. Fix: Use a tiered enforcement policy. Block hard failures (fabricated URLs, misquotes) but warn on softer failures (substitutions, drift) where appropriate.
5. Failing to Handle Synthesis Claims
Explanation: When a model synthesizes information from multiple sources, it may cite only one of them. This is not necessarily a misquote, but it is incomplete attribution. Standard verification might flag this as an error. Fix: Allow for multi-citation claims. If a claim is supported by the union of multiple retrieved documents, ensure the model cites all relevant sources or explicitly indicates synthesis.
6. Performance Bottlenecks
Explanation: Running semantic verification on every citation can introduce significant latency, especially for long responses with many citations. Fix: Optimize the verification pipeline. Use caching for repeated checks, parallelize verification where possible, and consider sampling citations for large responses if full verification is too costly.
7. Lack of Human Review Workflow
Explanation: Automated verification is not perfect. Edge cases and nuanced errors will slip through. Without a mechanism for human review, these errors can persist. Fix: Integrate a human-in-the-loop workflow for flagged citations. Provide reviewers with the claim, the cited text, and the verification result to facilitate quick decisions.
Production Bundle
Action Checklist
- Define Citation Schema: Establish a consistent format for claims and citations in your LLM prompts and response parsing.
- Implement URL Extraction: Build a parser to reliably extract URLs and associated claims from the model output.
- Set Up Context Map: Store retrieved documents in a fast-access structure (e.g., in-memory map or cache) for verification.
- Deploy Verification Engine: Integrate the
CitationVerifierinto your response pipeline. - Configure Enforcement Policy: Define rules for blocking, warning, or passing based on your risk tolerance.
- Add Semantic Verification: Implement embedding-based or LLM-based claim support checks.
- Monitor Metrics: Track citation error rates by category to identify trends and model regressions.
- Establish Review Workflow: Create a process for human review of flagged citations.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Customer Support Bot | Block Fabricated/Misquote; Warn on Substitution/Drift | High volume, lower risk. Focus on preventing broken links and clear misinformation. | Low |
| Legal Research Tool | Block All Errors; Require Multi-Citation | Zero tolerance for errors. Audit trail is critical. | High |
| Internal Knowledge Base | Warn on All Errors; Allow Override | Balance accuracy with usability. Users can verify sources manually. | Medium |
| High-Stakes Financial Data | Block All Errors; Human Review Required | Compliance and regulatory requirements demand strict verification. | Very High |
Configuration Template
# citation-verification-config.yaml
verification:
strategy: "tiered"
semantic_threshold: 0.75
drift_sensitivity: "high"
enforcement:
policies:
- error_type: "FABRICATED"
action: "BLOCK"
- error_type: "MISQUOTE"
action: "BLOCK"
- error_type: "SUBSTITUTION"
action: "WARN"
- error_type: "DRIFT"
action: "WARN"
monitoring:
metrics:
- "citation_error_rate"
- "error_distribution"
- "verification_latency"
alerts:
- threshold: 0.05
metric: "citation_error_rate"
action: "notify_engineering"
Quick Start Guide
- Install Dependencies: Ensure you have the necessary libraries for text processing and semantic analysis (e.g.,
@langchain/core,@tensorflow-models/universal-sentence-encoder). - Define Models: Copy the
models.tsdefinitions into your project. - Implement Verifier: Use the
verifier.tscode as a starting point. Replace the mock semantic functions with actual implementations. - Configure Policy: Set up your enforcement rules based on your use case.
- Integrate: Add the verification step to your response pipeline. Test with a sample of queries to validate the logic.
- Deploy: Roll out the verification layer and monitor metrics. Adjust thresholds and policies as needed.
