eruse, vague absolutes
- Syntactic: Passive voice, formulaic binary contrasts, rhetorical setups
- Semantic: Lack of specificity, narrator distance, hand-holding transitions
- Prosodic: Metronomic sentence length, em-dash overuse, manufactured quotables
2. System Prompt Architecture
Constraints must be injected at the system level, not appended to user prompts. System instructions carry higher weight in attention mechanisms and persist across multi-turn conversations. The constraint payload should be structured as explicit negative examples paired with corrective directives.
3. Deterministic Scoring Loop
A programmatic rubric evaluates generated output against the five dimensions. Each dimension scores 1-10. Outputs below 35/50 trigger automatic revision or routing to human review. This creates a closed-loop quality gate.
Implementation Code (TypeScript)
import Anthropic from '@anthropic-ai/sdk';
interface WritingConstraint {
category: 'lexical' | 'syntactic' | 'semantic' | 'prosodic';
pattern: string;
correction: string;
}
interface RubricScore {
directness: number;
rhythm: number;
trust: number;
authenticity: number;
density: number;
total: number;
}
const CONSTRAINTS: WritingConstraint[] = [
{
category: 'lexical',
pattern: 'throat-clearing openers, emphasis crutches, all adverbs',
correction: 'Begin with the core assertion. Remove all adverbs. Replace vague intensifiers with concrete data.'
},
{
category: 'syntactic',
pattern: 'binary contrasts, dramatic fragments, rhetorical setups, false agency',
correction: 'State the conclusion directly. Name the specific actor performing the action. Eliminate standalone dramatic sentences.'
},
{
category: 'semantic',
pattern: 'lazy absolutes, narrator-from-a-distance voice, meta-joiners',
correction: 'Anchor claims to specific events or datasets. Address the reader directly. Remove transitional filler that references the text itself.'
},
{
category: 'prosodic',
pattern: 'uniform sentence length, em-dash emphasis, triple-item lists',
correction: 'Vary clause complexity. Replace em-dashes with standard punctuation. Use two-item comparisons instead of three-item sequences.'
}
];
function buildSystemPrompt(constraints: WritingConstraint[]): string {
const rules = constraints.map(c =>
`[${c.category.toUpperCase()}] Avoid: ${c.pattern}\nApply: ${c.correction}`
).join('\n\n');
return `You are a technical writing engine. Apply the following constraints to all generated text:\n\n${rules}\n\nOutput must pass a 5-dimension quality rubric. If any dimension scores below 6/10, revise before returning.`;
}
async function generateConstrainedProse(
client: Anthropic,
userPrompt: string,
model: string = 'claude-sonnet-4-6'
): Promise<string> {
const systemPrompt = buildSystemPrompt(CONSTRAINTS);
const response = await client.messages.create({
model,
system: systemPrompt,
max_tokens: 2048,
temperature: 0.7,
messages: [{ role: 'user', content: userPrompt }]
});
return response.content[0].type === 'text' ? response.content[0].text : '';
}
function evaluateRubric(text: string): RubricScore {
// Production implementation would use NLP heuristics or a secondary LLM judge
// This placeholder demonstrates the scoring architecture
const dimensions = ['directness', 'rhythm', 'trust', 'authenticity', 'density'] as const;
const scores: Record<string, number> = {};
dimensions.forEach(dim => {
scores[dim] = Math.floor(Math.random() * 5) + 6; // Simulated 6-10 range
});
const total = Object.values(scores).reduce((a, b) => a + b, 0);
return { ...scores, total } as RubricScore;
}
Architecture Decisions and Rationale
Why System Prompts Over Fine-Tuning?
Fine-tuning requires curated datasets, GPU compute, and model versioning. Constraint injection via system prompts achieves equivalent stylistic control with zero training overhead. Prompts can be updated instantly, A/B tested, and rolled back without redeploying model artifacts.
Why Markdown Constraints?
Markdown is token-efficient and universally parsed by LLM APIs. Structured text with explicit negative/positive pairs reduces ambiguity in attention allocation. The format remains human-readable for editorial teams while being machine-parsable for CI pipelines.
Why a 35/50 Threshold?
The five-dimension rubric converts subjective editorial feedback into deterministic gates. A 35-point floor (7/10 average per dimension) ensures baseline quality without over-constraining creative variation. Scores below threshold trigger automatic revision loops or human escalation, preventing low-quality output from reaching production.
Pitfall Guide
1. Over-Constraining Leading to Robotic Output
Explanation: Applying too many negative constraints simultaneously suppresses natural variation, causing the model to default to minimal, stilted phrasing.
Fix: Limit active constraints to 4-6 per generation pass. Use conditional activation based on content type (e.g., technical docs vs. marketing copy).
2. Ignoring Domain-Specific Voice Requirements
Explanation: Generic constraint sets strip necessary jargon, industry terminology, or brand-specific phrasing, making output feel sterile.
Fix: Maintain a domain whitelist. Allow specific terminology to bypass lexical filters while enforcing structural and prosodic rules.
3. Relying on Subjective Scoring Without Calibration
Explanation: Manual rubric evaluation introduces inter-rater variability. Different editors score "authenticity" differently, breaking automation.
Fix: Implement a secondary LLM judge with explicit scoring guidelines, or use NLP heuristics (sentence length variance, passive voice ratio, adverb density) for programmatic scoring.
4. Prompt Drift in Long Contexts
Explanation: As conversation history grows, system instructions lose attention weight. The model gradually reverts to default patterns.
Fix: Re-inject critical constraints at key turn boundaries, or use a sliding window architecture that preserves system prompt weight through context management.
5. Treating Constraints as Post-Processing Filters
Explanation: Applying rules after generation requires additional API calls, increases latency, and often produces incoherent edits.
Fix: Embed constraints in the system prompt before inference. Prevention is computationally cheaper and produces more coherent output than post-hoc rewriting.
6. Neglecting Token Budget for System Instructions
Explanation: Overly verbose constraint payloads consume context window, reducing space for user prompts and generated content.
Fix: Compress constraints into dense, imperative statements. Use structured formatting (brackets, colons) to maximize information density per token.
7. Assuming One-Size-Fits-All for All LLMs
Explanation: Different model families respond differently to constraint formatting. What works for Claude may underperform on open-weight alternatives.
Fix: Validate constraint syntax per model family. Adjust temperature and constraint density based on empirical output quality during model selection.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-volume technical documentation | System prompt constraints + programmatic rubric | Eliminates manual editing bottleneck, maintains consistent terminology | Low (API tokens only) |
| Brand marketing campaigns | Constraint injection + human editorial review | Preserves creative flexibility while enforcing structural rules | Medium (hybrid workflow) |
| Academic/scientific writing | Domain-specific constraint fork + LLM judge | Handles specialized terminology and citation patterns accurately | Medium-High (specialized tuning) |
| Internal knowledge base generation | Prevention-first constraints + automated scoring | Scales without editorial overhead, ensures readability standards | Low (fully automated) |
Configuration Template
// constraint-config.ts
export const CONTENT_CONSTRAINTS = {
lexical: {
forbidden: ['in today\'s', 'it\'s worth noting', 'undeniably', 'transformative', 'revolutionary'],
action: 'Replace with concrete assertions. Remove all adverbs. Anchor claims to specific data.'
},
syntactic: {
forbidden: ['not X, it\'s Y', 'here\'s the thing', 'so how do we', 'opens new possibilities'],
action: 'State conclusion directly. Name the actor. Eliminate standalone dramatic sentences.'
},
semantic: {
forbidden: ['people often find', 'watershed moment', 'every single person', 'you might be wondering'],
action: 'Address reader directly. Specify events. Remove hand-holding transitions.'
},
prosodic: {
forbidden: ['three consecutive same-length sentences', 'em-dash emphasis', 'triple-item lists'],
action: 'Vary clause complexity. Use standard punctuation. Prefer two-item comparisons.'
}
};
export const RUBRIC_THRESHOLDS = {
directness: 7,
rhythm: 7,
trust: 7,
authenticity: 7,
density: 7,
minimum_total: 35
};
export const GENERATION_CONFIG = {
model: 'claude-sonnet-4-6',
max_tokens: 2048,
temperature: 0.7,
system_prompt_weight: 'high',
constraint_injection: 'pre-inference'
};
Quick Start Guide
- Initialize constraint payload: Copy the
CONTENT_CONSTRAINTS object into your project. Adjust forbidden patterns to match your domain vocabulary.
- Inject into system prompt: Pass the formatted constraints as the
system parameter in your LLM API calls. Ensure it precedes user messages in the request payload.
- Deploy scoring gate: Implement the rubric evaluation function. Route outputs scoring below 35/50 to automatic revision or human review queues.
- Validate and iterate: Generate 10-20 test samples across content types. Adjust constraint density and temperature based on output quality. Monitor token usage and latency to ensure pipeline efficiency.