My AI agent saved the first paragraph and the last. It dropped 41 in between.
The Silent Swallow: Diagnosing and Fixing Data Loss in ProseMirror Automation
Current Situation Analysis
Browser automation agents frequently encounter rich-text editors when automating content workflows. The industry pain point is not that automation fails; it is that automation fails silently. Agents dispatch synthetic events, receive success signals, and proceed to submission, only for the backend to persist corrupted or truncated content.
This problem is overlooked because developers conflate event dispatch with state mutation. A dispatchEvent call returning true or an execCommand completing without throwing an exception is often treated as proof of success. However, modern editors like ProseMirror, Lexical, and Quill maintain internal state models that are decoupled from the raw DOM. Synthetic inputs can trigger validation layers, input rules, or reconciliation cycles that silently reject content while leaving the DOM in a deceptive state.
Data from production incidents reveals the severity. In a documented case involving Hashnode's ProseMirror-based editor, an automation payload of 7,000 characters resulted in a saved draft containing only 446 characters. The first and last paragraphs survived; the middle 41 paragraphs were replaced by empty tags. This represents a 94% data loss rate with zero error signals. The automation framework reported success, the DOM appeared populated during inspection, yet the editor's internal state had swallowed the majority of the input.
WOW Moment: Key Findings
The root cause of silent data loss in ProseMirror automation often lies in the interaction between granular input simulation and editor input rules. Input rules are plugins that intercept character-level events to trigger formatting shortcuts (e.g., > for blockquotes, # for headings). When an automation agent simulates typing character-by-character, it triggers these rules for every paragraph start. If the text triggers a rule but does not form a valid command, the rule engine can abort the insertion silently.
Bulk HTML injection bypasses this failure mode by triggering a different code path within the editor that treats the content as a paste operation rather than a sequence of typed characters.
| Strategy | Content Retention | Rule Interference | Latency | Reliability |
|---|---|---|---|---|
| Char-by-Char Input | ~6% (in rule-heavy editors) | Critical: Input rules abort inserts on false positives. | Low | Unreliable for bulk content. |
| Bulk HTML Injection | 100% | None: Bypasses granular input rule checks. | Medium | High for static content injection. |
| Canonical API Dispatch | 100% | Low: Respects rules but requires internal access. | Low | Best fidelity; requires editor internals. |
Why this matters: Relying on char-by-char simulation for bulk content in editors with aggressive input rules is a structural anti-pattern. Implementing a verification-and-fallback mechanism ensures that content retention is guaranteed regardless of the editor's internal configuration.
Core Solution
The robust approach to filling rich-text editors requires a three-phase strategy: Detection, Verification, and Adaptive Fallback.
- Detection: Identify the editor framework to select the optimal injection method.
- Verification: After any fill attempt, compare the expected content length against the actual persisted content.
- Adaptive Fallback: If verification fails, switch to a bulk HTML injection strategy that bypasses input rules.
Implementation Architecture
The following TypeScript implementation demonstrates a RobustEditorFiller that encapsulates this logic. It prioritizes the canonical API for fidelity but falls back to HTML injection when input rules cause data loss.
interface FillResult {
success: boolean;
method: 'canonical' | 'html_fallback' | 'failed';
retainedChars: number;
expectedChars: number;
}
class RobustEditorFiller {
private readonly RETENTION_THRESHOLD = 0.85;
async fill(targetElement: HTMLElement, content: string): Promise<FillResult> {
// Phase 1: Attempt Canonical Fill
const canonicalResult = await this.attemptCanonicalFill(targetElement, content);
if (canonicalResult.success) {
return canonicalResult;
}
// Phase 2: Verify Content Integrity
const verification = this.verifyContent(targetElement, content);
if (verification.retainedRatio >= this.RETENTION_THRESHOLD) {
return {
success: true,
method: 'canonical',
retainedChars: verification.retainedLength,
expectedChars: content.length
};
}
// Phase 3: Adaptive Fallback to HTML Injection
console.warn(`Content loss detected: ${verification.retainedRatio * 100}% retained. Switching to HTML fallback.`);
return this.injectHtmlFallback(targetElement, content);
}
private async attemptCanonicalFill(element: HTMLElement, content: string): Promise<FillResult | null> {
// Detect ProseMirror internals
const pmView = this.findProseMirrorView(element);
if (!pmView) return null;
try {
// Use ProseMirror's transaction API for high-fidelity insertion
const { state, dispatch } = pmView;
const slice = this.createSliceFromText(state.schema, content);
const tr = state.tr.replaceSelectionWith(slice, true);
dispatch(tr);
return {
success: true,
method: 'canonical',
retainedChars: content.length,
expectedChars: content.length
};
} catch (error) {
console.error('Canonical fill failed:', error);
return null;
}
}
private verifyContent(element: HTMLElement, expectedContent: string): { retainedLength: number; retainedRatio: number } {
// Check both DOM textContent and framework state if available
const pmView = this.findProseMirrorView(element);
const actualText = pmView
? pmView.state.doc.textContent
: element.textContent || '';
const retainedLength = actualText.length;
const retainedRatio = expectedContent.length > 0
? retainedLength / expectedContent.length
: 1;
return { retainedLength, retainedRatio };
}
private injectHtmlFallback(element: HTMLElement, content: string): FillResult {
// Convert text to paragraph-wrapped HTML
// This bypasses input rules by triggering a paste-like code path
const paragraphs = content.split(/\n\n+/);
const htmlContent = paragraphs
.map(p => `<p>${this.escapeHtml(p)}</p>`)
.join('');
// Clear existing content
while (element.firstChild) {
element.removeChild(element.firstChild);
}
// Inject bulk HTML
// Note: execCommand is deprecated but remains the most reliable way
// to trigger paste handlers in contenteditable across browsers.
element.focus();
document.execCommand('insertHTML', false, htmlContent);
const retainedLength = element.textContent?.length || 0;
return {
success: true,
method: 'html_fallback',
retainedChars: retainedLength,
expectedChars: content.length
};
}
private findProseMirrorView(element: HTMLElement): any {
// Walk DOM for ProseMirror view markers
const pmViewDesc = element.querySelector('.ProseMirror');
if (pmViewDesc && (pmViewDesc as any).pmView) {
return (pmViewDesc as any).pmView;
}
return null;
}
private escapeHtml(text: string): string {
return text
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
private createSliceFromText(schema: any, text: string): any {
// Simplified schema slice creation for demonstration
// In production, use ProseMirror's DOMParser or schema.nodes
return null;
}
}
Architecture Decisions
- Verification Threshold: The
RETENTION_THRESHOLDis set to 0.85. If less than 85% of the content is retained, the system assumes input rules or other filters are interfering. This threshold balances sensitivity against minor whitespace differences. - HTML Injection Strategy: The fallback converts double-newlines to paragraph tags. This preserves semantic structure while ensuring the bulk insert triggers the editor's paste handler rather than the character input handler.
- Framework State Check: Verification reads
pmView.state.doc.textContentwhen available. This is critical because the DOMtextContentmay appear correct while the ProseMirror state remains empty due to React reconciliation delays or state desync. - Sanitization: The
escapeHtmlfunction ensures that literal characters are preserved during HTML injection. While the content originates from a controlled agent context, sanitization prevents structural breakage and maintains data integrity.
Pitfall Guide
The False Positive Return
- Explanation:
dispatchEventandexecCommandreturn boolean success values that indicate the event was processed, not that content was inserted. Editors can consume events without mutating state. - Fix: Always read back
textContentor framework state after a fill operation. Never trust the return value of the dispatch call.
- Explanation:
Input Rule Traps
- Explanation: Editors with input rules (e.g., Tiptap, Hashnode) may abort inserts if text starts with trigger characters like
>or#but doesn't match a valid rule. This results in empty paragraphs. - Fix: Use bulk HTML injection for content containing trigger characters, or verify retention and fallback immediately.
- Explanation: Editors with input rules (e.g., Tiptap, Hashnode) may abort inserts if text starts with trigger characters like
DOM vs. Framework Desync
- Explanation: Changing the DOM does not automatically update the editor's internal state model. Submitting the form may send the framework state, which could be empty or stale.
- Fix: Verify against the framework state (e.g.,
view.state.doc) rather than just the DOM. Trigger framework updates if necessary.
React Reconciliation Race Conditions
- Explanation: In React-wrapped editors, synthetic events may fire before the component is ready to process them, or React may overwrite changes during a render cycle.
- Fix: Use
setTimeout(0)orrequestAnimationFrameto yield to the React render cycle, or use the canonical API which handles state updates synchronously.
Char-by-Char for Bulk Content
- Explanation: Simulating typing character-by-character is intended for interactive scenarios, not bulk content injection. It is slow and highly susceptible to input rule interference.
- Fix: Reserve char-by-char simulation for interactive typing tests. Use canonical APIs or bulk HTML injection for content population.
Ignoring Editor Variations
- Explanation: Assuming all ProseMirror instances behave identically. Different configurations may have custom plugins that alter insertion behavior.
- Fix: Implement detection logic to identify the specific editor configuration and adapt the strategy accordingly.
Production Bundle
Action Checklist
- Detect Editor Framework: Identify ProseMirror, Lexical, or Quill instances to select the appropriate fill strategy.
- Attempt Canonical Fill: Prioritize using the editor's internal API for high-fidelity insertion.
- Verify Content Integrity: Compare expected vs. actual content length immediately after fill.
- Implement Fallback Logic: Switch to bulk HTML injection if retention falls below the threshold.
- Sanitize HTML Payload: Escape content before HTML injection to preserve structure and prevent breakage.
- Log Verification Results: Record retention rates and methods used for debugging and monitoring.
- Test with Trigger Characters: Validate fills against content starting with
>,#,*, and[.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Known Editor with API Access | Canonical Dispatch | Highest fidelity; respects editor rules and state. | Low latency; requires internal access. |
| Unknown Editor / Fallback | Bulk HTML Injection | Bypasses input rules; ensures content retention. | Medium latency; may lose rich formatting. |
| Interactive Typing Simulation | Char-by-Char Input | Necessary for simulating user typing behavior. | High latency; risky for bulk content. |
| Content with Trigger Characters | HTML Injection or Escaped Input | Prevents input rule aborts on >, #, etc. |
Negligible; adds processing step. |
Configuration Template
// editor-filler.config.ts
export const EditorFillerConfig = {
retentionThreshold: 0.85,
fallbackMethods: ['html_injection', 'text_paste'],
verificationDelay: 50, // ms to wait for state updates
sanitizeHtml: true,
logLevel: 'warn', // 'error' | 'warn' | 'info'
};
Quick Start Guide
- Initialize Filler: Instantiate
RobustEditorFillerwith your configuration. - Select Target: Locate the editor element in the DOM.
- Execute Fill: Call
filler.fill(element, content). - Review Result: Check the
FillResultobject for success status and retention metrics. - Handle Failure: If
successis false, log the error and abort the workflow.
By implementing verification and adaptive fallbacks, you eliminate silent data loss and ensure that automation agents interact with rich-text editors reliably, regardless of internal configuration or input rule complexity.
