โ† Back to Blog
AI/ML2026-05-12ยท69 min read

My AI agent saved the first paragraph and the last. It dropped 41 in between.

By ืื—ื™ื” ื›ื”ืŸ

The Silent Swallow: Diagnosing and Fixing Data Loss in ProseMirror Automation

Current Situation Analysis

Browser automation agents frequently encounter rich-text editors when automating content workflows. The industry pain point is not that automation fails; it is that automation fails silently. Agents dispatch synthetic events, receive success signals, and proceed to submission, only for the backend to persist corrupted or truncated content.

This problem is overlooked because developers conflate event dispatch with state mutation. A dispatchEvent call returning true or an execCommand completing without throwing an exception is often treated as proof of success. However, modern editors like ProseMirror, Lexical, and Quill maintain internal state models that are decoupled from the raw DOM. Synthetic inputs can trigger validation layers, input rules, or reconciliation cycles that silently reject content while leaving the DOM in a deceptive state.

Data from production incidents reveals the severity. In a documented case involving Hashnode's ProseMirror-based editor, an automation payload of 7,000 characters resulted in a saved draft containing only 446 characters. The first and last paragraphs survived; the middle 41 paragraphs were replaced by empty tags. This represents a 94% data loss rate with zero error signals. The automation framework reported success, the DOM appeared populated during inspection, yet the editor's internal state had swallowed the majority of the input.

WOW Moment: Key Findings

The root cause of silent data loss in ProseMirror automation often lies in the interaction between granular input simulation and editor input rules. Input rules are plugins that intercept character-level events to trigger formatting shortcuts (e.g., > for blockquotes, # for headings). When an automation agent simulates typing character-by-character, it triggers these rules for every paragraph start. If the text triggers a rule but does not form a valid command, the rule engine can abort the insertion silently.

Bulk HTML injection bypasses this failure mode by triggering a different code path within the editor that treats the content as a paste operation rather than a sequence of typed characters.

Strategy Content Retention Rule Interference Latency Reliability
Char-by-Char Input ~6% (in rule-heavy editors) Critical: Input rules abort inserts on false positives. Low Unreliable for bulk content.
Bulk HTML Injection 100% None: Bypasses granular input rule checks. Medium High for static content injection.
Canonical API Dispatch 100% Low: Respects rules but requires internal access. Low Best fidelity; requires editor internals.

Why this matters: Relying on char-by-char simulation for bulk content in editors with aggressive input rules is a structural anti-pattern. Implementing a verification-and-fallback mechanism ensures that content retention is guaranteed regardless of the editor's internal configuration.

Core Solution

The robust approach to filling rich-text editors requires a three-phase strategy: Detection, Verification, and Adaptive Fallback.

  1. Detection: Identify the editor framework to select the optimal injection method.
  2. Verification: After any fill attempt, compare the expected content length against the actual persisted content.
  3. Adaptive Fallback: If verification fails, switch to a bulk HTML injection strategy that bypasses input rules.

Implementation Architecture

The following TypeScript implementation demonstrates a RobustEditorFiller that encapsulates this logic. It prioritizes the canonical API for fidelity but falls back to HTML injection when input rules cause data loss.

interface FillResult {
  success: boolean;
  method: 'canonical' | 'html_fallback' | 'failed';
  retainedChars: number;
  expectedChars: number;
}

class RobustEditorFiller {
  private readonly RETENTION_THRESHOLD = 0.85;

  async fill(targetElement: HTMLElement, content: string): Promise<FillResult> {
    // Phase 1: Attempt Canonical Fill
    const canonicalResult = await this.attemptCanonicalFill(targetElement, content);
    
    if (canonicalResult.success) {
      return canonicalResult;
    }

    // Phase 2: Verify Content Integrity
    const verification = this.verifyContent(targetElement, content);
    
    if (verification.retainedRatio >= this.RETENTION_THRESHOLD) {
      return {
        success: true,
        method: 'canonical',
        retainedChars: verification.retainedLength,
        expectedChars: content.length
      };
    }

    // Phase 3: Adaptive Fallback to HTML Injection
    console.warn(`Content loss detected: ${verification.retainedRatio * 100}% retained. Switching to HTML fallback.`);
    return this.injectHtmlFallback(targetElement, content);
  }

  private async attemptCanonicalFill(element: HTMLElement, content: string): Promise<FillResult | null> {
    // Detect ProseMirror internals
    const pmView = this.findProseMirrorView(element);
    if (!pmView) return null;

    try {
      // Use ProseMirror's transaction API for high-fidelity insertion
      const { state, dispatch } = pmView;
      const slice = this.createSliceFromText(state.schema, content);
      const tr = state.tr.replaceSelectionWith(slice, true);
      dispatch(tr);
      
      return {
        success: true,
        method: 'canonical',
        retainedChars: content.length,
        expectedChars: content.length
      };
    } catch (error) {
      console.error('Canonical fill failed:', error);
      return null;
    }
  }

  private verifyContent(element: HTMLElement, expectedContent: string): { retainedLength: number; retainedRatio: number } {
    // Check both DOM textContent and framework state if available
    const pmView = this.findProseMirrorView(element);
    const actualText = pmView 
      ? pmView.state.doc.textContent 
      : element.textContent || '';

    const retainedLength = actualText.length;
    const retainedRatio = expectedContent.length > 0 
      ? retainedLength / expectedContent.length 
      : 1;

    return { retainedLength, retainedRatio };
  }

  private injectHtmlFallback(element: HTMLElement, content: string): FillResult {
    // Convert text to paragraph-wrapped HTML
    // This bypasses input rules by triggering a paste-like code path
    const paragraphs = content.split(/\n\n+/);
    const htmlContent = paragraphs
      .map(p => `<p>${this.escapeHtml(p)}</p>`)
      .join('');

    // Clear existing content
    while (element.firstChild) {
      element.removeChild(element.firstChild);
    }

    // Inject bulk HTML
    // Note: execCommand is deprecated but remains the most reliable way 
    // to trigger paste handlers in contenteditable across browsers.
    element.focus();
    document.execCommand('insertHTML', false, htmlContent);

    const retainedLength = element.textContent?.length || 0;
    
    return {
      success: true,
      method: 'html_fallback',
      retainedChars: retainedLength,
      expectedChars: content.length
    };
  }

  private findProseMirrorView(element: HTMLElement): any {
    // Walk DOM for ProseMirror view markers
    const pmViewDesc = element.querySelector('.ProseMirror');
    if (pmViewDesc && (pmViewDesc as any).pmView) {
      return (pmViewDesc as any).pmView;
    }
    return null;
  }

  private escapeHtml(text: string): string {
    return text
      .replace(/&/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/"/g, '&quot;')
      .replace(/'/g, '&#039;');
  }

  private createSliceFromText(schema: any, text: string): any {
    // Simplified schema slice creation for demonstration
    // In production, use ProseMirror's DOMParser or schema.nodes
    return null; 
  }
}

Architecture Decisions

  • Verification Threshold: The RETENTION_THRESHOLD is set to 0.85. If less than 85% of the content is retained, the system assumes input rules or other filters are interfering. This threshold balances sensitivity against minor whitespace differences.
  • HTML Injection Strategy: The fallback converts double-newlines to paragraph tags. This preserves semantic structure while ensuring the bulk insert triggers the editor's paste handler rather than the character input handler.
  • Framework State Check: Verification reads pmView.state.doc.textContent when available. This is critical because the DOM textContent may appear correct while the ProseMirror state remains empty due to React reconciliation delays or state desync.
  • Sanitization: The escapeHtml function ensures that literal characters are preserved during HTML injection. While the content originates from a controlled agent context, sanitization prevents structural breakage and maintains data integrity.

Pitfall Guide

  1. The False Positive Return

    • Explanation: dispatchEvent and execCommand return boolean success values that indicate the event was processed, not that content was inserted. Editors can consume events without mutating state.
    • Fix: Always read back textContent or framework state after a fill operation. Never trust the return value of the dispatch call.
  2. Input Rule Traps

    • Explanation: Editors with input rules (e.g., Tiptap, Hashnode) may abort inserts if text starts with trigger characters like > or # but doesn't match a valid rule. This results in empty paragraphs.
    • Fix: Use bulk HTML injection for content containing trigger characters, or verify retention and fallback immediately.
  3. DOM vs. Framework Desync

    • Explanation: Changing the DOM does not automatically update the editor's internal state model. Submitting the form may send the framework state, which could be empty or stale.
    • Fix: Verify against the framework state (e.g., view.state.doc) rather than just the DOM. Trigger framework updates if necessary.
  4. React Reconciliation Race Conditions

    • Explanation: In React-wrapped editors, synthetic events may fire before the component is ready to process them, or React may overwrite changes during a render cycle.
    • Fix: Use setTimeout(0) or requestAnimationFrame to yield to the React render cycle, or use the canonical API which handles state updates synchronously.
  5. Char-by-Char for Bulk Content

    • Explanation: Simulating typing character-by-character is intended for interactive scenarios, not bulk content injection. It is slow and highly susceptible to input rule interference.
    • Fix: Reserve char-by-char simulation for interactive typing tests. Use canonical APIs or bulk HTML injection for content population.
  6. Ignoring Editor Variations

    • Explanation: Assuming all ProseMirror instances behave identically. Different configurations may have custom plugins that alter insertion behavior.
    • Fix: Implement detection logic to identify the specific editor configuration and adapt the strategy accordingly.

Production Bundle

Action Checklist

  • Detect Editor Framework: Identify ProseMirror, Lexical, or Quill instances to select the appropriate fill strategy.
  • Attempt Canonical Fill: Prioritize using the editor's internal API for high-fidelity insertion.
  • Verify Content Integrity: Compare expected vs. actual content length immediately after fill.
  • Implement Fallback Logic: Switch to bulk HTML injection if retention falls below the threshold.
  • Sanitize HTML Payload: Escape content before HTML injection to preserve structure and prevent breakage.
  • Log Verification Results: Record retention rates and methods used for debugging and monitoring.
  • Test with Trigger Characters: Validate fills against content starting with >, #, *, and [.

Decision Matrix

Scenario Recommended Approach Why Cost Impact
Known Editor with API Access Canonical Dispatch Highest fidelity; respects editor rules and state. Low latency; requires internal access.
Unknown Editor / Fallback Bulk HTML Injection Bypasses input rules; ensures content retention. Medium latency; may lose rich formatting.
Interactive Typing Simulation Char-by-Char Input Necessary for simulating user typing behavior. High latency; risky for bulk content.
Content with Trigger Characters HTML Injection or Escaped Input Prevents input rule aborts on >, #, etc. Negligible; adds processing step.

Configuration Template

// editor-filler.config.ts
export const EditorFillerConfig = {
  retentionThreshold: 0.85,
  fallbackMethods: ['html_injection', 'text_paste'],
  verificationDelay: 50, // ms to wait for state updates
  sanitizeHtml: true,
  logLevel: 'warn', // 'error' | 'warn' | 'info'
};

Quick Start Guide

  1. Initialize Filler: Instantiate RobustEditorFiller with your configuration.
  2. Select Target: Locate the editor element in the DOM.
  3. Execute Fill: Call filler.fill(element, content).
  4. Review Result: Check the FillResult object for success status and retention metrics.
  5. Handle Failure: If success is false, log the error and abort the workflow.

By implementing verification and adaptive fallbacks, you eliminate silent data loss and ensure that automation agents interact with rich-text editors reliably, regardless of internal configuration or input rule complexity.