I shipped a free ATS preview inside my paid AI tool. Here's the engineering write-up.
I shipped a free ATS preview inside my paid AI tool. Here's the engineering write-up.
Current Situation Analysis
Most resume optimization tools operate on a reactive or siloed model. Users typically upload a CV to a third-party SaaS (e.g., Jobscan at $49/mo) for a one-shot diagnostic that is completely disconnected from the actual application workflow. The critical failure mode is timing: applicants only discover their CV broke parsing after receiving a rejection email. By then, downstream AI-generated assets (cover letters, fit scores, interview prep) are already built on corrupted or truncated text. Traditional methods fail because they treat ATS compliance as a post-hoc validation step rather than a foundational input gate, forcing users to either pay for external subscriptions or gamble on parse integrity before spending AI tokens.
WOW Moment: Key Findings
| Approach | Client Latency | Bundle Impact | Privacy & Security |
|---|---|---|---|
| Traditional SaaS ATS (e.g., Jobscan) | 2–5s (server round-trip) | N/A (external) | Low (CV bytes sent to third-party) |
| Server-Side AI Pre-Scan | 1–3s (API overhead) | N/A (backend only) | Medium (bytes transit to backend) |
| In-Product Client-Side Preview (Vantage) | <200ms (pure JS) | +11 kB minified / +4.4 kB gzipped | High (bytes never leave browser) |
Key Findings:
- Zero-Token Gate: The preview executes before any AI token consumption, preventing wasted credits on unparseable CVs.
- Vendor-Specific Signal Mapping: Five major ATS vendors (Workday, Greenhouse, Lever, Taleo, iCIMS) are covered using pure client-side heuristics without external API calls.
- Bundle Efficiency: Client-side parsing adds only 4.4 kB gzipped to the main chunk, avoiding the ~300 kB overhead of
pdfjs-distby strategically deferring PDF handling to the paid flow.
Core Solution
The architecture is built around four strict constraints: run before token expenditure, add zero new dependencies, remain deletable in 5 lines, and stay entirely client-side for privacy.
1. Signal Computation & Vendor Heuristics
The lint engine ports pure functions from the open-source CV Mirror project. It computes structural signals directly from plain text without I/O:
export function computeSignals(text: string, fileSize: number): Signals {
const lines = text.split('\n');
const nonEmpty = lines.filter((l) => l.trim().length > 0);
const wordCount = (text.match(/\b\w+\b/g) || []).length;
// Multi-column heuristic: lines with a 5+ space gap
const multiColumnLines = nonEmpty.filter((l) => /\S {5,}\S/.test(l)).length;
const multiColumnRatio = nonEmpty.length > 0 ? multiColumnLines / nonEmpty.length : 0;
const wordsPerKB = fileSize > 0 ? wordCount / (fileSize / 1024) : 0;
const hasHeaderFooterLikeText = /^\s*page \d+( of \d+)?\s*$/im.test(text);
const hasEmoji = /[\u{1F300}-\u{1FAFF}\u{2600}-\u{27BF}]/u.test(text);
const hasSmartQuotes = /[‘’“”]/.test(text);
// ... etc
}
These signals feed into five vendor rule sets:
- Workday: Flags
multiColumnRatio > 15%as ERROR. - Greenhouse: Flags
hasEmojias WARN (strips codepoints, losing surrounding context). - Lever: Flags missing standard section headers as ERROR (parser uses headers to delimit sections).
- Taleo: Flags ISO-style dates as WARN (prefers Month-Year format).
- iCIMS: Flags
multiColumnRatio > 20%as ERROR.
Every rule cites public vendor documentation, maintaining transparency and auditability.
2. Strategic PDF Exclusion
PDF is the most common CV format, but pdfjs-dist adds ~300 kB minified—roughly a third of the main bundle. The trade-off: DOCX and TXT are supported inline using mammoth (already bundled for the paid flow) and File.text(). PDF uploads trigger a defer message: "Upload a DOCX version for the instant preview — your full Vantage analysis still works with PDFs." This preserves bundle size while keeping the preview instant.
3. Dashboard Integration & Rollback Safety
The feature is 100% additive. Only two surgical edits were made to Dashboard.tsx:
{/* === ATS scanner (additive, free, client-side). Removing this and the
import line restores the previous behaviour entirely. === */}
{cvFile && <AtsScannerSection cvFile={cvFile} />}
{/* === END ATS scanner === */}
Rollback requires git revert <hash> and deleting two new files. Existing tests, types, services, contexts, and routes remain untouched.
4. Bundle & QA Validation
- Main chunk: 1,154.70 kB → 1,165.85 kB (+11 kB minified)
- Gzipped: 308.20 kB → 312.56 kB (+4.4 kB gzipped)
- New dependencies: Zero
mammothis lazy-imported only on DOCX upload, ensuring cost is paid only by active users.- Pre-ship audit caught two UI inconsistencies:
passCountmisalignment with warning states, and inherited text tone conflicting with icon color. Both resolved via deterministicisCleanflags and stricterrors === 0 && warns === 0logic.
Pitfall Guide
- PDF Parsing Bundle Bloat: Importing
pdfjs-distclient-side adds ~300 kB, severely impacting TTI. Best Practice: Defer PDF handling to server-side or paid flows; usemammoth/File.text()for instant, lightweight previews. - Metric/UI State Misalignment: Counting "passes" without accounting for warnings creates contradictory UI states (e.g., "5/5" headline with only 3 green pills). Best Practice: Align pass logic strictly with
errors === 0 && warns === 0and derive UI classes deterministically. - Ignoring Vendor-Specific Heuristics: Treating all ATS parsers as identical causes false positives/negatives. Best Practice: Map raw signals to specific vendor rules (e.g., Workday multi-column >15%, Greenhouse emoji stripping) using documented parsing behaviors.
- Tight Coupling with Paid Token Flow: Running previews after token consumption wastes user credits on broken CVs. Best Practice: Gate the preview before token expenditure, keep it purely client-side, and treat it as a foundational input validation step.
- Irreversible Feature Integration: Adding features that touch core contexts/routes makes rollback painful and risky. Best Practice: Use additive-only components with clearly fenced comments and isolated imports to enable instant
git revert+ file deletion without touching existing architecture.
Deliverables
- 📐 Blueprint: Client-Side ATS Preview Architecture & Vendor Heuristic Mapping (PDF) — Details the signal computation pipeline, lazy-loading strategy, and zero-dependency integration pattern.
- ✅ Checklist: Pre-Shipping Audit & Rollback Verification — Validates bundle impact thresholds, metric/UI alignment, vendor rule coverage, and git revert safety procedures.
- ⚙️ Configuration Templates: Signal Computation Config & Vendor Rule Mapping Table — JSON/TS templates for extending heuristic thresholds, adding new vendor parsers, and configuring pass/warn/error states without modifying core logic.
