ons are meaningless without statistical grounding. The pipeline enforces minimum interview counts before promoting signals to decision-ready status.
4. Immutable Transcript Storage: Source transcripts are never mutated. All extractions reference original timestamps and speaker attribution for audit trails.
Implementation (TypeScript)
The following implementation demonstrates a production-ready extraction pipeline. It uses functional composition, explicit configuration, and deterministic pattern matching.
// types.ts
export type SignalCategory = 'objection' | 'feature_request' | 'emotional' | 'buying_intent' | 'pattern';
export interface ExtractionConfig {
minInterviews: {
investigate: number;
prioritize: number;
confirm: number;
};
patterns: Record<SignalCategory, RegExp[]>;
}
export interface ExtractedSignal {
id: string;
category: SignalCategory;
rawText: string;
speaker: string;
timestamp: string;
interviewId: string;
confidence: number;
}
export interface AnalysisResult {
signals: ExtractedSignal[];
thresholds: {
investigate: ExtractedSignal[];
prioritize: ExtractedSignal[];
confirm: ExtractedSignal[];
};
metadata: {
totalInterviews: number;
extractionDate: string;
};
}
// pipeline.ts
import { v4 as uuidv4 } from 'uuid';
export class InterviewSignalExtractor {
private config: ExtractionConfig;
constructor(config: ExtractionConfig) {
this.config = config;
}
public async processTranscript(
transcript: string,
interviewId: string,
speakerMap: Record<string, string>
): Promise<ExtractedSignal[]> {
const lines = transcript.split('\n').filter(l => l.trim().length > 0);
const signals: ExtractedSignal[] = [];
for (const line of lines) {
const [speakerRaw, ...textParts] = line.split(':');
const speaker = speakerMap[speakerRaw.trim()] || speakerRaw.trim();
const text = textParts.join(':').trim();
if (!text) continue;
// Pass 1: Objections
this.matchPattern(text, 'objection', signals, speaker, interviewId);
// Pass 2: Feature Requests
this.matchPattern(text, 'feature_request', signals, speaker, interviewId);
// Pass 3: Emotional Signals
this.matchPattern(text, 'emotional', signals, speaker, interviewId);
// Pass 4: Buying Intent
this.matchPattern(text, 'buying_intent', signals, speaker, interviewId);
}
return signals;
}
private matchPattern(
text: string,
category: SignalCategory,
signals: ExtractedSignal[],
speaker: string,
interviewId: string
): void {
const regexList = this.config.patterns[category];
for (const regex of regexList) {
if (regex.test(text)) {
signals.push({
id: uuidv4(),
category,
rawText: text,
speaker,
timestamp: new Date().toISOString(),
interviewId,
confidence: this.calculateConfidence(category, text)
});
break; // Prevent duplicate tagging per line
}
}
}
private calculateConfidence(category: SignalCategory, text: string): number {
// Base confidence by category, adjusted by explicitness
const base = {
objection: 0.6,
feature_request: 0.5,
emotional: 0.4,
buying_intent: 0.8
}[category];
// Boost for explicit commercial markers
const explicitBoost = /budget|pay|cost|switch|immediately|today|months/.test(text) ? 0.15 : 0;
return Math.min(base + explicitBoost, 1.0);
}
public aggregateResults(
allSignals: ExtractedSignal[],
totalInterviews: number
): AnalysisResult {
const grouped = new Map<string, ExtractedSignal[]>();
allSignals.forEach(sig => {
const key = `${sig.category}::${sig.rawText.toLowerCase().slice(0, 50)}`;
if (!grouped.has(key)) grouped.set(key, []);
grouped.get(key)!.push(sig);
});
const thresholdMap = (min: number) =>
Array.from(grouped.values())
.filter(group => group.length >= min)
.flat();
return {
signals: allSignals,
thresholds: {
investigate: thresholdMap(this.config.minInterviews.investigate),
prioritize: thresholdMap(this.config.minInterviews.prioritize),
confirm: thresholdMap(this.config.minInterviews.confirm)
},
metadata: {
totalInterviews,
extractionDate: new Date().toISOString()
}
};
}
}
Why This Architecture Works
- Deterministic Pattern Matching: Regex arrays per category ensure consistent extraction across interviews. No subjective interpretation during the pass phase.
- Confidence Scoring: Commercial intent signals receive higher base confidence. Explicit markers (
budget, switch, immediately) trigger automatic boosts, aligning with the source's definition of buying signals.
- Threshold Enforcement: The
aggregateResults method enforces the 3/5/7 interview rule programmatically. Signals below investigate threshold are quarantined, preventing premature roadmap commitment.
- Auditability: Every extracted signal retains
interviewId, speaker, and rawText. Product managers can trace any prioritized item back to the exact transcript moment.
Pitfall Guide
Qualitative analysis pipelines fail when teams skip structural discipline. The following mistakes consistently degrade signal quality in production environments.
| Pitfall | Explanation | Fix |
|---|
| Conflating Feature Requests with Purchase Intent | Teams treat "I wish it had X" as equivalent to "I'd pay for X". Feature requests indicate workflow gaps; buying signals indicate commercial readiness. Mixing them inflates backlog priority with non-commercial items. | Enforce strict category separation during extraction. Route feature requests to product discovery, buying signals to pricing/GTM validation. |
| Ignoring Emotional Valence | Emotional markers (frustrated, finally, tired of) often precede or follow commercial intent. Stripping them removes context needed to weight signal urgency. | Run the emotional pass before buying intent. Cross-reference timestamps to attach emotional context to commercial markers during aggregation. |
| Premature Pattern Aggregation | Grouping signals after 1-2 interviews creates false confidence. Qualitative data requires minimum sample sizes before pattern recognition becomes statistically meaningful. | Implement hard thresholds in the pipeline. Quarantine signals below investigate count. Require secondary validation (surveys, usage data) before promotion. |
| Over-Indexing on Vocal Minorities | Highly articulate users dominate note-taking. Their signals get weighted equally with quiet majority feedback, skewing roadmap decisions. | Track speaker frequency and signal distribution. Apply weighting algorithms or require cross-user validation before prioritization. |
| Skipping the Commercial Intent Pass | Analysts assume buying signals will "naturally surface". They rarely do. Without a dedicated pass, commercial intent gets buried under feature noise. | Mandate Pass 4 in every workflow. Treat it as non-negotiable. Use explicit regex patterns for budget, switching, and urgency markers. |
| Treating Transcripts as Flat Text | Raw transcripts lack speaker attribution, timestamps, and turn boundaries. Flat processing loses conversational context and makes audit trails impossible. | Preprocess transcripts with diarization tools. Enforce Speaker: Text formatting before pipeline ingestion. Store raw and processed versions separately. |
| Static Configuration Drift | Regex patterns and thresholds become outdated as product positioning or market language evolves. Unversioned configs cause extraction inconsistency. | Version control extraction configs. Implement CI checks for pattern coverage. Review and update thresholds quarterly with GTM and product leads. |
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| < 10 interviews (early validation) | Manual multi-pass extraction | Low volume justifies human review; avoids pipeline overhead | Minimal (analyst time only) |
| 10-50 interviews (growth stage) | Semi-automated pipeline + human validation | Regex extraction handles volume; humans verify commercial intent context | Moderate (dev time + analyst review) |
| 50+ interviews (scale/enterprise) | Fully automated pipeline with confidence scoring | High volume requires deterministic extraction; thresholds prevent noise | Higher (infrastructure + maintenance) |
| Compliance/audit-heavy environments | Immutable storage + full attribution pipeline | Regulatory requirements demand traceability and versioned configs | High (storage + audit tooling) |
Configuration Template
{
"version": "1.2.0",
"minInterviews": {
"investigate": 3,
"prioritize": 5,
"confirm": 7
},
"patterns": {
"objection": [
"\\b(hesitant|concern|worried|risk|unclear|not sure)\\b",
"\\b(need to think|talk to team|wait for)\\b"
],
"feature_request": [
"\\b(wish|would be great|need|missing|add|support)\\b",
"\\b(currently do this manually|workaround|hack)\\b"
],
"emotional": [
"\\b(frustrated|tired|finally|excited|love|hate|confused|overwhelmed)\\b",
"\\b(honestly|seriously|unbelievable|game changer)\\b"
],
"buying_intent": [
"\\b(budget|approved|allocated|pay|cost|price)\\b",
"\\b(switch|migrate|replace|immediately|today|months)\\b",
"\\b(willing to|sign up|commit|purchase|subscribe)\\b"
]
},
"confidence": {
"base": {
"objection": 0.6,
"feature_request": 0.5,
"emotional": 0.4,
"buying_intent": 0.8
},
"explicitBoost": 0.15,
"explicitMarkers": ["budget", "pay", "switch", "immediately", "today", "months"]
}
}
Quick Start Guide
- Format your transcripts: Convert raw recordings into
Speaker: Text lines with timestamps. Use tools like Otter.ai, Rev, or Whisper for diarization if needed.
- Initialize the pipeline: Import the
InterviewSignalExtractor class, load the configuration template, and instantiate with your project-specific regex arrays.
- Run extraction passes: Call
processTranscript() for each interview. The pipeline automatically executes all four semantic passes and returns structured ExtractedSignal records.
- Aggregate and threshold: Pass all signals into
aggregateResults() with your total interview count. The method returns quarantined, investigate, prioritize, and confirm tiers based on your configured thresholds.
- Route to workflow: Export
prioritize and confirm signals to your product management tool. Attach original transcript references for stakeholder review and pricing experiment design.
This framework transforms unstructured conversation data into prioritization-ready signals. By enforcing sequential extraction, deterministic thresholds, and audit-ready attribution, teams eliminate guesswork from qualitative analysis and align product decisions with verified commercial intent.