gorithmic analysis, frequency mapping, and main-thread UI synchronization. The following architecture uses AudioWorklet for deterministic processing, a YIN-inspired periodicity detector for robustness, and hysteresis smoothing to prevent note flickering.
Step 1: Audio Thread Setup
Modern browsers require AudioWorklet for real-time analysis. Unlike ScriptProcessorNode, worklets run on the audio rendering thread and communicate with the main thread via MessagePort.
// pitch-worklet.ts
class PitchDetectorWorklet extends AudioWorkletProcessor {
private buffer: Float32Array;
private bufferSize: number;
private sampleRate: number;
constructor() {
super();
this.bufferSize = 1024;
this.buffer = new Float32Array(this.bufferSize);
this.sampleRate = 48000; // Fallback, overridden by context
}
process(inputs: Float32Array[][], outputs: Float32Array[][], parameters: Record<string, Float32Array>): boolean {
const input = inputs[0][0];
if (!input) return true;
// Copy input to internal buffer
this.buffer.set(input);
// Run periodicity detection
const frequency = this.detectPeriodicity(this.buffer, this.sampleRate);
// Send result to main thread
if (frequency > 0) {
this.port.postMessage({ type: 'pitch', frequency });
}
return true;
}
private detectPeriodicity(signal: Float32Array, sr: number): number {
// Simplified YIN-inspired difference function
const halfSize = Math.floor(signal.length / 2);
const difference = new Float32Array(halfSize);
for (let tau = 1; tau < halfSize; tau++) {
let sum = 0;
for (let i = 0; i < halfSize; i++) {
const delta = signal[i] - signal[i + tau];
sum += delta * delta;
}
difference[tau] = sum;
}
// Cumulative mean normalization
let runningSum = 0;
difference[0] = 1;
for (let tau = 1; tau < halfSize; tau++) {
runningSum += difference[tau];
difference[tau] *= tau / runningSum;
}
// Find first dip below threshold
const threshold = 0.15;
let tauEstimate = -1;
for (let tau = 2; tau < halfSize; tau++) {
if (difference[tau] < threshold) {
while (tau + 1 < halfSize && difference[tau + 1] < difference[tau]) {
tau++;
}
tauEstimate = tau;
break;
}
}
if (tauEstimate === -1) return 0;
return sr / tauEstimate;
}
}
registerProcessor('pitch-detector-worklet', PitchDetectorWorklet);
Step 2: Main Thread Orchestration
The main thread handles permission requests, context lifecycle, and UI updates. It must never block the audio thread.
// audio-engine.ts
export class PitchExtractionEngine {
private context: AudioContext | null = null;
private stream: MediaStream | null = null;
private workletNode: AudioWorkletNode | null = null;
private onPitchUpdate: (freq: number) => void;
constructor(callback: (freq: number) => void) {
this.onPitchUpdate = callback;
}
async initialize(): Promise<void> {
if (!window.AudioContext && !(window as any).webkitAudioContext) {
throw new Error('Web Audio API not supported');
}
this.context = new AudioContext({ sampleRate: 48000 });
this.stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const source = this.context.createMediaStreamSource(this.stream);
await this.context.audioWorklet.addModule('pitch-worklet.js');
this.workletNode = new AudioWorkletNode(this.context, 'pitch-detector-worklet');
source.connect(this.workletNode);
// Do not connect to destination to avoid feedback loop
this.workletNode.port.onmessage = (event) => {
if (event.data.type === 'pitch') {
this.onPitchUpdate(event.data.frequency);
}
};
}
stop(): void {
this.stream?.getTracks().forEach(track => track.stop());
this.context?.close();
this.workletNode = null;
this.context = null;
}
}
Step 3: Frequency-to-Note Mapping
Equal temperament tuning uses a logarithmic scale. The conversion relies on Math.log2 for precision.
// note-mapper.ts
const CHROMATIC_SCALE = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B'];
const A4_FREQUENCY = 440;
const MIDI_A4 = 69;
export function frequencyToMidiNote(freq: number): number {
if (freq <= 0) return -1;
return Math.round(12 * Math.log2(freq / A4_FREQUENCY)) + MIDI_A4;
}
export function midiToNoteName(midi: number): string {
if (midi < 0 || midi > 127) return '---';
const octave = Math.floor(midi / 12) - 1;
const index = ((midi % 12) + 12) % 12; // Handle negative modulo safely
return `${CHROMATIC_SCALE[index]}${octave}`;
}
export function getCentDeviation(freq: number, targetMidi: number): number {
const targetFreq = A4_FREQUENCY * Math.pow(2, (targetMidi - MIDI_A4) / 12);
return 1200 * Math.log2(freq / targetFreq);
}
Architecture Rationale
- AudioWorklet over AnalyserNode:
AnalyserNode exposes FFT data but runs on the main thread when accessed via getByteFrequencyData(). AudioWorklet guarantees real-time execution without UI blocking.
- Buffer size 1024: At 48kHz, this yields ~21ms latency. Larger buffers (2048/4096) improve frequency resolution but push latency beyond the perceptual threshold for real-time feedback.
- YIN difference function: Squaring the difference between
signal[i] and signal[i+tau] emphasizes periodicity while suppressing noise. Cumulative mean normalization prevents low-frequency bias.
- No destination connection: Routing audio to
context.destination creates a feedback loop. The worklet processes silently.
- Cent deviation: Provides sub-note precision for tuning applications, enabling visual indicators that show sharp/flat direction and magnitude.
Pitfall Guide
1. Deprecated ScriptProcessorNode Usage
Explanation: ScriptProcessorNode was deprecated in 2014 because it runs on the main thread, causing audio glitches when the UI thread is busy. Modern browsers may still support it for legacy reasons, but it violates real-time audio guarantees.
Fix: Migrate to AudioWorklet. Use registerProcessor() and AudioWorkletNode. Ensure the worklet file is loaded via audioWorklet.addModule() before instantiation.
2. Sample Rate Assumptions
Explanation: Hardcoding 44100 or 48000 breaks frequency calculations when the hardware or OS selects a different sample rate. AudioContext.sampleRate varies by device and platform.
Fix: Always read context.sampleRate dynamically. Pass it to the worklet via port.postMessage or constructor parameters. Recalculate buffer sizes and frequency mappings based on the actual rate.
3. Harmonic Locking in FFT
Explanation: FFT peak-picking identifies the strongest spectral component. Musical instruments produce rich harmonic series where overtones often exceed the fundamental in amplitude. This causes the detector to report octaves or fifths above the actual pitch.
Fix: Use periodicity-based algorithms (autocorrelation or YIN) for musical signals. Reserve FFT for voice activity detection or simple tone generation where harmonics are minimal.
4. Unbounded Buffer Sizes
Explanation: Increasing fftSize or worklet buffer size improves frequency resolution but increases latency linearly. A 4096-sample buffer at 48kHz introduces ~85ms of delay, which breaks real-time feedback loops.
Fix: Cap buffers at 1024 or 2048 for interactive applications. Use zero-padding or interpolation if higher resolution is required without increasing latency.
5. Missing Silence/Noise Thresholding
Explanation: Raw pitch detectors output spurious frequencies during silence or background noise. This causes UI flickering and false note jumps.
Fix: Implement RMS energy calculation before pitch detection. If RMS < threshold, suppress output. Typical threshold: 0.01 to 0.02 for normalized float32 audio.
6. MIDI Note Flickering
Explanation: Floating-point frequency drift causes rapid toggling between adjacent MIDI notes, especially during vibrato or unstable playing.
Fix: Apply hysteresis or exponential moving average (EMA) smoothing. Only update the note when the frequency crosses a deadband threshold, or use a rolling window median filter.
7. Secure Context & HTTPS Restrictions
Explanation: getUserMedia() requires a secure context (HTTPS or localhost). Browsers block microphone access on HTTP domains, causing silent failures or permission errors.
Fix: Deploy over HTTPS. Use localhost for development. Handle NotAllowedError gracefully with user-facing prompts explaining the security requirement.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Voice command / UI feedback | FFT Peak-Picking | Low CPU, fast response, harmonics are minimal | Near-zero infrastructure cost |
| Guitar/Bass tuner | Autocorrelation | Robust to string harmonics, acceptable latency | Moderate CPU, no external deps |
| Professional tuning / Studio tool | YIN Algorithm | Eliminates octave errors, sub-cent precision | Higher CPU, requires careful buffer tuning |
| Polyphonic chord analysis | ML-based / WebAssembly | Periodicity detectors fail on overlapping fundamentals | Requires model hosting or WASM compilation |
Configuration Template
// production-config.ts
export const AudioPipelineConfig = {
sampleRate: 48000,
bufferSize: 1024,
workletModulePath: '/assets/pitch-worklet.js',
rmsThreshold: 0.015,
hysteresisDeadband: 2.5, // Hz
smoothingAlpha: 0.3, // EMA factor
maxRetries: 3,
secureContextRequired: true,
uiUpdateInterval: 16, // ms (~60fps)
};
export function validateEnvironment(): boolean {
const isSecure = window.isSecureContext || location.hostname === 'localhost';
const hasAudioContext = typeof AudioContext !== 'undefined';
const hasWorklet = typeof AudioWorklet !== 'undefined';
return isSecure && hasAudioContext && hasWorklet;
}
Quick Start Guide
- Create the worklet file: Save the
PitchDetectorWorklet code as pitch-worklet.js in your public assets directory. Ensure it's served with correct MIME types.
- Initialize the engine: Import
PitchExtractionEngine, pass a callback, and call initialize(). Handle NotAllowedError with a user prompt.
- Map and smooth: Convert incoming frequencies using
frequencyToMidiNote(), apply EMA smoothing, and calculate cent deviation for visual feedback.
- Bind to UI: Update your tuner interface on a
requestAnimationFrame loop using the smoothed values. Never update DOM directly from the worklet message handler.
- Test and deploy: Run on HTTPS, verify latency stays under 30ms, and validate note stability across different instruments and volume levels.