back tolerance for coincidental character distribution.
Core Solution
The robust implementation replaces string splitting with a deterministic state machine. It iterates through the raw text once, tracking quote context, field boundaries, and row termination. Type coercion and delimiter detection are layered as configurable middleware.
1. Quoted Field Parser (RFC 4180 Compliant)
function parseCSVLine(line) {
const result = [];
let current = '';
let inQuotes = false;
for (let i = 0; i < line.length; i++) {
const char = line[i];
if (char === '"') {
if (inQuotes && line[i + 1] === '"') {
// Escaped double-quote: "" inside a quoted field = literal "
current += '"';
i++;
} else {
inQuotes = !inQuotes;
}
} else if (char === ',' && !inQuotes) {
result.push(current);
current = '';
} else {
current += char;
}
}
result.push(current);
return result;
}
2. Multi-Line Safe Parser
function parseCSV(text) {
const rows = [];
let row = [];
let field = '';
let inQuotes = false;
for (let i = 0; i < text.length; i++) {
const c = text[i];
if (inQuotes) {
if (c === '"') {
if (text[i + 1] === '"') {
field += '"'; // escaped quote
i++;
} else {
inQuotes = false; // end of quoted field
}
} else {
field += c;
}
} else {
if (c === '"') {
inQuotes = true;
} else if (c === ',') {
row.push(field);
field = '';
} else if (c === '\n' || (c === '\r' && text[i + 1] === '\n')) {
if (c === '\r') i++; // skip \r in CRLF
row.push(field);
field = '';
if (row.some(f => f !== '')) rows.push(row); // skip empty lines
row = [];
} else {
field += c;
}
}
}
// Last field + row
if (field || row.length > 0) {
row.push(field);
if (row.some(f => f !== '')) rows.push(row);
}
return rows;
}
3. Type Coercion Middleware
function coerceValue(val) {
if (val === '' || val === null || val === undefined) return null;
if (val === 'true') return true;
if (val === 'false') return false;
const num = Number(val);
if (!isNaN(num) && val.trim() !== '') return num;
return val;
}
4. Delimiter Auto-Detection
function detectDelimiter(firstLine) {
const candidates = [',', ';', '\t', '|'];
const counts = candidates.map(d => ({
delimiter: d,
count: firstLine.split(d).length - 1
}));
return counts.sort((a, b) => b.count - a.count)[0].delimiter;
}
5. Full Integration: CSV to JSON Objects
function csvToJson(csvText, options = {}) {
const { coerce = true, delimiter = null } = options;
const rows = parseCSV(csvText);
if (rows.length === 0) return [];
const sep = delimiter || detectDelimiter(csvText.split('\n')[0]);
// Re-parse with the detected delimiter if needed
// (For simplicity, the parseCSV above uses ',' — extend it to accept a delimiter param)
const headers = rows[0].map(h => h.trim());
const results = [];
for (let i = 1; i < rows.length; i++) {
const obj = {};
headers.forEach((header, j) => {
const raw = rows[i][j] ?? '';
obj[header] = coerce ? coerceValue(raw.trim()) : raw.trim();
});
results.push(obj);
}
return results;
}
6. Validation Test Case
const csv = `name,age,city,bio
Alice,30,Boston,"Loves hiking, camping."
Bob,25,NYC,"Says ""hello"" a lot."
Charlie,,,"
Multi-line
bio"`;
console.log(JSON.stringify(csvToJson(csv), null, 2));
Expected output:
[
{ "name": "Alice", "age": 30, "city": "Boston", "bio": "Loves hiking, camping." },
{ "name": "Bob", "age": 25, "city": "NYC", "bio": "Say \"hello\" a lot." },
{ "name": "Charlie", "age": null, "city": null, "bio": "\nMulti-line\nbio" }
]
Pitfall Guide
- Naive Delimiter Splitting: Using
split(',') ignores RFC 4180 quoting rules. Fields containing commas must be parsed within quote context, otherwise data integrity breaks immediately.
- Ignoring Quote Context for Newlines: Splitting on
\n prematurely fractures multi-line quoted fields. Always track inQuotes state before treating a newline as a row terminator.
- Aggressive Type Coercion: Auto-converting strings to numbers/booleans destroys leading zeros (e.g.,
"01234" → 1234) and exact string representations. Keep coercion opt-in via configuration flags.
- CRLF Line Ending Mismatch: Windows exports use
\r\n. Naive parsers leave \r attached to the final field of each row. Explicitly handle \r consumption when detecting \n.
- Delimiter Misdetection: Frequency-based auto-detection can fail if payload data contains more of one character than the actual delimiter by coincidence. Validate against known locale patterns or allow explicit override.
- Empty Row/Field Handling: Failing to filter trailing empty lines or mishandling
undefined values produces malformed JSON objects. Use row.some(f => f !== '') guards and null-coalescing (??) for missing columns.
Deliverables
📐 Blueprint: CSV-to-JSON State Machine Architecture
A single-pass character iterator that maintains inQuotes, field, and row state. Delimiter detection runs as a lightweight pre-scan on the header row. Type coercion is injected as a pluggable middleware step. The architecture guarantees O(n) time complexity, O(1) auxiliary space per field, and full RFC 4180 compliance without regex backtracking.
✅ Production Readiness Checklist