Static Data Journalism at Scale: Processing Federal Crash Records with Astro and Vanilla DOM

Current Situation Analysis

Modern web development has heavily normalized client-side state management for data presentation. Teams routinely reach for React, Vue, or Svelte paired with D3 or AG-Grid to render tabular reports, assuming that interactivity requires a virtual DOM and reactive bindings. This approach introduces significant overhead: larger JavaScript bundles, hydration delays, and complex state synchronization for data that rarely changes after publication.

The problem is often overlooked because developers conflate interactivity with client-side rendering. A sortable table or a trend visualization does not require a framework. When the dataset is static or updated on a predictable schedule, pre-rendering the markup at build time and attaching lightweight vanilla event listeners yields faster initial paint, zero hydration cost, and simpler maintenance.

Data from the National Highway Traffic Safety Administration (NHTSA) Fatality Analysis Reporting System (FARS) demonstrates this clearly. The FARS dataset ships as annual CSV archives (~30 MB per year) containing every fatal traffic crash in the United States. Processing seven years of records (2018–2024) involves parsing ~230 MB of raw text, joining vehicle and accident tables, filtering for medium/heavy trucks, and aggregating fatality counts across 19 metropolitan areas. Despite the volume, the entire pipeline can be distilled into a 12 KB JSON payload. A complete Astro 6 build with sortable columns, inline SVG sparklines, and responsive styling completes in under two days of development, with zero framework dependencies and a final bundle under 15 KB.

WOW Moment: Key Findings

The architectural shift from framework-heavy SPAs to static pre-rendering with minimal DOM manipulation produces measurable performance and maintenance gains. The following comparison isolates the operational differences when handling a ~33,000-row dataset across two common approaches.

Approach	Initial Load (LCP)	Sorting Latency	Bundle Size	Maintenance Overhead
Framework SPA (React + AG-Grid)	1.8s – 2.4s	45ms – 80ms (re-render)	140 KB – 210 KB	High (state sync, deps, hydration)
Static Astro + Vanilla DOM	0.3s – 0.5s	<10ms (direct node manipulation)	8 KB – 15 KB	Low (build-time generation, no state)

Why this matters: Static pre-rendering eliminates the hydration tax entirely. Sorting operates directly on DOM nodes rather than triggering virtual DOM diffing. The data payload shrinks by 99.9% because only the aggregated series is shipped, not raw rows. This pattern is ideal for editorial dashboards, public data reports, and archival analytics where real-time updates are unnecessary.

Core Solution

The implementation spans three distinct layers: a Python data pipeline, an Astro build-time renderer, and a vanilla TypeScript controller for interactivity. Each layer is optimized for predictability and minimal runtime cost.

Step 1: Data Pipeline (Python)

The FARS dataset requires careful handling of encoding inconsistencies and directory structure changes across release years. The pipeline performs three operations:

Normalize CSV headers by stripping UTF-8 BOM artifacts
Map vehicle body types (codes 60–69) to accident records
Aggregate fatals by city and year, outputting a compact JSON structure

import csv
import json
import os
from pathlib import Path
from collections import defaultdict

class FarsDataProcessor:
    def __init__(self, base_dir: Path, output_path: Path):
        self.base_dir = base_dir
        self.output_path = output_path
        self.year_layouts = {
            2018: "FARS2018",
            2019: "FARS2019",
            2020: "FARS2020/FARS2020NationalCSV",
            2021: "FARS2021/FARS2021NationalCSV",
            2022: "FARS2022/FARS2022NationalCSV",
            2023: "FARS2023/FARS2023NationalCSV",
            2024: "FARS2024/FARS2024NationalCSV",
        }

    def _open_clean_csv(self, file_path: Path):
        raw = open(file_path, encoding="latin-1")
        reader = csv.DictReader(raw)
        if reader.fieldnames and reader.fieldnames[0].startswith("ï»¿"):
            reader.fieldnames[0] = reader.fieldnames[0][3:]
        return raw, reader

    def extract_truck_fatalities(self, target_years: list[int]) -> dict:
        aggregated = defaultdict(lambda: defaultdict(int))
        
        for year in target_years:
            layout = self.year_layouts[year]
            vehicle_path = self.base_dir / layout / "vehicle.csv"
            accident_path = self.base_dir / layout / "accident.csv"

            truck_incidents = set()
            _, v_reader = self._open_clean_csv(vehicle_path)
            for row in v_reader:
                body_type = int(row.get("BODY_TYP") or 0)
                if 60 <= body_type <= 69:
                    state = int(row["STATE"])
                    case_id = int(row["ST_CASE"])
                    truck_incidents.add((state, case_id))

            _, a_reader = self._open_clean_csv(accident_path)
            for row in a_reader:
                state = int(row["STATE"])
                case_id = int(row["ST_CASE"])
                if (state, case_id) in truck_incidents:
                    city_code = int(row.get("CITY", 0) or 0)
                    fatals = int(row["FATALS"])
                    aggregated[(state, city_code)][year] += fatals

        return dict(aggregated)

    def compile_report(self, target_cities: dict, years: list[int]) -> str:
        raw_data = self.extract_truck_fatalities(years)
        report = []
        for (state, city), series in raw_data.items():
            city_info = target_cities.get((state, city))
            if not city_info:
                continue
            baseline = sum(series.get(y, 0) for y in years[:3]) / 3
            recent = sum(series.get(y, 0) for y in years[-3:]) / 3
            change_pct = ((recent - baseline) / baseline) * 100 if baseline else 0
            report.append({
                "name": city_info["name"],
                "state": state,
                "series": {str(y): series.get(y, 0) for y in years},
                "baseline_avg": round(baseline, 1),
                "recent_avg": round(recent, 1),
                "change_pct": round(change_pct, 1),
            })
        
        self.output_path.write_text(json.dumps(report, indent=2))
        return json.dumps(report)

Architecture Rationale: Python is chosen for its robust CSV handling and memory efficiency. The two-pass join (vehicle → accident) avoids loading entire datasets into RAM. BOM stripping is handled manually to maintain compatibility with legacy Latin-1 encoded rows. The output is pre-aggregated, reducing client-side computation to zero.

Step 2: Astro Build-Time Rendering

Astro 6 compiles the JSON payload into static HTML. The component generates table rows and inline SVG sparklines during the build phase. No client-side data fetching occurs.

---
import cityData from "../../data/fars-report.json";
import { generateSparkline } from "../../utils/sparkline";

const years = [2018, 2019, 2020, 2021, 2022, 2023, 2024];
const sortedCities = [...cityData].sort((a, b) => b.change_pct - a.change_pct);
---

<table class="sortable-report">
  <thead>
    <tr>
      <th class="sort-col" data-sort="text" data-default-dir="asc">City</th>
      <th class="sort-col" data-sort="num" data-default-dir="desc">2018</th>
      <th class="sort-col" data-sort="num" data-default-dir="desc">2024</th>
      <th class="sort-col" data-sort="num" data-default-dir="desc">Change</th>
      <th class="trend-col">Trend</th>
    </tr>
  </thead>
  <tbody>
    {sortedCities.map((city) => (
      <tr>
        <td data-v={city.name}>{city.name}</td>
        <td data-v={city.series["2018"]}>{city.series["2018"]}</td>
        <td data-v={city.series["2024"]}>{city.series["2024"]}</td>
        <td data-v={city.change_pct}>
          {city.change_pct > 0 ? `+${city.change_pct}%` : `${city.change_pct}%`}
        </td>
        <td>{generateSparkline(city.series, years)}</td>
      </tr>
    ))}
  </tbody>
</table>

Step 3: Vanilla TypeScript Sorting Controller

Sorting operates on data-v attributes to ensure type consistency. The controller uses HTMLTableCellElement.cellIndex to map headers to columns accurately, avoiding index drift caused by non-sortable cells.

interface SortConfig {
  type: "text" | "num";
  direction: "asc" | "desc" | null;
}

export class TableSorter {
  private table: HTMLTableElement;
  private headers: HTMLTableCellElement[];
  private body: HTMLTableSectionElement;
  private activeConfig: Map<number, SortConfig> = new Map();

  constructor(tableSelector: string) {
    this.table = document.querySelector(tableSelector)!;
    this.headers = Array.from(this.table.querySelectorAll("th.sort-col"));
    this.body = this.table.querySelector("tbody")!;
    this.bindEvents();
  }

  private bindEvents(): void {
    this.headers.forEach((header) => {
      header.addEventListener("click", () => this.handleSort(header));
    });
  }

  private handleSort(header: HTMLTableCellElement): void {
    const colIndex = header.cellIndex;
    const sortType = header.dataset.sort as "text" | "num";
    const current = this.activeConfig.get(colIndex);
    const defaultDir = (header.dataset.defaultDir as "asc" | "desc") || "asc";
    
    const nextDir = !current || current.direction === "desc" ? defaultDir : "desc";
    
    this.activeConfig.set(colIndex, { type: sortType, direction: nextDir });
    this.updateHeaderStates(header, nextDir);
    this.sortRows(colIndex, sortType, nextDir);
  }

  private updateHeaderStates(activeHeader: HTMLTableCellElement, dir: string): void {
    this.headers.forEach((h) => h.removeAttribute("data-active"));
    activeHeader.setAttribute("data-active", dir);
  }

  private sortRows(colIndex: number, type: "text" | "num", dir: "asc" | "desc"): void {
    const rows = Array.from(this.body.querySelectorAll("tr"));
    const pinned = rows.filter((r) => r.classList.contains("summary-row"));
    const sortable = rows.filter((r) => !r.classList.contains("summary-row"));

    sortable.sort((a, b) => {
      const aVal = a.children[colIndex]?.getAttribute("data-v") ?? "";
      const bVal = b.children[colIndex]?.getAttribute("data-v") ?? "";
      
      const comparison = type === "num"
        ? parseFloat(aVal) - parseFloat(bVal)
        : aVal.localeCompare(bVal);
        
      return dir === "asc" ? comparison : -comparison;
    });

    [...sortable, ...pinned].forEach((row) => this.body.appendChild(row));
  }
}

document.querySelectorAll("table.sortable-report").forEach((el) => {
  new TableSorter(el as HTMLTableElement);
});

Why this structure works:

data-v decouples display formatting from sorting logic. Visible text like "+13.1%" or "—" never interferes with numeric comparison.
cellIndex guarantees column alignment regardless of DOM structure changes or non-sortable placeholders.
DOM node reordering via appendChild is hardware-accelerated and avoids reflow penalties associated with innerHTML replacement.

Pitfall Guide

1. BOM Byte Corruption in CSV Headers

Explanation: Newer FARS releases prepend a UTF-8 Byte Order Mark (\xef\xbb\xbf) to the first column header. When read as Latin-1, this becomes ï»¿STATE, causing KeyError on standard header lookups. Fix: Open files with utf-8-sig or manually strip the first three characters from fieldnames[0] before iteration. Always validate header names against a whitelist.

2. Array Index vs DOM Cell Index Mismatch

Explanation: Using Array.prototype.forEach index ((th, i)) maps to the position within the filtered headers list, not the actual <tr>. Non-sortable columns (like trend visuals) shift numeric indices, causing silent sorting failures. Fix: Always use HTMLTableCellElement.cellIndex. It returns the true DOM position within the parent row, immune to sibling filtering.

3. JSX Parser Tag Leakage in Table Rows

Explanation: Astro's compiler can emit unclosed or phantom inline formatting tags when JSX expressions containing <strong> or <em> are mapped directly inside table cells. This breaks CSS inheritance and causes unexpected font-weight propagation. Fix: Pre-compute values in the frontmatter. Avoid mapping JSX with inline formatting tags adjacent to other formatted rows. Inspect raw HTML output for unbalanced tags during development.

4. Browser Auto-Dark Halation Effects

Explanation: Chrome's experimental auto-dark mode inverts light backgrounds, causing white text on dark backgrounds to appear heavier due to optical halation. This mimics a font-weight: 700 effect even when CSS specifies 400. Fix: Apply color-scheme: only light to the root element and include <meta name="color-scheme" content="only light">. This overrides experimental flags and extension overrides.

5. Font Weight Perception vs Actual CSS

Explanation: Inter at 400 weight often reads as semi-bold on high-DPI displays due to subpixel rendering and OS-level font smoothing. Users may report "everything looks bold" even when computed styles are correct. Fix: Use font-weight: 350 or switch to a typeface with tighter default metrics for body text. Test on multiple OS renderers (macOS, Windows, Linux) before finalizing typography scales.

6. Ignoring Data Type Coercion in Sort Comparators

Explanation: Sorting strings that look like numbers ("10", "2") lexicographically yields incorrect order. Conversely, parsing non-numeric data-v values returns NaN, breaking comparison logic. Fix: Validate data-v types at render time. Use parseFloat() with fallback to 0 for numeric columns. For text columns, use localeCompare() with explicit locale settings.

7. Hardcoding Year Ranges in Static Builds

Explanation: Embedding year arrays directly in components forces manual updates when new data releases. This breaks CI/CD pipelines and increases maintenance friction. Fix: Derive year ranges dynamically from the JSON payload or environment variables. Pass Object.keys(series).map(Number).sort() to ensure the table adapts to dataset changes automatically.

Production Bundle

Action Checklist

Validate CSV headers against a strict schema before processing
Strip BOM bytes explicitly to prevent cross-year parsing failures
Use cellIndex instead of array indices for DOM column mapping
Decouple display formatting from sort values using data-v attributes
Pre-render SVG sparklines at build time to eliminate client-side charting
Apply color-scheme: only light to prevent browser auto-dark halation
Audit raw HTML output for unclosed inline tags after JSX mapping
Implement ARIA live regions for sortable tables to support screen readers

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Dataset < 50K rows, updated monthly	Static Astro + Vanilla JS	Zero hydration, instant sort, minimal bundle	Low dev cost, near-zero hosting
Real-time streaming data (>10 req/sec)	Framework SPA + WebSocket	Requires live state sync and virtual DOM	High infra cost, complex state management
Complex cross-filtering & drill-down	AG-Grid / React Table	Built-in pivot, export, and accessibility	Medium dev cost, larger bundle
Editorial data reports / archives	Pre-rendered JSON + DOM sort	Fastest LCP, SEO-friendly, easy maintenance	Lowest long-term maintenance

Configuration Template

// src/utils/sparkline.ts
export function generateSparkline(series: Record<string, number>, years: number[]): string {
  const values = years.map((y) => series[y] ?? 0);
  const min = Math.min(...values);
  const max = Math.max(...values);
  const range = Math.max(max - min, 1);
  
  const width = 80;
  const height = 20;
  const points = values.map((v, i) => {
    const x = (i / (values.length - 1)) * width;
    const y = height - ((v - min) / range) * height;
    return `${x.toFixed(1)},${y.toFixed(1)}`;
  }).join(" ");
  
  const mean = values.reduce((a, b) => a + b, 0) / values.length;
  const lastVal = values[values.length - 1];
  const dotColor = lastVal > mean ? "#b91c1c" : "#047857";
  const lastY = height - ((lastVal - min) / range) * height;
  
  return `<svg viewBox="0 0 ${width} ${height}" width="${width}" height="${height}" role="img" aria-label="Trend visualization">
    <polyline points="${points}" fill="none" stroke="#9ca3af" stroke-width="1.4" stroke-linejoin="round" stroke-linecap="round" />
    <circle cx="${width}" cy="${lastY.toFixed(1)}" r="2.6" fill="${dotColor}" />
  </svg>`;
}

/* src/styles/table.css */
.sortable-report {
  width: 100%;
  border-collapse: collapse;
  font-family: system-ui, sans-serif;
}
.sortable-report th {
  cursor: pointer;
  user-select: none;
  padding: 0.75rem;
  text-align: left;
  border-bottom: 2px solid #e5e7eb;
}
.sortable-report th[data-active="asc"]::after { content: " ↑"; }
.sortable-report th[data-active="desc"]::after { content: " ↓"; }
.sortable-report td {
  padding: 0.75rem;
  border-bottom: 1px solid #f3f4f6;
}
.sortable-report td[data-v=""] { color: #9ca3af; }

Quick Start Guide

Download & Extract FARS Archives: Pull the 2018–2024 CSV zips from https://static.nhtsa.gov/nhtsa/downloads/FARS/{year}/National/FARS{year}NationalCSV.zip and extract to a local data/fars/ directory.
Run the Python Pipeline: Execute python process_fars.py --input data/fars --output public/data/report.json --years 2018 2019 2020 2021 2022 2023 2024. Verify the 12 KB JSON output.
Initialize Astro Project: Run npm create astro@latest and place the JSON in src/data/. Copy the TypeScript sorter and SVG generator into src/utils/.
Build & Deploy: Run npm run build. The static output contains pre-rendered HTML, zero client-side framework code, and interactive sorting via vanilla DOM manipulation. Deploy to any static host (Vercel, Netlify, Cloudflare Pages).

Building a sortable data-journalism page in Astro: 33K FARS records, sparklines, and the bugs that nearly broke it