I built a Zero-Allocation C# Knowledge Graph (because JVM graphs are too bloated)

By Codcompass Team·2026-05-18·8 min read

Architecting Sub-Millisecond Graph Traversal for AI Agents Using Forward Star Arrays in C#

Current Situation Analysis

AI agents are rapidly evolving beyond simple text generation. Modern agentic workflows require relational memory: the ability to trace connections, detect cycles, and perform multi-hop reasoning across entities. When an agent needs to answer queries like "Which suppliers share a dependency with this failing component?" or "Find the shortest influence path between User A and User B," vector search and relational tables fall short. They lack the native topology required for efficient graph traversal.

The industry standard response is to provision a dedicated graph database, typically a Java-based container like Neo4j or JanusGraph. While powerful, this introduces significant operational overhead. For local development, edge deployments, or high-frequency agent loops, spinning up a multi-gigabyte JVM container is often disproportionate to the workload.

Developers attempting to solve this in-memory within .NET frequently default to Object-Oriented Programming (OOP) models. This approach creates a hidden performance tax known as the Object Header Tax. In a naive implementation, every node and edge becomes a managed object.

Consider a modest knowledge graph with 150,000 nodes and 233,000 edges. An OOP representation generates approximately 383,000 distinct object headers scattered across the managed heap. Traversing this structure forces the CPU to chase pointer references across non-contiguous memory regions, resulting in severe cache misses. Furthermore, the .NET Garbage Collector (GC) must track and potentially compact this massive object graph, introducing unpredictable latency spikes. For AI agents requiring deterministic, sub-millisecond response times, this architecture is fundamentally unsuited.

WOW Moment: Key Findings

By abandoning object references in favor of a Forward Star (or Compressed Sparse Row) representation, graph traversal latency drops by orders of magnitude while eliminating heap allocations during query execution. The following comparison highlights the performance delta between a standard OOP approach and a zero-allocation array-based engine, based on benchmarks of a graph with 150,000 nodes and 233,333 edges.

Approach	Heap Allocations (Traversal)	CPU Cache Efficiency	25-Hop Path Latency	4-Hop Neighborhood Scan	Disk Persistence (27.46 MB)
OOP Graph Model	~383k objects	Poor (Random Access)	>50 ms (est.)	>15 ms (est.)	>500 ms
Forward Star Arrays	0	High (Sequential)	5.3 ms	0.44 ms	30 ms

Why this matters: The Forward Star representation transforms graph traversal from a pointer-chasing problem into a memory bandwidth problem. By storing relationships in contiguous primitive arrays, the CPU hardware prefetcher can load data efficiently. The result is that a 25-hop shortest path search completes in 5.3 milliseconds, and a full 4-hop neighborhood expansion takes 0.44 milliseconds. Persistence is equally accelerated; raw memory arrays can be serialized to disk in 30 milliseconds and revived in 41 milliseconds, enabling rapid state checkpointing for long-running agent sessions.

Core Solution

The architecture relies on the Forward Star data structure. Instead of objects, the graph is represented by four parallel integer arrays. This design ensures that all data for a specific node's adjacency list is stored contiguously, maximizing cache utilization.

1. Data Structure Design

The en

gine maintains four arrays:

nodeOffsets: Maps each node ID to the starting index of its edges in the edge arrays.
destinations: Stores the target node ID for each edge.
edgeTypes: Stores the relationship type (encoded as an integer) for each edge.
nextEdges: Implements a linked list within the arrays to chain edges belonging to the same source node.

public sealed class GraphEngine
{
    private int[] _nodeOffsets;
    private int[] _destinations;
    private int[] _edgeTypes;
    private int[] _nextEdges;
    
    private int _edgeCount;
    private readonly int _maxNodes;
    private readonly int _maxEdges;

    public GraphEngine(int maxNodes, int maxEdges)
    {
        _maxNodes = maxNodes;
        _maxEdges = maxEdges;
        
        // Pre-allocate arrays to ensure zero-allocation during growth
        _nodeOffsets = new int[maxNodes];
        _destinations = new int[maxEdges];
        _edgeTypes = new int[maxEdges];
        _nextEdges = new int[maxEdges];
        
        // Initialize offsets to -1 (indicating no edges)
        for (int i = 0; i < maxNodes; i++)
        {
            _nodeOffsets[i] = -1;
        }
        
        _edgeCount = 0;
    }
}

2. Zero-Allocation Edge Insertion

Adding an edge involves updating the nextEdges pointer and writing to the parallel arrays. No objects are created. The nodeOffsets array is updated only when the first edge is added for a node.

public void AddEdge(int sourceNode, int targetNode, int relationType)
{
    if (_edgeCount >= _maxEdges)
        throw new InvalidOperationException("Edge capacity exceeded.");
    if (sourceNode >= _maxNodes || targetNode >= _maxNodes)
        throw new IndexOutOfRangeException("Node ID out of bounds.");

    int edgeIndex = _edgeCount++;
    
    // Insert at the head of the linked list for this node
    _destinations[edgeIndex] = targetNode;
    _edgeTypes[edgeIndex] = relationType;
    _nextEdges[edgeIndex] = _nodeOffsets[sourceNode];
    _nodeOffsets[sourceNode] = edgeIndex;
}

3. Zero-Allocation Traversal (BFS)

Traversal algorithms must avoid allocating queues or visited sets. This is achieved using Span<T> and stack allocation or pre-allocated buffers. The following Breadth-First Search (BFS) implementation uses a Span<int> for the queue, ensuring the traversal itself generates zero garbage.

public ReadOnlySpan<int> FindShortestPath(int startNode, int endNode, Span<int> pathBuffer)
{
    if (startNode == endNode) return Span<int>.Empty;

    // Pre-allocated visited array (reuse across calls in production)
    // For this example, we assume a separate visited tracker exists or is passed.
    // In a production bundle, use ArrayPool<int> for the visited array.
    
    Span<int> queue = stackalloc int[_maxNodes]; 
    int head = 0;
    int tail = 0;
    
    queue[tail++] = startNode;
    
    // Parent tracking for path reconstruction
    // In production, use a shared array to avoid allocation
    Span<int> parent = stackalloc int[_maxNodes];
    parent.Fill(-1);

    while (head < tail)
    {
        int current = queue[head++];
        
        if (current == endNode)
        {
            return ReconstructPath(parent, startNode, endNode, pathBuffer);
        }

        // Traverse adjacency list using Forward Star
        int edgeIdx = _nodeOffsets[current];
        while (edgeIdx != -1)
        {
            int neighbor = _destinations[edgeIdx];
            if (parent[neighbor] == -1 && neighbor != startNode)
            {
                parent[neighbor] = current;
                queue[tail++] = neighbor;
            }
            edgeIdx = _nextEdges[edgeIdx];
        }
    }

    return ReadOnlySpan<int>.Empty; // Path not found
}

4. Raw Memory Persistence

Serialization overhead is eliminated by treating the arrays as raw byte blocks. Using MemoryMarshal, the engine writes the exact memory layout of the arrays directly to disk. This bypasses JSON/XML parsers and reflection entirely.

public void SaveToDisk(string filePath)
{
    using var fs = new FileStream(filePath, FileMode.Create, FileAccess.Write, FileShare.None);
    
    // Write metadata
    fs.Write(BitConverter.GetBytes(_maxNodes));
    fs.Write(BitConverter.GetBytes(_maxEdges));
    fs.Write(BitConverter.GetBytes(_edgeCount));
    
    // Write raw array bytes
    WriteRawArray(fs, _nodeOffsets);
    WriteRawArray(fs, _destinations);
    WriteRawArray(fs, _edgeTypes);
    WriteRawArray(fs, _nextEdges);
}

private void WriteRawArray(FileStream fs, int[] array)
{
    ReadOnlySpan<byte> bytes = MemoryMarshal.AsBytes(array.AsSpan());
    fs.Write(bytes);
}

Architecture Rationale:

Forward Star over Adjacency Matrix: Adjacency matrices consume $O(N^2)$ memory, which is prohibitive for sparse graphs. Forward Star uses $O(V + E)$ space.
Parallel Arrays over Struct Arrays: While struct[] can be cache-friendly, parallel int[] arrays allow for more granular control over memory layout and easier raw serialization via MemoryMarshal.
Pre-allocation: By sizing arrays to maximum expected capacity, the engine avoids resizing operations that would trigger allocations and data copying during runtime.

Pitfall Guide

Sparse Node ID Mapping
- Explanation: If node IDs are UUIDs or large integers, using them directly as array indices wastes massive amounts of memory.
- Fix: Maintain a Dictionary<string, int> or a custom hash map to map sparse external IDs to dense internal indices ($0$ to $N-1$). The graph engine should only operate on dense indices.
Using List<int> for Edge Storage
- Explanation: List<T> introduces indirection and may trigger internal array resizing, causing allocations and GC pressure.
- Fix: Use fixed-size arrays or ArrayPool<int> for dynamic scenarios. Never use List<T> in the hot path of graph traversal.
Allocating BFS Queues
- Explanation: Instantiating Queue<int> or List<int> inside a traversal method generates garbage proportional to the graph size.
- Fix: Use Span<T> with stackalloc for small graphs, or reuse a pre-allocated array via ArrayPool<int> for larger traversals.
String-Based Edge Types
- Explanation: Storing relation types as strings in the edge arrays destroys cache locality and increases memory footprint.
- Fix: Enumerate relation types and store them as int or byte. Maintain a static lookup table for string-to-int mapping at the API boundary.
Ignoring Cache Line Boundaries
- Explanation: Accessing multiple arrays simultaneously can cause cache thrashing if the arrays are not aligned or if the traversal pattern jumps between them inefficiently.
- Fix: Ensure arrays are allocated contiguously where possible. Structure traversal loops to access _destinations and _nextEdges sequentially. Profile with hardware counters to verify cache hit rates.
JSON/XML Persistence
- Explanation: Text-based serialization adds CPU overhead and increases file size, slowing down load/save cycles.
- Fix: Use binary serialization via MemoryMarshal.AsBytes for raw arrays. This reduces persistence time from hundreds of milliseconds to tens of milliseconds.
Thread-Safety Assumptions
- Explanation: The Forward Star structure is not inherently thread-safe for concurrent writes. Multiple threads adding edges can corrupt the linked list pointers.
- Fix: Implement partition-based concurrency (e.g., lock per node range) or use lock-free append-only structures if multi-threaded writes are required. For read-heavy agent workloads, snapshot isolation is often sufficient.

Production Bundle

Action Checklist

Map Sparse IDs: Implement a dense ID mapper before initializing the graph engine.
Pre-allocate Capacity: Size arrays based on expected maximum nodes and edges to avoid runtime growth.
Use Span for Traversal: Replace all collection-based queues with Span<T> or pooled arrays in traversal algorithms.
Encode Edge Types: Convert all string relation types to integer enums at the ingestion layer.
Binary Persistence: Implement MemoryMarshal-based save/load routines for sub-50ms checkpointing.
Benchmark GC: Run traversal loops under a profiler to verify zero allocations in the hot path.
Validate Round-Trip: Ensure data integrity after save/load cycles by comparing graph hashes.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
AI Agent Context (< 10M edges)	Forward Star C# Engine	Sub-millisecond latency, zero-allocation, embeddable.	Low (No external infra).
Rapid Prototyping (< 100k edges)	OOP Graph / In-Memory DB	Development speed, ease of use.	Low.
Distributed Graph (> 100M edges)	Neo4j / JanusGraph	Horizontal scaling, ACID transactions, cluster management.	High (Infra/DevOps).
Relational Queries + Graph	SQL with Recursive CTEs	Leverages existing SQL expertise, good for shallow graphs.	Medium.

Configuration Template

Use this builder pattern to initialize the graph engine with production-ready settings.

public class GraphBuilder
{
    private int _maxNodes = 1_000_000;
    private int _maxEdges = 5_000_000;
    private string _persistencePath = "graph.bin";

    public GraphBuilder WithCapacity(int nodes, int edges)
    {
        _maxNodes = nodes;
        _maxEdges = edges;
        return this;
    }

    public GraphBuilder WithPersistence(string path)
    {
        _persistencePath = path;
        return this;
    }

    public GraphEngine Build()
    {
        var engine = new GraphEngine(_maxNodes, _maxEdges);
        
        if (File.Exists(_persistencePath))
        {
            engine.LoadFromDisk(_persistencePath);
        }
        
        return engine;
    }
}

Quick Start Guide

Initialize: Create a GraphEngine with estimated capacity.

var graph = new GraphEngine(maxNodes: 200_000, maxEdges: 500_000);

Ingest Data: Add nodes and edges using integer IDs and types.

graph.AddEdge(1, 2, relationType: 10); // 1 -> 2 of type 10
graph.AddEdge(2, 3, relationType: 10);

Traverse: Execute a shortest path search using a pre-allocated buffer.

Span<int> pathBuffer = stackalloc int[100];
var path = graph.FindShortestPath(1, 3, pathBuffer);

Persist: Save the graph state to disk in milliseconds.
```
graph.SaveToDisk("agent_memory.bin");
```

This architecture enables AI agents to maintain rich, relational memory with performance characteristics that rival native CPU operations, eliminating the need for heavy external dependencies while ensuring deterministic latency.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back