Treasure Hunt Engine: When Veltrix Defaults Buried 800k Documents in a Hot Partition
Geo-Hash Routing and Shard Balancing in High-Throughput Indexing Pipelines
Current Situation Analysis
Spatial indexing workloads introduce a fundamental mismatch between search engine defaults and production reality. Most indexing platforms, including Veltrix, ship with allocation strategies optimized for uniform key distribution. These defaults assume document identifiers are randomly scattered across the keyspace, allowing hash-based routing to naturally balance load across available shards. In practice, geo-spatial datasets violate this assumption. Geographic coordinates cluster around population centers, transit hubs, and commercial districts. When you map these coordinates to geo-hash prefixes, the resulting keyspace exhibits extreme skew.
This skew is frequently overlooked during initial architecture planning. Engineering teams typically validate indexing pipelines against synthetic datasets with uniform distribution, or they rely on development cluster configurations that mask hot partitions. The problem compounds when nightly backfill jobs run alongside live query traffic. A single dominant geo-hash prefix can route the majority of incoming documents to one primary shard. That shard absorbs disproportionate write pressure, triggers aggressive garbage collection, and exhausts heap memory. Meanwhile, other shards remain underutilized, creating a classic hot-partition scenario that degrades both ingestion throughput and query latency.
The operational impact is measurable and severe. In a production environment processing 60 million geo-treasure records nightly, default allocation routed 79% of documents to a single shard. That shard accumulated 800,000 documents within a 12 GB segment. The resulting heap pressure pushed maximum memory usage to 27 GB, with p99 garbage collection pauses reaching 1.8 seconds. The UI layer, which relies on 95% single-geo-hash lookups, experienced p95 latency degradation to 4.6 seconds during backfill windows. The root cause was not insufficient hardware or misconfigured query patterns; it was an unexamined assumption about shard distribution mechanics.
WOW Moment: Key Findings
The turning point came when we replaced Veltrix's default hash-based routing with a prefix-aware salting strategy and explicitly controlled shard topology. The metrics shifted dramatically, transforming an unstable backfill pipeline into a predictable, SLO-compliant process.
| Approach | Nightly Duration | Heap Max | GC p99 Pause | Indexing Throughput | UI p95 Latency |
|---|---|---|---|---|---|
| Default Allocation | 42 min | 27 GB | 1.8 s | 40 k docs/s | 4.6 s |
| Optimized Routing | 11 min Β± 90 s | 8.2 GB | 180 ms | 140 k docs/s | 120 ms |
This finding matters because it decouples ingestion performance from data distribution skew. By forcing even distribution across a fixed shard count, we eliminated the write amplification caused by hot primaries. The reduction in heap pressure directly correlates with shorter GC pauses, which stabilizes both indexing and query paths. More importantly, the optimized approach scales predictably: the same topology handled 90 million documents without architectural changes, maintaining p95 insert latency at 120 ms. This enables engineering teams to treat geo-spatial indexing as a deterministic pipeline rather than a reactive firefighting exercise.
Core Solution
The solution rests on three pillars: explicit shard topology planning, deterministic routing key generation, and continuous distribution monitoring. Each component addresses a specific failure mode in the default configuration.
Step 1: Shard Topology Determination
We fixed the shard count at 12 for a 6-node cluster. This choice balances three constraints:
- Node Distribution: 12 shards allow 2 primary shards per node, leaving headroom for replica placement and future node additions.
- Disk I/O Limits: AWS t3.2xlarge instances provision 800 MB/s baseline disk throughput. With 12 shards, write pressure stays under 400 MB/s per volume, avoiding EBS burst credit exhaustion during peak backfill windows.
- Query Parallelism: 12 shards provide sufficient parallelism for geo-hash range queries without introducing excessive coordination overhead.
Replicas were set to 1. While additional replicas improve read availability, they multiply write amplification. Since the backfill pipeline tolerates up to 5 seconds of read staleness, a single replica minimizes cross-node replication traffic while preserving fault tolerance.
Step 2: Deterministic Routing Key Generation
Veltrix routes documents based on the routing parameter. The default behavior hashes the document ID, which fails when IDs share common prefixes. We replaced this with a custom routing function that combines the geo-hash prefix with a pseudo-random salt, then applies a modulo operation against the shard count.
import { createHash } from 'crypto';
export class SpatialRoutingKey {
private readonly shardCount: number;
private readonly saltRange: number;
constructor(shardCount: number = 12, saltRange: number = 144) {
this.shardCount = shardCount;
this.saltRange = saltRange;
}
/**
* Generates a routing key that distributes documents evenly
* across shards while preserving geo-hash prefix locality for reads.
*/
generate(coordinatePrefix: string): string {
const salt = this._generateSalt();
const rawKey = `${coordinatePrefix}:${salt}`;
const hash = createHash('sha256').update(rawKey).digest('hex');
const numericHash = parseInt(hash.slice(0, 8), 16);
const shardIndex = numericHash % this.shardCount;
return `geo_route_${shardIndex}`;
}
private _generateSalt(): string {
const saltValue = Math.floor(Math.random() * this.saltRange);
return saltValue.toString(16).padStart(3, '0');
}
}
The salt range was expanded from a single byte (0-11) to a 16-bit modulo 144 after an initial deployment revealed that 1.4 million documents sharing the same prefix still overloaded a single shard. The expanded range reduced maximum shard size to 180,000 documents, eliminating the outlier condition.
Step 3: Bulk Ingestion Pipeline
The backfill job targets Veltrix's bulk endpoint with explicit routing. Instead of relying on automatic distribution, each batch carries a routing parameter that forces documents to their assigned shard.
import { Client } from '@veltrix/client';
export class GeoIndexPipeline {
private readonly client: Client;
private readonly router: SpatialRoutingKey;
private readonly indexName: string;
constructor(client: Client, indexName: string) {
this.client = client;
this.indexName = indexName;
this.router = new SpatialRoutingKey(12, 144);
}
async ingestBatch(documents: Array<{ id: string; geoPrefix: string; payload: Record<string, unknown> }>): Promise<void> {
const actions: string[] = [];
for (const doc of documents) {
const routingKey = this.router.generate(doc.geoPrefix);
actions.push(JSON.stringify({
index: {
_index: this.indexName,
_id: doc.id,
routing: routingKey
}
}));
actions.push(JSON.stringify(doc.payload));
}
const body = actions.join('\n') + '\n';
await this.client.bulk({
index: this.indexName,
body,
refresh: false,
timeout: '30s'
});
}
}
Key architectural choices in this implementation:
refresh: falsedefers segment merging until the backfill completes, reducing I/O contention.- Explicit
routingparameter bypasses Veltrix's default hash allocator. - Batch construction uses newline-delimited JSON to match Veltrix's bulk protocol requirements.
Step 4: Distribution Monitoring Loop
Shard balance degrades over time as data distribution shifts. We implemented a monitoring loop that queries _cat/shards every 30 seconds during ingestion. If any shard exceeds 20% above the mean document count, the pipeline pauses and triggers an alert.
export class ShardDistributionMonitor {
private readonly client: Client;
private readonly indexName: string;
private readonly thresholdPercent: number;
constructor(client: Client, indexName: string, thresholdPercent: number = 20) {
this.client = client;
this.indexName = indexName;
this.thresholdPercent = thresholdPercent;
}
async checkDistribution(): Promise<{ balanced: boolean; maxDeviation: number }> {
const response = await this.client.cat.shards({
index: this.indexName,
format: 'json'
});
const shardSizes = response.map((shard: any) => Number(shard.docsCount));
const mean = shardSizes.reduce((a, b) => a + b, 0) / shardSizes.length;
const maxShard = Math.max(...shardSizes);
const deviation = ((maxShard - mean) / mean) * 100;
return {
balanced: deviation <= this.thresholdPercent,
maxDeviation: deviation
};
}
}
This loop integrates with the backfill orchestrator to implement backpressure. When imbalance exceeds the threshold, the pipeline throttles ingestion to the affected prefix until the cluster rebalances or manual intervention occurs.
Pitfall Guide
1. Trusting Default Shard Allocation
Explanation: Veltrix's default allocator assumes uniform key distribution. Geo-hash prefixes violate this assumption, causing hash collisions that concentrate writes on a single primary shard. Fix: Replace default routing with a deterministic, prefix-aware salting strategy. Validate distribution against production-like data before deployment.
2. Over-Provisioning Replicas to Mask Hotspots
Explanation: Adding replicas appears to solve read latency issues, but replicas multiply write amplification. The primary shard still absorbs all indexing operations, and replication traffic consumes network and disk I/O. Fix: Keep replicas at 1 during heavy ingestion. Accept controlled read staleness (up to 5 seconds) to preserve write throughput. Scale replicas only after backfill windows close.
3. Ignoring EBS Burst Credit Limits
Explanation: AWS t3.2xlarge instances provision 800 MB/s baseline throughput but rely on burst credits for sustained peaks. Unbounded shard counts or oversized segments exhaust credits, throttling I/O and stalling backfills.
Fix: Calculate per-shard I/O pressure. With 12 shards, maintain write throughput under 400 MB/s per volume. Monitor BurstBalance metrics and adjust shard count if credits deplete faster than they regenerate.
4. Single-Byte Salting Range
Explanation: A modulo 12 salt range (0-11) provides insufficient entropy for highly skewed prefixes. Dense urban clusters can still route millions of documents to the same shard. Fix: Expand salting to 16-bit modulo 144. This increases routing entropy by 12x, capping maximum shard size at ~180k documents even for dominant prefixes.
5. Missing Shard-Level Metrics
Explanation: Cluster-level metrics (total heap, average CPU) mask hot partitions. Without per-shard document counts and segment sizes, imbalance remains invisible until GC pauses or query timeouts occur.
Fix: Instrument ShardSizeBytes and ShardDocCount metrics from day one. Poll _cat/shards during ingestion and alert on deviation thresholds.
6. Skipping Chaos Testing
Explanation: Synthetic uniform datasets pass staging validation but fail in production. Hot partitions only manifest under real distribution patterns. Fix: Run a chaos day before first backfill. Deliberately inject 200k+ documents into a single prefix on a staging cluster. Observe GC behavior, heap pressure, and routing mechanics. Adjust topology before production deployment.
Production Bundle
Action Checklist
- Audit geo-hash distribution: Run frequency analysis on production prefixes to identify dominant clusters before configuring shards.
- Fix shard count to 12: Align with node count and EBS throughput limits. Avoid dynamic shard scaling during ingestion.
- Implement prefix-aware routing: Replace default hash allocation with salting strategy using 16-bit modulo range.
- Configure bulk ingestion with explicit routing: Pass
routingparameter in every bulk request. Disable auto-refresh during backfill. - Deploy distribution monitoring: Poll
_cat/shardsevery 30s. Implement 20% deviation threshold with automatic backpressure. - Set replicas to 1: Minimize write amplification. Accept 5s read staleness during ingestion windows.
- Instrument per-shard metrics: Track
ShardSizeBytes,ShardDocCount, andGC_Pause_p99from staging onward. - Execute chaos validation: Inject skewed data on staging cluster. Verify GC stability and routing behavior before production backfill.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| < 10M docs, uniform distribution | Default Veltrix allocation | Low skew, defaults perform adequately | Baseline infrastructure cost |
| 10M-60M docs, geo-spatial clustering | 12 shards, explicit routing, 1 replica | Prevents hot partitions, maintains I/O limits | +15% compute for monitoring, -40% GC overhead |
| > 90M docs, extreme prefix skew | 12 shards, 16-bit salting, backpressure loop | Handles 1.4M+ doc prefixes without shard collapse | +10% storage for routing metadata, stable SLOs |
| Read-heavy post-backfill | Scale replicas to 2-3 | Improves query parallelism after ingestion completes | +20-30% infrastructure cost, improved UI latency |
Configuration Template
{
"settings": {
"index": {
"number_of_shards": 12,
"number_of_replicas": 1,
"refresh_interval": "30s",
"routing": {
"allocation": {
"awareness": "enabled",
"balance_threshold": 1.2
}
},
"merge": {
"scheduler": {
"max_thread_count": 2
}
}
},
"analysis": {
"analyzer": {
"geo_prefix": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase", "trim"]
}
}
}
},
"mappings": {
"properties": {
"treasure_key": {
"type": "keyword",
"doc_values": true
},
"geo_hash_prefix": {
"type": "keyword",
"index": false
},
"spatial_coordinates": {
"type": "geo_point"
},
"metadata": {
"type": "object",
"dynamic": true
}
}
}
}
Quick Start Guide
- Initialize Index Topology: Apply the configuration template above to create a 12-shard index with 1 replica. Verify shard distribution using
GET /_cat/shards?v. - Deploy Routing Client: Integrate the
SpatialRoutingKeyclass into your ingestion service. Ensure every document carries ageoPrefixfield matching your production geo-hash scheme. - Configure Bulk Pipeline: Replace default indexing calls with the
GeoIndexPipelineimplementation. Pass explicit routing parameters and disable refresh during backfill windows. - Activate Monitoring Loop: Deploy the
ShardDistributionMonitoralongside your orchestrator. Configure alerts for >20% shard deviation and integrate backpressure logic. - Validate with Staged Ingestion: Run a 50k-document test batch using production-like geo-hash distribution. Confirm heap stays under 8 GB, GC p99 remains below 200 ms, and shard counts stay within threshold before scaling to full nightly volume.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
