0 client-side.
2. Time-slicing for scale: Break large date ranges into discrete windows to stay under the 1,000-result cap.
3. Endpoint routing by use case: Use /search for topic discovery, /search_by_date for chronological feeds.
Implementation (TypeScript)
The following module demonstrates a typed, production-grade client. It abstracts parameter encoding, implements automatic time-slicing, and normalizes the Algolia response envelope.
interface HNQueryConfig {
query: string;
tags: string;
minPoints: number;
minComments: number;
windowDays: number;
maxPerPage: number;
}
interface HNSearchResponse {
hits: Array<{
objectID: string;
title: string;
url: string;
author: string;
points: number;
num_comments: number;
created_at: string;
created_at_i: number;
}>;
nbHits: number;
page: number;
nbPages: number;
hitsPerPage: number;
}
class HNIndexClient {
private readonly baseUrl: string = 'https://hn.algolia.com/api/v1';
private readonly defaultLimit: number = 50;
async fetchChronologicalFeed(config: HNQueryConfig): Promise<HNSearchResponse['hits']> {
const allResults: HNSearchResponse['hits'] = [];
const now = Math.floor(Date.now() / 1000);
const windowSize = 7 * 24 * 3600; // 7-day slices to stay under pagination cap
let currentEnd = now;
while (currentEnd > now - (config.windowDays * 24 * 3600)) {
const currentStart = currentEnd - windowSize;
const batch = await this.executeBatch({
...config,
timeRange: { start: currentStart, end: currentEnd }
});
allResults.push(...batch);
currentEnd = currentStart;
}
return allResults.sort((a, b) => b.created_at_i - a.created_at_i);
}
private async executeBatch(config: HNQueryConfig & { timeRange: { start: number; end: number } }): Promise<HNSearchResponse['hits']> {
const params = new URLSearchParams({
query: config.query,
tags: config.tags,
numericFilters: `points>${config.minPoints},num_comments>${config.minComments},created_at_i>=${config.timeRange.start},created_at_i<=${config.timeRange.end}`,
hitsPerPage: String(Math.min(config.maxPerPage, this.defaultLimit)),
page: '0'
});
const endpoint = `${this.baseUrl}/search_by_date?${params.toString()}`;
const response = await fetch(endpoint);
if (!response.ok) {
throw new Error(`HN Index request failed: ${response.status} ${response.statusText}`);
}
const data: HNSearchResponse = await response.json();
return data.hits;
}
}
Architecture Decisions & Rationale
- Time-Slicing Strategy: The 1,000-result pagination ceiling is enforced at the index level. By chunking queries into 7-day windows, we guarantee each batch stays well below the threshold while maintaining chronological integrity. This avoids the silent data loss that occurs when deep pagination is attempted.
- Endpoint Selection:
search_by_date is explicitly chosen for chronological feeds. The /search endpoint applies Algolia’s relevance scoring, which reorders results based on text match weight and engagement metrics. Using /search for time-series data introduces non-deterministic ordering.
- Server-Side Numeric Filters: Conditions for
points, num_comments, and created_at_i are concatenated with commas (AND logic) and passed directly to numericFilters. This shifts computational load to Algolia’s index, reducing payload size and network overhead.
- Type Safety & Envelope Mapping: The Algolia response wraps results in a structured envelope (
nbHits, page, nbPages). Mapping this to a strict TypeScript interface prevents runtime property access errors and enables IDE autocompletion for downstream processing.
Pitfall Guide
Explanation: Algolia’s standard index caps retrieval at ~1,000 results per query. Attempting to iterate page beyond this limit returns empty arrays without error codes.
Fix: Implement time-slicing or domain/tag partitioning. Break broad queries into discrete created_at_i windows and aggregate results client-side.
2. Mixing Relevance and Recency Endpoints
Explanation: /search ranks by text relevance and engagement weight. /search_by_date ranks strictly by timestamp. Using the wrong endpoint corrupts chronological dashboards or topic discovery workflows.
Fix: Route /search for keyword/topic exploration. Route /search_by_date for feeds, monitors, and time-bound exports.
3. Client-Side Filtering After Fetch
Explanation: Fetching 1,000 rows and filtering for points > 200 client-side wastes bandwidth, hits rate limits faster, and increases latency.
Fix: Push all numeric constraints to numericFilters. The API evaluates them server-side before returning the payload.
4. Misconstructing Tag Logic
Explanation: Tags use comma for AND and parentheses for OR. Writing tags=story,show_hn incorrectly implies items must be both a story AND a Show HN post simultaneously, which is impossible.
Fix: Use tags=(story,show_hn) for union logic. Use tags=story,author_pg for intersection. Validate tag combinations against the documented enum (story, comment, ask_hn, show_hn, poll, author_<username>).
5. Ignoring the Unofficial Rate Budget
Explanation: The ~10,000 requests/hour/IP guideline is not a published SLA. Exceeding it triggers silent throttling or temporary IP blocks, especially during backfills.
Fix: Implement exponential backoff, request queuing, and response caching. Add jitter to scheduled jobs to avoid thundering herd patterns.
6. Relying on Algolia for Real-Time State
Explanation: The Algolia index updates on a crawl cycle, not instantly. It is optimized for search, not live consistency.
Fix: Use Algolia for discovery and filtering. Cross-reference objectID values with the Firebase API (https://hacker-news.firebaseio.com/v0/item/{id}.json) when real-time comment counts or live state are required.
7. Mishandling created_at_i vs created_at
Explanation: created_at is an ISO string. created_at_i is a Unix timestamp. Filtering on the string field causes syntax errors or silent failures in numericFilters.
Fix: Always use created_at_i for range queries. Convert JavaScript Date objects to Unix timestamps using Math.floor(date.getTime() / 1000) before injection.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Tracking recent Show HN posts (>100 pts) | /search_by_date + tags=show_hn + numericFilters | Chronological ordering required; server-side filtering reduces payload | Low (single batch per window) |
| Backfilling 6 months of data | Time-sliced /search_by_date + Redis cache | Pagination cap forces windowing; caching prevents redundant requests | Medium (compute + storage for cache) |
| Monitoring a specific author | tags=author_<username> + /search | Relevance ranking surfaces high-impact posts; narrow tag scope stays under limits | Low |
| Real-time comment tracking | Firebase /item/{id} + Algolia discovery | Firebase provides live state; Algolia handles initial ID discovery | Low-Medium (Firebase has no search, requires two-step flow) |
| Building a trend dashboard | /search + numericFilters + client aggregation | Relevance scoring highlights emerging topics; aggregation normalizes time gaps | Medium (requires scheduling + cache) |
Configuration Template
// hn-config.ts
export const HN_CONFIG = {
baseUrl: 'https://hn.algolia.com/api/v1',
endpoints: {
relevance: '/search',
recency: '/search_by_date',
itemDetail: '/items'
},
limits: {
maxHitsPerPage: 1000,
recommendedHitsPerPage: 50,
paginationCeiling: 1000,
rateBudgetHourly: 10000
},
tags: {
story: 'story',
comment: 'comment',
askHn: 'ask_hn',
showHn: 'show_hn',
poll: 'poll',
authorPrefix: 'author_'
},
retry: {
maxAttempts: 3,
baseDelayMs: 1000,
jitterMs: 500
},
cache: {
ttlSeconds: 3600,
keyPrefix: 'hn_cache_'
}
};
export type HNTag = typeof HN_CONFIG.tags[keyof typeof HN_CONFIG.tags];
export type HNEndpoint = typeof HN_CONFIG.endpoints[keyof typeof HN_CONFIG.endpoints];
Quick Start Guide
- Initialize the client: Import the configuration and instantiate the
HNIndexClient. Set your query parameters (query, tags, minPoints, windowDays).
- Execute a time-sliced fetch: Call
fetchChronologicalFeed(). The client automatically chunks the date range, applies server-side filters, and returns a sorted array of hits.
- Handle pagination & caching: Store completed time windows in your preferred cache layer. On subsequent runs, skip cached windows and only fetch new intervals.
- Validate results: Check
nbHits against expectations. If nbHits approaches the pagination ceiling, reduce the time window size or tighten numericFilters.
- Cross-reference live data (optional): For items requiring real-time comment counts, map
objectID values to https://hacker-news.firebaseio.com/v0/item/{id}.json and merge the payloads.