The Hacker News Search API: Free, No-Key, and Surprisingly Powerful

By Codcompass Team·2026-06-01·7 min read

Programmatic Access to Hacker News: Architecting Queries Against the Algolia Index

Current Situation Analysis

Building automated workflows around Hacker News content—whether for competitive intelligence, trend tracking, or dataset collection—requires a reliable programmatic interface. The platform’s official Firebase API (hacker-news.firebaseio.com/v0/) only supports direct ID lookups and static list endpoints (topstories, newstories). It lacks query capabilities, forcing developers to either scrape HTML (fragile, rate-limited, and maintenance-heavy) or build custom indexing pipelines.

The gap is filled by an undocumented but publicly accessible endpoint powered by Algolia, the same search infrastructure Hacker News uses for its on-site search. The base URL https://hn.algolia.com/api/v1/ exposes full-text search, numeric filtering, and tag-based scoping without authentication or API keys. Despite its utility, the endpoint is frequently misunderstood. Teams treat it as a standard REST collection API, overlooking its search-engine architecture. This leads to silent failures when pagination ceilings are hit, rate budgets are exhausted, or relevance ranking corrupts chronological expectations.

Two hard constraints shape production architecture:

Pagination Ceiling: Algolia’s standard index limits retrieval to approximately 1,000 results per query. Deep pagination beyond this threshold returns empty or truncated datasets.
Rate Guidance: While no published SLA exists, community and infrastructure telemetry consistently point to a courtesy budget of ~10,000 requests per hour per IP. Exceeding this triggers silent throttling or temporary blocks.

These constraints are not bugs; they are architectural boundaries. Successful implementations treat the endpoint as a search index, not a database, and design around time-slicing, server-side filtering, and intelligent caching.

WOW Moment: Key Findings

The most critical architectural decision when working with Hacker News data is selecting the right access layer. The table below compares the three primary approaches developers encounter in production:

Approach	Search & Filtering	Max Retrieval Depth	Rate Constraints	Implementation Complexity
Firebase Official	None (ID-only lookups)	Unlimited (per ID)	Strict (no public SLA)	Low
Algolia Search	Full-text + numeric + tags	~1,000 per query	~10k req/hr (courtesy)	Medium
Custom Scraping	HTML parsing required	Unlimited	High risk of blocking	High

Why this matters: The Algolia endpoint is the only viable path for programmatic filtering, but its 1,000-result ceiling forces time-slicing strategies for large backfills. Teams that attempt to paginate past the limit or filter client-side will experience data loss and degraded performance. Recognizing that /search optimizes for relevance while /search_by_date optimizes for recency prevents structural mismatches in downstream pipelines.

Core Solution

Architecture Overview

A production-ready implementation separates query construction, execution, and result normalization. The architecture follows three principles:

Server-side filtering first: Push all numeric and tag constraints to the API. Never fetch 1,000 rows to discard 90

0 client-side. 2. Time-slicing for scale: Break large date ranges into discrete windows to stay under the 1,000-result cap. 3. Endpoint routing by use case: Use /search for topic discovery, /search_by_date for chronological feeds.

Implementation (TypeScript)

The following module demonstrates a typed, production-grade client. It abstracts parameter encoding, implements automatic time-slicing, and normalizes the Algolia response envelope.

interface HNQueryConfig {
  query: string;
  tags: string;
  minPoints: number;
  minComments: number;
  windowDays: number;
  maxPerPage: number;
}

interface HNSearchResponse {
  hits: Array<{
    objectID: string;
    title: string;
    url: string;
    author: string;
    points: number;
    num_comments: number;
    created_at: string;
    created_at_i: number;
  }>;
  nbHits: number;
  page: number;
  nbPages: number;
  hitsPerPage: number;
}

class HNIndexClient {
  private readonly baseUrl: string = 'https://hn.algolia.com/api/v1';
  private readonly defaultLimit: number = 50;

  async fetchChronologicalFeed(config: HNQueryConfig): Promise<HNSearchResponse['hits']> {
    const allResults: HNSearchResponse['hits'] = [];
    const now = Math.floor(Date.now() / 1000);
    const windowSize = 7 * 24 * 3600; // 7-day slices to stay under pagination cap
    let currentEnd = now;

    while (currentEnd > now - (config.windowDays * 24 * 3600)) {
      const currentStart = currentEnd - windowSize;
      const batch = await this.executeBatch({
        ...config,
        timeRange: { start: currentStart, end: currentEnd }
      });
      allResults.push(...batch);
      currentEnd = currentStart;
    }

    return allResults.sort((a, b) => b.created_at_i - a.created_at_i);
  }

  private async executeBatch(config: HNQueryConfig & { timeRange: { start: number; end: number } }): Promise<HNSearchResponse['hits']> {
    const params = new URLSearchParams({
      query: config.query,
      tags: config.tags,
      numericFilters: `points>${config.minPoints},num_comments>${config.minComments},created_at_i>=${config.timeRange.start},created_at_i<=${config.timeRange.end}`,
      hitsPerPage: String(Math.min(config.maxPerPage, this.defaultLimit)),
      page: '0'
    });

    const endpoint = `${this.baseUrl}/search_by_date?${params.toString()}`;
    const response = await fetch(endpoint);
    
    if (!response.ok) {
      throw new Error(`HN Index request failed: ${response.status} ${response.statusText}`);
    }

    const data: HNSearchResponse = await response.json();
    return data.hits;
  }
}

Architecture Decisions & Rationale

Time-Slicing Strategy: The 1,000-result pagination ceiling is enforced at the index level. By chunking queries into 7-day windows, we guarantee each batch stays well below the threshold while maintaining chronological integrity. This avoids the silent data loss that occurs when deep pagination is attempted.
Endpoint Selection: search_by_date is explicitly chosen for chronological feeds. The /search endpoint applies Algolia’s relevance scoring, which reorders results based on text match weight and engagement metrics. Using /search for time-series data introduces non-deterministic ordering.
Server-Side Numeric Filters: Conditions for points, num_comments, and created_at_i are concatenated with commas (AND logic) and passed directly to numericFilters. This shifts computational load to Algolia’s index, reducing payload size and network overhead.
Type Safety & Envelope Mapping: The Algolia response wraps results in a structured envelope (nbHits, page, nbPages). Mapping this to a strict TypeScript interface prevents runtime property access errors and enables IDE autocompletion for downstream processing.

Pitfall Guide

1. Assuming Unlimited Pagination

Explanation: Algolia’s standard index caps retrieval at ~1,000 results per query. Attempting to iterate page beyond this limit returns empty arrays without error codes. Fix: Implement time-slicing or domain/tag partitioning. Break broad queries into discrete created_at_i windows and aggregate results client-side.

2. Mixing Relevance and Recency Endpoints

Explanation: /search ranks by text relevance and engagement weight. /search_by_date ranks strictly by timestamp. Using the wrong endpoint corrupts chronological dashboards or topic discovery workflows. Fix: Route /search for keyword/topic exploration. Route /search_by_date for feeds, monitors, and time-bound exports.

3. Client-Side Filtering After Fetch

Explanation: Fetching 1,000 rows and filtering for points > 200 client-side wastes bandwidth, hits rate limits faster, and increases latency. Fix: Push all numeric constraints to numericFilters. The API evaluates them server-side before returning the payload.

4. Misconstructing Tag Logic

Explanation: Tags use comma for AND and parentheses for OR. Writing tags=story,show_hn incorrectly implies items must be both a story AND a Show HN post simultaneously, which is impossible. Fix: Use tags=(story,show_hn) for union logic. Use tags=story,author_pg for intersection. Validate tag combinations against the documented enum (story, comment, ask_hn, show_hn, poll, author_<username>).

5. Ignoring the Unofficial Rate Budget

Explanation: The ~10,000 requests/hour/IP guideline is not a published SLA. Exceeding it triggers silent throttling or temporary IP blocks, especially during backfills. Fix: Implement exponential backoff, request queuing, and response caching. Add jitter to scheduled jobs to avoid thundering herd patterns.

6. Relying on Algolia for Real-Time State

Explanation: The Algolia index updates on a crawl cycle, not instantly. It is optimized for search, not live consistency. Fix: Use Algolia for discovery and filtering. Cross-reference objectID values with the Firebase API (https://hacker-news.firebaseio.com/v0/item/{id}.json) when real-time comment counts or live state are required.

7. Mishandling `created_at_i` vs `created_at`

Explanation: created_at is an ISO string. created_at_i is a Unix timestamp. Filtering on the string field causes syntax errors or silent failures in numericFilters. Fix: Always use created_at_i for range queries. Convert JavaScript Date objects to Unix timestamps using Math.floor(date.getTime() / 1000) before injection.

Production Bundle

Action Checklist

Define query scope: Determine whether relevance (/search) or recency (/search_by_date) matches your use case.
Implement time-slicing: Partition large date ranges into 7-day or 14-day windows to stay under the 1,000-result cap.
Push filters server-side: Move all points, num_comments, and date constraints into numericFilters.
Validate tag syntax: Use parentheses for OR logic and commas for AND logic. Test combinations against the supported enum.
Add rate limiting & backoff: Implement a request queue with exponential backoff and jitter to respect the ~10k req/hr courtesy budget.
Cache responses: Store completed time windows in Redis or local storage to avoid redundant fetches during retries or restarts.
Cross-reference live state: Use Firebase /item/{id} endpoints only when real-time comment counts or post updates are required.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Tracking recent Show HN posts (>100 pts)	`/search_by_date` + `tags=show_hn` + `numericFilters`	Chronological ordering required; server-side filtering reduces payload	Low (single batch per window)
Backfilling 6 months of data	Time-sliced `/search_by_date` + Redis cache	Pagination cap forces windowing; caching prevents redundant requests	Medium (compute + storage for cache)
Monitoring a specific author	`tags=author_<username>` + `/search`	Relevance ranking surfaces high-impact posts; narrow tag scope stays under limits	Low
Real-time comment tracking	Firebase `/item/{id}` + Algolia discovery	Firebase provides live state; Algolia handles initial ID discovery	Low-Medium (Firebase has no search, requires two-step flow)
Building a trend dashboard	`/search` + `numericFilters` + client aggregation	Relevance scoring highlights emerging topics; aggregation normalizes time gaps	Medium (requires scheduling + cache)

Configuration Template

// hn-config.ts
export const HN_CONFIG = {
  baseUrl: 'https://hn.algolia.com/api/v1',
  endpoints: {
    relevance: '/search',
    recency: '/search_by_date',
    itemDetail: '/items'
  },
  limits: {
    maxHitsPerPage: 1000,
    recommendedHitsPerPage: 50,
    paginationCeiling: 1000,
    rateBudgetHourly: 10000
  },
  tags: {
    story: 'story',
    comment: 'comment',
    askHn: 'ask_hn',
    showHn: 'show_hn',
    poll: 'poll',
    authorPrefix: 'author_'
  },
  retry: {
    maxAttempts: 3,
    baseDelayMs: 1000,
    jitterMs: 500
  },
  cache: {
    ttlSeconds: 3600,
    keyPrefix: 'hn_cache_'
  }
};

export type HNTag = typeof HN_CONFIG.tags[keyof typeof HN_CONFIG.tags];
export type HNEndpoint = typeof HN_CONFIG.endpoints[keyof typeof HN_CONFIG.endpoints];

Quick Start Guide

Initialize the client: Import the configuration and instantiate the HNIndexClient. Set your query parameters (query, tags, minPoints, windowDays).
Execute a time-sliced fetch: Call fetchChronologicalFeed(). The client automatically chunks the date range, applies server-side filters, and returns a sorted array of hits.
Handle pagination & caching: Store completed time windows in your preferred cache layer. On subsequent runs, skip cached windows and only fetch new intervals.
Validate results: Check nbHits against expectations. If nbHits approaches the pagination ceiling, reduce the time window size or tighten numericFilters.
Cross-reference live data (optional): For items requiring real-time comment counts, map objectID values to https://hacker-news.firebaseio.com/v0/item/{id}.json and merge the payloads.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back