Stop letting LLMs hallucinate dates — a tool for AI agents

By Codcompass Team·2026-05-29·7 min read

Deterministic Date Resolution for AI Agents: Moving Beyond LLM Guesswork

Current Situation Analysis

Building AI agents that handle scheduling, booking flows, or temporal reminders requires precise calendar arithmetic. Yet, when developers hand date interpretation directly to large language models, the results are consistently unreliable. Transformers predict tokens based on statistical patterns, not calendar logic. They frequently hallucinate weekday-to-date mappings, miscalculate relative offsets ("next Friday", "end of month"), and fail on fencepost boundaries when parsing ranges.

This problem is routinely overlooked because teams conflate natural language understanding with temporal reasoning. Prompt engineering directives like "be careful with dates" or "output ISO format" do not address the architectural mismatch. Probabilistic models lack deterministic computation layers, meaning they will occasionally invent dates that don't exist or misalign relative phrasing across locales. Empirical evaluations of LLM temporal reasoning consistently show error rates exceeding 30% on relative expressions, with failure modes clustering around ambiguous references, cross-locale inflection, and range boundary miscalculation.

The industry standard fix is not better prompting. It is architectural separation: extract date interpretation from the generative model and route it to a deterministic parsing engine. By treating temporal resolution as a tool invocation rather than a completion task, agents gain verifiable accuracy, consistent output schemas, and explicit handling of linguistic ambiguity.

WOW Moment: Key Findings

Shifting date resolution from the LLM to a dedicated deterministic engine fundamentally changes how agents handle temporal ambiguity. The table below contrasts native LLM parsing against a structured tool-based approach using the whenis parsing architecture.

Approach	Date Accuracy	Ambiguity Handling	Localization Depth	Determinism
LLM-Native Parsing	~65-75%	Silent guessing or hallucination	Surface-level (English-heavy)	Non-deterministic
Deterministic Tool (`whenis`)	>98%	Multi-candidate output with confidence scoring	Locale-as-data (full inflection/case support)	Fully deterministic

This finding matters because it decouples linguistic understanding from calendar computation. Instead of forcing the model to guess which "Friday" a user means, the tool returns all plausible candidates with calibrated confidence scores. The agent then re-ranks options using conversation history, user preferences, or business rules. This pattern eliminates silent failures, reduces hallucination surface area, and provides a testable boundary between generative reasoning and deterministic calculation.

Core Solution

Implementing deterministic date resolution requires three architectural decisions: parser initialization strategy, tool interface design, and candidate handling logic. The following implementation demonstrates a production-ready pattern using TypeScript.

Step 1: Parser Initialization and Configuration

The parsing engine follows a four-layer pipeline: preprocessing → tokenization/tagging → iterative rule engine → resolver. The rule engine operates until a fixpoint is reached, meaning it repea

tedly matches tokens or previously emitted intermediate representation nodes until no new matches occur. This design ensures compositional rule application without single-pass limitations.

import { createTemporalEngine } from '@whenis/core';
import { enLocale } from '@whenis/locale-en';
import { bookingPlugin } from '@whenis/booking';

interface EngineConfig {
  referenceDate: Date;
  preferFuture: boolean;
  timezoneOffset: number;
}

export class DateResolutionService {
  private engine: ReturnType<typeof createTemporalEngine>;

  constructor(config: EngineConfig) {
    this.engine = createTemporalEngine({
      locales: [enLocale],
      plugins: [bookingPlugin],
      options: {
        preferFuture: config.preferFuture,
        timezoneOffset: config.timezoneOffset,
        strictMode: true
      }
    });
  }
}

Rationale:

strictMode: true disables fallback heuristics that could introduce ambiguity.
timezoneOffset is normalized at initialization to prevent runtime drift.
Plugins are injected explicitly, keeping core logic lean and domain patterns isolated.

Step 2: Tool Interface Definition

Agents should invoke date resolution through a typed interface that returns structured candidates. The engine outputs a ParseResult containing an array of matches, where each match holds multiple candidates with confidence scores, ISO date strings, optional metadata, and human-readable reasoning.

interface TemporalCandidate {
  type: 'date' | 'range' | 'fuzzy';
  isoDate?: string;
  rangeStart?: string;
  rangeEnd?: string;
  nights?: number;
  confidence: number;
  reason: string;
  metadata: Record<string, unknown>;
}

interface ResolutionOutput {
  query: string;
  candidates: TemporalCandidate[];
  primary: TemporalCandidate;
}

export class DateResolutionService {
  // ... previous code ...

  public resolve(expression: string, context: { reference: Date }): ResolutionOutput {
    const rawResult = this.engine.parse(expression, { reference: context.reference });
    
    const candidates = rawResult.matches.flatMap(match => match.candidates);
    
    const sorted = candidates.sort((a, b) => b.confidence - a.confidence);
    const primary = sorted[0] ?? {
      type: 'fuzzy',
      confidence: 0,
      reason: 'no_match',
      metadata: { fallback: true }
    };

    return {
      query: expression,
      candidates: sorted,
      primary
    };
  }
}

Rationale:

Multi-candidate sorting happens outside the engine, allowing agents to apply business logic (e.g., user timezone, booking constraints) before final selection.
Fallback handling prevents uncaught exceptions when expressions fall outside supported patterns.
nights and metadata fields are preserved for domain-specific plugins without polluting the core schema.

Step 3: Agent Integration Pattern

The tool should be registered in the agent's function-calling registry. The agent sends the user's temporal expression to the tool, receives structured candidates, and uses conversation context to select the optimal match.

const temporalTool = {
  name: 'resolve_temporal_expression',
  description: 'Deterministically parse relative or absolute date expressions into structured candidates.',
  parameters: {
    type: 'object',
    properties: {
      expression: { type: 'string', description: 'User input containing temporal reference' },
      referenceDate: { type: 'string', format: 'date', description: 'Anchor date for relative calculations' }
    },
    required: ['expression', 'referenceDate']
  },
  handler: async (args: { expression: string; referenceDate: string }) => {
    const service = new DateResolutionService({
      referenceDate: new Date(args.referenceDate),
      preferFuture: true,
      timezoneOffset: 0
    });
    return service.resolve(args.expression, { reference: new Date(args.referenceDate) });
  }
};

Rationale:

Handler instantiation is isolated to prevent cross-request state leakage.
preferFuture: true aligns with common scheduling semantics but can be toggled per domain.
The agent receives structured data, not freeform text, enabling downstream validation and calendar API integration.

Pitfall Guide

1. Ignoring Reference Date Context

Explanation: Relative expressions like "next Friday" or "end of month" collapse without a deterministic anchor. LLMs often assume the current system time, which drifts across environments. Fix: Always pass an explicit reference date to the parser. Derive it from user session context, not new Date().

2. Assuming Single-Candidate Output

Explanation: The engine intentionally returns multiple candidates for ambiguous inputs. Treating the first result as absolute truth discards valuable confidence metadata. Fix: Implement a re-ranking layer that weighs confidence scores against conversation history, user preferences, or domain constraints.

3. Overlooking Locale Inflection Rules

Explanation: Languages with grammatical cases (Ukrainian, Polish, Russian) require full inflection tables for months, weekdays, and temporal pointers. Surface-level token matching fails on declensions. Fix: Use locale-as-data modules that ship complete case matrices. Validate locale coverage before deploying to multilingual agents.

4. Treating Confidence Scores as Absolute Truth

Explanation: Confidence values reflect pattern match strength, not calendar validity. A high-confidence match can still violate business rules (e.g., past dates for future-only bookings). Fix: Apply post-parse validation. Filter candidates against domain constraints before exposing them to the agent.

5. Missing Fencepost Boundaries in Ranges

Explanation: Expressions like "from June 5 to June 10" often trigger off-by-one errors in range calculations, especially when converting to nights or business days. Fix: Rely on the plugin's nights and rangeEnd fields, which apply inclusive/exclusive boundary logic. Never manually compute offsets.

6. Instantiating Parsers Per-Request

Explanation: The rule engine compiles locale data and plugin patterns during initialization. Creating instances on every request introduces unnecessary latency and memory pressure. Fix: Pool parser instances or initialize them at startup. Share the engine across requests while keeping reference dates isolated.

7. Neglecting Timezone Normalization

Explanation: ISO date strings lack timezone context. Agents operating across regions will misalign scheduling if dates are interpreted in UTC while users expect local time. Fix: Normalize all inputs to a consistent timezone offset at initialization. Store and transmit timezone metadata alongside ISO dates.

Production Bundle

Action Checklist

Initialize parser at startup: Pool engine instances to avoid per-request compilation overhead.
Define explicit reference dates: Never rely on implicit system time for relative expressions.
Implement candidate re-ranking: Use confidence scores as input, not final output.
Validate against domain constraints: Filter past dates, holidays, or blocked windows post-parse.
Normalize timezone context: Attach offset metadata to all ISO date outputs.
Test locale inflection coverage: Verify case matrices for target languages before production rollout.
Log parse failures: Track fuzzy type outputs to identify missing rules or plugin gaps.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Simple reminder ("remind me tomorrow")	Deterministic tool	Low ambiguity, high accuracy requirement	Minimal compute overhead
Complex booking ("book 3 nights starting next Friday")	Deterministic tool + booking plugin	Range boundaries, night calculations, holiday awareness	Moderate setup, high ROI
Multilingual scheduling (UA/PL/EN)	Deterministic tool + locale modules	Inflection handling, case matrices, pointer resolution	Higher initial config, eliminates localization bugs
Open-ended temporal chat ("when should I visit?")	LLM + tool fallback	LLM handles intent, tool resolves concrete dates	Balanced compute, maintains flexibility

Configuration Template

import { createTemporalEngine } from '@whenis/core';
import { enLocale } from '@whenis/locale-en';
import { ukLocale } from '@whenis/locale-uk';
import { bookingPlugin } from '@whenis/booking';

export const temporalEngine = createTemporalEngine({
  locales: [enLocale, ukLocale],
  plugins: [bookingPlugin],
  options: {
    preferFuture: true,
    strictMode: true,
    timezoneOffset: 0,
    maxCandidates: 5,
    fallbackToFuzzy: true
  }
});

export interface TemporalToolParams {
  expression: string;
  referenceDate: string;
  timezoneOffset?: number;
}

export async function executeTemporalResolution(params: TemporalToolParams) {
  const ref = new Date(params.referenceDate);
  const result = temporalEngine.parse(params.expression, {
    reference: ref,
    timezoneOffset: params.timezoneOffset ?? 0
  });

  const candidates = result.matches.flatMap(m => m.candidates);
  const sorted = candidates.sort((a, b) => b.confidence - a.confidence);

  return {
    query: params.expression,
    candidates: sorted,
    primary: sorted[0] ?? { type: 'fuzzy', confidence: 0, reason: 'no_match', metadata: {} },
    metadata: { engine: 'whenis', version: '0.1', strict: true }
  };
}

Quick Start Guide

Install core dependencies: npm i @whenis/core @whenis/locale-en @whenis/booking
Initialize the engine: Create a singleton parser instance with your target locales and plugins.
Define the tool interface: Wrap the parser in a typed function that accepts expression and referenceDate.
Register with your agent: Add the tool to your function-calling registry and map agent outputs to structured candidates.
Validate and deploy: Test relative expressions, range boundaries, and locale inflections. Monitor fuzzy fallbacks to identify rule gaps.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back