ta residency policies.
- SQL as the Universal Interface: Every connected source exposes a schema namespace (e.g.,
github, calendar). Tables map to API endpoints. Columns map to flattened response fields. This standardizes cross-source queries into a single declarative language.
- Asynchronous CLI Bridge: Instead of blocking the event loop, the bridge uses
spawn with promise resolution. This allows concurrent query execution and prevents agent hangs during slow API responses.
- Context Trimming by Design: Queries use
WHERE, GROUP BY, and LIMIT clauses to minimize payload size. The LLM receives only aggregated or filtered rows, not raw API dumps.
Implementation
The following TypeScript implementation demonstrates a production-ready pattern. It replaces synchronous blocking calls with an async bridge, introduces schema discovery, and isolates LLM interaction.
1. Query Bridge (coral-bridge.ts)
import { spawn } from 'child_process';
import { promisify } from 'util';
const execAsync = promisify(require('child_process').exec);
export class QueryBridge {
private readonly timeoutMs = 25000;
async execute<T>(sql: string): Promise<T[]> {
return new Promise((resolve, reject) => {
const proc = spawn('coral', ['sql', '--format', 'json', sql.trim()], {
stdio: ['pipe', 'pipe', 'pipe'],
env: { ...process.env, FORCE_COLOR: '0' }
});
let stdout = '';
let stderr = '';
proc.stdout.on('data', (chunk: Buffer) => stdout += chunk.toString());
proc.stderr.on('data', (chunk: Buffer) => stderr += chunk.toString());
const timer = setTimeout(() => {
proc.kill('SIGTERM');
reject(new Error(`Query timeout after ${this.timeoutMs}ms`));
}, this.timeoutMs);
proc.on('close', (code) => {
clearTimeout(timer);
if (code !== 0) {
reject(new Error(`Coral exited with code ${code}: ${stderr}`));
return;
}
try {
resolve(JSON.parse(stdout) as T[]);
} catch (e) {
reject(new Error(`Failed to parse JSON output: ${stdout}`));
}
});
});
}
async discoverTables(schema: string): Promise<string[]> {
const rows = await this.execute<{ table_name: string }>(
`SELECT table_name FROM coral.tables WHERE schema_name = '${schema}'`
);
return rows.map(r => r.table_name);
}
async inspectColumns(schema: string, table: string): Promise<string[]> {
const rows = await this.execute<{ column_name: string }>(
`SELECT column_name FROM coral.columns WHERE schema_name = '${schema}' AND table_name = '${table}'`
);
return rows.map(r => r.column_name);
}
}
2. Data Orchestrator (data-orchestrator.ts)
import { QueryBridge } from './coral-bridge';
export interface CommitMetric {
date: string;
repository: string;
count: number;
}
export interface PullRequest {
title: string;
status: string;
created_at: string;
}
export class DataOrchestrator {
constructor(private bridge: QueryBridge) {}
async fetchEngineeringMetrics(owner: string, windowDays: number) {
const commits = await this.bridge.execute<CommitMetric>(`
SELECT
CAST(commit__author__date AS DATE) AS date,
repo AS repository,
COUNT(*) AS count
FROM github.commits
WHERE owner = '${owner}'
AND commit__author__date >= NOW() - INTERVAL '${windowDays} days'
GROUP BY 1, 2
ORDER BY 1 DESC, 3 DESC
LIMIT 50
`);
const prs = await this.bridge.execute<PullRequest>(`
SELECT title, state, created_at
FROM github.pulls
WHERE owner = '${owner}'
AND created_at >= NOW() - INTERVAL '${windowDays} days'
ORDER BY created_at DESC
LIMIT 20
`);
return { commits, prs };
}
}
3. Insight Engine (insight-engine.ts)
import { GoogleGenAI } from '@google/genai';
export class InsightEngine {
private client: GoogleGenAI;
constructor(apiKey: string) {
this.client = new GoogleGenAI({ apiKey });
}
async generateReport(query: string, dataset: object): Promise<string> {
const response = await this.client.models.generateContent({
model: 'gemini-2.5-flash',
config: {
systemInstruction: `You are an engineering analytics assistant.
Analyze the provided dataset. Reference specific dates, repositories, and metrics.
Do not invent data. If the dataset lacks context, state what is missing.`,
},
contents: `
User Query: "${query}"
Dataset (last 7 days):
${JSON.stringify(dataset, null, 2)}
Provide a concise, data-grounded analysis.
`,
});
return response.text ?? 'No response generated.';
}
}
4. Agent Entry Point (agent-runner.ts)
import dotenv from 'dotenv';
import { QueryBridge } from './coral-bridge';
import { DataOrchestrator } from './data-orchestrator';
import { InsightEngine } from './insight-engine';
dotenv.config();
async function main() {
const args = process.argv.slice(2);
if (args.length === 0) {
console.error('Usage: npx tsx agent-runner.ts "<your question>"');
process.exit(1);
}
const userQuery = args.join(' ');
const owner = process.env.GITHUB_USERNAME;
const apiKey = process.env.GEMINI_API_KEY;
if (!owner || !apiKey) {
console.error('Missing required environment variables.');
process.exit(1);
}
const bridge = new QueryBridge();
const orchestrator = new DataOrchestrator(bridge);
const engine = new InsightEngine(apiKey);
try {
console.log('Fetching engineering metrics...');
const metrics = await orchestrator.fetchEngineeringMetrics(owner, 7);
console.log('Generating analysis...');
const report = await engine.generateReport(userQuery, metrics);
console.log('\n--- REPORT ---\n', report);
} catch (error) {
console.error('Agent execution failed:', error);
process.exit(1);
}
}
main();
Why this structure works: The bridge isolates I/O, the orchestrator enforces query boundaries, and the engine handles reasoning. Each layer has a single responsibility. The async bridge prevents event loop blocking. Schema discovery methods enable dynamic query construction without hardcoding column names. Context is explicitly trimmed before reaching the LLM.
Pitfall Guide
Production agents fail when developers treat the data layer as an afterthought. The following patterns consistently cause degradation in accuracy, latency, or security.
| Pitfall | Explanation | Fix |
|---|
| Ignoring JSON Flattening Conventions | APIs return nested objects. The runtime flattens them using __ as a delimiter (e.g., commit.author.date β commit__author__date). Hardcoding assumed column names breaks when APIs update. | Always query coral.columns before writing production SQL. Use schema discovery to validate field names dynamically. |
| Blocking the Event Loop | Using synchronous child process execution halts the Node.js event loop. If the runtime takes 10 seconds to paginate a large repository, the entire agent thread freezes. | Replace execFileSync with spawn + Promise resolution. Implement timeout boundaries and graceful cancellation. |
| Over-Fetching for Context | Pulling full API responses and injecting them into prompts wastes tokens and introduces noise. LLMs perform worse with verbose, unstructured payloads. | Use WHERE, GROUP BY, LIMIT, and aggregate functions in SQL. Fetch only the rows and columns required for the specific query. |
| Assuming Server-Side JOINs | Cross-source joins execute locally. The runtime fetches both datasets, then merges them in memory. Large tables can cause OOM errors or severe latency. | Filter aggressively on both sides before joining. Use date ranges, owner constraints, and status filters. Avoid unbounded JOIN operations. |
| Credential Leakage to LLM | Passing API tokens, PATs, or raw response headers into the prompt context exposes secrets to inference providers. | Never include authentication metadata in prompts. The runtime handles auth locally. Strip headers, tokens, and internal IDs before serialization. |
| Neglecting Rate Limit Awareness | While the runtime handles retries, poorly structured SQL can trigger excessive API calls. A query without pagination awareness may hit provider limits. | Structure queries to minimize call volume. Use LIMIT and date windows. Monitor runtime logs for retry spikes and adjust query granularity accordingly. |
| Prompt Drift from Raw JSON | Injecting unformatted JSON arrays causes the LLM to parse structure instead of reasoning over values. This increases hallucination rates. | Serialize data with JSON.stringify(data, null, 2) and wrap it in explicit markdown sections. Provide a system instruction that defines the expected schema. |
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single-source reporting (e.g., GitHub commits only) | Direct SQL query via runtime | Minimal overhead, deterministic output, no join complexity | Low (baseline token usage) |
| Multi-source correlation (e.g., commits + calendar events) | Cross-source JOIN with strict date filters | Eliminates manual state management, reduces LLM round-trips | Medium (higher initial fetch, lower reasoning cost) |
| Real-time dashboard / high-frequency polling | Runtime + materialized view or cache | Prevents repeated API calls, reduces rate limit exposure | Low (cache hits reduce API calls by 80%+) |
| Secure/air-gapped environment | Local-first runtime with offline schema cache | Credentials never leave host, no external inference dependency | High (requires local compute, but zero data egress) |
| Rapid prototyping / notebook analysis | CLI + inline SQL + JSON export | Fast iteration, no boilerplate, direct data inspection | Low (development time optimized) |
Configuration Template
Use this template to initialize sources, verify connectivity, and prepare the runtime for agent integration. Replace placeholders with your environment values.
#!/usr/bin/env bash
# init-coral-env.sh
set -euo pipefail
echo "π Discovering available sources..."
coral source discover
echo "π Connecting GitHub source..."
# Expects GITHUB_PAT environment variable
coral source add --interactive github <<EOF
${GITHUB_PAT}
EOF
echo "π Connecting Calendar source..."
coral source add --interactive google_calendar <<EOF
${GOOGLE_CLIENT_ID}
${GOOGLE_CLIENT_SECRET}
EOF
echo "π Verifying schema availability..."
coral sql "SELECT schema_name, table_name FROM coral.tables ORDER BY schema_name;"
echo "β
Runtime ready. Use 'coral mcp-stdio' to expose to agents."
Environment variables required:
GITHUB_PAT=ghp_xxxxxxxxxxxxxxxxxxxx
GOOGLE_CLIENT_ID=xxxx.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=GOCSPX-xxxxxxxxxxxx
GEMINI_API_KEY=AIzaSy-xxxxxxxxxxxx
GITHUB_USERNAME=your-username
Quick Start Guide
- Install the runtime: Run
brew install withcoral/tap/coral (macOS) or curl -fsSL https://withcoral.com/install.sh | sh (Linux). Verify with coral --version.
- Connect your first source: Execute
coral source add --interactive github and paste a Personal Access Token with repo and read:user scopes.
- Validate the schema: Run
coral sql "SELECT table_name FROM coral.tables WHERE schema_name = 'github'" to confirm tables like commits and pulls are available.
- Test a query: Execute
coral sql "SELECT COUNT(*) FROM github.commits WHERE owner = 'your-username' AND commit__author__date >= NOW() - INTERVAL '7 days'" to verify data retrieval.
- Wire to your agent: Replace the CLI bridge in the reference implementation with your async handler, inject your
GEMINI_API_KEY, and run npx tsx agent-runner.ts "summarize my recent activity".
The runtime handles authentication, pagination, and schema translation. Your agent receives clean rows, writes focused prompts, and operates within predictable token budgets. This architecture scales from local prototyping to production-grade engineering assistants without rewriting data access logic.