sts on four pillars: context injection, embedding retrieval, safety enforcement, and execution routing.
1. Context Injection & Domain Alignment
LLMs lack innate knowledge of your business logic. Without explicit guidance, they will guess column meanings, misinterpret status enums, or generate queries against deprecated tables. The solution is a structured context file that maps business terminology to schema artifacts.
Instead of hardcoding prompts, store domain rules in a Markdown file that the engine parses at initialization. This file defines status mappings, calculated metrics, and exclusion rules.
# config/initializers/nl_query_engine.rb
NlQueryEngine.configure do |config|
config.context_source = Rails.root.join("config", "domain_rules.md")
config.schema_cache_ttl = 12.hours
config.enable_voice_input = true
end
The engine reads this file, strips non-schema noise, and injects it into the system prompt. This approach prevents prompt injection risks while keeping business logic version-controlled alongside the application.
2. Embedding Pipeline & Retrieval
Traditional RAG systems rely on external vector databases. For Rails applications, this introduces infrastructure overhead and synchronization complexity. A more pragmatic approach stores embeddings directly in the existing database using a JSON column.
The pipeline indexes schema metadata, table descriptions, and historical query patterns. When a user submits a prompt, the system retrieves the most relevant schema fragments using cosine similarity against the stored embeddings.
class SchemaIndexer
def self.rebuild_index
ActiveRecord::Base.descendants.each do |model|
next unless model.table_exists?
metadata = {
table: model.table_name,
columns: model.column_names,
associations: model.reflect_on_all_associations.map(&:name),
context: extract_domain_context(model)
}
embedding = LlmAdapter.generate_embedding(metadata.to_json)
QueryIndex.upsert(
{ model_name: model.name, vector: embedding, metadata: metadata },
unique_by: :model_name
)
end
end
def self.retrieve_relevant_schema(prompt)
query_embedding = LlmAdapter.generate_embedding(prompt)
QueryIndex.order("vector <=> ?", query_embedding).limit(5).pluck(:metadata)
end
end
Storing vectors in a JSON column eliminates external dependencies. PostgreSQL's pgvector extension or SQLite's FTS5 can handle similarity searches efficiently for typical Rails schema sizes.
3. Safety Enforcement Layer
Natural language generation is inherently non-deterministic. The safety layer must guarantee read-only execution regardless of LLM output. This requires multiple validation stages:
- Keyword Blocklist: Reject prompts containing mutation keywords before they reach the LLM.
- SQL Parser Validation: Parse generated SQL to verify it contains only
SELECT statements.
- Transactional Rollback: Wrap execution in a transaction that always rolls back, preventing accidental writes.
- Read Replica Routing: Direct queries to a read-only database replica when available.
class QueryExecutor
MUTATION_KEYWORDS = %w[DELETE UPDATE INSERT DROP ALTER TRUNCATE CREATE].freeze
def self.execute_safe(prompt, user_context)
raise SecurityError, "Mutation keywords detected" if prompt.match?(/#{MUTATION_KEYWORDS.join('|')}/i)
generated_sql = LlmAdapter.generate_sql(prompt, user_context)
validate_read_only(generated_sql)
ActiveRecord::Base.transaction do
result = ActiveRecord::Base.connection.execute(generated_sql)
raise ActiveRecord::Rollback
end
result
end
def self.validate_read_only(sql)
parsed = SqlParser.parse(sql)
raise SecurityError, "Non-SELECT statement detected" unless parsed.statements.all? { |s| s.type == :select }
end
end
This multi-layer approach ensures that even if the LLM generates malicious or malformed SQL, the execution environment neutralizes the threat before it touches the database.
4. Execution Routing & UI Integration
The interface should live at a dedicated route, isolated from public-facing paths. It must support iterative refinement: users can edit generated SQL, pin specific tables using @table_name syntax, and export results as CSV. Auto-generated charts trigger when the result set contains categorical or time-series data.
Integration with existing BI tools like Blazer becomes seamless by exposing a "Open in Blazer" button that passes the validated SQL directly to the external dashboard. This preserves institutional knowledge while accelerating ad-hoc analysis.
Pitfall Guide
1. Unbounded Embedding Growth
Explanation: Indexing every column and association without pruning causes the JSON vector column to bloat, degrading similarity search performance.
Fix: Implement a schema filter that excludes internal tables (ar_internal_metadata, schema_migrations), audit logs, and high-cardinality columns. Set a maximum embedding count per model.
Explanation: Users may craft prompts that attempt to override system instructions or extract sensitive schema data.
Fix: Sanitize user input before LLM submission. Use strict system prompts that explicitly forbid instruction overriding. Validate all LLM output against a whitelist of allowed operations.
3. N+1 Query Generation
Explanation: LLMs sometimes generate inefficient queries that trigger multiple round trips instead of using JOIN or includes.
Fix: Add a query optimization step that rewrites generated SQL to use explicit joins. Cache frequent query patterns and suggest optimized alternatives in the UI.
4. Ignoring Read Replica Routing
Explanation: Running analytical queries against the primary database increases write latency and risks connection pool exhaustion.
Fix: Configure ActiveRecord::Base.connected_to(role: :reading) for all LLM-generated queries. Verify replica lag before execution and fallback to primary only if lag exceeds threshold.
5. Over-Reliance on LLM Schema Inference
Explanation: LLMs may hallucinate column names or relationships that don't exist, especially in legacy schemas with inconsistent naming.
Fix: Always ground generation in retrieved embeddings. Reject queries that reference non-existent tables/columns before execution. Maintain a strict schema registry.
6. Missing Audit Trails
Explanation: Natural language queries leave no trace of who asked what, making compliance and debugging difficult.
Fix: Log every prompt, generated SQL, execution time, and result count to an audit table. Include user ID, IP, and session metadata. Rotate logs according to retention policies.
7. Context File Staleness
Explanation: Business rules change faster than embedding updates. Stale context files cause the LLM to generate queries against deprecated logic.
Fix: Trigger automatic re-indexing on schema migrations. Add a CI check that validates context files against current ActiveRecord models. Notify maintainers when drift exceeds 10%.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Internal team ad-hoc queries | LLM-RAG Rails Engine | Fast iteration, self-service, low dev overhead | Low (API costs scale with usage) |
| Executive dashboards | Traditional BI Tool (Metabase/Blazer) | Stable, pre-validated, optimized for rendering | Medium (licensing + maintenance) |
| Compliance-heavy reporting | Manual SQL + Code Review | Audit trail, deterministic output, strict governance | High (engineering time) |
| Legacy schema with poor docs | LLM-RAG + Context File | Bridges semantic gap, auto-discovers relationships | Low-Medium (embedding storage + API) |
| High-frequency analytical loads | Materialized Views + BI Tool | Predictable performance, no LLM latency | Medium (storage + refresh jobs) |
Configuration Template
# config/initializers/nl_query_engine.rb
NlQueryEngine.configure do |config|
# LLM Provider Configuration
config.llm_provider = :gemini
config.llm_model = "gemini-2.0-flash"
config.fallback_model = "openai/gpt-4o-mini"
config.api_key = ENV.fetch("LLM_API_KEY")
# Safety & Execution
config.enforce_read_only = true
config.always_rollback = true
config.blocklist = %w[DELETE UPDATE INSERT DROP ALTER TRUNCATE CREATE GRANT REVOKE]
config.route_to_replica = true
config.replica_lag_threshold = 5.seconds
# Context & Embeddings
config.context_file = Rails.root.join("config", "domain_rules.md")
config.embedding_storage = :json_column
config.max_embeddings_per_model = 50
config.schema_cache_ttl = 12.hours
# UI & Integration
config.mount_at = "/data-queries"
config.enable_csv_export = true
config.enable_auto_charts = true
config.blazer_integration = true
config.audit_logging = true
end
Quick Start Guide
- Add the gem and run generators: Include the engine in your
Gemfile, execute the install generator, run migrations, and trigger the initial schema indexing.
- Configure your LLM provider: Set your API key, select a model, and define fallback routing for cost control. Enable replica routing if your architecture supports it.
- Define domain context: Create a Markdown file mapping business terminology to tables, columns, and status values. Exclude internal or sensitive schemas.
- Mount and secure: The engine mounts at your configured path. Verify that mutation keywords are blocked, transactions always rollback, and audit logs capture all activity.
- Test and iterate: Submit natural language queries, validate generated SQL, export results, and refine context files based on misinterpretations. Schedule periodic re-indexing to maintain accuracy.