from physical schema changes. They allow you to pre-join tables, apply business logic, and embed masking policies without modifying source data. If a table schema evolves, you update the view definition once; the AI continues querying the same logical interface.
-- Create isolated execution context
CREATE OR REPLACE ROLE ai_query_executor;
-- Bind to dedicated compute
GRANT USAGE ON WAREHOUSE ai_compute_wh TO ROLE ai_query_executor;
GRANT USAGE ON DATABASE analytics_prod TO ROLE ai_query_executor;
GRANT USAGE ON SCHEMA analytics_prod.governed TO ROLE ai_query_executor;
-- Restrict to read-only view access
GRANT SELECT ON ALL VIEWS IN SCHEMA analytics_prod.governed TO ROLE ai_query_executor;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA analytics_prod.governed TO ROLE ai_query_executor;
Phase 2: Enforce Column-Level Redaction
Snowflake masking policies evaluate the executing role at query time. They return raw values for authorized human analysts and transformed values for the AI role. This enforcement happens inside the SQL engine, making it impossible to bypass via prompt manipulation or direct SQL injection.
Architecture decision: Use deterministic hashing for columns that require grouping or joining. Non-deterministic masking breaks aggregations. Always preserve the original data type to maintain compatibility with GROUP BY, JOIN, and window functions.
-- Define deterministic redaction policy
CREATE OR REPLACE MASKING POLICY redact_sensitive_columns
AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ANALYST_HUMAN', 'DATA_ENGINEER') THEN val
ELSE SHA2(val)
END;
-- Apply to governed view columns
ALTER VIEW analytics_prod.governed.customer_summary
MODIFY COLUMN email_address SET MASKING POLICY redact_sensitive_columns;
ALTER VIEW analytics_prod.governed.customer_summary
MODIFY COLUMN phone_number SET MASKING POLICY redact_sensitive_columns;
Phase 3: Restrict Dataset Scope
Row access policies filter result sets before they reach the client. They are essential for multi-tenant environments, time-bound analytics, or compliance-driven data segmentation. Like masking policies, they execute at the engine layer and cannot be overridden by the AI.
Production insight: Combine row policies with session variables for dynamic scoping. Pass a tenant_id or fiscal_period through the MCP connection context, and reference it in the policy definition. This allows a single AI role to serve multiple isolated workloads safely.
-- Create row-level filter
CREATE OR REPLACE ROW ACCESS POLICY tenant_isolation
AS (tenant_id STRING) RETURNS BOOLEAN ->
CURRENT_ROLE() IN ('ANALYST_HUMAN')
OR tenant_id = CURRENT_SESSION_CONTEXT('app_tenant_scope');
-- Bind to view
ALTER VIEW analytics_prod.governed.transaction_log
ADD ROW ACCESS POLICY tenant_isolation ON (tenant_id);
Phase 4: Implement Observability & Limits
Every AI-generated query must be traceable. Snowflake's QUERY_HISTORY captures SQL text, execution role, timestamps, and row counts. Pipe this data into your SIEM or logging pipeline to maintain a 30-day audit trail.
Critical addition: Inject session metadata into every query via SQL comments. This enables precise filtering in QUERY_HISTORY without parsing application logs. Pair this with a warehouse-level query timeout to prevent cost runaway from recursive or poorly optimized AI prompts.
// MCP Server wrapper with query tagging & timeout enforcement
import { McpServer } from '@modelcontextprotocol/sdk';
const server = new McpServer({ name: 'secure-snowflake-gateway', version: '1.0.0' });
server.tool(
'execute_analytics_query',
'Run read-only SQL against governed Snowflake views',
{ sql: { type: 'string', description: 'SELECT statement' } },
async ({ sql }) => {
const sessionId = crypto.randomUUID();
const taggedQuery = `/* mcp_session=${sessionId}, agent=claude_analyst */ ${sql}`;
// Enforce 60s timeout at execution layer
const result = await snowflakeClient.execute({
sqlText: taggedQuery,
bindValues: [],
timeout: 60000,
role: 'ai_query_executor'
});
if (result.numRows > 10000) {
await alertMonitoringSystem({
type: 'HIGH_ROW_COUNT',
session: sessionId,
rows: result.numRows
});
}
return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
}
);
Phase 5: Introduce Semantic Routing
The most subtle data leakage occurs when the AI selects the wrong table. Legacy exports, raw staging tables, and unredacted backups often coexist with governed datasets. Without metadata awareness, the AI defaults to the first matching name.
A schema-aware catalog acts as a guardrail. Before executing SQL, the agent queries the catalog for dataset ownership, freshness, PII tags, and recommended views. The catalog only surfaces governed interfaces. This adds one network round-trip but eliminates logical misrouting and ensures compliance metadata travels with the query.
Pitfall Guide
1. Reusing Analytics or ETL Roles
Explanation: Sharing roles between humans, pipelines, and AI agents breaks the principle of least privilege. If the analytics role has INSERT or CREATE privileges, the AI inherits them. Role reuse also contaminates audit trails, making it impossible to distinguish human queries from AI execution.
Fix: Create a dedicated ai_query_executor role. Grant only USAGE on compute and SELECT on views. Never share credentials or role bindings across execution contexts.
2. Applying Non-Deterministic Masks to Join Keys
Explanation: Randomized masking changes values on every query execution. If a masked column is used in a JOIN or GROUP BY, the engine cannot match rows, resulting in empty results or incorrect aggregations.
Fix: Use deterministic hashing (SHA2, HASH) for columns involved in joins or aggregations. Preserve the original data type. Document which columns use deterministic vs. non-deterministic masking in your data dictionary.
3. Forgetting to Tag AI Queries
Explanation: Without session metadata, QUERY_HISTORY returns a flat list of SQL statements. Identifying which queries originated from the AI requires parsing application logs or guessing based on timing. This breaks audit compliance and incident response.
Fix: Inject a standardized comment header (/* mcp_session=..., agent=... */) at the MCP server layer. Configure your SIEM to parse these tags and route them to a dedicated AI audit dashboard.
4. Relying Solely on Prompt Engineering
Explanation: Prompt instructions like "never query PII columns" are advisory, not enforceable. Model updates, context window truncation, or adversarial prompting can bypass textual guardrails. The database engine does not read prompts; it executes SQL based on role privileges.
Fix: Treat prompts as UX, not security. Enforce boundaries at the storage layer using masking policies, row filters, and view-only grants. Validate controls by testing with adversarial SQL generation.
5. Ignoring Warehouse Query Timeouts
Explanation: Snowflake warehouses run queries until completion or cancellation. An AI agent stuck in a recursive loop or executing a poorly optimized cross-join can consume compute credits for hours before manual intervention.
Fix: Set a warehouse-level query timeout (start at 60 seconds). Monitor QUERY_HISTORY for timeout events. Implement exponential backoff in the MCP server for retry logic, and alert on repeated timeout patterns.
6. Granting DDL or DML Privileges
Explanation: AI agents are stateless and exploratory. They do not understand transactional boundaries or schema migration impacts. Granting CREATE, INSERT, UPDATE, or DELETE privileges introduces accidental data mutation, table drops, or constraint violations.
Fix: Enforce read-only access. If the AI requires write capabilities for logging or audit trails, create a separate append-only table with restricted INSERT privileges and route writes through a controlled stored procedure.
7. Skipping Catalog Validation
Explanation: Direct table selection bypasses governance metadata. The AI cannot distinguish between orders_raw (unredacted) and orders_governed (masked). This leads to compliance violations and inconsistent reporting.
Fix: Integrate a metadata catalog as a mandatory pre-query step. Require the agent to resolve table names through the catalog before generating SQL. Cache catalog responses to minimize latency impact.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Internal analytics AI | View-only SELECT + Column Masking | Balances flexibility with PII protection; minimal latency | Low (view maintenance) |
| Multi-tenant SaaS AI | Row Access Policies + Tenant Scoping | Enforces data isolation at engine level; prevents cross-tenant leakage | Medium (policy management) |
| External customer-facing AI | Catalog Guardrail + Strict View Layer | Eliminates wrong-table selection; ensures governed interfaces only | High (catalog integration + round-trip) |
| Compliance-heavy (HIPAA/GDPR) | Full stack: Masking + Row Policies + Audit + Catalog | Meets regulatory audit requirements; provides end-to-end traceability | Medium-High (setup time + monitoring) |
Configuration Template
-- 1. Execution Context
CREATE OR REPLACE ROLE ai_data_gateway;
GRANT USAGE ON WAREHOUSE ai_analytics_wh TO ROLE ai_data_gateway;
GRANT USAGE ON DATABASE prod_warehouse TO ROLE ai_data_gateway;
GRANT USAGE ON SCHEMA prod_warehouse.governed TO ROLE ai_data_gateway;
GRANT SELECT ON ALL VIEWS IN SCHEMA prod_warehouse.governed TO ROLE ai_data_gateway;
-- 2. Column Masking (Deterministic for joins)
CREATE OR REPLACE MASKING POLICY hash_pii_fields
AS (input_val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('BI_ANALYST', 'DATA_STEWARD') THEN input_val
ELSE SHA2(input_val)
END;
-- 3. Row Access (Tenant/Time Bound)
CREATE OR REPLACE ROW ACCESS POLICY restrict_to_recent_data
AS (event_date DATE) RETURNS BOOLEAN ->
CURRENT_ROLE() IN ('BI_ANALYST')
OR event_date >= DATEADD(day, -90, CURRENT_DATE());
-- 4. Warehouse Limits
ALTER WAREHOUSE ai_analytics_wh SET
QUERY_TIMEOUT_IN_SECONDS = 60,
MAX_CONCURRENCY_LEVEL = 5,
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS = 30;
-- 5. Apply Policies to Governed View
CREATE OR REPLACE VIEW prod_warehouse.governed.customer_events AS
SELECT
customer_id,
event_type,
event_date,
email_address,
region_code
FROM prod_warehouse.raw.customer_events
WHERE is_active = TRUE;
ALTER VIEW prod_warehouse.governed.customer_events
MODIFY COLUMN email_address SET MASKING POLICY hash_pii_fields,
ADD ROW ACCESS POLICY restrict_to_recent_data ON (event_date);
Quick Start Guide
- Provision isolated compute: Create a dedicated X-Small warehouse for AI queries. Set
QUERY_TIMEOUT_IN_SECONDS = 60 and attach a resource monitor to cap monthly credits.
- Create governed views: Extract required datasets into a
governed schema. Apply business logic, joins, and filters at the view layer. Never grant direct table access.
- Bind masking & row policies: Identify PII columns and apply deterministic masking policies. Attach row access policies for tenant or date scoping. Test with
CURRENT_ROLE() impersonation.
- Deploy MCP wrapper: Implement query tagging in your MCP server. Route all execution through the
ai_data_gateway role. Configure QUERY_HISTORY export to your logging pipeline and set up row-count alerts.