A dating algorithm that physically cannot read photos (and why I wrote it that way)
Enforcing Algorithmic Blindness: Type-Level Constraints for Bias-Resistant Ranking Systems
Current Situation Analysis
Modern matching systemsâwhether for dating, hiring, lending, or content recommendationâfrequently claim to prioritize substantive signals over superficial ones. Yet the engineering reality rarely matches the marketing promise. Most teams implement "blindness" to certain data modalities (images, protected attributes, or legacy metadata) through runtime filters, feature flags, or policy documentation. These approaches share a critical vulnerability: they rely on human discipline rather than structural guarantees.
The industry pain point is constraint decay. A feature flag introduced to disable visual data in a ranking pipeline survives until the next quarterly OKR cycle, when an engineer proposes an A/B test to "reintroduce secondary signals." The flag flips. The metric moves. The original design principle evaporates. Documentation suffers the same fate; a README stating "do not pass media bytes to the matcher" is functionally equivalent to a suggestion. In production environments with high turnover, rapid iteration, and distributed teams, unenforced constraints become technical debt that compounds silently.
This problem is overlooked because engineers default to flexibility. Runtime checks and flags feel safer during development. They allow quick pivots without schema migrations or type refactors. However, flexibility without boundaries introduces latent bias. When a ranking model ingests unconstrained data, it inevitably learns correlations that developers did not intend to encode. Text embeddings can proxy for demographic attributes. Metadata fields can leak socioeconomic signals. The mathematical convenience of "feed everything into the vectorizer" directly conflicts with the architectural requirement of "guarantee exclusion."
The alternative is to shift constraint enforcement from the runtime layer to the type system. By designing domain types that physically lack fields for forbidden data, the compiler becomes the gatekeeper. This approach transforms a policy decision into a build artifact. It eliminates flag sprawl, reduces audit complexity, and creates a verifiable contract that survives team turnover. The cost is upfront architectural discipline, but the payoff is a ranking engine that cannot be accidentally or intentionally bypassed without breaking the build.
WOW Moment: Key Findings
The structural advantage of type-level enforcement becomes immediately visible when comparing it against traditional constraint mechanisms. The following table isolates the operational characteristics that determine long-term system integrity.
| Enforcement Strategy | Compile-Time Guarantee | Refactor Risk | Audit Complexity | Maintenance Overhead |
|---|---|---|---|---|
| Feature Flags | None | High (flag creep, stale toggles) | Medium (runtime log analysis) | High (cleanup, monitoring) |
| Documentation/Policy | None | High (human error, onboarding gaps) | High (manual code review) | Medium (drift over time) |
| Runtime Guards | Partial | Medium (bypass via casting/any) |
Medium (test coverage dependency) | Medium (edge case handling) |
| Type-Level Constraints | Absolute | Low (compiler blocks violations) | Low (signature inspection) | Low (zero runtime cost) |
This finding matters because it reframes compliance as an engineering artifact rather than an operational process. When the constraint lives in the type signature, verification requires no runtime instrumentation, no log aggregation, and no manual audits. A developer can inspect the entry-point function, confirm the absence of forbidden fields, and trust the compiler to reject any deviation. This enables public verifiability, reduces cognitive load during refactors, and eliminates the "flag fatigue" that plagues long-running product teams.
Core Solution
Building a structurally constrained ranking engine requires three coordinated decisions: type design, service boundary isolation, and embedding pipeline construction. Each decision reinforces the constraint at a different layer of the stack.
Step 1: Define the Constrained Domain Type
The foundation is a TypeScript interface that explicitly excludes visual data. The type should only contain fields that are mathematically and ethically appropriate for ranking.
// src/domain/signals.ts
export interface TextArtifact {
content: string;
language: string;
wordCount: number;
}
export interface AudioTranscript {
text: string;
durationSec: number;
speakerId: string;
}
export interface UserSignal {
artifacts: TextArtifact[];
transcript: AudioTranscript;
intent: 'networking' | 'collaboration' | 'mentorship';
metadata: {
region: string;
language: string;
cohort: string;
};
}
Notice the deliberate absence of any image, photo, avatar, or media fields. This is not an oversight; it is the constraint. Any function accepting UserSignal cannot access visual data because the type system does not permit it.
Step 2: Architect Service Boundaries
Type constraints only work if the data pipeline respects them. The media storage layer must be isolated from the ranking service. This requires separate read paths, distinct access control policies, and explicit data flow boundaries.
// src/infrastructure/media-gateway.ts
export class MediaAccessController {
constructor(private readonly vault: SecureBlobStore) {}
async retrieveVisualData(
requesterId: string,
targetId: string,
consentToken: string
): Promise<Buffer | null> {
const hasMutualConsent = await this.vault.verifyConsent(
requesterId,
targetId,
consentToken
);
if (!hasMutualConsent) return null;
return this.vault.fetch(targetId);
}
}
The ranking service never imports MediaAccessController. It operates exclusively on UserSignal. Visual data retrieval happens downstream, behind a consent gate, and never feeds back into the affinity calculation. This separation ensures that even if a developer attempts to bridge the services, the architecture physically prevents it.
Step 3: Construct the Embedding Pipeline
Text and audio transcripts are concatenated into a single document per user. This document is vectorized into a fixed-dimensional space. The original implementation uses a 1536-dimensional embedding model, which provides sufficient capacity for semantic capture without excessive computational overhead.
// src/matching/embedding-pipeline.ts
import { createEmbeddingClient } from './vector-client';
const vectorClient = createEmbeddingClient({
dimensions: 1536,
model: 'text-embedding-v3',
batchSize: 50
});
export async function generateSignalVector(signal: UserSignal): Promise<number[]> {
const combinedDocument = [
...signal.artifacts.map(a => a.content),
signal.transcript.text
].join(' | ');
return vectorClient.encode(combinedDocument);
}
The pipeline deliberately excludes any image preprocessing, OCR, or visual feature extraction. By construction, the vector space contains no latent dimensions correlated with visual attributes. This is not a filtering step; it is an input restriction.
Step 4: Implement Deterministic Reranking
Cosine similarity provides the baseline affinity score. Two lightweight, interpretable rerankers adjust the score based on ideological alignment and shared interest overlap. The coefficients are transparent and easily auditable.
// src/matching/ranking-engine.ts
import { cosineSimilarity } from './math-utils';
import { jaccardSimilarity } from './set-utils';
export function computeAffinity(
viewerVector: number[],
candidateVector: number[],
viewerIdeology: string,
candidateIdeology: string,
viewerInterests: Set<string>,
candidateInterests: Set<string>
): number {
const baseScore = cosineSimilarity(viewerVector, candidateVector);
const ideologyDelta = Math.abs(
parseFloat(viewerIdeology) - parseFloat(candidateIdeology)
);
const interestOverlap = jaccardSimilarity(viewerInterests, candidateInterests);
const adjustedScore = baseScore
- (0.12 * ideologyDelta)
+ (0.08 * interestOverlap);
return Math.max(0, Math.min(1, adjustedScore));
}
The coefficients (0.12 and 0.08) are intentionally small. They act as tiebreakers, not primary drivers. This preserves the semantic foundation of the embedding while allowing deterministic adjustments for alignment. The function is pure, stateless, and fully auditable.
Architecture Rationale
- Type-level enforcement over runtime checks: Runtime guards can be bypassed via type casting,
anyusage, or dynamic property access. The compiler eliminates this attack surface entirely. - Service isolation over shared databases: Co-located tables encourage accidental joins. Separate read paths with distinct IAM policies enforce data flow boundaries at the infrastructure level.
- Linear reranking over neural rerankers: Complex reranking models obscure decision logic. Linear adjustments with documented coefficients maintain transparency and simplify fairness audits.
- Fixed-dimension embeddings over adaptive models: A static vector space ensures consistent behavior across deployments. Adaptive models introduce drift that can inadvertently encode forbidden signals.
Pitfall Guide
1. Latent Dimension Leakage
Explanation: Even when visual data is excluded, text embeddings can learn proxies for protected attributes through correlated vocabulary, regional slang, or socioeconomic markers. Fix: Conduct embedding audits using projection techniques (t-SNE, UMAP) to detect clustering by forbidden attributes. Apply debiasing algorithms or restrict input vocabulary to neutralize proxy signals.
2. Type System Bypass via Casting
Explanation: Developers may use as any or dynamic property access to force visual data into the ranking pipeline, defeating the type constraint.
Fix: Enable strict TypeScript configuration, enforce @typescript-eslint/no-explicit-any and @typescript-eslint/no-unsafe-assignment rules, and require peer review for any type assertion.
3. Service Boundary Bleed
Explanation: Shared database connections or ORM configurations can allow the ranking service to accidentally query media tables through relationship traversal. Fix: Use dedicated read replicas for the ranking service. Implement network segmentation, restrict IAM roles to specific table prefixes, and disable ORM relationship traversal for cross-service entities.
4. Over-Engineering Rerankers
Explanation: Introducing complex ML reranking models obscures decision logic, increases latency, and makes fairness audits nearly impossible. Fix: Keep reranking linear and interpretable. Document all coefficients. Reserve complex models for downstream recommendation layers, not core affinity calculation.
5. Metadata Correlation Drift
Explanation: Demographic or geographic fields can act as proxies for protected attributes. Over time, these fields may introduce bias even if initially deemed safe. Fix: Run periodic correlation audits between metadata fields and ranking outcomes. Drop or hash high-correlation fields. Implement differential privacy techniques for sensitive metadata.
6. Flag Creep Disguised as Configuration
Explanation: Teams may introduce configuration objects that conditionally enable visual data, effectively recreating feature flags under a different name. Fix: Treat configuration as part of the type contract. Use discriminated unions to enforce mutually exclusive modes. Reject any configuration that introduces optional visual fields.
Production Bundle
Action Checklist
- Define constrained domain types with explicit field exclusion
- Isolate media storage behind a separate service with strict IAM policies
- Implement embedding pipeline that concatenates text/audio only
- Deploy linear reranking logic with documented coefficients
- Configure TypeScript strict mode and linting rules to prevent type bypass
- Set up embedding audit pipeline to detect latent proxy signals
- Establish network segmentation between ranking and media services
- Document constraint rationale in architecture decision records (ADRs)
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| High-compliance industry (finance, healthcare) | Type-level constraints + service isolation | Guarantees auditability, eliminates runtime bypass risk | High upfront, low long-term |
| Rapid prototyping / MVP | Feature flags + runtime guards | Faster iteration, easier A/B testing | Low upfront, high technical debt |
| Public-facing transparency requirement | Type-level constraints + open signature verification | Enables third-party verification, builds trust | Medium upfront, negligible maintenance |
| Multi-modal ranking (text + image + audio) | Separate pipelines with explicit fusion layer | Prevents accidental leakage, maintains modularity control | High complexity, requires careful orchestration |
Configuration Template
// tsconfig.strict.json
{
"compilerOptions": {
"strict": true,
"noImplicitAny": true,
"strictNullChecks": true,
"noUncheckedIndexedAccess": true,
"exactOptionalPropertyTypes": true,
"forceConsistentCasingInFileNames": true
},
"include": ["src/**/*.ts"],
"exclude": ["node_modules", "dist"]
}
// src/infrastructure/service-boundaries.ts
export const SERVICE_ROUTES = {
RANKING: {
allowedTables: ['user_signals', 'embedding_cache', 'affinity_logs'],
blockedTables: ['media_assets', 'user_avatars', 'visual_features'],
networkPolicy: 'isolated-read-replica'
},
MEDIA: {
allowedTables: ['media_assets', 'consent_records', 'access_logs'],
blockedTables: ['user_signals', 'embedding_cache'],
networkPolicy: 'private-subnet'
}
} as const;
Quick Start Guide
- Initialize the constrained type: Create a
UserSignalinterface that explicitly omits visual fields. Add it to your domain layer. - Set up the embedding client: Configure a 1536-dim vectorizer. Ensure the pipeline only accepts concatenated text/audio documents.
- Deploy the ranking function: Implement
computeAffinitywith cosine similarity and linear rerankers. Add unit tests for edge cases. - Enforce service boundaries: Configure IAM policies to block the ranking service from accessing media tables. Verify with integration tests.
- Validate the constraint: Run
tsc --noEmitwith strict mode. Confirm that any attempt to pass visual data into the ranking pipeline fails at compile time.
Mid-Year Sale â Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register â Start Free Trial7-day free trial · Cancel anytime · 30-day money-back
