Back to KB

reduce effective entropy. Conversely, models with low uniqueness rates exhibit even mo

Difficulty
Intermediate
Read Time
77 min

The Bot Left a Fingerprint: Detecting and Attributing LLM-Generated Passwords

By Codcompass TeamΒ·Β·77 min read

Statistical Fingerprinting of AI-Generated Secrets: Detection, Attribution, and Risk Mitigation

Current Situation Analysis

The integration of Large Language Models (LLMs) into development workflows has introduced a subtle but critical vulnerability: the generation of cryptographic secrets by probabilistic models. Developers and AI agents frequently request passwords, API keys, and connection strings from LLMs, operating under the assumption that the output is sufficiently random for security purposes. This assumption is fundamentally flawed.

LLMs are engineered to maximize the probability of the next token based on training data, which is the antithesis of cryptographic randomness. Secure password generation requires high entropy and uniform distribution; LLMs produce outputs biased toward common patterns, substrings, and structural repetitions found in their training corpora. This creates a class of "pseudo-random" secrets that appear complex to human inspection but are statistically predictable.

Recent analysis by Irregular researchers, validated by GitGuardian's monitoring of 34 million passwords from GitHub repositories between November 2025 and March 2026, confirms the prevalence of this issue. The study identified approximately 28,000 passwords exhibiting strong statistical signatures of LLM generation, committed at a velocity of roughly 1,500 per week. These secrets are not confined to experimental code; they appear in production configuration files, .env environments, and infrastructure-as-code templates.

The risk extends beyond predictability. When a developer asks an LLM to generate a password, the secret traverses the network to the provider's API, potentially entering logs or training data. Furthermore, AI coding agents have been observed autonomously generating and hardcoding these predictable secrets into Terraform configurations and source files, creating a supply chain risk where the secret is weak before it even reaches the target system.

WOW Moment: Key Findings

The most critical insight from the analysis is that uniqueness does not equate to security. Some models generate 100% unique passwords across samples, yet those passwords share massive substring overlaps that drastically reduce effective entropy. Conversely, models with low uniqueness rates exhibit even more severe pattern locking.

The following table highlights the statistical fingerprints observed across major model families based on a corpus of 8,000 generated passwords (200 samples per model across 40 models):

Model FamilyUniqueness RateSignature SubstringOccurrence FrequencyPosition Bias
Claude Opus 4.635%N/AN/A100% lowercase at index 0
Llama 3.3 70B~55%8d100%99–100% uppercase at index 0
GPT-5 Family100%7!52%92% uppercase at index 0
Mistral Medium 3.1~55%x7#pL965%Variable
Cross-ProviderN/AL227% avgN/A

Why this matters:

  • The Uniqueness Trap: The GPT-5 family generates unique passwords, yet the bigram 7! appears in over half of all outputs. An attacker aware of this bias can reduce the search space by orders of magnitude.
  • Universal Fingerprints: The L2 bigram appears in passwords from 10 out of 11 providers with an average probability of 27%. This suggests a shared training bias or prompt-response pattern that transcends individual model architectures.
  • Structural Locking: Models exhibit rigid positional biases. Claude Opus always starts with a lowercase letter, while Llama models almost always start with uppercase. This violates the uniform distribution required for secure credentials.

In the wild, Anthropic-generated passwords were the most fr

πŸŽ‰ Mid-Year Sale β€” Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register β€” Start Free Trial

7-day free trial Β· Cancel anytime Β· 30-day money-back