Agent Skills Has No Integrity Layer. We Built One.

By Codcompass Team·2026-05-09·5 min read

Current Situation Analysis

The Agent Skills specification defines six frontmatter fields for SKILL.md: name, description, license, compatibility, metadata, and allowed-tools. None of these fields are cryptographic. There is no content hash, no digital signature, and no mechanism to verify whether the bytes received by an agent match the bytes originally published by the author.

This architectural gap stems from a deliberate prioritization of interoperability over integrity. The format successfully achieved cross-runtime compatibility across 35+ agent environments (Claude Code, Cursor, Codex CLI, Gemini CLI, GitHub Copilot, etc.). However, deferring the integrity layer introduces critical failure modes:

Self-Declared Identity Failure: The metadata field is a free-form key-value map. metadata.author can be set to any arbitrary string (e.g., metadata.author: anthropic) by any publisher, making it useless under adversarial conditions.
Registry Tampering Vulnerability: Without a canonical content hash, a registry or man-in-the-middle can modify a skill between publication and installation. The consuming agent has zero visibility into post-publication mutations.
Compressed Supply-Chain Attack Timeline: Package ecosystems historically face supply-chain attacks within years of launch (npm took 8 years for event-stream; PyPI compressed this). Agent Skills has been live for only six months across three major registries (ClawHub: 3.2K, Skills.sh: 89K, askill.sh: 275K), creating a high-risk window for exploitation before integrity controls are adopted.

WOW Moment: Key Findings

Experimental validation of the Skill Provenance Attestation (SPA) layer demonstrates immediate tamper detection and cryptographic identity binding with minimal runtime overhead. The following comparison highlights the security posture shift from native SKILL.md to SPA-enhanced workflows:

Approach	Tamper Detection	Identity Verification	Supply Chain Risk	Verification Overhead
Native SKILL.md (v0.2)	None (0%)	Self-declared (Untrusted)	High (Blind Trust)	~0ms
SPA-Enhanced	100% (Deterministic SHA-256)	Ed25519/JWKS (Cryptographic)	Mitigated (Zero-Trust)	~15-30ms

Key Findings:

Instant Mutation Detection: Appending a single byte to README.md triggers an immediate digest mismatch, even though the Ed25519 signature remains mathematically valid. This proves content-bound verification operates independently of signature validation.
Backward Compatibility: Tools lacking SPA awareness gracefully ignore the attestation layer (metadata extension or SKILL.sig sidecar) without breaking execution.
Deterministic Reproducibility: Lexicographical sorting of relative paths + null-byte delimiters ensures identical digest generation across all OS environments and runtimes.

Core Solution

Skill Provenance Attestation (SPA) is designed as an additive, non-forking integrity layer. It operates either embedded within the existing metadata field or as a sidecar file (SKILL.sig). The architecture decouples provenance verification from runtime execution, allowing consumer policies to enforce trust boundaries without modifying the core spec.

Technical Implementation

1. Skill Digest Algorithm A deterministic SHA-256 hash is computed over the entire skill directory. The algorithm enforces strict ordering and exclusion rules to guarantee reproducibility:

Files are sorted lexicographically by relative

path.

Digest input format per file: relpath + null byte + sha256(file_content) + newline
Final output: "sha256-" + base64url(sha256(digest_input))
Exclusions: SKILL.sig and top-level dotfiles are excluded. All other files, including scripts/ (executable code), are covered.

2. SPA Token Structure The attestation is packaged as a JWT signed with the publisher's Ed25519 key and verifiable via JWKS. Critical design decisions include:

typ header set to spa+jwt to prevent cross-use with session or authentication tokens.
Claims include skill_digest, skill_name, skill_version, publisher identity (handle, display name, verified domain), and revocation_url.
Leverages existing JWKS infrastructure (AgentLair) for unified key management.

3. Verification Pipeline (6 Steps)

Compute local digest using the deterministic algorithm.
Locate SPA payload (sidecar file or frontmatter).
Decode JWT header and validate typ: spa+jwt.
Fetch JWKS for the token issuer.
Verify Ed25519 signature against the payload.
Compare computed digest with skill_digest claim. Mismatch triggers immediate rejection.

Demo Output

We signed AgentLair's own email skill using the reference implementation. This is the actual output:

$ bun demo/compute-digest.ts agentlair-email-skill/

Files included (sorted by relpath):
  README.md → sha256:f3e27686cac980974de885c0077f31d588d48b263cf1c75715cc5f6c348d698e
  SKILL.md  → sha256:95c3b33cde228b13b698e400d276b2d849f872fd8c66ce3894ac42a7115ea4a0

skill_digest: sha256-NDOawr5cQVVfoE4cvxxhUxAjI9fGh3YXNKboNAQu4QA

Verification passes end-to-end:

✓ TEST VERIFIED by Pico (test demo) (amdal.dev) via https://agentlair.dev.

Then we appended one byte to README.md and ran the verifier again. The Ed25519 signature still verified. The key and signing input did not change. The digest check caught it:

✗ digest     MISMATCH
           expected: sha256-NDOawr5cQVVfoE4cvxxhUxAjI9fGh3YXNKboNAQu4QA
           computed: sha256-NoaYktLqpnTV76pL9eksd6Is7yZCs-hUbSrchIPYiQY

The verifier exits with code 1 and logs: "This skill was modified after the publisher signed it. Treat as unverified."

A skill with metadata.author: anthropic and no SKILL.sig surfaces as: "Unverified skill. No provenance attestation found. metadata.author is self-declared only." The consumer policy decides whether to block. The signal is now visible where today it is not.

Scope & Boundaries

SPA explicitly does not solve two categories of risk:

Malicious-but-Verified Skills: Cryptographic verification proves origin and immutability, not safety. A signed skill can still be intentionally harmful. Safety enforcement remains a consumer policy/sandboxing responsibility.
Key Compromise: Like all PKI systems, stolen signing keys allow valid SPA issuance until revocation. Every SPA includes a revocation_url; consumers must check revocation status on install. Compromise detection operates out-of-band.

Pitfall Guide

Relying on Self-Declared Metadata: metadata.author and similar fields are trivially spoofable. Always enforce cryptographic attestation (SKILL.sig or verified metadata claims) before trusting publisher identity.
Ignoring Revocation Checks: SPA tokens carry a revocation_url. Failing to poll this endpoint on install leaves agents vulnerable to compromised or rotated keys. Implement automated revocation validation in your registry or runtime loader.
Cross-Use of JWT Types: SPA tokens use typ: spa+jwt. If your system also handles session or auth JWTs, strictly validate the typ header during decoding to prevent token confusion attacks.
Incomplete Digest Coverage: Excluding executable directories (e.g., scripts/, bin/) from the digest calculation creates blind spots for post-signing payload injection. The algorithm must cover all content and code files, excluding only SKILL.sig and dotfiles.
Confusing Provenance with Safety: A valid signature only answers "did this content come from this account?" and "has it been modified?". It does not validate behavioral safety. Always pair SPA verification with runtime sandboxing, capability restrictions, and consumer policy engines.
Assuming Immutable Signing Keys: PKI limitations apply. Publishers must monitor key usage, rotate credentials proactively, and publish revocation notices immediately upon suspected compromise. Consumers should treat long-lived, unrevoked tokens from inactive publishers with elevated scrutiny.

Deliverables

📘 SPA Architecture Blueprint: Complete reference design including deterministic digest algorithm, JWT claim schema, JWKS integration flow, and 6-step verification state machine.
✅ Pre-Install Verification Checklist: Step-by-step validation protocol for registry operators and agent runtimes (digest computation → JWT decode → JWKS fetch → signature verify → revocation check → policy enforcement).
⚙️ Configuration & Integration Templates: Ready-to-use SKILL.sig sidecar structure, TypeScript digest computation snippet (30 lines), full verifier implementation (150 lines), and CLI usage examples for local validation.
📦 Reference Implementation & Spec: Full v0.2 specification, worked examples with real hash/JWT payloads, and issuance/revocation endpoint blueprints available in the agent-infra repository.