rting stdio, Streamable HTTP, and legacy SSE transports.
Implementation Strategy
- Define Semantic Test Inputs: Auto-generated inputs based on schema minimums are insufficient for meaningful validation. Use sidecar configuration files to define realistic inputs and expected outcomes for critical tools.
- Execute Dry-Runs: Run the probe with tool execution enabled to verify that tools can be called successfully without side effects.
- Classify Diagnostic Output: Distinguish between harmless startup warnings and fatal initialization errors by configuring stderr classification rules.
- Integrate with CI: Generate structured outputs for CI platforms, including step summaries and annotations, to provide immediate feedback to developers.
Configuration and Execution
Create a sidecar configuration file to define test scenarios. This file maps tool names to specific inputs and validation expectations.
probe-suite.json
{
"tools": {
"search_knowledge_base": {
"input": {
"query": "authentication error handling",
"limit": 5,
"include_snippets": true
},
"expect": {
"not_error_code": [401, 403, 500],
"min_results": 1
}
},
"update_record": {
"input": {
"record_id": "test_record_99",
"payload": { "status": "verified" }
},
"expect": {
"not_error_code": [400, 404, 409]
}
}
}
}
Run the probe against the target server. The command below executes the handshake, enumerates tools, and performs dry-runs using the sidecar configuration.
npx @k08200/mcp-probe@latest @scope/knowledge-mcp \
--probe-tools \
--tools-file probe-suite.json \
--stderr-allow "deprecated" \
--stderr-fatal "missing required api key"
For servers exposed over HTTP, pass authentication headers directly:
npx @k08200/mcp-probe@latest https://api.example.com/mcp \
--header "Authorization: Bearer ${CI_MCP_TOKEN}" \
--probe-tools \
--tools-file probe-suite.json
Architecture Decisions
- Sidecar Inputs vs. Auto-Generation: Sidecar files are mandatory for production validation. Auto-generated inputs often use empty strings or default values that bypass business logic checks. Sidecars allow teams to inject context-aware data, ensuring that tools are tested against realistic scenarios.
- Stderr Classification: MCP servers often emit non-fatal warnings during initialization. Without classification, CI pipelines may fail on benign output. Explicit rules (
--stderr-allow, --stderr-fatal) ensure that only genuine errors block deployment.
- Batch Configuration: Teams managing multiple MCP servers benefit from a unified validation gate. A batch configuration file allows a single command to validate an entire fleet, reducing CI complexity and ensuring consistent standards across services.
Pitfall Guide
Production deployments of MCP servers reveal common validation pitfalls. Addressing these issues requires disciplined configuration and awareness of runtime behavior.
-
Relying on Schema-Minimum Inputs
- Explanation: Using auto-generated inputs that satisfy schema requirements but lack semantic meaning. This can result in tools returning empty results or bypassing validation logic, masking functional defects.
- Fix: Always use sidecar files with realistic inputs that exercise the tool's business logic. Define expectations that verify meaningful output.
-
Ignoring Stderr Noise
- Explanation: Treating all stderr output as fatal errors. Many servers log deprecation warnings or debug information that does not affect functionality. This leads to false positives in CI.
- Fix: Configure
--stderr-allow patterns for known benign messages and --stderr-fatal for critical errors. Review stderr logs periodically to update classification rules.
-
Hardcoding Secrets in Probes
- Explanation: Embedding API keys or tokens directly in sidecar files or command lines. This exposes credentials in version control and CI logs.
- Fix: Use environment variables or CI secret management systems. Reference secrets via
${ENV_VAR} syntax in configurations and ensure they are injected securely at runtime.
-
Assuming Stdio-Only Transport
- Explanation: Validating only stdio-based servers while neglecting HTTP or SSE transports. This leaves remote or containerized servers untested.
- Fix: Ensure probe configurations cover all transport types used in production. Test HTTP servers with appropriate headers and SSE servers with connection validation.
-
Lack of Fleet Management
- Explanation: Running individual probes for each server in a large ecosystem. This creates fragmented CI pipelines and inconsistent validation standards.
- Fix: Use batch configuration (
--config) to validate multiple servers in a single run. This centralizes validation logic and simplifies CI orchestration.
-
Timeout Misconfiguration
- Explanation: Setting aggressive timeouts that cause false failures on slow networks or under load. Conversely, overly permissive timeouts can mask performance regressions.
- Fix: Tune timeout values based on baseline latency metrics. Use the probe's latency reporting to establish thresholds and detect performance degradation over time.
-
Schema Drift Blindness
- Explanation: Tools changing their input/output schemas without updating probe configurations. This can lead to probes passing despite schema incompatibilities.
- Fix: Version probe configurations alongside server code. Implement schema validation checks in the probe to detect structural changes and alert teams to required updates.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single Server Validation | CLI with sidecar file | Simple setup, direct feedback for individual services. | Low |
| Multi-Server Fleet | Batch configuration (--config) | Centralized management, consistent standards, reduced CI overhead. | Medium |
| Remote HTTP Server | URL target with --header | Validates authentication and network accessibility. | Low |
| High-Security Environment | Sidecar with env var secrets | Ensures credentials are not exposed in logs or repos. | Low |
| Performance-Sensitive Deployment | Probe with latency monitoring | Detects regressions and ensures SLA compliance. | Low |
Configuration Template
Use this template to set up batch validation for a fleet of MCP servers. This configuration supports mixed transports and centralized secret management.
mcp-fleet.config.json
{
"servers": [
{
"name": "database-connector",
"command": "npx @scope/db-mcp-server",
"probe_tools": true,
"tools_file": "db-probe.json",
"stderr_fatal": ["connection refused"]
},
{
"name": "api-gateway",
"url": "https://api.internal.com/mcp",
"headers": {
"Authorization": "Bearer ${GATEWAY_TOKEN}"
},
"probe_tools": true,
"tools_file": "api-probe.json",
"timeout_ms": 5000
}
],
"github_summary": true,
"badge_output": "fleet-status.json"
}
Execute the fleet validation:
npx @k08200/mcp-probe@latest --config mcp-fleet.config.json --github-summary
Quick Start Guide
- Define Test Scenarios: Create a JSON sidecar file mapping tool names to realistic inputs and expected outcomes.
- Run Initial Probe: Execute
npx @k08200/mcp-probe@latest <server> --probe-tools --tools-file <config> to validate tool execution.
- Classify Output: Add
--stderr-allow and --stderr-fatal flags to filter diagnostic messages based on server behavior.
- Integrate CI: Add the probe command to your CI pipeline with
--github-summary to generate actionable feedback in pull requests.
- Monitor Results: Review step summaries and annotations to identify failures, then iterate on probe configurations to improve coverage.