From Scrapers to MCP Server: Serving Korean Entertainment Data to AI Agents
Architecting Production-Ready MCP Servers: Unified Data Strategies for Agentic AI
Current Situation Analysis
The rapid adoption of Model Context Protocol (MCP) has created a disconnect between data availability and agent utility. While vast amounts of structured data exist, AI agents struggle to leverage it effectively when exposed through traditional API patterns. The industry pain point is not a lack of data; it is the fragmentation of information and the misalignment between raw data schemas and agentic reasoning workflows.
Exposing database columns directly as MCP tool parameters is a common anti-pattern. This approach forces the LLM to perform complex joins and data synthesis, increasing token consumption, latency, and hallucination risk. Agents require opinionated tools that encapsulate business logic and return structured, context-rich responses.
This challenge is exemplified by niche data domains where information is scattered across proprietary platforms. Consider the Korean entertainment ecosystem: data regarding a single title is dispersed across MyDramaList, Naver, TMDB, Rotten Tomatoes, and regional streaming services. Building a unified view requires ingesting nearly 10,000 movies, 3,500 TV shows, per-episode Nielsen Korea ratings scraped from SVG elements, award histories, and streaming availability across four regions. Without a unified MCP layer, an agent cannot answer cross-domain queries like "Which drama has the highest Nielsen viewership growth but low Western critic scores?" because no single API contains both datasets.
The solution lies in architecting MCP servers that act as semantic bridges, transforming fragmented raw data into high-value, monetizable tools optimized for agentic consumption.
WOW Moment: Key Findings
The value of a production MCP server is not measured by the number of tools, but by the uniqueness of the cross-source queries it enables. When data is unified and tools are opinionated, agents can perform reasoning tasks that were previously impossible.
The following comparison illustrates the operational difference between exposing raw data and deploying unified MCP tools:
| Strategy | Data Coverage | Agent Reasoning Steps | Monetization Readiness | Query Capability |
|---|---|---|---|---|
| Raw API/DB Exposure | Fragmented across sources | High (Agent must join/synthesize) | None (No auth/usage tracking) | Limited to single-source lookups |
| Unified Opinionated MCP | Cross-source normalized | Low (Agent calls specific tool) | High (OAuth 2.1 + DCR) | Complex cross-domain analysis |
Why this matters:
Unified tools unlock "moat" queries. For example, a compare_ratings tool that returns Naver's verified Korean ticket buyer score alongside Rotten Tomatoes' Western critic score allows an agent to identify cultural reception gaps. Similarly, a get_episode_trajectory tool that aggregates Nielsen viewership percentages enables trend analysis that no individual scraper provides. These capabilities transform the MCP server from a simple data proxy into a strategic asset for developers and end-users.
Core Solution
Building a production-grade MCP server requires a disciplined approach to tool design, authentication, and deployment. The following implementation uses FastMCP, Supabase, Descope, and Railway to demonstrate a complete architecture.
1. Tool Taxonomy and Design
Tools should be categorized by the agent's intent. A robust server organizes tools into three buckets:
- Discovery Tools: Answer "What should I explore?" Examples include
search_by_community_tagorlist_trending_titles. These tools should leverage unique taxonomies, such as MyDramaList's community tags ("Enemies to Lovers", "Time Travel"), which do not exist in structured form elsewhere. - Detail Tools: Answer "Tell me everything about this entity." Examples include
fetch_media_detailsorget_episode_ratings. Theget_episode_ratingstool is critical for domains with granular metrics; it should return normalized data (e.g., Nielsen percentages) rather than raw HTML or SVG references. - Utility Tools: Answer cross-cutting questions. Examples include
find_streaming_availabilityorcompare_audience_scores. These tools perform joins across internal tables and external sources, returning a single, enriched response.
2. Implementation with FastMCP and Pydantic
FastMCP simplifies server creation by allowing tool definitions via decorators. However, production systems must enforce schema validation using Pydantic models. This ensures consistent return types and improves the LLM's understanding of the tool's output.
import os
from fastmcp import FastMCP
from fastmcp.server.auth.providers.descope import DescopeProvider
from pydantic import BaseModel, Field
from typing import List, Optional
# Define strict return schemas for agent reliability
class RatingSnapshot(BaseModel):
naver_audience: float = Field(description="Score from Korean verified ticket buyers (0-10)")
mdl_fan_score: float = Field(description="Score from international K-drama fans (0-10)")
rt_tomatometer: Optional[float] = Field(description="Western critic score (0-100)", default=None)
class MediaDetail(BaseModel):
title_english: str
title_korean: str
release_year: int
ratings: RatingSnapshot
streaming_regions: List[str]
# Initialize MCP server with authentication
auth_provider = DescopeProvider(
config_url=os.environ["DESCOPE_CONFIG_URL"],
base_url=os.environ["SERVER_URL"],
)
mcp = FastMCP(
name="GlobalMediaInsights",
instructions="""
You are an expert media analyst. Use the provided tools to retrieve
unified data across Korean and Western entertainment platforms.
Always cite the source of ratings when comparing audiences.
""",
auth=auth_provider,
)
@mcp.tool
def compare_audience_scores(title_id: str) -> RatingSnapshot:
"""
Compare localized vs global audience reception for a specific title.
Useful for identifying cultural reception gaps.
"""
# In production, this calls a query layer that joins Supabase tables
# and fetches external ratings. Returns a Pydantic model.
return _query_engine.get_rating_comparison(title_id)
@mcp.tool
def search_titles_by_tag(tag: str, limit: int = 20) -> List[MediaDetail]:
"""
Browse titles by community-generated tags.
Common tags: 'Revenge', 'Found Family', 'CEO Male Lead'.
"""
return _query_engine.search_by_tag(tag, limit)
if __name__ == "__main__":
port = int(os.environ.get("PORT", 8000))
mcp.run(
transport="streamable-http",
host="0.0.0.0",
port=port,
)
Architecture Decisions:
- Pydantic Models: Using
BaseModelfor return types prevents schema drift and gives the LLM precise type hints, reducing parsing errors. - Streamable HTTP: This transport is required for remote MCP servers accessed by clients like Claude Desktop or web-based agents. It maintains persistent connections and handles tool discovery efficiently.
- Direct Query Layer: Tools should call the database query layer directly, bypassing intermediate REST APIs. This reduces latency and simplifies the stack.
3. Authentication with Descope
Monetizable MCP servers require robust authentication. Descope is recommended due to its native FastMCP integration and support for Dynamic Client Registration (DCR). DCR allows MCP clients to register automatically without manual configuration, streamlining the onboarding process for developers.
The DescopeProvider handles OAuth 2.1 flows, token validation, and user context injection. Ensure the SERVER_URL environment variable includes the https:// scheme, as Pydantic validation will reject relative URLs.
4. Deployment Strategy
MCP servers must respond quickly to tool calls. Cold starts can cause timeouts in agentic workflows, degrading the user experience. Railway is preferred over free-tier alternatives because it offers always-on containers at a low cost ($5/month), eliminating cold start latency.
The deployment configuration is minimal:
# railway.toml
[build]
builder = "nixpacks"
[deploy]
startCommand = "python server.py"
restartPolicyType = "always"
Railway injects the PORT environment variable dynamically. The server must read this variable at runtime rather than hardcoding a port, ensuring compatibility with the platform's networking layer.
Pitfall Guide
Production deployments reveal subtle issues that can derail an MCP project. The following pitfalls are derived from real-world implementation errors.
| Pitfall Name | Explanation | Fix |
|---|---|---|
| Incremental Edit Corruption | Automated string replacements or patch-based edits can corrupt import blocks or syntax, leading to confusing errors like SyntaxError: '(' was never closed. |
Regenerate configuration and server files from scratch during updates. Avoid incremental patches in CI/CD pipelines. |
| URL Schema Omission | Environment variables for URLs (e.g., SERVER_URL) must include the scheme (https://). Omitting this causes Pydantic validation failures during startup. |
Validate all URL environment variables in a pre-deployment script. Always enforce schema inclusion. |
| Cold Start Latency | Free hosting tiers often spin down containers after inactivity. A 30-60 second restart delay causes MCP clients to timeout during tool discovery or execution. | Use always-on hosting for MCP servers. Budget for $5-10/month to ensure responsiveness. |
| Tool Granularity Mismatch | Exposing too many parameters or raw database fields forces the LLM to do heavy lifting, increasing token usage and error rates. | Design opinionated tools that encapsulate logic. Return enriched objects, not raw rows. |
| Token Scoping Misconfiguration | Descope's audience (aud) claim is optional but critical for security. Leaving it unset allows tokens to be used across multiple servers. |
Configure the MCP Server URL in Descope to enforce strict token scoping in production. |
| Port Binding Assumptions | Hardcoding ports (e.g., 8000) fails on platforms that assign dynamic ports via environment variables. |
Read os.environ.get("PORT", default) and bind dynamically. |
| Docstring Neglect | FastMCP uses function docstrings as tool descriptions for the LLM. Vague or missing docstrings reduce tool discoverability and usage accuracy. | Write detailed docstrings with parameter descriptions, return value explanations, and usage examples. |
Production Bundle
Action Checklist
- Define Tool Taxonomy: Categorize tools into Discovery, Detail, and Utility based on agent workflows.
- Implement Pydantic Schemas: Create strict return models for all tools to ensure type safety and LLM clarity.
- Configure Descope Auth: Set up
DescopeProviderwith DCR and verifySERVER_URLincludeshttps://. - Enable Always-On Hosting: Deploy to a platform like Railway with a restart policy that prevents cold starts.
- Validate Token Scoping: Configure the audience claim in Descope for production security.
- Test End-to-End: Verify tool calls via Claude Code or a similar MCP client to ensure reasoning works as expected.
- List in Directories: Submit the server to Smithery, Glama, and mcp.so for discoverability.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| High-Traffic Production | Railway Always-On + Descope Pro | Eliminates cold starts; supports high concurrent auth requests. | ~$15-20/month |
| Internal/Testing | Localhost + Descope Free | Fast iteration; free tier sufficient for low volume. | $0 |
| Monetization Required | Descope + Usage Tracking | OAuth enables per-user billing and access control. | Descope fees + infra |
| Niche Data Domain | Custom Scrapers + Supabase | Unified data creates unique value; scrapers fill API gaps. | Scraper maintenance |
Configuration Template
Use this template to bootstrap a production MCP server:
# server.py
import os
from fastmcp import FastMCP
from fastmcp.server.auth.providers.descope import DescopeProvider
# Load environment variables
DESCOPE_CONFIG_URL = os.environ["DESCOPE_CONFIG_URL"]
SERVER_URL = os.environ["SERVER_URL"] # Must include https://
# Initialize Auth
auth = DescopeProvider(
config_url=DESCOPE_CONFIG_URL,
base_url=SERVER_URL,
)
# Initialize Server
mcp = FastMCP(
name="ProductionMCPServer",
instructions="Unified data access for agentic workflows.",
auth=auth,
)
# Define tools here...
if __name__ == "__main__":
port = int(os.environ.get("PORT", 8000))
mcp.run(
transport="streamable-http",
host="0.0.0.0",
port=port,
)
# railway.toml
[build]
builder = "nixpacks"
[deploy]
startCommand = "python server.py"
restartPolicyType = "always"
Quick Start Guide
- Install Dependencies: Run
pip install fastmcp descope pydantic. - Set Up Descope: Create a project in Descope, enable DCR, and copy the Config URL.
- Configure Environment: Set
DESCOPE_CONFIG_URLandSERVER_URL(withhttps://) in your deployment platform. - Deploy: Push to Railway or your chosen host. Verify the container starts without validation errors.
- Test: Connect via
claude mcp add <name> -- <url>and run a sample query to verify tool discovery and execution.
