From Scrapers to MCP Server: Serving Korean Entertainment Data to AI Agents

Architecting Production-Ready MCP Servers: Unified Data Strategies for Agentic AI

Current Situation Analysis

The rapid adoption of Model Context Protocol (MCP) has created a disconnect between data availability and agent utility. While vast amounts of structured data exist, AI agents struggle to leverage it effectively when exposed through traditional API patterns. The industry pain point is not a lack of data; it is the fragmentation of information and the misalignment between raw data schemas and agentic reasoning workflows.

Exposing database columns directly as MCP tool parameters is a common anti-pattern. This approach forces the LLM to perform complex joins and data synthesis, increasing token consumption, latency, and hallucination risk. Agents require opinionated tools that encapsulate business logic and return structured, context-rich responses.

This challenge is exemplified by niche data domains where information is scattered across proprietary platforms. Consider the Korean entertainment ecosystem: data regarding a single title is dispersed across MyDramaList, Naver, TMDB, Rotten Tomatoes, and regional streaming services. Building a unified view requires ingesting nearly 10,000 movies, 3,500 TV shows, per-episode Nielsen Korea ratings scraped from SVG elements, award histories, and streaming availability across four regions. Without a unified MCP layer, an agent cannot answer cross-domain queries like "Which drama has the highest Nielsen viewership growth but low Western critic scores?" because no single API contains both datasets.

The solution lies in architecting MCP servers that act as semantic bridges, transforming fragmented raw data into high-value, monetizable tools optimized for agentic consumption.

WOW Moment: Key Findings

The value of a production MCP server is not measured by the number of tools, but by the uniqueness of the cross-source queries it enables. When data is unified and tools are opinionated, agents can perform reasoning tasks that were previously impossible.

The following comparison illustrates the operational difference between exposing raw data and deploying unified MCP tools:

Strategy	Data Coverage	Agent Reasoning Steps	Monetization Readiness	Query Capability
Raw API/DB Exposure	Fragmented across sources	High (Agent must join/synthesize)	None (No auth/usage tracking)	Limited to single-source lookups
Unified Opinionated MCP	Cross-source normalized	Low (Agent calls specific tool)	High (OAuth 2.1 + DCR)	Complex cross-domain analysis

Why this matters: Unified tools unlock "moat" queries. For example, a compare_ratings tool that returns Naver's verified Korean ticket buyer score alongside Rotten Tomatoes' Western critic score allows an agent to identify cultural reception gaps. Similarly, a get_episode_trajectory tool that aggregates Nielsen viewership percentages enables trend analysis that no individual scraper provides. These capabilities transform the MCP server from a simple data proxy into a strategic asset for developers and end-users.

Core Solution

Building a production-grade MCP server requires a disciplined approach to tool design, authentication, and deployment. The following implementation uses FastMCP, Supabase, Descope, and Railway to demonstrate a complete architecture.

1. Tool Taxonomy and Design

Tools should be categorized by the agent's intent. A robust server organizes tools into three buckets:

Discovery Tools: Answer "What should I explore?" Examples include search_by_community_tag or list_trending_titles. These tools should leverage unique taxonomies, such as MyDramaList's community tags ("Enemies to Lovers", "Time Travel"), which do not exist in structured form elsewhere.
Detail Tools: Answer "Tell me everything about this entity." Examples include fetch_media_details or get_episode_ratings. The get_episode_ratings tool is critical for domains with granular metrics; it should return normalized data (e.g., Nielsen percentages) rather than raw HTML or SVG references.
Utility Tools: Answer cross-cutting questions. Examples include find_streaming_availability or compare_audience_scores. These tools perform joins across internal tables and external sources, returning a single, enriched response.

2. Implementation with FastMCP and Pydantic

FastMCP simplifies server creation by allowing tool definitions via decorators. However, production systems must enforce schema validation using Pydantic models. This ensures consistent return types and improves the LLM's understanding of the tool's output.

import os
from fastmcp import FastMCP
from fastmcp.server.auth.providers.descope import DescopeProvider
from pydantic import BaseModel, Field
from typing import List, Optional

# Define strict return schemas for agent reliability
class RatingSnapshot(BaseModel):
    naver_audience: float = Field(description="Score from Korean verified ticket buyers (0-10)")
    mdl_fan_score: float = Field(description="Score from international K-drama fans (0-10)")
    rt_tomatometer: Optional[float] = Field(description="Western critic score (0-100)", default=None)

class MediaDetail(BaseModel):
    title_english: str
    title_korean: str
    release_year: int
    ratings: RatingSnapshot
    streaming_regions: List[str]

# Initialize MCP server with authentication
auth_provider = DescopeProvider(
    config_url=os.environ["DESCOPE_CONFIG_URL"],
    base_url=os.environ["SERVER_URL"],
)

mcp = FastMCP(
    name="GlobalMediaInsights",
    instructions="""
    You are an expert media analyst. Use the provided tools to retrieve 
    unified data across Korean and Western entertainment platforms.
    Always cite the source of ratings when comparing audiences.
    """,
    auth=auth_provider,
)

@mcp.tool
def compare_audience_scores(title_id: str) -> RatingSnapshot:
    """
    Compare localized vs global audience reception for a specific title.
    Useful for identifying cultural reception gaps.
    """
    # In production, this calls a query layer that joins Supabase tables
    # and fetches external ratings. Returns a Pydantic model.
    return _query_engine.get_rating_comparison(title_id)

@mcp.tool
def search_titles_by_tag(tag: str, limit: int = 20) -> List[MediaDetail]:
    """
    Browse titles by community-generated tags.
    Common tags: 'Revenge', 'Found Family', 'CEO Male Lead'.
    """
    return _query_engine.search_by_tag(tag, limit)

if __name__ == "__main__":
    port = int(os.environ.get("PORT", 8000))
    mcp.run(
        transport="streamable-http",
        host="0.0.0.0",
        port=port,
    )

Architecture Decisions:

Pydantic Models: Using BaseModel for return types prevents schema drift and gives the LLM precise type hints, reducing parsing errors.
Streamable HTTP: This transport is required for remote MCP servers accessed by clients like Claude Desktop or web-based agents. It maintains persistent connections and handles tool discovery efficiently.
Direct Query Layer: Tools should call the database query layer directly, bypassing intermediate REST APIs. This reduces latency and simplifies the stack.

3. Authentication with Descope

Monetizable MCP servers require robust authentication. Descope is recommended due to its native FastMCP integration and support for Dynamic Client Registration (DCR). DCR allows MCP clients to register automatically without manual configuration, streamlining the onboarding process for developers.

The DescopeProvider handles OAuth 2.1 flows, token validation, and user context injection. Ensure the SERVER_URL environment variable includes the https:// scheme, as Pydantic validation will reject relative URLs.

4. Deployment Strategy

MCP servers must respond quickly to tool calls. Cold starts can cause timeouts in agentic workflows, degrading the user experience. Railway is preferred over free-tier alternatives because it offers always-on containers at a low cost ($5/month), eliminating cold start latency.

The deployment configuration is minimal:

# railway.toml
[build]
builder = "nixpacks"

[deploy]
startCommand = "python server.py"
restartPolicyType = "always"

Railway injects the PORT environment variable dynamically. The server must read this variable at runtime rather than hardcoding a port, ensuring compatibility with the platform's networking layer.

Pitfall Guide

Production deployments reveal subtle issues that can derail an MCP project. The following pitfalls are derived from real-world implementation errors.

Pitfall Name	Explanation	Fix
Incremental Edit Corruption	Automated string replacements or patch-based edits can corrupt import blocks or syntax, leading to confusing errors like `SyntaxError: '(' was never closed`.	Regenerate configuration and server files from scratch during updates. Avoid incremental patches in CI/CD pipelines.
URL Schema Omission	Environment variables for URLs (e.g., `SERVER_URL`) must include the scheme (`https://`). Omitting this causes Pydantic validation failures during startup.	Validate all URL environment variables in a pre-deployment script. Always enforce schema inclusion.
Cold Start Latency	Free hosting tiers often spin down containers after inactivity. A 30-60 second restart delay causes MCP clients to timeout during tool discovery or execution.	Use always-on hosting for MCP servers. Budget for $5-10/month to ensure responsiveness.
Tool Granularity Mismatch	Exposing too many parameters or raw database fields forces the LLM to do heavy lifting, increasing token usage and error rates.	Design opinionated tools that encapsulate logic. Return enriched objects, not raw rows.
Token Scoping Misconfiguration	Descope's audience (`aud`) claim is optional but critical for security. Leaving it unset allows tokens to be used across multiple servers.	Configure the MCP Server URL in Descope to enforce strict token scoping in production.
Port Binding Assumptions	Hardcoding ports (e.g., `8000`) fails on platforms that assign dynamic ports via environment variables.	Read `os.environ.get("PORT", default)` and bind dynamically.
Docstring Neglect	FastMCP uses function docstrings as tool descriptions for the LLM. Vague or missing docstrings reduce tool discoverability and usage accuracy.	Write detailed docstrings with parameter descriptions, return value explanations, and usage examples.

Production Bundle

Action Checklist

Define Tool Taxonomy: Categorize tools into Discovery, Detail, and Utility based on agent workflows.
Implement Pydantic Schemas: Create strict return models for all tools to ensure type safety and LLM clarity.
Configure Descope Auth: Set up DescopeProvider with DCR and verify SERVER_URL includes https://.
Enable Always-On Hosting: Deploy to a platform like Railway with a restart policy that prevents cold starts.
Validate Token Scoping: Configure the audience claim in Descope for production security.
Test End-to-End: Verify tool calls via Claude Code or a similar MCP client to ensure reasoning works as expected.
List in Directories: Submit the server to Smithery, Glama, and mcp.so for discoverability.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
High-Traffic Production	Railway Always-On + Descope Pro	Eliminates cold starts; supports high concurrent auth requests.	~$15-20/month
Internal/Testing	Localhost + Descope Free	Fast iteration; free tier sufficient for low volume.	$0
Monetization Required	Descope + Usage Tracking	OAuth enables per-user billing and access control.	Descope fees + infra
Niche Data Domain	Custom Scrapers + Supabase	Unified data creates unique value; scrapers fill API gaps.	Scraper maintenance

Configuration Template

Use this template to bootstrap a production MCP server:

# server.py
import os
from fastmcp import FastMCP
from fastmcp.server.auth.providers.descope import DescopeProvider

# Load environment variables
DESCOPE_CONFIG_URL = os.environ["DESCOPE_CONFIG_URL"]
SERVER_URL = os.environ["SERVER_URL"]  # Must include https://

# Initialize Auth
auth = DescopeProvider(
    config_url=DESCOPE_CONFIG_URL,
    base_url=SERVER_URL,
)

# Initialize Server
mcp = FastMCP(
    name="ProductionMCPServer",
    instructions="Unified data access for agentic workflows.",
    auth=auth,
)

# Define tools here...

if __name__ == "__main__":
    port = int(os.environ.get("PORT", 8000))
    mcp.run(
        transport="streamable-http",
        host="0.0.0.0",
        port=port,
    )

# railway.toml
[build]
builder = "nixpacks"

[deploy]
startCommand = "python server.py"
restartPolicyType = "always"

Quick Start Guide

Install Dependencies: Run pip install fastmcp descope pydantic.
Set Up Descope: Create a project in Descope, enable DCR, and copy the Config URL.
Configure Environment: Set DESCOPE_CONFIG_URL and SERVER_URL (with https://) in your deployment platform.
Deploy: Push to Railway or your chosen host. Verify the container starts without validation errors.
Test: Connect via claude mcp add <name> -- <url> and run a sample query to verify tool discovery and execution.