Difficulty

Intermediate

Read Time

5 min

/api/admin/crawler

By Codcompass Team·2026-05-24·5 min read

/api/admin/crawler

Overview

The /api/admin/crawler endpoint serves as the primary programmatic interface for manually triggering and monitoring web crawler jobs within the platform. It is designed for developers, DevOps engineers, and system administrators who need immediate control over content indexing workflows without relying on automated schedulers or cron jobs. The endpoint operates in two distinct modes: a POST request to enqueue and dispatch a new crawler task, and a GET request to retrieve the detailed status and metadata of an existing job.

Because this endpoint is publicly accessible (no authentication headers required), it is typically deployed in environments where network-level access controls, reverse proxy rules, or internal routing restrict external exposure. Use this endpoint when you need to:

Manually initiate a crawl after deploying new content or updating site architecture.
Integrate crawler triggers into external CI/CD pipelines or monitoring dashboards.
Debug or validate job execution by polling the CrawlerJob database records directly.

Endpoint Reference

Base Path: /api/admin/crawler
HTTP Methods: POST, GET
Authentication: Public (No authentication required)
Content-Type: application/json
Runtime: Next.js API Route (App Router compatible)
Dependencies: @supabase/supabase-js, @/lib/crawler-dispatch

Request Format

The endpoint handles two distinct request patterns depending on the HTTP method used.

POST Request

The POST method initiates a new crawler job. It does not require a request body or query parameters. The endpoint automatically generates a unique job identifier and dispatches the task to the underlying execution engine. According to the source implementation, this method enqueues the job and triggers the dispatch mechanism (e.g., GitHub Actions or internal queue), waiting only for the HTTP acknowledgment from the target system (default timeout ≤12 seconds). It does not block until the crawler finishes processing.

Headers: Standard HTTP headers. No Authorization header is required.
**Bo

dy**: None.

Query Parameters: None.

GET Request

The GET method retrieves the status and metadata of a specific crawler job. It requires the jobId query parameter to locate the record in the CrawlerJob table.

Headers: Standard HTTP headers.
Body: None.
Query Parameters:
Parameter Type Required Description
jobId string Optional The unique identifier (UUID) of the crawler job.

Parameter	Type	Required	Description
`jobId`	string	Optional	The unique identifier (UUID) of the crawler job.

Example GET Request URL:

GET /api/admin/crawler?jobId=550e8400-e29b-41d4-a716-446655440000

Response Format

Responses are returned as JSON objects. The structure varies based on the HTTP method and execution outcome.

POST Response

Success (200 OK): Returned when the job is successfully enqueued and dispatched.

{
  "success": true,
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "dispatched",
  "dispatchTarget": "github_actions",
  "githubAccepted": true,
  "actionsUrl": "https://github.com/org/repo/actions/runs/12345678",
  "errorMessage": null,
  "message": "Crawler job successfully dispatched."
}

Dispatch Failure (502 Bad Gateway): Returned when the job is created but the downstream dispatch mechanism fails to acknowledge the trigger.

{
  "success": false,
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "dispatch_failed",
  "dispatchTarget": "github_actions",
  "githubAccepted": false,
  "actionsUrl": null,
  "errorMessage": "Failed to trigger GitHub Actions workflow.",
  "message": "Job enqueued but dispatch failed."
}

Server Error (500 Internal Server Error): Returned if required Supabase environment variables are missing or an unexpected exception occurs during execution.
```
{
  "error": "Supabase is not configured for admin crawler API."
}
```

GET Response

Success (200 OK): Returns the full CrawlerJob database record.

{
  "success": true,
  "job": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "completed",
    "trigger_source": "admin",
    "created_at": "2024-01-15T10:30:00Z",
    "updated_at": "2024-01-15T10:35:00Z",
    "metadata": { "pages_crawled": 142, "errors": 0 }
  }
}

Not Found (404 Not Found): Returned when the jobId does not match any records in the database.
```
{
  "error": "Job not found"
}
```
Database Error (500 Internal Server Error): Returned on Supabase query failures or connection issues.
```
{
  "error": "Database query failed: relation \"CrawlerJob\" does not exist"
}
```

Help Message (200 OK): Returned when GET is called without a jobId parameter.

{
  "message": "Use POST to enqueue a crawler run (GitHub Actions / queue). Use GET ?jobId=<uuid> to fetch job status."
}

Usage Example

The following examples demonstrate how to trigger a new crawler job and subsequently check its status using curl and native JavaScript fetch.

Trigger a new crawler job (POST):

curl -X POST https://your-domain.com/api/admin/crawler \
  -H "Content-Type: application/json"

Check job status (GET):

curl -X GET "https://your-domain.com/api/admin/crawler?jobId=550e8400-e29b-41d4-a716-446655440000" \
  -H "Content-Type: application/json"

Programmatic Example (Node.js/Fetch):

// 1. Dispatch crawler
const dispatchRes = await fetch('/api/admin/crawler', { method: 'POST' });
const dispatchData = await dispatchRes.json();

if (!dispatchData.success) {
  throw new Error(dispatchData.errorMessage || 'Dispatch failed');
}

const jobId = dispatchData.jobId;

// 2. Poll status until completion
let jobStatus = 'pending';
while (jobStatus !== 'completed' && jobStatus !== 'failed') {
  const statusRes = await fetch(`/api/admin/crawler?jobId=${jobId}`);
  const { job } = await statusRes.json();
  jobStatus = job.status;
  console.log(`Current status: ${jobStatus}`);
  await new Promise(res => setTimeout(res, 5000)); // Wait 5s before next poll
}

Common Pitfalls

Misinterpreting POST Response Timing: The POST endpoint only handles job enqueuing and dispatching. It does not wait for the crawler to finish processing. The response confirms that the job was accepted by the execution engine, but the actual crawling happens asynchronously. Developers should implement polling or webhook listeners using the GET ?jobId= endpoint to track completion.
Missing Environment Configuration: The endpoint relies on NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY. If these are not defined in the runtime environment, the POST request will immediately return a 500 error. Ensure service-role keys are used for backend operations, as client-side keys lack the necessary permissions to write to the CrawlerJob table.
Case-Sensitive Query Parameters: The jobId query parameter must be spelled exactly as jobId (camelCase). Using jobid or JobId will result in the endpoint returning the help message instead of querying the database. Additionally, the jobId must match the exact UUID format returned by the initial POST request; partial matches or typos will trigger a 404 response.

While /api/admin/crawler handles manual dispatch and direct job lookup, it is part of a broader crawler management ecosystem. Developers typically pair this endpoint with:

Job Queue Management Routes: Endpoints that list recent jobs, filter by status, or manage retry logic for failed crawls. These routes often consume the same CrawlerJob table and provide bulk operations.
Webhook/Callback Handlers: Routes that receive asynchronous updates from the execution engine (e.g., GitHub Actions or internal workers) to update the CrawlerJob table in real-time, enabling event-driven status tracking instead of polling.
Configuration & Scope Endpoints: APIs that define crawl targets, rate limits, and exclusion rules, which are referenced when the dispatcher initializes a new job.

For production deployments, ensure that access to /api/admin/crawler is restricted via network ACLs, reverse proxy rules, or internal routing, as the endpoint does not enforce application-level authentication.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back

Sources

• api-reference

/api/admin/crawler

Overview

Endpoint Reference

Request Format

POST Request

🎉 Mid-Year Sale — Unlock Full Article

Production Bundle

Sources