dy**: None.
GET Request
The GET method retrieves the status and metadata of a specific crawler job. It requires the jobId query parameter to locate the record in the CrawlerJob table.
- Headers: Standard HTTP headers.
- Body: None.
- Query Parameters:
| Parameter | Type | Required | Description |
|---|
jobId | string | Optional | The unique identifier (UUID) of the crawler job. |
Example GET Request URL:
GET /api/admin/crawler?jobId=550e8400-e29b-41d4-a716-446655440000
Responses are returned as JSON objects. The structure varies based on the HTTP method and execution outcome.
POST Response
- Success (200 OK): Returned when the job is successfully enqueued and dispatched.
{
"success": true,
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "dispatched",
"dispatchTarget": "github_actions",
"githubAccepted": true,
"actionsUrl": "https://github.com/org/repo/actions/runs/12345678",
"errorMessage": null,
"message": "Crawler job successfully dispatched."
}
- Dispatch Failure (502 Bad Gateway): Returned when the job is created but the downstream dispatch mechanism fails to acknowledge the trigger.
{
"success": false,
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "dispatch_failed",
"dispatchTarget": "github_actions",
"githubAccepted": false,
"actionsUrl": null,
"errorMessage": "Failed to trigger GitHub Actions workflow.",
"message": "Job enqueued but dispatch failed."
}
- Server Error (500 Internal Server Error): Returned if required Supabase environment variables are missing or an unexpected exception occurs during execution.
{
"error": "Supabase is not configured for admin crawler API."
}
GET Response
- Success (200 OK): Returns the full
CrawlerJob database record.
{
"success": true,
"job": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"trigger_source": "admin",
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T10:35:00Z",
"metadata": { "pages_crawled": 142, "errors": 0 }
}
}
- Not Found (404 Not Found): Returned when the
jobId does not match any records in the database.
{
"error": "Job not found"
}
- Database Error (500 Internal Server Error): Returned on Supabase query failures or connection issues.
{
"error": "Database query failed: relation \"CrawlerJob\" does not exist"
}
- Help Message (200 OK): Returned when
GET is called without a jobId parameter.
{
"message": "Use POST to enqueue a crawler run (GitHub Actions / queue). Use GET ?jobId=<uuid> to fetch job status."
}
Usage Example
The following examples demonstrate how to trigger a new crawler job and subsequently check its status using curl and native JavaScript fetch.
Trigger a new crawler job (POST):
curl -X POST https://your-domain.com/api/admin/crawler \
-H "Content-Type: application/json"
Check job status (GET):
curl -X GET "https://your-domain.com/api/admin/crawler?jobId=550e8400-e29b-41d4-a716-446655440000" \
-H "Content-Type: application/json"
Programmatic Example (Node.js/Fetch):
// 1. Dispatch crawler
const dispatchRes = await fetch('/api/admin/crawler', { method: 'POST' });
const dispatchData = await dispatchRes.json();
if (!dispatchData.success) {
throw new Error(dispatchData.errorMessage || 'Dispatch failed');
}
const jobId = dispatchData.jobId;
// 2. Poll status until completion
let jobStatus = 'pending';
while (jobStatus !== 'completed' && jobStatus !== 'failed') {
const statusRes = await fetch(`/api/admin/crawler?jobId=${jobId}`);
const { job } = await statusRes.json();
jobStatus = job.status;
console.log(`Current status: ${jobStatus}`);
await new Promise(res => setTimeout(res, 5000)); // Wait 5s before next poll
}
Common Pitfalls
- Misinterpreting POST Response Timing: The
POST endpoint only handles job enqueuing and dispatching. It does not wait for the crawler to finish processing. The response confirms that the job was accepted by the execution engine, but the actual crawling happens asynchronously. Developers should implement polling or webhook listeners using the GET ?jobId= endpoint to track completion.
- Missing Environment Configuration: The endpoint relies on
NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY. If these are not defined in the runtime environment, the POST request will immediately return a 500 error. Ensure service-role keys are used for backend operations, as client-side keys lack the necessary permissions to write to the CrawlerJob table.
- Case-Sensitive Query Parameters: The
jobId query parameter must be spelled exactly as jobId (camelCase). Using jobid or JobId will result in the endpoint returning the help message instead of querying the database. Additionally, the jobId must match the exact UUID format returned by the initial POST request; partial matches or typos will trigger a 404 response.
While /api/admin/crawler handles manual dispatch and direct job lookup, it is part of a broader crawler management ecosystem. Developers typically pair this endpoint with:
- Job Queue Management Routes: Endpoints that list recent jobs, filter by status, or manage retry logic for failed crawls. These routes often consume the same
CrawlerJob table and provide bulk operations.
- Webhook/Callback Handlers: Routes that receive asynchronous updates from the execution engine (e.g., GitHub Actions or internal workers) to update the
CrawlerJob table in real-time, enabling event-driven status tracking instead of polling.
- Configuration & Scope Endpoints: APIs that define crawl targets, rate limits, and exclusion rules, which are referenced when the dispatcher initializes a new job.
For production deployments, ensure that access to /api/admin/crawler is restricted via network ACLs, reverse proxy rules, or internal routing, as the endpoint does not enforce application-level authentication.