| custom | devto",
"label": "string (max 120 chars)",
"tags": ["string (max 80 chars each)"],
"feedUrl": "string (max 2048 chars)",
"siteUrl": "string (max 2048 chars)",
"articlesPerTag": 5,
"contentTrack": "kb | blog | ai",
"priority": "P0 | P1 | P2",
"expectedCategory": "string (max 160 chars)",
"crawlStrategy": "string (max 500 chars)",
"extractTemplate": "github_trending | tech_blog | generic_rss | roadmap_nodes | none"
}
]
}
}
### Field Validation Rules
| Field | Type | Constraints & Defaults |
|-------|------|------------------------|
| `schedule` | `object` | Merged with existing config. Invalid keys are ignored. |
| `advanced` | `object` | Merged with existing config. Invalid keys are ignored. |
| `sources` | `array` | If omitted or not an array, falls back to current sources. Must contain at least 1 item. |
| `id` | `string` | Max 64 characters. Auto-generated if missing. |
| `enabled` | `boolean` | Coerced via `Boolean()`. |
| `type` | `string` | Allowed: `rss`, `custom`. Defaults to `devto` if omitted or invalid. |
| `label` | `string` | Max 120 characters. Defaults to `未命名` if empty. |
| `tags` | `string[]` | Each tag trimmed to 80 chars. Empty strings filtered out. |
| `feedUrl` / `siteUrl` | `string` | Max 2048 characters. Optional. |
| `articlesPerTag` | `number` | Clamped to `1–30`. Defaults to `5` if missing or invalid. |
| `contentTrack` | `string` | Allowed: `kb`, `blog`, `ai`. Otherwise `undefined`. |
| `priority` | `string` | Allowed: `P0`, `P1`, `P2`. Otherwise `undefined`. |
| `expectedCategory` | `string` | Max 160 characters. Optional. |
| `crawlStrategy` | `string` | Max 500 characters. Optional. |
| `extractTemplate` | `string` | Allowed: `github_trending`, `tech_blog`, `generic_rss`, `roadmap_nodes`, `none`. Otherwise `undefined`. |
## Response Format
All responses return a JSON object with an `ok` boolean flag. Successful responses include the updated or current configuration.
### Success Response (`200 OK`)
```json
{
"ok": true,
"config": {
"version": 1,
"schedule": { /* normalized schedule */ },
"advanced": { /* normalized advanced settings */ },
"sources": [ /* validated source array */ ],
"lastRun": "2024-01-15T08:30:00.000Z"
}
}
Error Responses
| Status | Condition | Example Payload |
|---|
400 | Invalid JSON, missing config object, empty sources array, or validation failure | {"ok": false, "error": "Missing config"} |
401 | PUT request lacks valid authorization | {"ok": false, "error": "Unauthorized"} |
404 | Local crawler UI feature is disabled in platform settings | {"ok": false, "error": "Not found"} |
500 | Unexpected server or file I/O failure | {"ok": false, "error": "Internal server error"} |
Usage Example
Retrieve Current Configuration
curl -X GET "https://your-instance.com/api/local-only/config" \
-H "Accept: application/json"
Update Configuration
curl -X PUT "https://your-instance.com/api/local-only/config" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-token>" \
-d '{
"config": {
"schedule": { "interval": "daily", "time": "02:00" },
"advanced": { "concurrency": 4, "timeout": 30000 },
"sources": [
{
"id": "tech-rss-01",
"enabled": true,
"type": "rss",
"label": "Engineering Blog",
"tags": ["frontend", "performance"],
"feedUrl": "https://example.com/feed.xml",
"siteUrl": "https://example.com",
"articlesPerTag": 10,
"contentTrack": "blog",
"priority": "P1",
"extractTemplate": "tech_blog"
}
]
}
}'
Common Pitfalls
-
Environment Guard Returns 404
The endpoint is wrapped in assertLocalCrawlerUiEnabled(). If the platform's local crawler UI is disabled in environment variables or feature flags, both GET and PUT will return 404 Not Found. Verify that the feature is enabled before troubleshooting authentication or payload issues.
-
Partial vs Full Replacement Behavior
schedule and advanced objects are merged with existing values, meaning you can send partial updates. However, the sources array is fully replaced when provided. If you omit sources in a PUT request, the platform falls back to the current array. Accidentally sending an empty sources array will trigger a 400 error with the message 至少保留一个数据源 (Keep at least one data source).
-
Silent Truncation & Type Coercion
The endpoint does not reject oversized strings or invalid enums; it truncates or coerces them. For example, a type value of "web" becomes "devto", and a label of 200 characters is silently cut to 120. Always validate payloads client-side or inspect the 200 OK response to confirm how the platform normalized your input.
This configuration endpoint is a foundational component of the local crawler management suite. It is typically used in conjunction with:
- Crawler Execution Endpoints: Trigger manual crawls or schedule jobs that immediately respect the updated
schedule, sources, and extractTemplate values.
- Status & Logging Endpoints: Monitor crawl progress, retrieve execution history, and debug extraction failures after configuration changes.
- Feature Flag Management Routes: Enable or disable the local crawler UI itself, which controls whether this endpoint remains accessible.
After a successful PUT request, the new configuration is persisted to the platform's local storage and becomes active for all subsequent crawler runs without requiring a server restart.