Crawl a Site
Crawl docs or marketing pages into RogerIQ knowledge.
Crawl a Site
RogerIQ can crawl public websites and create knowledge articles from pages.
Start a Crawl
json{ "url": "https://example.com/docs", "maxPages": 100, "maxDepth": 3}
Defaults and Limits
| Option | Default | Limit |
|---|---|---|
maxPages | 100 | 500 |
maxDepth | 3 | 5 |
Crawl Output
Each indexed page can become a knowledge article with crawl metadata such as:
- render method
- content type
- crawl duration
- content length
- crawled timestamp
- source URL
Recrawling
Use recrawl when the source page changed and the RogerIQ article should be refreshed.
When Crawls Fail
Common causes:
- page blocks bots
- page requires authentication
- content is rendered only after unsupported client behavior
- page is too large
- link graph exceeds max depth
- rate limits or transient network errors
For product docs you control, HolyDocs sync is usually cleaner than generic crawling because it sends structured page content and stable external IDs.