Crawl a Site

Crawl docs or marketing pages into RogerIQ knowledge.

Crawl a Site

RogerIQ can crawl public websites and create knowledge articles from pages.

Start a Crawl

json
{  "url": "https://example.com/docs",  "maxPages": 100,  "maxDepth": 3}

Defaults and Limits

Option	Default	Limit
`maxPages`	100	500
`maxDepth`	3	5

Crawl Output

Each indexed page can become a knowledge article with crawl metadata such as:

Recrawling

Use recrawl when the source page changed and the RogerIQ article should be refreshed.

When Crawls Fail

Common causes:

page blocks bots
page requires authentication
content is rendered only after unsupported client behavior
page is too large
link graph exceeds max depth
rate limits or transient network errors

For product docs you control, HolyDocs sync is usually cleaner than generic crawling because it sends structured page content and stable external IDs.

Was this page helpful?

Next Hosted Docs