Fetch web pages and extract readable content for AI use. Use when reading, summarizing, or crawling a specific URL or small set of URLs. Prefer low-friction URL-to-Markdown services first, then fall back to browser-based retrieval, search snippets, or cached/indexed copies when sites are protected by Cloudflare or similar bot checks.
Fetch readable web content with a reliability-first fallback chain.
Do not promise direct access to every site. Some sites use Cloudflare, login walls, bot detection, or legal restrictions. In those cases, switch to the next fallback instead of insisting the first method should work.
Try lightweight conversion services first:
r.jina.ai
https://r.jina.ai/http://example.com
markdown.new
https://markdown.new/https://example.com
defuddle
https://defuddle.md/https://example.com
For deterministic retries, use the bundled script:
python {baseDir}/scripts/fetch_url.py "https://example.com/article"
The script returns JSON with:
Use these when the user wants article text, page summaries, or structured extraction from normal public pages.
Treat the fetch as failed or unreliable if you see signs like:
Just a moment...Performing security verificationEnable JavaScript and cookiesWhen this happens, stop treating the result as the page content.
For sites blocked behind Cloudflare or requiring real browser execution:
Use browser fallback for:
If direct fetch and browser fetch are not available or still fail:
This is often enough for metadata tasks like:
If a site is inconsistent, return a mixed result instead of stalling:
r.jina.aimarkdown.newdefuddleWhen extracting structured data, prefer columns like:
direct, browser, search, secondary)high, medium, low)