Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.
| Target Website | Anti-Bot Level | Recommended Method | Script |
|---|---|---|---|
| Regular Sites | Low | web_fetch tool | N/A (built-in) |
| Dynamic Sites | Medium | Playwright Simple | scripts/playwright-simple.js |
| Cloudflare Protected | High | Playwright Stealth ⭐ | scripts/playwright-stealth.js |
| YouTube | Special | deep-scraper | Install separately |
| Special | reddit-scraper | Install separately |
cd playwright-scraper-skill
npm install
npx playwright install chromium
Use OpenClaw's built-in web_fetch tool:
# Invoke directly in OpenClaw
Hey, fetch me the content from https://example.com
Use Playwright Simple:
node scripts/playwright-simple.js "https://example.com"
Example output:
{
"url": "https://example.com",
"title": "Example Domain",
"content": "...",
"elapsedSeconds": "3.45"
}
Use Playwright Stealth:
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"
Features:
navigator.webdriver = false)Use deep-scraper (install separately):
# Install deep-scraper skill
npx clawhub install deep-scraper
# Use it
cd skills/deep-scraper
node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"
scripts/playwright-simple.jsscripts/playwright-stealth.js ⭐If the site doesn't have dynamic loading, use OpenClaw's web_fetch tool—it's fastest.
If you need to wait for JavaScript rendering, use playwright-simple.js.
If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.
All scripts support environment variables:
# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL
# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL
# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL
# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL
# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL
| Method | Speed | Anti-Bot | Success Rate (Discuss.com.hk) |
|---|---|---|---|
| web_fetch | ⚡ Fastest | ❌ None | 0% |
| Playwright Simple | 🚀 Fast | ⚠️ Low | 20% |
| Playwright Stealth | ⏱️ Medium | ✅ Medium | 100% ✅ |
| Puppeteer Stealth | ⏱️ Medium | ✅ Medium-High | ~80% |
| Crawlee (deep-scraper) | 🐢 Slow | ❌ Detected | 0% |
| Chaser (Rust) | ⏱️ Medium | ❌ Detected | 0% |
Lessons learned from our testing:
navigator.webdriver — EssentialaddInitScript (Playwright) — Inject before page loadSolution: Use playwright-stealth.js
Solution:
headless: false (headful mode sometimes has higher success rate)Solution:
waitForTimeoutwaitUntil: 'networkidle' or 'domcontentloaded'Best Solution: Pure Playwright + anti-bot techniques (framework-independent)
browser tool