Skip to content

스킬 검색.../

Agent Skill Search Engine

검색

검색
카테고리
직업

About

About
Privacy
Terms

© 2026 Skills Pool. All rights reserved.

Web Scraping | Skills Pool

스킬 파일

Web Scraping

Web scraping — extract structured data from websites using Python (httpx + BeautifulSoup/selectolax), handle pagination, rate limiting, and anti-bot patterns. Use when extracting data from web pages.

jarbitechture0 스타2026. 4. 12.

직업
카테고리: 디버깅

스킬 내용

Purpose

Extract structured data from websites. Handle pagination, rate limiting, JavaScript rendering, and anti-bot measures.

When to Use

User says: "scrape", "extract from website", "download data from", "crawl"
Context: data collection from web sources

Stack

HTTP: httpx (async support, HTTP/2)
Parsing: BeautifulSoup4 or selectolax (faster)
JS rendering: Playwright (when content requires JavaScript)
Rate limiting: respect robots.txt, 1-2 second delays between requests

Patterns

CSS selectors: soup.select("div.class > a[href]")
Pagination: follow next links or increment page params
Tables: pandas.read_html(url) for simple HTML tables
JSON APIs: check network tab — many sites have JSON endpoints behind the HTML

관련 스킬

빠른 설치

Web Scraping

npx skillvault add jarbitechture/jarbitechture-claude-skills-skills-web-scraping-skill-md

Skill 다운로드 저장소 열기

작성자: jarbitechture
스타: 0
업데이트: 2026. 4. 12.
직업

이 페이지의 내용

Retry with backoff: tenacity library or manual exponential backoff

Constraints

Always check robots.txt first
Rate limit: minimum 1 second between requests
Set a realistic User-Agent header
Cache responses locally to avoid re-fetching during development
Never scrape behind authentication without explicit permission

When to Use

소프트웨어 개발자

Session Logs

Search and analyze your own session logs (older/parent conversations) using jq.

OpenClaw Test Heap Leaks

Investigate `pnpm test` memory growth, Vitest worker OOMs, and suspicious RSS increases in OpenClaw using the `scripts/test-parallel.mjs` heap snapshot tooling. Use when Codex needs to reproduce test-lane memory growth, collect repeated `.heapsnapshot` files, compare snapshots from the same worker PID, triage likely transformed-module retention versus likely runtime leaks, and fix or reduce the impact by patching cleanup logic or isolating hotspot tests.

Node Connect

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps. Use when QR/setup code/manual connect fails, local Wi-Fi works but VPS/tailnet does not, or errors mention pairing required, unauthorized, bootstrap token invalid or expired, gateway.bind, gateway.remote.url, Tailscale, or plugins.entries.device-pair.config.publicUrl.

Openclaw Qa Testing

Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.

Openclaw Secret Scanning Maintainer

Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.

Flags

Use when you need to check feature flag states, compare channels, or debug why a feature behaves differently across release channels.

소프트웨어 개발자