Name: Satisficing Web Fetcher
Author: egbertie

스킬 검색.../

Satisficing Web Fetcher | Skills Pool

from fetcher import HTTPFetcher

fetcher = HTTPFetcher()
result = fetcher.fetch("https://example.com")
print(result.text)

from fetcher import StealthyFetcher

fetcher = StealthyFetcher(headless=True)
result = fetcher.fetch("https://protected-site.com")
print(result.css("h1::text").get())

from fetcher import AdaptiveParser

parser = AdaptiveParser(result.html)
# 首次抓取，保存元素指纹
products = parser.css(".product", auto_save=True)

# 页面结构变化后，仍能定位
products = parser.css(".product", adaptive=True)

# 基础抓取
python3 cli.py fetch "https://example.com"

# Stealthy模式
python3 cli.py fetch "https://protected-site.com" --mode stealthy

# 使用CSS选择器提取
python3 cli.py fetch "https://example.com" --css "h1::text"

# 查看审计日志
python3 cli.py audit

ALLOWED_DOMAINS = [
    "example.com",
    "api.github.com",
    "*.wikipedia.org",
]

rate_limit: 1  # 每秒最大请求数
max_content_size: 10485760  # 10MB
enable_pii_filter: true
audit_retention_days: 30

satisficing-web-fetcher/
├── fetcher.py              # 主模块
│   ├── HTTPFetcher         # 基础抓取
│   ├── StealthyFetcher     # 反爬绕过
│   └── AdaptiveParser      # 自适应解析
├── sandbox/                # 沙箱隔离
│   ├── browser_controller.py
│   ├── memory_limiter.py
│   └── timeout_guard.py
├── security/               # 安全控制
│   ├── audit_logger.py
│   ├── domain_whitelist.py
│   └── content_filter.py
├── cli.py                  # 命令行工具
└── examples/               # 使用示例

特性	web_fetch	smart-web-fetch	satisficing-web-fetcher
基础HTTP	✅	✅	✅
内容清洗	❌	✅	✅
JS渲染	❌	❌	✅
Cloudflare绕过	❌	❌	✅
自适应解析	❌	❌	✅
沙箱隔离	❌	❌	✅
审计日志	❌	❌	✅
外部成本	无	无	无

# 基础依赖
pip install requests playwright

# 安装浏览器
playwright install chromium

级别	模式	适用场景	资源消耗
Level 1	HTTP基础抓取	静态页面、API	低
Level 2	Stealthy抓取	反爬页面、Cloudflare	中
Level 3	浏览器动态渲染	重度JS依赖页面	高

级别	模式	适用场景	资源消耗
Level 1	HTTP基础抓取	静态页面、API	低
Level 2	Stealthy抓取	反爬页面、Cloudflare	中
Level 3	浏览器动态渲染	重度JS依赖页面	高

Satisficing Web Fetcher

核心设计理念

功能特性

1. 三级抓取策略

2. 自适应解析

3. 安全沙箱

Satisficing Web Fetcher

核心设计理念

功能特性

1. 三级抓取策略

2. 自适应解析

3. 安全沙箱

4. 内容安全

快速开始

基础抓取

反爬绕过

自适应解析

CLI使用

配置

域名白名单

安全策略

安全红线

架构

与现有工具对比

依赖安装

评估报告

Xurl

Acp Router

Coding Standards

Api Design

Mcp Server Patterns

Backend Patterns