Modern web scraping with structured data extraction. Fetch web pages, extract content using CSS selectors, parse structured data (JSON-LD, Open Graph, meta tags), and handle pagination.
Modern web scraping with intelligent content extraction.
node /path/to/skills/web-scraper/scripts/fetch.js https://example.com
node /path/to/skills/web-scraper/scripts/extract.js https://example.com --selector "h1,h2,p"
node /path/to/skills/web-scraper/scripts/metadata.js https://example.com
Fetch web page content with smart extraction.
Usage:
node fetch.js <url> [OPTIONS]
Options:
--output <fmt> - Output format: text, html, markdown (default: text)--timeout <ms> - Request timeout (default: 30000)--user-agent <ua> - Custom User-Agent string--headers <json> - Custom headers as JSON--follow - Follow redirects (default: true)Extract specific elements using CSS selectors.
Usage:
node extract.js <url> --selector <css> [OPTIONS]
Options:
--selector <css> - CSS selector (required)--attr <name> - Extract attribute instead of text--multiple - Return all matches (default: first only)--json - Output as JSON arrayExtract structured metadata from pages.
Usage:
node metadata.js <url> [OPTIONS]
Extracts:
Extract and analyze links from a page.
Usage:
node links.js <url> [OPTIONS]
Options:
--internal - Only internal links--external - Only external links--filter <pattern> - Filter by URL pattern--format <fmt> - Output: json, csv, listParse and process XML sitemaps.
Usage:
node sitemap.js <url> [OPTIONS]
Options:
--discover - Auto-discover sitemap from robots.txt--filter <pattern> - Filter URLs by pattern--limit <n> - Limit number of URLsnode fetch.js https://blog.example.com/article --output markdown
node extract.js https://shop.example.com --selector "a.product-link" --attr href --multiple
node metadata.js https://example.com
Output:
{
"title": "Example Page",
"description": "Page description",
"openGraph": {
"title": "Example OG Title",
"image": "https://example.com/image.jpg",
"type": "website"
}
}
node links.js https://example.com --external --format csv
node sitemap.js https://example.com/sitemap.xml --filter "/blog/"
# Page Title
Main content extracted and converted to markdown...
## Section Heading
Paragraph text with [links](https://example.com).
{
"url": "https://example.com",
"selector": "h2",
"matches": [
{ "text": "First Heading", "html": "<h2>First Heading</h2>" },
{ "text": "Second Heading", "html": "<h2>Second Heading</h2>" }
],
"count": 2
}
{
"url": "https://example.com",
"title": "Page Title",
"description": "Meta description",
"canonical": "https://example.com/page",
"openGraph": {
"title": "OG Title",
"description": "OG Description",
"image": "https://example.com/og-image.jpg",
"type": "article"
},
"twitterCard": {
"card": "summary_large_image",
"site": "@example"
},
"jsonLd": [
{ "@type": "Article", "headline": "Article Title" }
]
}
cloudflare-browser skill