Scrape gig platforms and communities for leads matching agency services
Scrapes freelance and community platforms for gig listings and posts matching the agency's services. Covers 17 platforms: Freelancer, PeoplePerHour, Guru, Fiverr, Upwork, Reddit, LinkedIn, Behance, Dribbble, Shopify Community, IndieHackers, Funding News, Twitter/X, Instagram Brands, Product Hunt, Google Maps, and Shopify Store Discovery. Extracted from Hawk's watcher architecture but expressed as Claude-readable instructions, not TypeScript.
agency.config.json at repo root with services, icp, and scoring sectionscrm-writer skill for logging resultsRead agency.config.json from the project root.
Extract:
services[].keywords -- all service keyword arraysicp.primary_keywords, icp.secondary_keywordsicp.negative_keywords -- to filter out self-promoters and irrelevant resultsAccept parameters:
platforms -- list of platforms to search, or "all" (default: "all")max_results -- max leads to return (default: 20)time_window -- how far back to search (default: "past week")Available platforms: Freelancer, PeoplePerHour, Guru, Fiverr, Upwork, Reddit, LinkedIn, Behance, Dribbble, Shopify Community, IndieHackers, Funding News, Twitter/X, Instagram Brands, Product Hunt, Google Maps, Shopify Store Discovery.
For each selected platform, use WebSearch with site-specific queries. Generate 2-3 queries per platform by rotating service keywords. Execute searches with rate limiting: max 3 concurrent, 2-second pause between batches of 3.
site:freelancer.com/projects "{keyword}" for each service keyword/projects/ in the URL path. Filter results missing this pattern.site:peopleperhour.com "{keyword}" "project"site:guru.com/jobs "{keyword}"site:fiverr.com/categories "{keyword}" OR search for buyer request forumssite:upwork.com/freelance-jobs "{keyword}"site:reddit.com/r/shopify OR site:reddit.com/r/ecommerce "{keyword}" "{intent_keyword}"site:linkedin.com/posts "{keyword}" "{intent_keyword}"site:linkedin.com/jobs "{keyword}"site:behance.net/joblist "{keyword}"site:dribbble.com/jobs "{keyword}"site:community.shopify.com "{keyword}" "{intent_keyword}"site:indiehackers.com "{keyword}" "{intent_keyword}""{industry} startup" "raises" OR "funding" OR "seed round" this week$X million, Rs X crore, etc.site:twitter.com OR site:x.com "{keyword}" "shopify" OR "ecommerce" OR "d2c"site:x.com "{industry}" "just launched" OR "coming soon" OR "new store"site:instagram.com "{keyword}" "shop" OR "store" OR "link in bio"site:producthunt.com "{keyword}" OR "ecommerce" OR "shopify"site:google.com/maps "{industry}" "{market}" + site:justdial.com "{keyword}" (India) + site:yelp.com "{keyword}" (US/UK)site:myshopify.com "{industry_keyword}" + site:builtwith.com "shopify" "{industry}"Normalize all results from every platform into a single standard format:
{
"title": "Project or post title",
"description": "First 500 chars of description or body text",
"platform": "Freelancer",
"url": "https://...",
"budget": "$500-$1000",
"posted_date": "2024-01-15",
"market": "US",
"contact": "username or name if available",
"company": "Company name if identifiable from context",
"skills_required": ["shopify", "liquid", "theme-development"],
"contact_surfaces": {
"has_email": false,
"has_linkedin": false,
"has_instagram": false,
"has_phone": false,
"has_website": true,
"channel_count": 1,
"discovery_notes": "Website from store URL."
},
"raw_data": {}
}
Rules:
market should be inferred from currency symbols, country mentions, or TLD patterns in the content.budget should be normalized to a consistent format: "$X" or "$X-$Y" or "Rs X" for Indian currency. Set to empty string if not found.contact should capture the username, author name, or poster identity if visible.skills_required should be extracted from explicit skill tags or inferred from description keywords.Apply filters in this order:
Negative keyword filter: Check title + description against icp.negative_keywords. Reject any result matching a negative keyword (these are typically self-promoters, spam, or irrelevant categories).
Relevance check: The result must match at least one primary_keyword OR one secondary_keyword from the config. Results matching zero keywords are discarded.
Recency filter: Skip results with a posted_date older than the time_window parameter. If posted_date is empty/unknown, keep the result but mark it with "date_unknown": true.
Dedup by URL: Remove exact URL duplicates. If the same company or person appears from different URLs (same underlying opportunity), keep only the most recent or most detailed one.
Return an array of normalized leads, sorted by relevance:
Present results grouped by platform with counts:
Found {N} leads across {M} platforms:
Freelancer ({count}):
1. "Project title" - $budget - URL
2. ...
Reddit ({count}):
1. "Post title" - r/subreddit - URL
2. ...
LinkedIn ({count}):
...
[Other platforms with results]
Total: {N} leads
Filtered out: {X} (negative keywords: {a}, irrelevant: {b}, stale: {c}, duplicates: {d})
Trim to max_results after sorting. If more results exist, note the overflow count.
Trigger phrases:
User: Scrape Freelancer and Reddit for Shopify leads
Assistant: [reads config, generates site-specific queries for Freelancer and Reddit, executes with rate limiting, normalizes to standard format, filters and deduplicates, presents grouped results]
User: Run platform scraper on all platforms, last 3 days
Assistant: [same flow across all 12 platforms, time_window set to 3 days]