Automatically discover new grad / junior SWE job postings from multiple sources, filter and rank them, then feed the top matches into mass-apply. Runs on cron (twice daily) or manually. Triggers on: 'find jobs', 'search for jobs', 'discover new jobs', 'any new jobs?', 'run job search'.
Read this file and all source guides in
sources/fresh every time. Do not rely on memory or previous runs.
Scrape multiple job sources → filter → deduplicate → rank → hand off top 5 to /mass-apply.
All browser-based scraping uses $AB_CONNECT (auto-connect to real Chrome).
Read each source guide in sources/ before scraping:
| Source | Guide | Method | Anti-bot |
|---|---|---|---|
| SimplifyJobs GitHub | sources/simplify.md | GitHub API (no browser) | None |
sources/linkedin.md | agent-browser | High — auto-connect required | |
| Indeed | sources/indeed.md | agent-browser | Moderate |
| Glassdoor | sources/glassdoor.md | agent-browser | High — auto-connect required |
| Wellfound | sources/wellfound.md | agent-browser | Low |
For each source, read the source guide and follow its scraping instructions. Collect a list of jobs with:
company — company namerole — job titlelocation — job locationurl — direct application URL (strip tracking params)age — days since posted (0 = today)source — which source it came fromIf a source fails (blocked, down, CAPTCHA), log the failure and continue with other sources.
Performance rule: Extract all jobs from a search results page in ONE eval call per query. Never click individual job cards — that's too slow. The source guides have the exact eval scripts.
jobs.db — skip jobs discovered within the last 14 days (by URL or company+role match). Jobs older than 14 days are allowed to be re-discovered.Remove jobs where:
Score each job. Priority order:
Take top 10 by total score. If fewer than 10 pass filters, take what's available. (Target: 20 applications per day across 2 runs = 10 per run.)
If jobs found:
"Found {N} new matching jobs. Starting mass-apply..."
→ /mass-apply with the job URLs
If no jobs found:
"No new matching jobs found. Will check again at next scheduled run."
Database: ~/work/personal/jobs/.claude/jobs.db
Initialize if missing:
python3 ~/work/personal/jobs/.claude/scripts/init_jobs_db.py --base-path ~/work/personal/jobs
discovered_jobs — every job found from any source:
company, company_normalized, role, location, url (unique)source, discovered_at, age_daysskills_score, company_tier, total_scoredescription_snippet, status (new/applied/skipped/rejected)scrape_runs — log each cron run:
started_at, completed_at, sources_checked, jobs_found, jobs_filtered, jobs_appliedscrape_state — per-source state (e.g., SimplifyJobs last SHA):
source, key, valueOnly skip jobs discovered within the last 14 days. If a job was seen 14+ days ago, allow re-discovery — the position may still be open and worth re-applying.
SELECT 1 FROM discovered_jobs
WHERE (url = ? OR (company_normalized = ? AND role = ?))
AND discovered_at > datetime('now', '-14 days')
-- Log the run
INSERT INTO scrape_runs (sources_checked, jobs_found, jobs_filtered, jobs_applied)
VALUES (?, ?, ?, ?);
-- Insert new jobs (ignore duplicates)
INSERT OR IGNORE INTO discovered_jobs (company, company_normalized, role, location, url, source, age_days, skills_score, company_tier, total_score, description_snippet)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);
-- Update status for applied jobs
UPDATE discovered_jobs SET status = 'applied', applied_at = datetime('now') WHERE url = ?;
-- Update source state
INSERT OR REPLACE INTO scrape_state (source, key, value, updated_at)
VALUES ('simplify', 'last_sha', ?, datetime('now'));