Analyze and rank PDF resumes of candidates for a software engineering position. Use this skill when the user uploads one or more PDF resumes (or provides URLs to download them from) and a job description and wants candidates scored, ranked, and compared. Triggers include: "rank resumes", "score candidates", "analyze resumes", "screen applicants", "evaluate candidates", "review resumes for [role]", "download and rank resumes", or any request to compare multiple PDF resumes against a job posting. Also use when the user asks to extract GitHub profiles from resumes and check open-source contributions, or when the user wants award-based scoring. Do NOT use for non-recruitment document analysis or for parsing a single resume without scoring.
Analyze PDF resumes against a software engineering job description. Download resumes from URLs if needed, extract and parse each PDF, exclude overqualified candidates, score the rest on three criteria, rank them, select the top 10, produce a detailed report, and self-review for accuracy.
This skill bundles three scripts in scripts/. Every script uses ONLY the
Python standard library plus pdfplumber/pypdf (pre-installed).
| Script | Purpose | When to Run |
|---|---|---|
scripts/async_resume_downloader.py | Async-download resume PDFs from public URLs | Step 1 — only when user provides URLs instead of uploading files |
scripts/resume_scorer.py | Extract text from PDFs, parse candidate data, exclude overqualified, compute NLP similarity, write structured JSON | Step 3 — always |
Claude itself performs Steps 4–8 (GitHub fetching, scoring, ranking, report
generation, and self-review) because they require web_fetch, judgment calls,
and iterative verification that only Claude can do.
User Input (URLs and/or uploaded PDFs + Job Description)
│
▼
Step 1 ── Download PDFs from URLs (async_resume_downloader.py)
│ Skip if all resumes are already uploaded as files.
▼
Step 2 ── Collect all PDFs into one directory
│ Merge downloaded PDFs with any directly uploaded PDFs.
▼
Step 3 ── Extract & parse all resumes (resume_scorer.py)
│ Outputs structured JSON with candidate data + exclusion list.
▼
Step 4 ── Fetch GitHub profiles (Claude via web_fetch)
│ For each candidate with a GitHub username in the JSON.
▼
Step 5 ── Score each candidate on 3 criteria (Claude)
│ Experience Match (50%) + GitHub (30%) + Awards (20%)
▼
Step 6 ── Rank, select top 10, generate report (Claude)
│
▼
Step 7 ── Self-review & re-run if errors found (Claude)
When: The user provides one or more public URLs to PDF resumes. Skip if: All resumes are uploaded directly as PDF files.
Run the async downloader:
python3 scripts/async_resume_downloader.py \
-o /home/claude/resumes \
"https://example.com/alice.pdf" \
"https://example.com/bob.pdf"
Or use it as a library within a Python script:
import asyncio, sys
sys.path.insert(0, "scripts")
from async_resume_downloader import download_resumes
results = asyncio.run(download_resumes(
urls=["https://example.com/alice.pdf", "https://example.com/bob.pdf"],
output_dir="/home/claude/resumes",
))
failed = [r for r in results if not r["success"]]
if failed:
print(f"WARNING: {len(failed)} download(s) failed")
for f in failed:
print(f" - {f['url']}: {f['error']}")
Features: concurrent downloads (5 parallel), 3 retries with exponential backoff, %PDF magic-byte validation, 50 MB size cap, safe filename generation.
Output: PDF files saved in the specified directory.
Error handling: If a URL fails after all retries, the script logs the error and continues with the remaining URLs. Report any failed downloads to the user and exclude those candidates from scoring.
Merge downloaded PDFs (from Step 1) with any user-uploaded PDFs into a single working directory:
mkdir -p /home/claude/resumes
# Copy uploaded PDFs (if any)
cp /mnt/user-data/uploads/*.pdf /home/claude/resumes/ 2>/dev/null
# Downloaded PDFs are already in /home/claude/resumes/ from Step 1
Also prepare the job description. If uploaded as a PDF or text file, copy it:
cp /mnt/user-data/uploads/job_description.* /home/claude/jd.txt
If the user typed it in chat, write it to a file:
cat > /home/claude/jd.txt << 'EOF'
<paste job description text here>
EOF
Run the extraction and parsing script on the collected resumes:
python3 scripts/resume_scorer.py \
/home/claude/resumes \
/home/claude/jd.txt \
/home/claude/candidates.json
What this script does automatically:
sentence-transformers is installed) between each resume and the JD.To enable NLP similarity (optional, recommended):
pip install sentence-transformers --break-system-packages
If install succeeds, the script automatically computes cosine similarity using
the all-MiniLM-L6-v2 model. If unavailable, it falls back to keyword-only
matching and notes this in the output.
Output JSON structure:
{
"job_description": {
"required_languages": ["Python", "Go"],
"required_frameworks": ["Kubernetes", "Docker"],
"min_years_experience": 3,
"max_years_experience": 5,
"education": ["B.S. in Computer Science"],
"raw_text": "..."
},
"candidates": [
{
"filename": "alice_resume.pdf",
"name": "Alice Chen",
"github_username": "alicechen",
"languages": ["Python", "Go", "JavaScript"],
"frameworks": ["Kubernetes", "Docker", "React"],
"years_of_experience": 4,
"awards": ["Dean's List 2019", "ICPC Regional Finalist 2020"],
"education": ["B.S. in Computer Science"],
"semantic_similarity": 0.72,
"raw_text_preview": "..."
}
],
"excluded_overqualified": [
{
"filename": "bob_resume.pdf",
"name": "Bob Smith",
"years_of_experience": 12,
"max_allowed": 5,
"reason": "Detected 12 years experience exceeds maximum of 5 years"
}
],
"total_resumes_scanned": 25,
"total_after_exclusion": 22,
"total_excluded": 3,
"nlp_available": true
}
For each candidate in the JSON who has a github_username, use web_fetch to
retrieve their GitHub profile: