智能简历解析系统,支持PDF/Word/图片格式简历的结构化信息提取、岗位匹配度分析、优化建议生成。完全本地运行,无需外部API。使用场景:(1) 解析上传的简历文件提取核心信息,(2) 输入岗位JD计算简历匹配度,(3) 生成简历优化建议,(4) 导出结构化简历数据。
python scripts/extract_pdf.py <input-pdf-path>
返回纯文本内容
python scripts/extract_docx.py <input-docx-path>
返回纯文本内容
python scripts/extract_image.py <input-image-path>
返回OCR识别的文本内容
python scripts/parse_resume.py <extracted-text-file>
返回结构化JSON数据
python scripts/match_jd.py <resume-json-path> <jd-text-path>
返回匹配度分析结果
{
"basic_info": {
"name": "",
"phone": "",
"email": "",
"age": null,
"gender": "",
"location": "",
"work_years": null
},
"education": [
{
"school": "",
"major": "",
"degree": "",
"start_date": "",
"end_date": "",
"gpa": "",
"courses": []
}
],
"work_experience": [
{
"company": "",
"position": "",
"start_date": "",
"end_date": "",
"description": "",
"achievements": [],
"technologies": []
}
],
"projects": [
{
"name": "",
"role": "",
"start_date": "",
"end_date": "",
"description": "",
"technologies": [],
"achievements": []
}
],
"skills": {
"technical": [],
"soft": [],
"languages": []
},
"certificates": [],
"awards": [],
"self_assessment": ""
}
{
"overall_score": 0-100,
"dimensions": [
{
"name": "核心技能匹配",
"score": 0-100,
"weight": 0.4,
"matched": ["匹配的核心技能列表"],
"missing": ["缺失的核心技能列表"],
"analysis": "详细分析说明"
},
{
"name": "岗位职责匹配",
"score": 0-100,
"weight": 0.3,
"matched": ["匹配的职责经验"],
"gap": "职责差距描述",
"analysis": "详细分析说明"
},
{
"name": "经验/资历匹配",
"score": 0-100,
"weight": 0.15,
"matched": ["匹配的经验点"],
"gap": "经验差距描述",
"analysis": "详细分析说明"
},
{
"name": "学历/背景匹配",
"score": 0-100,
"weight": 0.15,
"matched": "匹配结果描述",
"gap": "背景差距描述(如果有)",
"analysis": "详细分析说明"
}
],
"overall_analysis": "整体匹配情况总结,明确说明是否匹配",
"strengths": ["简历优势列表"],
"weaknesses": ["简历不足列表,必须明确核心差距"],
"suggestions": ["具体的优化建议列表"]
}
首次使用前安装依赖:
pip install PyPDF2 python-docx pytesseract pillow python-multipart
注意:OCR功能需要安装Tesseract引擎