Analyze `.eml` headers to classify phishing vs legitimate messages and generate a schema-compliant JSON report. Use when investigating a mailbox corpus for phishing indicators under terminal-bench style output constraints.
.eml) and asked to split into phishing vs legitimate.Enumerate the corpus first.
ls -1 /emails (or target directory) and capture the exact file list.Do a fast triage pass on key headers for every file.
From, Reply-To, Return-PathAuthentication-Results, Received-SPF, DKIM-SignatureReceived, Date, Message-ID, SubjectEscalate to full header inspection for ambiguous or high-risk messages.
sed -n '1,/^$/p' file.eml).Classify with multi-signal logic (not single-header logic).
Write output JSON exactly in required schema.
phishing_emails (array of filenames)legitimate_emails (array of filenames)suspicious_count (integer)suspicious_count can fail tests even if reasoning is correct.Run verification aligned to typical grader assertions:
File existence
/solution/phishing_analysis.json exists.JSON validity
python3 -m json.tool /solution/phishing_analysis.json.Required fields and types
phishing_emails, legitimate_emailssuspicious_countCoverage and uniqueness
/emails file set.Count consistency
suspicious_count == len(phishing_emails).Classification confidence check
sed -n '1,/^$/p' for full header block