Actions: Confirm role, level, must-haves, and the decision timeline. Identify which signals exist vs need to be created (work sample, trial, references). Record constraints (PII, internal-only, fairness).
Checks: The decision and decision date are explicit (who decides, by when, using which signals).
2) Define the bar + criteria (don’t improvise later)
Inputs: role context; existing rubric/values (if any).
Actions: Choose 4–8 criteria; define what “strong / acceptable / weak” looks like with observable anchors. Add explicit red flags. Decide whether to prioritize raw ability + drive vs “years of experience” for this role.
Outputs: Evaluation brief + draft scorecard.
Checks: Every criterion is measurable via evidence; no criterion is “vibe” or “culture fit” without definition.
3) Build the signal plan + evidence log
Inputs: existing notes; planned stages.
Actions: Decide what each signal is responsible for (interviews = behavioral evidence; work sample = in-context execution; references = longitudinal performance). Create a single signal log so you can compare apples-to-apples.
Outputs: Signal plan + signal log table (empty or partially filled).
Checks: No single signal dominates by default; reference checks and work samples have defined weight when used.
4) Design (or evaluate) the work sample / take-home / paid trial
Inputs: role outputs; constraints; candidate seniority.
Actions: Create a job-relevant task with clear deliverables and scoring rubric. If the task is >2–3 hours or resembles real work, prefer a paid trial and clarify IP/confidentiality boundaries.
Outputs: Work sample/trial brief + scoring rubric.
Checks: Task predicts real performance, is fair across backgrounds, and has objective scoring anchors.
5) Run reference checks (highest-signal when done well)
Actions: Prioritize references who worked with the candidate for extended periods and in similar contexts. Ask for specific examples, deltas over time, strengths/limits, and “how would you staff them?” Capture verbatim evidence and calibrate for bias.
Outputs: Reference notes + reference summary.
Checks: Summary contains concrete examples and clear hire/no-hire signal, not generic praise.
Inputs: scorecard, signal log, work sample results, reference summary.
Actions: Write a decision memo that cites evidence, calls out disagreements/uncertainty, and proposes mitigations (onboarding plan, coaching, 30/60/90 checkpoints) if hiring.
Checks: Recommendation matches the weighted evidence; red flags are explicitly addressed.
7) Quality gate + calibration + finalize pack
Inputs: full draft pack.
Actions: Run references/CHECKLISTS.md and score with references/RUBRIC.md. Add Risks / Open questions / Next steps. If uncertain, propose the smallest additional signal to resolve (targeted reference, scoped trial, specific follow-up interview).
Outputs: Final Candidate Evaluation Decision Pack.
Checks: Evidence is sufficient for the decision; limitations and fairness risks are explicit.
Always include: Risks, Open questions, Next steps.
Examples
Example 1 (final decision): “Here are interview notes for a Senior PM candidate. Create a scorecard, summarize signals, and write a hiring decision memo. Include risks and suggested mitigations.”
Expected: scorecard with anchors + evidence, signal log, decision memo with explicit risks.
Example 2 (work sample + references): “We’re hiring a Founding Engineer. Design a 2-day paid trial task and rubric, plus a reference check script. Then show how we should combine those signals into a hire/no-hire decision.”
Expected: trial brief + rubric, reference kit, and a synthesis framework.
Boundary example (insufficient signal): “Tell me if this person is good. I only have their resume.”
Response: require criteria + at least one high-signal input (structured interview notes, work sample plan/results, or references); propose a minimal evaluation plan and list assumptions/unknowns.
Boundary example (redirect to interviews): “Design a structured interview loop with behavioral questions for this PM role.”
Response: redirect to conducting-interviews — this skill evaluates candidates after signals are collected, it does not design interview questions or scripts.
Boundary example (redirect to offers): “We've decided to hire this candidate. Help me structure the offer and negotiate.”
Response: redirect to negotiating-offers — this skill produces the hire/no-hire recommendation, not the offer strategy.
Anti-patterns (common failure modes)
”Gut feel” disguised as process — Having a scorecard but filling it in retrospectively to justify a decision already made. Evidence must be captured before the overall recommendation is written.
Recency bias in signal weighting — Over-weighting the most recent signal (e.g., a strong reference) while discounting earlier mixed interview signals. Use explicit weights defined before evaluation.
Work sample as unpaid labor — Designing a take-home that takes 8+ hours, uses real company data without compensation, or has unclear IP ownership. Keep tasks under 3 hours or pay for longer trials.
Reference theater — Accepting only candidate-provided references and treating generic praise (“great to work with”) as signal. Prioritize back-channel references and probe for specific examples + growth areas.
Consensus-seeking over evidence — Running debriefs where the loudest voice wins or where the group converges on a comfortable middle. Require independent scoring before group discussion.