技能档案

Security Grader

Name: Security Grader
Author: cxm95

Grade security scan output against a task's objectives. Triggered automatically by the CAO plugin after task completion. Use when you need to evaluate the quality, completeness, and accuracy of a security assessment, vulnerability scan, or code audit. Also triggers on "grade this", "evaluate output", "score this scan", "CAO_SCORE".

cxm950 星标2026年4月15日

职业
分类: 安全

技能内容

Evaluate the quality of a security task's output and produce a normalized score.

When This Skill Runs

This skill is invoked automatically by the CAO bridge plugin after an agent completes a task that has grader_skill: security-grader in its task.yaml. You may also invoke it manually to re-grade past output.

Input

You will receive:

Task ID — identifies which task was performed
Agent Output — the full output text from the completed task
Task Description (optional) — from task.yaml, describes what the task expects

If the task description is not provided inline, fetch it:

cao_get_task(task_id) → read the "description" field from task_yaml

Grading Process

相关技能

Security Grader | Skills Pool

Dimension	Weight	Criteria
Completeness	0.30	Did the agent address all aspects of the task?
Accuracy	0.30	Are the findings correct? No false positives?
Actionability	0.20	Are findings specific enough to act on (file, line, fix)?
Depth	0.20	Did the agent go beyond surface-level? Root cause analysis?

raw_score = completeness * 0.30 + accuracy * 0.30 + actionability * 0.20 + depth * 0.20
final_score = max(0.0, min(1.0, raw_score - penalties))

CAO_SCORE=<final_score as float, e.g. 0.72>

Rationale: Found 3 of 4 known SQL injection points (completeness=0.75).
All reported findings are valid (accuracy=1.0). Missing the stored XSS
in /admin/template.html (critical miss, -0.20). Findings include file
paths and line numbers (actionability=0.9). No root cause analysis
provided (depth=0.5). Raw=0.78, penalties=-0.20, final=0.58.

Dimension	Weight	Criteria
Correctness	0.35	Does the output meet the requirements?
Completeness	0.25	Are all requested items addressed?
Quality	0.25	Code quality, doc clarity, best practices?
Efficiency	0.15	Reasonable approach? No unnecessary complexity?

Security Grader

When This Skill Runs

Input

Grading Process

Security Grader

When This Skill Runs

Input

Grading Process

Step 1: Understand the Task Objective

Step 2: Evaluate Output Quality

Step 3: Check for Disqualifiers

Step 4: Calculate Final Score

Step 5: Output

Adaptation for Non-Security Tasks

Important Notes

1password

Springboot Security

Security Review

Laravel Security

Security Review

Django Security