Skill-Datei

Validate Behavior Change

Name: Validate Behavior Change
Author: reubenjohn

Scientifically validate Desk changes that affect Reeve's behavior. Use BEFORE committing non-trivial changes to CLAUDE.md, skills, Goals/, or Responsibilities/. Runs isolated test pulses with positive/negative cases to verify the change produces desired behavior under realistic conditions.

reubenjohn0 Sterne17.02.2026

Beruf
Kategorien: Laborwerkzeuge

Skill-Inhalt

Scientific validation for any Desk change that affects Reeve's behavior.

Core Question

"I want Reeve to behave differently. How do I know it actually will?"

When to Use

DO use when:

Adding/modifying behavior in CLAUDE.md
Creating new skills with behavioral impact
Changing Goals/ that affect Reeve's priorities
Modifying Responsibilities/ that change routine actions
Any change where you think "I hope this works"

DON'T use for:

Trivial typo fixes
Documentation-only changes
Changes with no behavioral impact

Skill	When to Use

Verwandte Skills

Validate Behavior Change | Skills Pool

list_upcoming_pulses(limit=10)

git stash  # If uncommitted work
BASELINE=$(git log -1 --format="%H")
echo "$BASELINE" > /tmp/behavior-validation-baseline.txt
echo "Baseline: $BASELINE"

**Change**: [What am I modifying?]
**Desired Behavior**: [What should happen?]
**Trigger Conditions**: [When should it activate?]
**Boundary Conditions**: [When should it NOT activate?]
**Observable Evidence**: [How do I verify it worked?]

# Make changes, commit so test pulses see them
git add -A && git commit -m "VALIDATION: [description]"

schedule_pulse(scheduled_at="now", prompt=test_1, tags=["validation", "pos_1", "run_1"])
schedule_pulse(scheduled_at="in 3 minutes", prompt=test_1, tags=["validation", "pos_1", "run_2"])
schedule_pulse(scheduled_at="in 6 minutes", prompt=test_2, tags=["validation", "neg_1", "run_1"])

Task(
    subagent_type="Explore",
    model="haiku",
    prompt="""Analyze validation test {test_id}, run {run}.
    Expected: {evidence}

    Check file changes:
    - Tasks/Open.md, Knowledge/Diary/, Responsibilities/

    Use session-analyzer to check:
    - Session JSONL for tool_use calls (Write, Edit)
    - Feedback signals in user messages

    Report: PASS or FAIL with evidence."""
)

BASELINE=$(cat /tmp/behavior-validation-baseline.txt)
git reset --hard $BASELINE
git status
rm /tmp/behavior-validation-baseline.txt

test_id: "pos_implied_action"

`/context-engineering`	Before designing changes - understand where information should live
`/session-analyzer`	After tests complete - analyze session metrics, tool usage, feedback signals

Test Type	Design Goal
Hard Positive	Should trigger despite: no explicit keywords, casual tone, mixed content
Hard Negative	Should NOT trigger despite: action words, urgency language, false patterns

Validate Behavior Change

Core Question

When to Use

Validate Behavior Change

Core Question

When to Use

The Scientific Method

Workflow

Phase 1: Clear the Runway

Phase 2: Capture Baseline

Phase 3: Define Expected Behavior

Phase 4: Design Test Cases

Phase 5: Execute & Analyze (Fail-Fast)

Phase 6: Clean Reset

Test Case Templates

Hard Positive Example

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio

Validate Behavior Change

Core Question

When to Use

Related Skills

Validate Behavior Change

Core Question

When to Use

Related Skills

The Scientific Method

Workflow

Phase 1: Clear the Runway

Phase 2: Capture Baseline

Phase 3: Define Expected Behavior

Phase 4: Design Test Cases

Phase 5: Execute & Analyze (Fail-Fast)

Phase 6: Clean Reset

Test Case Templates

Hard Positive Example

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio