Name: Behavioral Evals
Author: google-gemini

Overview

Behavioral evaluations (evals) are tests that validate the agent's decision-making (e.g., tool choice) rather than pure functionality. They are critical for verifying prompt changes, debugging steerability, and preventing regressions.

[!NOTE] Single Source of Truth: For core concepts, policies, running tests, and general best practices, always refer to evals/README.md.

🔄 Workflow Decision Tree

Does a prompt/tool change need validation?
- No -> Normal integration tests.
- Yes -> Continue below.
Is it UI/Interaction heavy?
- Yes -> Use (). See .

Behavioral Evals

Overview

🔄 Workflow Decision Tree

Behavioral Evals

Overview

🔄 Workflow Decision Tree

📋 Quick Checklist

1. Setup Workspace

2. Write Assertions

3. Verify

📦 Bundled Resources

Taskflow Inbox Triage

Accessibility

Open a Pull Request

Investor Materials

Continuous Agent Loop

Configure Ecc