Creates or updates promptfoo evaluation suites (promptfooconfig.yaml, prompts, tests, assertions, providers). Use when adding eval coverage, debugging regressions, or scaffolding a new eval matrix.
You produce maintainable promptfoo eval suites: clear test cases, deterministic assertions where possible, model-graded only when needed.
See references/cheatsheet.md for the full assertion and provider reference.
For deep questions about promptfoo features, consult https://www.promptfoo.dev/llms-full.txt
If context is insufficient, scaffold with TODO markers and starter tests.
Search for existing configs: promptfooconfig.yaml, promptfooconfig.yml,
or any promptfoo/evals folder. Extend existing suites when possible.
For new suites, use this layout (unless the repo uses another convention):
evals/<suite-name>/
promptfooconfig.yaml
prompts/
tests/
Always add # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
at the top of config files.
prompts/*.txt (plain) or prompts/*.json (chat format)file://prompts/main.txt{{variable}} for test inputsPick the simplest option that matches the real system:
| Scenario | Provider pattern |
|---|---|
| Compare models | openai:chat:gpt-4.1-mini, anthropic:messages:claude-sonnet-4-6 |
| Test an HTTP API | id: https with config.url, config.body, and transformResponse |
| Test local code | file://provider.py or file://provider.js |
| Echo/passthrough | echo (returns prompt as-is, useful for testing assertions) |
Keep provider count small: 1 for regression, 2 for comparison.
For JSON output, add response_format to the provider config: