Name: Promptfoo Evals
Author: promptfoo

Writing Promptfoo Evals

You produce maintainable promptfoo eval suites: clear test cases, deterministic assertions where possible, model-graded only when needed.

See references/cheatsheet.md for the full assertion and provider reference. For deep questions about promptfoo features, consult https://www.promptfoo.dev/llms-full.txt

Inputs (infer from repo context if not provided)

What is being evaluated (prompt, agent, endpoint, RAG pipeline)?
What are the inputs and outputs (text, JSON, multi-turn chat, tool calls)?
What does "good" look like (acceptance criteria, failure modes)?

If context is insufficient, scaffold with TODO markers and starter tests.

Workflow

1. Find or create the eval suite

Writing Promptfoo Evals

You produce maintainable promptfoo eval suites: clear test cases, deterministic assertions where possible, model-graded only when needed.

See references/cheatsheet.md for the full assertion and provider reference. For deep questions about promptfoo features, consult https://www.promptfoo.dev/llms-full.txt

Inputs (infer from repo context if not provided)

What is being evaluated (prompt, agent, endpoint, RAG pipeline)?
What are the inputs and outputs (text, JSON, multi-turn chat, tool calls)?
What does "good" look like (acceptance criteria, failure modes)?

If context is insufficient, scaffold with TODO markers and starter tests.

Scenario	Provider pattern
Compare models	`openai:chat:gpt-4.1-mini`, `anthropic:messages:claude-sonnet-4-6`
Test an HTTP API	`id: https` with `config.url`, `config.body`, and `transformResponse`
Test local code	`file://provider.py` or `file://provider.js`
Echo/passthrough	`echo` (returns prompt as-is, useful for testing assertions)

Promptfoo Evals

Writing Promptfoo Evals

Inputs (infer from repo context if not provided)

Workflow

1. Find or create the eval suite

Promptfoo Evals

Writing Promptfoo Evals

Inputs (infer from repo context if not provided)

Workflow

1. Find or create the eval suite

2. Write prompts

3. Choose providers

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing