Name: Evals Write Spec
Author: elastic

Write Eval Specs

Spec File Anatomy

Eval specs use the evaluate Playwright fixture (not test). A spec file follows this structure:

import { evaluate, tags, selectEvaluators, type Example, type TaskOutput } from '@kbn/evals';

evaluate.describe('Suite name', { tag: tags.serverless.observability.complete }, () => {
  evaluate.beforeAll(async ({ fetch, log }) => {
    // one-time setup: install docs, create agents, load archives
  });

  evaluate.afterAll(async ({ fetch, log }) => {
    // teardown: uninstall docs, delete agents, unload archives
  });

  evaluate('test name', async ({ executorClient, connector }) => {
    await executorClient.runExperiment(
      { dataset, task },
      evaluators
    );
  });
});

When a suite has a custom src/evaluate.ts, import from there instead of @kbn/evals:

Write Eval Specs

Spec File Anatomy

Eval specs use the evaluate Playwright fixture (not test). A spec file follows this structure:

import { evaluate, tags, selectEvaluators, type Example, type TaskOutput } from '@kbn/evals';

evaluate.describe('Suite name', { tag: tags.serverless.observability.complete }, () => {
  evaluate.beforeAll(async ({ fetch, log }) => {
    // one-time setup: install docs, create agents, load archives
  });

  evaluate.afterAll(async ({ fetch, log }) => {
    // teardown: uninstall docs, delete agents, unload archives
  });

  evaluate('test name', async ({ executorClient, connector }) => {
    await executorClient.runExperiment(
      { dataset, task },
      evaluators
    );
  });
});

When a suite has a custom src/evaluate.ts, import from there instead of @kbn/evals:

Tag	When to use
`tags.serverless.observability.complete`	Observability domain evals
`tags.serverless.security.complete`	Security domain evals
`tags.serverless.search`	Search domain evals
`tags.stateful.classic`	Stateful-only evals

Evals Write Spec

Write Eval Specs

Spec File Anatomy

Evals Write Spec

Write Eval Specs

Spec File Anatomy

Tags

Datasets

Tasks

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing