Name: Perform Agent Builder Eval
Author: elastic

Perform Agent Builder Eval | Skills Pool

yarn es snapshot --license trial --secure-files gcs.client.default.credentials_file=<GCS_CREDENTIALS_PATH>

MAX_RETRIES=30; COUNT=0; until curl -s -u elastic:changeme http://localhost:9200/_cluster/health | grep -q '"status"'; do COUNT=$((COUNT+1)); if [ "$COUNT" -ge "$MAX_RETRIES" ]; then echo "ERROR: Elasticsearch did not become available after $MAX_RETRIES attempts"; exit 1; fi; sleep 5; done

curl -s -u elastic:changeme -X PUT "http://localhost:9200/_snapshot/agent-builder-datasets" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "gcs",
    "settings": {
      "bucket": "agent-builder-datasets",
      "base_path": "knowledge_base/snapshot_dt=2026-01-10"
    }
  }'

curl -s -u elastic:changeme "http://localhost:9200/_snapshot/agent-builder-datasets/_all"

curl -s -u elastic:changeme -X POST "http://localhost:9200/_snapshot/agent-builder-datasets/<snapshot_name>/_restore" \
  -H "Content-Type: application/json" \
  -d '{
    "indices": "*",
    "include_global_state": false
  }'

curl -s -u elastic:changeme -X POST "http://localhost:9200/<comma_separated_index_names>/_close"

yarn start --no-base-path

ELASTICSEARCH_HOST=http://localhost:9200 ELASTICSEARCH_USERNAME=elastic ELASTICSEARCH_PASSWORD=changeme node scripts/edot_collector.js

Stack is ready!

Elasticsearch: running (snapshot with GCS credentials)

Kibana: running (no base path)

Phoenix: confirmed running

EDOT: running

Important: Make sure Cloud Connected Mode (CCM) is enabled in Kibana before running the evaluation. Go to Stack Management > Cloud Connected Mode in the Kibana UI and enable it if it is not already active.

Run the following command in a separate terminal to start the evaluation:
TRACING_ES_URL=http://elastic:changeme@localhost:9200 \
SELECTED_EVALUATORS="<value>" \
RAG_EVAL_K=<value> \
KBN_EVALS_EXECUTOR=phoenix \
EVALUATION_CONNECTOR_ID=<value> \
DATASET_NAME="<value>" \
EVALUATION_REPETITIONS=<value> \
KBN_EVALS_SKIP_CONNECTOR_SETUP=true \
node scripts/playwright test \
  --config x-pack/platform/packages/shared/agent-builder/kbn-evals-suite-agent-builder/playwright.config.ts \
  evals/external/external_dataset.spec.ts \
  --project <value>

# Kill Elasticsearch
pkill -f 'elasticsearch' || true

# Kill Kibana (node process started by yarn start)
pkill -f 'scripts/kibana --dev' || true

# Kill EDOT collector
pkill -f 'edot_collector' || true

Perform Agent Builder Eval

Perform Agent Builder Evaluation

Action: `init`

Step 1: Prompt for GCS Credentials

Perform Agent Builder Eval

Perform Agent Builder Evaluation

Action: `init`

Step 1: Prompt for GCS Credentials

Step 2: Launch Elasticsearch

Step 3: Register GCS Snapshot Repository

Step 4: Restore a Snapshot

Step 5: Launch Kibana

Step 6: Confirm Phoenix Running

Step 7: Launch EDOT

Step 8: Collect Eval Parameters and Output Run Command

8a: Discover available connectors

8b: Select evaluation connector (judge)

8c: Select project (model to evaluate)

8d: Select dataset

8e: Output the run command

Action: `stop`

Step 1: Kill Processes

Step 2: Confirm

Important Notes

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing

Perform Agent Builder Eval

Perform Agent Builder Evaluation

Action: init

Step 1: Prompt for GCS Credentials

Perform Agent Builder Eval

Perform Agent Builder Evaluation

Action: init

Step 1: Prompt for GCS Credentials

Step 2: Launch Elasticsearch

Step 3: Register GCS Snapshot Repository

Step 4: Restore a Snapshot

Step 5: Launch Kibana

Step 6: Confirm Phoenix Running

Step 7: Launch EDOT

Step 8: Collect Eval Parameters and Output Run Command

8a: Discover available connectors

8b: Select evaluation connector (judge)

8c: Select project (model to evaluate)

8d: Select dataset

8e: Output the run command

Action: stop

Step 1: Kill Processes

Step 2: Confirm

Important Notes

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing

Action: `init`

Action: `init`

Action: `stop`