Write Playwright E2E tests for the Phoenix AI observability platform. Use when creating, updating, or debugging Playwright tests, or when the user asks about testing UI features, writing E2E tests, or automating browser interactions for Phoenix.
Write end-to-end tests for Phoenix using Playwright. Tests live in app/tests/ and follow established patterns.
app/tests.app/playwright.config.ts (global timeout, expect.timeout, use.navigationTimeout, and webServer.timeout).import { expect, test } from "@playwright/test";
import { randomUUID } from "crypto";
test.describe("Feature Name", () => {
test.beforeEach(async ({ page }) => {
await page.goto(`/login`);
await page.getByLabel("Email").fill("admin@localhost");
await page.getByLabel("Password").fill("admin123");
await page.getByRole("button", { name: "Log In", exact: true }).click();
await page.waitForURL("**/projects");
});
test("can do something", async ({ page }) => {
// Test implementation
});
});
| User | Password | Role | |
|---|---|---|---|
| Admin | admin@localhost | admin123 | admin |
| Member | [email protected] | member123 | member |
| Viewer | [email protected] | viewer123 | viewer |
Role selectors (most robust):
page.getByRole("button", { name: "Save" });
page.getByRole("link", { name: "Datasets" });
page.getByRole("tab", { name: /Evaluators/i });
page.getByRole("menuitem", { name: "Edit" });
page.getByRole("cell", { name: "my-item" });
page.getByRole("heading", { name: "Title" });
page.getByRole("dialog");
page.getByRole("textbox", { name: "Name" });
page.getByRole("combobox", { name: /mapping/i });
Label selectors:
page.getByLabel("Email");
page.getByLabel("Dataset Name");
page.getByLabel("Description");
Text selectors:
page.getByText("No evaluators added");
page.getByPlaceholder("Search...");
Test IDs (when available):
page.getByTestId("modal");
CSS locators (last resort):
page.locator('button:has-text("Save")');
// Click button to open dropdown
await page.getByRole("button", { name: "New Dataset" }).click();
// Select menu item
await page.getByRole("menuitem", { name: "New Dataset" }).click();
// Open menu, hover over submenu trigger, click submenu item
await page.getByRole("button", { name: "Add evaluator" }).click();
await page
.getByRole("menuitem", { name: "Use LLM evaluator template" })
.hover();
await page.getByRole("menuitem", { name: /correctness/i }).click();
// IMPORTANT: Always use getByRole("menuitem") for submenu items, not getByText()
// Playwright's auto-waiting handles the submenu appearance timing
// ❌ BAD - flaky in CI:
// await page.getByText("ExactMatch").first().click();
// ✅ GOOD - reliable:
// await page.getByRole("menuitem", { name: /ExactMatch/i }).click();
// Wait for dialog
await expect(page.getByRole("dialog")).toBeVisible();
// Fill form in dialog
await page.getByLabel("Name").fill("test-name");
// Submit
await page.getByRole("button", { name: "Create" }).click();
// Wait for close
await expect(page.getByRole("dialog")).not.toBeVisible();
// Find row by cell content
const row = page.getByRole("row").filter({
has: page.getByRole("cell", { name: "item-name" }),
});
// Click action button in row (usually last button)
await row.getByRole("button").last().click();
// Select action from menu
await page.getByRole("menuitem", { name: "Edit" }).click();
await page.getByRole("tab", { name: /Evaluators/i }).click();
await page.waitForURL("**/evaluators");
await expect(page.getByRole("tab", { name: /Evaluators/i })).toHaveAttribute(
"aria-selected",
"true",
);
// When multiple textboxes exist, scope to section
const systemSection = page.locator('button:has-text("System")');
const systemTextbox = systemSection
.locator("..")
.locator("..")
.getByRole("textbox");
await systemTextbox.fill("content");
Use test.describe.serial when tests depend on each other:
test.describe.serial("Workflow", () => {
const itemName = `item-${randomUUID()}`;
test("step 1: create item", async ({ page }) => {
// Creates itemName
});
test("step 2: edit item", async ({ page }) => {
// Uses itemName from previous test
});
test("step 3: verify edits", async ({ page }) => {
// Verifies itemName was edited
});
});
// Visibility
await expect(element).toBeVisible();
await expect(element).not.toBeVisible();
// Text content
await expect(element).toHaveText("expected");
await expect(element).toContainText("partial");
// Attributes
await expect(element).toHaveAttribute("aria-selected", "true");
// Input values
await expect(input).toHaveValue("expected value");
// URL
await page.waitForURL("**/datasets/**/examples");
// Direct navigation
await page.goto("/datasets");
await page.waitForURL("**/datasets");
// Click navigation
await page.getByRole("link", { name: "Datasets" }).click();
await page.waitForURL("**/datasets");
// Extract ID from URL
const url = page.url();
const match = url.match(/datasets\/([^/]+)/);
const datasetId = match ? match[1] : "";
// Navigate with query params
await page.goto(`/playground?datasetId=${datasetId}`);
Before running Playwright tests, build the app so E2E runs against the latest frontend changes:
pnpm run build
# Run specific test file
pnpm exec playwright test tests/server-evaluators.spec.ts --project=chromium
# Run with UI mode
pnpm exec playwright test --ui
# Run specific test by name
pnpm exec playwright test -g "can create"
# Debug mode
pnpm exec playwright test --debug
By default, Playwright serves an HTML report after tests finish and waits for Ctrl+C, which can cause command timeouts. Use these options to avoid this:
# Use list reporter (no interactive server)
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=list
# Use dot reporter for minimal output
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=dot
# Set CI mode to disable interactive features
CI=1 pnpm exec playwright test tests/example.spec.ts --project=chromium
Recommended for automation: Always use --reporter=list or CI=1 when running tests programmatically to ensure the command exits cleanly after tests complete.
| Page | URL Pattern | Key Elements |
|---|---|---|
| Datasets | /datasets | Table, "New Dataset" button |
| Dataset Detail | /datasets/{id}/examples | Tabs (Experiments, Examples, Evaluators, Versions) |
| Dataset Evaluators | /datasets/{id}/evaluators | "Add evaluator" button, evaluators table |
| Playground | /playground | Prompts section, Experiment section |
| Playground + Dataset | /playground?datasetId={id} | Dataset selector, Evaluators button |
| Prompts | /prompts | "New Prompt" button, prompts table |
| Settings | /settings/general | "Add User" button, users table |
When selectors are unclear, use agent-browser to explore the Phoenix UI. For detailed agent-browser usage, invoke the /agent-browser skill.
# Open Phoenix page (dev server runs on port 6006)
agent-browser open "http://localhost:6006/datasets"
# Get interactive snapshot with element refs
agent-browser snapshot -i
# Click using refs from snapshot
agent-browser click @e5
# Fill form fields
agent-browser fill @e2 "test value"
# Get element text
agent-browser get text @e1
agent-browser open "http://localhost:6006/datasets"agent-browser snapshot -i@e1 [button] "New Dataset")agent-browser click @e1agent-browser snapshot -i| agent-browser output | Playwright selector |
|---|---|
@e1 [button] "Save" | page.getByRole("button", { name: "Save" }) |
@e2 [link] "Datasets" | page.getByRole("link", { name: "Datasets" }) |
@e3 [textbox] "Name" | page.getByRole("textbox", { name: "Name" }) |
@e4 [menuitem] "Edit" | page.getByRole("menuitem", { name: "Edit" }) |
@e5 [tab] "Evaluators 0" | page.getByRole("tab", { name: /Evaluators/i }) |
{feature-name}.spec.ts{role}-access.spec.ts{feature}.rate-limit.spec.ts (runs last).first(), .last(), or .nth(n){ name: /pattern/i }waitForURL over waitForTimeoutDon't assume parallelism is the problem
waitForTimeout is almost always wrong
page.waitForTimeout() is the #1 cause of flakiness in Phoenix tests// ❌ BAD - flaky, races against rendering
await page.waitForTimeout(500);
await element.click();
// ✅ GOOD - waits for actual state
await element.waitFor({ state: "visible" });
await element.click();
Test the actual failure before fixing
Phoenix test infrastructure is solid
randomUUID() for data isolation - this works wellWhen tests are flaky:
Run with parallelism multiple times to catch intermittent failures:
for i in 1 2 3 4 5; do
pnpm exec playwright test --project=chromium --reporter=dot
done
Look for waitForTimeout usage - replace with proper waits:
grep -r "waitForTimeout" app/tests/
Check for race conditions in element interactions:
page.waitForLoadState("networkidle")waitForURL after navigation actionsVerify selectors are stable:
Run with trace on failure to see what happened:
pnpm exec playwright test --trace on-first-retry
| Flaky Pattern | Root Cause | Fix |
|---|---|---|
| Submenu item not found | Using getByText() instead of getByRole() | Use getByRole("menuitem", { name: /pattern/i }) for submenu items |
| Menu click fails | Menu not fully rendered | await menu.waitFor({ state: "visible" }) before click |
| Dialog assertion fails | Dialog animation not complete | Assert specific completion signal (hidden dialog + next-state element) |
| Navigation timeout | Page still loading | Remove waitForLoadState("networkidle") - it's flaky in CI |
| Element not found | Dynamic content loading | Wait for element visibility, not arbitrary timeout |
| Stale element | Re-render between locate and click | Store locator, not element handle |
Use proper waits:
// Wait for element state
await element.waitFor({ state: "visible" | "hidden" | "attached" })
// Wait for network
await page.waitForLoadState("networkidle" | "domcontentloaded" | "load")
// Wait for URL change
await page.waitForURL("**/expected-path")
Use unique test data:
const uniqueName = `test-${randomUUID()}`;
Prefer role selectors - they're less brittle:
page.getByRole("button", { name: "Save" }) // ✅ Good
page.locator('button.save-btn') // ❌ Brittle
Don't fight animations - wait for them:
await expect(dialog).not.toBeVisible();
Verify URL changes after navigation:
await page.waitForURL("**/datasets");