Skill File

Phoenix Playwright Tests

Name: Phoenix Playwright Tests
Author: Arize-ai

Write Playwright E2E tests for the Phoenix AI observability platform. Use when creating, updating, or debugging Playwright tests, or when the user asks about testing UI features, writing E2E tests, or automating browser interactions for Phoenix.

Arize-ai9,333 starsFeb 17, 2026

Occupation
Categories: Testing

Skill Content

Phoenix Playwright Test Writing

Write end-to-end tests for Phoenix using Playwright. Tests live in app/tests/ and follow established patterns.

Timeout Policy

Do not pass timeout args in test code under app/tests.
Tune timing centrally in app/playwright.config.ts (global timeout, expect.timeout, use.navigationTimeout, and webServer.timeout).

Quick Start

import { expect, test } from "@playwright/test";
import { randomUUID } from "crypto";

test.describe("Feature Name", () => {
  test.beforeEach(async ({ page }) => {
    await page.goto(`/login`);
    await page.getByLabel("Email").fill("admin@localhost");
    await page.getByLabel("Password").fill("admin123");
    await page.getByRole("button", { name: "Log In", exact: true }).click();
    await page.waitForURL("**/projects");
  });

  test("can do something", async ({ page }) => {
    // Test implementation
  });
});

Related Skills

Phoenix Playwright Tests | Skills Pool

Role selectors (most robust):

page.getByRole("button", { name: "Save" });
page.getByRole("link", { name: "Datasets" });
page.getByRole("tab", { name: /Evaluators/i });
page.getByRole("menuitem", { name: "Edit" });
page.getByRole("cell", { name: "my-item" });
page.getByRole("heading", { name: "Title" });
page.getByRole("dialog");
page.getByRole("textbox", { name: "Name" });
page.getByRole("combobox", { name: /mapping/i });

Label selectors:

page.getByLabel("Email");
page.getByLabel("Dataset Name");
page.getByLabel("Description");

Text selectors:

page.getByText("No evaluators added");
page.getByPlaceholder("Search...");

Test IDs (when available):
```
page.getByTestId("modal");
```

CSS locators (last resort):

page.locator('button:has-text("Save")');

// Click button to open dropdown
await page.getByRole("button", { name: "New Dataset" }).click();
// Select menu item
await page.getByRole("menuitem", { name: "New Dataset" }).click();

// Open menu, hover over submenu trigger, click submenu item
await page.getByRole("button", { name: "Add evaluator" }).click();
await page
  .getByRole("menuitem", { name: "Use LLM evaluator template" })
  .hover();
await page.getByRole("menuitem", { name: /correctness/i }).click();

// IMPORTANT: Always use getByRole("menuitem") for submenu items, not getByText()
// Playwright's auto-waiting handles the submenu appearance timing
// ❌ BAD - flaky in CI:
// await page.getByText("ExactMatch").first().click();
// ✅ GOOD - reliable:
// await page.getByRole("menuitem", { name: /ExactMatch/i }).click();

// Wait for dialog
await expect(page.getByRole("dialog")).toBeVisible();
// Fill form in dialog
await page.getByLabel("Name").fill("test-name");
// Submit
await page.getByRole("button", { name: "Create" }).click();
// Wait for close
await expect(page.getByRole("dialog")).not.toBeVisible();

// Find row by cell content
const row = page.getByRole("row").filter({
  has: page.getByRole("cell", { name: "item-name" }),
});
// Click action button in row (usually last button)
await row.getByRole("button").last().click();
// Select action from menu
await page.getByRole("menuitem", { name: "Edit" }).click();

await page.getByRole("tab", { name: /Evaluators/i }).click();
await page.waitForURL("**/evaluators");
await expect(page.getByRole("tab", { name: /Evaluators/i })).toHaveAttribute(
  "aria-selected",
  "true",
);

// When multiple textboxes exist, scope to section
const systemSection = page.locator('button:has-text("System")');
const systemTextbox = systemSection
  .locator("..")
  .locator("..")
  .getByRole("textbox");
await systemTextbox.fill("content");

test.describe.serial("Workflow", () => {
  const itemName = `item-${randomUUID()}`;

  test("step 1: create item", async ({ page }) => {
    // Creates itemName
  });

  test("step 2: edit item", async ({ page }) => {
    // Uses itemName from previous test
  });

  test("step 3: verify edits", async ({ page }) => {
    // Verifies itemName was edited
  });
});

// Visibility
await expect(element).toBeVisible();
await expect(element).not.toBeVisible();

// Text content
await expect(element).toHaveText("expected");
await expect(element).toContainText("partial");

// Attributes
await expect(element).toHaveAttribute("aria-selected", "true");

// Input values
await expect(input).toHaveValue("expected value");

// URL
await page.waitForURL("**/datasets/**/examples");

// Direct navigation
await page.goto("/datasets");
await page.waitForURL("**/datasets");

// Click navigation
await page.getByRole("link", { name: "Datasets" }).click();
await page.waitForURL("**/datasets");

// Extract ID from URL
const url = page.url();
const match = url.match(/datasets\/([^/]+)/);
const datasetId = match ? match[1] : "";

// Navigate with query params
await page.goto(`/playground?datasetId=${datasetId}`);

pnpm run build

# Run specific test file
pnpm exec playwright test tests/server-evaluators.spec.ts --project=chromium

# Run with UI mode
pnpm exec playwright test --ui

# Run specific test by name
pnpm exec playwright test -g "can create"

# Debug mode
pnpm exec playwright test --debug

# Use list reporter (no interactive server)
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=list

# Use dot reporter for minimal output
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=dot

# Set CI mode to disable interactive features
CI=1 pnpm exec playwright test tests/example.spec.ts --project=chromium

Page	URL Pattern	Key Elements
Datasets	`/datasets`	Table, "New Dataset" button
Dataset Detail	`/datasets/{id}/examples`	Tabs (Experiments, Examples, Evaluators, Versions)
Dataset Evaluators	`/datasets/{id}/evaluators`	"Add evaluator" button, evaluators table
Playground	`/playground`	Prompts section, Experiment section
Playground + Dataset	`/playground?datasetId={id}`	Dataset selector, Evaluators button
Prompts	`/prompts`	"New Prompt" button, prompts table
Settings	`/settings/general`	"Add User" button, users table

# Open Phoenix page (dev server runs on port 6006)
agent-browser open "http://localhost:6006/datasets"

# Get interactive snapshot with element refs
agent-browser snapshot -i

# Click using refs from snapshot
agent-browser click @e5

# Fill form fields
agent-browser fill @e2 "test value"

# Get element text
agent-browser get text @e1

agent-browser output	Playwright selector
`@e1 [button] "Save"`	`page.getByRole("button", { name: "Save" })`
`@e2 [link] "Datasets"`	`page.getByRole("link", { name: "Datasets" })`
`@e3 [textbox] "Name"`	`page.getByRole("textbox", { name: "Name" })`
`@e4 [menuitem] "Edit"`	`page.getByRole("menuitem", { name: "Edit" })`
`@e5 [tab] "Evaluators 0"`	`page.getByRole("tab", { name: /Evaluators/i })`

Don't assume parallelism is the problem
- Phoenix tests run with 7 parallel workers without issues
- The app handles concurrent logins, database operations, and session management properly
- If tests fail with parallelism, it's usually a test timing issue, not infrastructure
- Playwright's browser context isolation is robust - each worker gets isolated cookies/sessions

waitForTimeout is almost always wrong

page.waitForTimeout() is the #1 cause of flakiness in Phoenix tests
Arbitrary timeouts race against rendering and network speed

Always replace with state-based waits:

// ❌ BAD - flaky, races against rendering
await page.waitForTimeout(500);
await element.click();

// ✅ GOOD - waits for actual state
await element.waitFor({ state: "visible" });
await element.click();

Test the actual failure before fixing
- Run tests with parallelism enabled to see what actually fails
- Check error messages - they often point to the real issue
- Don't optimize prematurely (e.g., caching auth state) if it's not the problem
Phoenix test infrastructure is solid
- In-memory SQLite works fine with parallel tests
- No need for per-worker databases
- No need for auth state caching
- Tests use randomUUID() for data isolation - this works well

Run with parallelism multiple times to catch intermittent failures:

for i in 1 2 3 4 5; do
  pnpm exec playwright test --project=chromium --reporter=dot
done

Look for waitForTimeout usage - replace with proper waits:
```
grep -r "waitForTimeout" app/tests/
```
Check for race conditions in element interactions:
- Wait for element visibility before interacting
- Wait for network idle when needed: page.waitForLoadState("networkidle")
- Use waitForURL after navigation actions
Verify selectors are stable:
- Avoid CSS selectors that depend on DOM structure
- Use role/label selectors that match ARIA attributes
- Test selectors don't break when UI updates

Run with trace on failure to see what happened:

pnpm exec playwright test --trace on-first-retry

Flaky Pattern	Root Cause	Fix
Submenu item not found	Using `getByText()` instead of `getByRole()`	Use `getByRole("menuitem", { name: /pattern/i })` for submenu items
Menu click fails	Menu not fully rendered	`await menu.waitFor({ state: "visible" })` before click
Dialog assertion fails	Dialog animation not complete	Assert specific completion signal (hidden dialog + next-state element)
Navigation timeout	Page still loading	Remove `waitForLoadState("networkidle")` - it's flaky in CI
Element not found	Dynamic content loading	Wait for element visibility, not arbitrary timeout
Stale element	Re-render between locate and click	Store locator, not element handle

Use proper waits:

// Wait for element state
await element.waitFor({ state: "visible" | "hidden" | "attached" })

// Wait for network
await page.waitForLoadState("networkidle" | "domcontentloaded" | "load")

// Wait for URL change
await page.waitForURL("**/expected-path")

Use unique test data:

const uniqueName = `test-${randomUUID()}`;

Prefer role selectors - they're less brittle:

page.getByRole("button", { name: "Save" }) // ✅ Good
page.locator('button.save-btn') // ❌ Brittle

Don't fight animations - wait for them:

await expect(dialog).not.toBeVisible();

Verify URL changes after navigation:
```
await page.waitForURL("**/datasets");
```

User	Email	Password	Role
Admin	admin@localhost	admin123	admin
Member	[email protected]	member123	member
Viewer	[email protected]	viewer123	viewer

Phoenix Playwright Tests

Phoenix Playwright Test Writing

Timeout Policy

Quick Start

Phoenix Playwright Tests

Phoenix Playwright Test Writing

Timeout Policy

Quick Start

Test Credentials

Selector Patterns (Priority Order)

Common UI Patterns

Nested Menus (Submenus)

Dialogs/Modals

Tables with Row Actions

Tabs

Form Inputs in Sections

Serial Tests (Shared State)

Assertions

Navigation Patterns

Running Tests

Avoiding Interactive Report Server

Phoenix-Specific Pages

UI Exploration with agent-browser

Quick Reference for Phoenix

Discovering Selectors Workflow

Translating to Playwright

File Naming

Common Gotchas

Debugging Flaky Tests

Critical Lessons Learned

Debugging Workflow

Common Flaky Patterns and Fixes

Test Stability Best Practices

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing

Phoenix Playwright Tests

Phoenix Playwright Test Writing

Timeout Policy

Quick Start

Phoenix Playwright Tests

Phoenix Playwright Test Writing

Timeout Policy

Quick Start

Test Credentials

Selector Patterns (Priority Order)

Common UI Patterns

Dropdown Menus

Nested Menus (Submenus)

Dialogs/Modals

Tables with Row Actions

Tabs

Form Inputs in Sections

Serial Tests (Shared State)

Assertions

Navigation Patterns

Running Tests

Avoiding Interactive Report Server

Phoenix-Specific Pages

UI Exploration with agent-browser

Quick Reference for Phoenix

Discovering Selectors Workflow

Translating to Playwright

File Naming

Common Gotchas

Debugging Flaky Tests

Critical Lessons Learned

Debugging Workflow

Common Flaky Patterns and Fixes

Test Stability Best Practices

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing