When to use

Tests fail intermittently (pass sometimes, fail others)
Timeout errors (TimeoutError: wait_until timed out)
Snapshot mismatches with pixel differences
Browser-specific failures (firefox, webkit, chromium)
User asks to fix top flaky tests from CI

Finding top flaky tests

Run the script to identify the most flaky tests from recent CI runs:

uv run scripts/fetch_flaky_tests.py

Options:

--days N: Look back N days (default: 4)
--top N: Return top N flaky tests (default: 10)
--min-reruns N: Minimum total reruns to include (default: 2)
--json: Output as JSON for programmatic use

# Check if a test file has the flaky marker
grep -l "pytest.mark.flaky" e2e_playwright/<test_file>.py

for i in {1..25}; do
  result=$(make run-e2e-test e2e_playwright/test_file.py::test_name -- --browser firefox 2>&1)
  if echo "$result" | grep -q "FAILED"; then
    echo "=== FAILURE ON RUN $i ==="
    echo "$result"
    break
  fi
  echo "Run $i: PASSED"
done

# Before
element.click()
assert_snapshot(element, name="snapshot")

# After
element.click()
expect(element).to_be_visible()  # Wait for visibility
assert_snapshot(element, name="snapshot")

calendar = page.locator('[data-baseweb="calendar"]').first
expect(calendar).to_be_visible()  # Wait for animation to complete
assert_snapshot(calendar, name="calendar-snapshot")

# Before
assert error_count == expected_count

# After - browsers may retry failed image loads
assert error_count >= expected_count

# Before
wait_until(app, lambda: check_condition(), timeout=10000)

# After
wait_until(app, lambda: check_condition(), timeout=20000)

element = page.locator(".my-element")
expect(element).to_be_visible()
# For elements with animations, wait for specific CSS state:
expect(element).to_have_css("opacity", "1")
assert_snapshot(element, name="snapshot")

Browser	Common Issues
Firefox	Slower console logging, may retry failed requests, subpixel rendering differences
Webkit	May have timing differences with layout
Chromium	Generally most reliable, use as baseline

# In the test app (.py file)
st.markdown("---")  # Stabilizes subpixel rendering for elements below
st.date_input("Pick a date")

# Only use after confirming this is a browser-level limitation, not a fixable timing issue
@pytest.mark.skip_browser("webkit", reason="Webkit has known layout timing issues with this element")
def test_problematic_on_webkit(app: Page):
    ...

# Run 10+ times to ensure stability
for i in {1..10}; do
  make run-e2e-test e2e_playwright/test_file.py::test_name -- --browser firefox 2>&1 | grep -E "(PASSED|FAILED)"
done

Fixing Flaky E2e Tests | Skills Pool

Fixing Flaky E2e Tests

Fixing Flaky E2e Tests

When to use

Finding top flaky tests

Filtering tests to fix

Investigation workflow

1. Reproduce the flakiness locally (REQUIRED)

2. Check test artifacts

Common causes and fixes

Timing issues (most common)

Browser retry causing extra events

Timeout too short

Snapshot mismatch due to timing

Browser-specific considerations

Firefox subpixel rendering flakiness

Verification

Key utilities

Complete workflow

Rules

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing