Diagnose and fix flaky Playwright e2e tests. Use when tests fail intermittently, show timeout errors, have snapshot mismatches, or exhibit browser-specific failures.
Diagnose and fix flaky Playwright E2E tests in e2e_playwright/.
TimeoutError: wait_until timed out)Run the script to identify the most flaky tests from recent CI runs:
uv run scripts/fetch_flaky_tests.py
Options:
--days N: Look back N days (default: 4)--top N: Return top N flaky tests (default: 10)--min-reruns N: Minimum total reruns to include (default: 2)--json: Output as JSON for programmatic useThe script downloads playwright_test_stats artifacts from successful playwright.yml runs and aggregates tests that required reruns.
Skip tests already marked with @pytest.mark.flaky---these are known flaky tests being tracked separately.
# Check if a test file has the flaky marker
grep -l "pytest.mark.flaky" e2e_playwright/<test_file>.py
IMPORTANT: Only attempt to fix tests that fail locally. If you cannot reproduce the flakiness after 25 runs, do NOT attempt a fix—the test may be flaky due to CI environment factors that cannot be addressed locally.
Run the test up to 25 times with the affected browser(s). The loop breaks on first failure and captures full output:
for i in {1..25}; do
result=$(make run-e2e-test e2e_playwright/test_file.py::test_name -- --browser firefox 2>&1)
if echo "$result" | grep -q "FAILED"; then
echo "=== FAILURE ON RUN $i ==="
echo "$result"
break
fi
echo "Run $i: PASSED"
done
If all 25 runs pass, skip this test and move to the next one.
After failure, examine:
e2e_playwright/test-results/ - traces, screenshots, videose2e_playwright/test-results/snapshot-updates/ - actual vs expected snapshotsFor persistent snapshot flakiness: If a test keeps failing due to snapshot mismatches, compare the actual vs expected images in e2e_playwright/test-results/snapshot-updates/. Look for:
This helps identify whether the flakiness is due to timing (content not loaded), animation state, or browser rendering differences.
Symptom: Screenshots taken before element fully renders, animations not complete.
Fix: Add explicit waits before interactions or screenshots:
# Before
element.click()
assert_snapshot(element, name="snapshot")
# After
element.click()
expect(element).to_be_visible() # Wait for visibility
assert_snapshot(element, name="snapshot")
For popups/modals/calendars that animate:
calendar = page.locator('[data-baseweb="calendar"]').first
expect(calendar).to_be_visible() # Wait for animation to complete
assert_snapshot(calendar, name="calendar-snapshot")
Symptom: Assertion expects exact count but gets more (e.g., assert 44 == 41).
Fix: Use >= instead of == when browsers may retry failed operations:
# Before
assert error_count == expected_count
# After - browsers may retry failed image loads
assert error_count >= expected_count
Symptom: TimeoutError on slower browsers.
Fix: Increase timeout for operations that can be slow:
# Before
wait_until(app, lambda: check_condition(), timeout=10000)
# After
wait_until(app, lambda: check_condition(), timeout=20000)
Symptom: Snapshot mismatch for ... (X pixels difference).
Causes:
Fix: Ensure element is stable before screenshot:
element = page.locator(".my-element")
expect(element).to_be_visible()
# For elements with animations, wait for specific CSS state:
expect(element).to_have_css("opacity", "1")
assert_snapshot(element, name="snapshot")
| Browser | Common Issues |
|---|---|
| Firefox | Slower console logging, may retry failed requests, subpixel rendering differences |
| Webkit | May have timing differences with layout |
| Chromium | Generally most reliable, use as baseline |
Symptom: Firefox screenshots flake with 1-pixel differences due to subpixel rendering variations.
Fix: Add a one-liner markdown element above the element being tested. This shifts the subpixel position to a more stable value:
# In the test app (.py file)
st.markdown("---") # Stabilizes subpixel rendering for elements below
st.date_input("Pick a date")
This is a workaround for Firefox's subpixel rendering behavior and can reduce snapshot flakiness when other timing fixes don't help.
If you've exhausted timing fixes and the flakiness persists only on a specific browser due to known browser limitations (not test bugs), skip_browser may be appropriate as a last resort:
# Only use after confirming this is a browser-level limitation, not a fixable timing issue
@pytest.mark.skip_browser("webkit", reason="Webkit has known layout timing issues with this element")
def test_problematic_on_webkit(app: Page):
...
Important: Using skip_browser requires justification. Prefer fixing the underlying timing issue first. See "Rules" section for guidance on when skipping is acceptable.
After applying fix, verify with multiple runs:
# Run 10+ times to ensure stability
for i in {1..10}; do
make run-e2e-test e2e_playwright/test_file.py::test_name -- --browser firefox 2>&1 | grep -E "(PASSED|FAILED)"
done
Target: 10/10 passes before considering fix complete.
From e2e_playwright.conftest:
wait_for_app_run(page) - Wait for Streamlit script executionwait_for_app_loaded(page) - Wait for initial app loadwait_until(page, fn, timeout) - Poll until condition is trueFrom e2e_playwright.shared.app_utils:
expect_no_skeletons(element) - Wait for loading skeletons to disappearreset_focus(page) - Click outside to trigger blur eventsreset_hovering(locator) - Move mouse away from elementFetch flaky tests: uv run scripts/fetch_flaky_tests.py --top 10
Filter out marked tests: Skip tests with @pytest.mark.flaky
For each remaining test:
Run checks: make check before committing
skip_browser is acceptable only when:
reason explaining why