Use when asked to create, update, or analyse a performance/profiling/benchmark Jupyter notebook for experimental results. Covers tooling (uv + nb.py helper), standard cell layout, reproducibility (commit + patch), test stability, machine info, log loading patterns, and updating existing notebooks.
uv is the recommended tool for running notebooks — it requires no global Jupyter installation,
pins exact dependency versions per invocation, and keeps each invocation's environment isolated.
There are no conflicts with system Python or existing virtualenvs.
Install on macOS / Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
Install on Windows (PowerShell):
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
Inside the IntelliJ monorepo, use the pinned wrapper instead of a system uv.
It downloads and caches the right version automatically:
community/tools/uv.cmd run ...
${CLAUDE_SKILL_DIR}/scripts/nb.py is a small CLI that covers the
most common notebook operations without requiring a running Jupyter server:
# List all cells (index, type, first line)
python ${CLAUDE_SKILL_DIR}/scripts/nb.py list-cells notebook.ipynb
# Print source of cell N (0-based)
python ${CLAUDE_SKILL_DIR}/scripts/nb.py get-cell notebook.ipynb 3
# Replace cell N with content from a file (use '-' to read from stdin)
python ${CLAUDE_SKILL_DIR}/scripts/nb.py set-cell notebook.ipynb 3 new_cell.py
# Insert a new cell at position N (shifts later cells down); reads from stdin or a file
python ${CLAUDE_SKILL_DIR}/scripts/nb.py insert-cell notebook.ipynb 6 markdown < description.md
python ${CLAUDE_SKILL_DIR}/scripts/nb.py insert-cell notebook.ipynb 11 code snippet.py
# Delete cell N
python ${CLAUDE_SKILL_DIR}/scripts/nb.py delete-cell notebook.ipynb 5
# Collapse all code cells except the config cell (cell 1) so the notebook
# opens with only the editable config visible; use expand-cells to reverse
python ${CLAUDE_SKILL_DIR}/scripts/nb.py collapse-cells notebook.ipynb
python ${CLAUDE_SKILL_DIR}/scripts/nb.py expand-cells notebook.ipynb
# Keep a non-standard cell visible by tagging it nb:visible, or with --keep:
python ${CLAUDE_SKILL_DIR}/scripts/nb.py collapse-cells notebook.ipynb --keep 1 4
# Apply the patch(es) embedded in the last cell to a git repo
# --repo is required when the notebook and repo are on different paths
python ${CLAUDE_SKILL_DIR}/scripts/nb.py apply-patch notebook.ipynb --repo D:/src/jetbrains/idea/main
# Apply the patch(es) embedded in a specific cell
python ${CLAUDE_SKILL_DIR}/scripts/nb.py apply-patch notebook.ipynb 12 --repo D:/src/jetbrains/idea/main
# Bake the current git diff back into the diff fence of the last cell
# (the inverse of apply-patch — use after iterating on the patch)
python ${CLAUDE_SKILL_DIR}/scripts/nb.py update-patch notebook.ipynb --repo D:/src/jetbrains/idea/main
python ${CLAUDE_SKILL_DIR}/scripts/nb.py update-patch notebook.ipynb --repo D:/src/jetbrains/idea/main --staged
# Execute all cells in place (overwrites outputs)
python ${CLAUDE_SKILL_DIR}/scripts/nb.py execute notebook.ipynb
# Export to HTML (writes alongside the notebook; use -o for a custom directory).
# Cells collapsed via collapse-cells have their source omitted in the HTML output.
python ${CLAUDE_SKILL_DIR}/scripts/nb.py export-html notebook.ipynb
python ${CLAUDE_SKILL_DIR}/scripts/nb.py export-html notebook.ipynb -o out/
# Start JupyterLab for interactive editing (opens in browser)
python ${CLAUDE_SKILL_DIR}/scripts/nb.py serve notebook.ipynb
The tool invokes community/tools/uv.cmd internally, so no separate uv setup is needed.
PowerShell and cmd.exe handle escaping poorly for arguments containing backslashes, spaces, or special characters (e.g. JVM options with UNC paths). Store the value in an environment variable first to sidestep the problem.
Bash:
export WSL_PROJECT='\\wsl.localhost\Ubuntu-24.04\home\yourname\idea'
./tests.cmd --jvm-option "-Dijent.wsl.project.root=$WSL_PROJECT"
PowerShell:
$env:WSL_PROJECT = '\\wsl.localhost\Ubuntu-24.04\home\yourname\idea'
./tests.cmd --jvm-option "-Dijent.wsl.project.root=$env:WSL_PROJECT"
The same technique applies for any argument that would be hard to inline-escape: paths, output
directories, or flags that differ per engineer. The Kotlin side can read the value via
System.getProperty("...") or System.getenv("...").
By convention, notebooks should open with all boilerplate code collapsed so the
reader only sees the config cell they need to edit. Use collapse-cells after
finishing the notebook structure:
python ${CLAUDE_SKILL_DIR}/scripts/nb.py collapse-cells notebook.ipynb
This hides all code cells except cell 1 (the standard config cell). To mark a
different cell as permanently visible, add the tag nb:visible to its metadata:
# In a one-off update script:
nb["cells"][4]["metadata"].setdefault("tags", []).append("nb:visible")
When any code cell carries nb:visible, the tag takes precedence over --keep.
| # | Type | Purpose |
|---|---|---|
| 0 | markdown | Title; what is measured; exact shell command(s) to collect data |
| 1 | code | Config — only user-editable variables (e.g. repo_root = Path(r'...')) |
| 2 | code | Machine info (host, OS, CPU, RAM) |
| 3 | code | Imports + data loading |
| 4..N | code | Analysis — parse, stats, plots, tables |
| last | markdown | Full patch(es) needed to reproduce the data, as fenced diff blocks |
Every notebook must be self-contained enough for a different engineer on a different machine to re-run it and get the same results.
Required elements:
origin/master or a release branch (three-digit major version and maybe a minor version after a dot, e.g. origin/261, origin/261.12345).
Feature branches are deleted after merging, so a hash reachable only from a feature branch becomes inaccessible to anyone trying to reproduce the results.git diff $(git merge-base <stable branch> <hash>) <hash>
PowerShell:
git diff (git merge-base <stable branch> <hash>) <hash>
If an instrumentation patch was applied separately (not merged), include it too.Before writing the notebook, verify the test workload is deterministic and noise-free. Ask the user if uncertain. Red flags:
| Risk | Mitigation |
|---|---|
| Index caches reused across runs | preserveSystemDir = false (IDE Starter default) |
| Pre-built / shared indexes downloaded | .setSharedIndexesDownload(false) |
| Background telemetry uploads | .disableReportingStatisticsToProduction(), .disableFusSendingOnIdeClose() |
| Test runs on wrong filesystem | Verify the actual runtime path (e.g. WSL or Docker path vs Windows host path). For WSL/Docker tests, check that resolvedProjectHome points inside the target environment. A common pitfall: hook APIs that return a new context but the caller discards the return value, silently keeping the old path. |
| Non-deterministic algorithms | Verify fixed seeds, disabled sampling, etc. |
| Competing background processes | Identify and disable anything that shares the measured resource. |
If a test result depends on any random or session-specific factor, ask the user before proceeding.
The fewer steps a user needs to collect data, the better. The ideal: apply one patch, run one command, open the notebook.
Aim for a single test invocation. Put all measurements in one test class. If measurements
share setup (e.g. opening and indexing a project), run them in sequence inside one test method,
or as @Order-annotated methods that always run together as a group.
Use a single patch. Combine instrumentation and test changes into one diff. Two patches double the preparation steps for anyone trying to reproduce the results.
Anti-pattern: a notebook that requires running an indexing test and a rename test as separate
invocations (different --module flags, different log files to locate) forces a two-step recipe.
Merge the measurements into one test class so a single command generates all data.
See reference/machine-info.md for a stdlib-only implementation that works on Windows, macOS, and Linux.
See reference/log-files.md for patterns covering IDE Starter tests and regular (non-Starter) tests.
Some notebooks measure multiple configurations, and a user may
have run only one of them. In those cases the data-loading cell should set a boolean flag and
every subsequent analysis cell that depends on that data should check it before running.
This lets execute (or "Run All") complete without errors even when only partial data exists.
# In the data-loading cell — set flags for each optional configuration: