Run the LLM extractor benchmark and compare results. Use when the user wants to benchmark, test, or compare LLM extraction performance on manpages.
Run the LLM extractor benchmark tool and compare results against previous runs.
/llm-bench [--model <model>] [--batch <size>] [-d <description>] [--baseline <path>] [files...]
openai/gpt-5-mini.50.list to find paths. When omitted, compares against the most recent previous report.run_in_background: true). The batch API can take 10–30 minutes to complete — do NOT poll the output. Wait for the background task completion notification before proceeding.source /home/idank/dev/vibe/explainshell/.venv/bin/activate && python /home/idank/dev/vibe/explainshell/tools/llm_bench.py run --model <model> --batch <size> -d '<description>' [files...]
--baseline, use --baseline <path>. Otherwise omit to compare against the previous report.source /home/idank/dev/vibe/explainshell/.venv/bin/activate && python /home/idank/dev/vibe/explainshell/tools/llm_bench.py compare [--baseline <path>]
Think how changes in the current session (if there are any) could affect the LLM extraction pipeline, and whether the results make sense.
Raw LLM responses are stored in the run directory alongside the report. For per-file debugging, inspect the response files directly (e.g. cat <run-dir>/find.chunk-0.response.txt).