Run the LLM extractor benchmark and compare results. Use when the user wants to benchmark, test, or compare LLM extraction performance on manpages.
Run the LLM extractor benchmark tool and compare results against previous runs.
/llm-bench [--model <model>] [--batch <size>] [-d <description>] [--baseline <path>] [files...]
openai/gpt-5-mini.50.list to find paths. When omitted, compares against the most recent previous report.run_in_background: truesource /home/idank/dev/vibe/explainshell/.venv/bin/activate && python /home/idank/dev/vibe/explainshell/tools/llm_bench.py run --model <model> --batch <size> -d '<description>' [files...]
--baseline, use --baseline <path>. Otherwise omit to compare against the previous report.source /home/idank/dev/vibe/explainshell/.venv/bin/activate && python /home/idank/dev/vibe/explainshell/tools/llm_bench.py compare [--baseline <path>]
Think how changes in the current session (if there are any) could affect the LLM extraction pipeline, and whether the results make sense.
Raw LLM responses are stored in the run directory alongside the report. For per-file debugging, inspect the response files directly (e.g. cat <run-dir>/find.chunk-0.response.txt).