Debug PyTorch 2 compiler stack failures including Dynamo graph breaks, Inductor codegen errors, AOTAutograd crashes, and accuracy mismatches. Use when encountering torch.compile errors, BackendCompilerFailed exceptions, recompilation issues, Triton kernel failures, FX graph problems, or when the user mentions debugging PT2, Dynamo, Inductor, or compiled model issues.
Debug test failures and runtime errors in the PyTorch 2 compiler stack (Dynamo, Inductor, AOTAutograd, FX graphs).
$CONDA_DEFAULT_ENV. Then run python -c "import torch; print(torch.__version__)" to confirm torch is importable and report the version. If the environment is not active or torch cannot be imported, stop and ask the user to activate the correct environment before proceeding.test/dynamo/test_repros.py, test/inductor/test_torchinductor.py, test/export/test_export.py). — it is already oversized; find a more specific test file that matches the area of the bug. Use and . The test must fail before the fix and pass after. Having the test first keeps you grounded — you know exactly what "fixed" looks like before you start exploring the codebase.test/dynamo/test_misc.pytorch.testing._internal.common_utils.TestCaserun_testsEnterWorktree to create a worktree checked out at main. Copy the new test file into the worktree and run the test there to confirm it fails on main. If the test passes on main, stop — the test may not be capturing the right bug, or the bug may already be fixed. Exit the worktree with ExitWorktree (action: remove) and return to the working branch before continuing.TORCH_LOGS settingsTORCH_COMPILE_DEBUG=1is_exporting works, also run the existing test_is_exporting export test). Use pytest -k to quickly run related tests by name. The task is not complete until all pass./pr-review skill to review your own changes before presenting them. Fix any issues it flags.Use Grep, Glob, and Read directly for code exploration. Do not spawn meta_codesearch agents — they are slow and expensive. The Architectural Knowledge and Key Source Files sections below should give you enough context to know where to look. A targeted Grep for a function name is always faster.
Before reading implementation code, determine the compilation mode. These share code but diverge in important ways:
torch.compile -- Dynamo + Inductor. tx.export=False, no _compiling_state_context().torch.export (strict) -- tx.export=True, _compiling_state_context() active.torch.export (non-strict, the default) -- Uses Dynamo via fullgraph_capture but tx.export may differ from strict. _compiling_state_context() active. Check torch._export.config.use_new_tracer_experimental — it changes which code path is used.Many PT2 bugs come from confusing these two:
is_exporting() → ConstantVariable(True)).torch.compiler._is_exporting_flag.When debugging, add temporary print() statements directly in the source file rather than monkey-patching from outside — dispatch chains make monkey-patching unreliable.
Pick the right diagnostic tool based on the error category:
TORCH_LOGS="+dynamo,graph_breaks,recompiles" python your_script.pyTORCH_COMPILE_DEBUG=1 python your_script.py — creates torch_compile_debug/ with FX graphs, Inductor IR, and generated codeTORCH_LOGS="output_code" python your_script.pyTORCH_TRACE=/path/to/trace python your_script.py then tlparse /path/to/traceTORCHINDUCTOR_COMPILE_THREADS=1 python your_script.pyClassify the failure using the error message and traceback:
| Error Pattern | Category | Jump To |
|---|---|---|
Unsupported: ... or graph break in logs | Graph break | Graph Breaks |
BackendCompilerFailed | Inductor/backend crash | Backend Failures |
RecompileError or cache_size_limit | Recompilation | Recompilation |
| Accuracy mismatch / wrong numerical output | Accuracy | Accuracy |
InternalTorchDynamoError | Dynamo bug | Internal Errors |
| Segfault or CUDA IMA | Runtime crash | Runtime Crashes |
| Triton assertion / index out of bounds | Triton kernel bug | Triton Failures |
Graph breaks split the compiled graph into smaller subgraphs, often causing performance regressions or unexpected behavior.
Diagnosis:
TORCH_LOGS="graph_breaks" python your_script.py
Key files:
torch/_dynamo/exc.py -- Unsupported exception classtorch/_dynamo/variables/ -- where most graph break decisions happenCommon causes:
Fix approach:
torch._dynamo.allow_in_graph or restructuring user codeBackendCompilerFailed means Inductor (or another backend) crashed during compilation.
Diagnosis:
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=2 python your_script.py
This generates minifier_launcher.py that isolates the minimal failing graph.
Key files:
torch/_dynamo/repro/after_aot.py -- repro/minifier for post-AOT failurestorch/_inductor/ -- the backend itselfFix approach:
TORCH_COMPILE_DEBUG=1) to understand what ops are involvedtorch/_inductor/lowering.py), scheduling issue, or codegen issueExcessive recompilation happens when guards are too specific, causing cache misses.
Diagnosis:
TORCH_LOGS="recompiles,recompiles_verbose,guards" python your_script.py
Key config:
torch._dynamo.config.recompile_limit (default: 8)torch._dynamo.config.fail_on_recompile_limit_hit -- set to True to get a hard errorCommon causes:
Fix approach:
torch._dynamo.mark_dynamic() or fix the source of guard instabilityThe compiled model produces different numerical results than eager mode.
Diagnosis:
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.py
This compares compiled vs. eager with an fp64 reference and dumps a repro if accuracy fails.
Key utilities:
torch/_dynamo/debug_utils.py -- same_two_models(), backend_accuracy_fails(), cast_to_fp64()torch._dynamo.config.repro_tolerance (default: 1e-3)Fix approach:
InternalTorchDynamoError indicates a bug in Dynamo itself.
Diagnosis:
TORCHDYNAMO_VERBOSE=1 python your_script.py
# or equivalently:
TORCH_LOGS="+dynamo" python your_script.py
Key files:
torch/_dynamo/symbolic_convert.py -- bytecode interpretertorch/_dynamo/variables/ -- variable tracking systemtorch/_dynamo/guards.py -- guard generationFix approach:
TORCHDYNAMO_VERBOSE=1TORCHINDUCTOR_COMPILE_THREADS=1 and pdb if neededSegfaults and CUDA illegal memory access errors during execution of compiled code.
Diagnosis (make crash deterministic):
PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 python your_script.py
For CUDA IMA, add NaN checks:
TORCHINDUCTOR_NAN_ASSERTS=1 python your_script.py
For Inductor-level sync debugging:
torch._inductor.config.triton.debug_sync_kernel = True # sync after every kernel
torch._inductor.config.triton.debug_sync_graph = True # sync before/after graph
Fix approach:
PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1TORCH_LOGS="output_code"TORCHINDUCTOR_NAN_ASSERTS=1 to find the first kernel producing bad valuesTriton assertion failures or index-out-of-bounds in generated kernels.
Diagnosis:
TORCH_LOGS="output_code,schedule" python your_script.py
Key files:
torch/_inductor/codegen/triton.py -- Triton codegentorch/_inductor/scheduler.py -- kernel fusion decisionsFix approach:
output_code logsTORCH_COMPILE_DEBUG=1) to trace back to the FX op| File | Purpose |
|---|---|
torch/_dynamo/exc.py | Exception hierarchy and error formatting |
torch/_dynamo/debug_utils.py | Minifier support, accuracy checking, input serialization |
torch/_dynamo/repro/after_dynamo.py | Repro/minifier for Dynamo-stage failures |
torch/_dynamo/repro/after_aot.py | Repro/minifier for post-AOTAutograd failures |
torch/_dynamo/repro/aoti.py | Repro/minifier for AOTI failures |
torch/_dynamo/config.py | Dynamo config (repro levels, recompile limits) |
torch/_dynamo/variables/torch.py | Torch function handling, tracing state functions |
torch/_dynamo/variables/higher_order_ops.py | HOP tracing (cond, map, etc.) |
torch/_dynamo/symbolic_convert.py | Bytecode interpreter, InstructionTranslator |
torch/_dynamo/convert_frame.py | Frame compilation, fullgraph_capture entry point |
torch/_dynamo/functional_export.py | New export tracer (_dynamo_graph_capture_for_export) |
torch/_dynamo/eval_frame.py | torch._dynamo.export, optimize_assert |
torch/_export/_trace.py | Export pipeline (_export, _strict_export, _non_strict_export, _export_to_aten_ir) |
torch/_export/utils.py | _compiling_state_context() |
torch/compiler/__init__.py | is_compiling(), is_exporting(), runtime flags |
torch/_higher_order_ops/cond.py | torch.cond implementation and proxy tracing |
torch/_higher_order_ops/utils.py | reenter_make_fx for HOP branch tracing |
torch/_inductor/config.py | Inductor config (debug flags, trace settings) |
The minifier reduces a failing graph to the smallest reproduction:
# Step 1: Generate the minifier launcher
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=2 python your_script.py
# Step 2: Run the minifier
python minifier_launcher.py minify
# Step 3: Run the minimized repro
python minifier_launcher.py run
For accuracy issues, use level 4:
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.py
torch/_inductor/debug.py | DebugContext, graph visualization, IR logging |
torch/_logging/_registrations.py | All registered log aliases and artifacts |
PyTorch深度学习模式与最佳实践,用于构建稳健、高效且可复现的训练流程、模型架构和数据加载。