Reflect on a failed submission, diagnose the cause, and produce a revised plan.md.
Reflect on a failed submission for SW Expert Academy problem $0 and update the plan. Do NOT write or run any code.
| Argument | Position | Required | Description |
|---|---|---|---|
$0 | first | Yes | The problem ID (e.g., 12345 or VH1234). |
$1 | second | Yes | The trial number that failed (e.g., 1). |
$2 | third | Yes | The judge verdict: WA, TLE, MLE, or RE. |
$3 | fourth | No | Output trial number for the revised plan. Defaults to $1 + 1 if omitted. Use the same trial number as $1 when revising within a trial (e.g., after stress test failure). |
Read all of the following:
./problem_bank/$0/problem.md — the original problem statement.plan_1.md … plan_$1.md) and solutions (solution_1.py … solution_$1.py).plan_1.md (original approach), plus the last 3 plans and solutions (plan_{$1-2}.md … plan_$1.md and solution_{$1-2}.py … solution_$1.py). This keeps context manageable while retaining the original approach and recent history.Reading the history avoids repeating past mistakes and reveals patterns across attempts.
Skip this step entirely when $1 < 3 (i.e., this is the first or second trial reflection). Only activate from trial 3 onward.
Perform the following four checks using the full history gathered in Step 1:
Look at the verdict sequence across all trials and classify the pattern:
| Pattern | Definition |
|---|---|
| Plateaued | Same verdict repeating with roughly the same number of test cases passed. |
| Oscillating | Alternating between two verdicts (e.g., MLE → TLE → MLE or TLE → MLE → TLE). |
| Regressing | Fewer test cases passed than earlier trials, or new verdict type that is worse. |
| Progressing | Strictly more test cases passed, or verdict improving (e.g., RE → WA → WA with more passes). |
Has the fundamental algorithm stayed the same across trials? The following count as "same algorithm":
A different algorithm means a different time-complexity class or a fundamentally different problem decomposition (e.g., DP → greedy, brute-force → divide-and-conquer, BFS → binary search).
Estimate actual resource usage for the current algorithm against the problem's constraints:
If the estimate exceeds 50% of either the time or memory limit, flag the algorithm as infeasible.
Declare STUCK if ANY of the following hold:
If the pattern is Progressing, do NOT declare STUCK — the current approach is improving and should be continued.
Record the verdict (STUCK or NOT STUCK) and carry it forward to Step 2 and Step 3.
The verdict is $2. Analyze the failed solution against the problem statement. Focus your diagnosis based on the verdict:
| Verdict | Focus areas |
|---|---|
| WA (Wrong Answer) | Logic errors, off-by-one, misread problem constraints, wrong output format, missed edge cases. |
| TLE (Time Limit Exceeded) | Algorithm complexity too high, redundant computation, slow I/O, unnecessary data structure overhead. |
| MLE (Memory Limit Exceeded) | Excessive data structure sizes, recursion depth, unnecessary copies. |
| RE (Runtime Error) | Index out of bounds, division by zero, recursion limit, integer overflow, wrong input parsing. |
Clearly state:
Only applies when Step 1.5 declared STUCK. Skip this subsection otherwise.
When STUCK is declared, the standard diagnosis above is insufficient. Additionally:
Write an updated plan that fixes the diagnosed issue. The revised plan must include:
[NOTE] DO NOT USE import sys. Instead use input() for reading input and print() for output.
[NOTE] DO NOT USE import io. Do NOT USE import os.
[NOTE] DO NOT import external packages like numpy, pandas, etc.
Determine the output trial number:
$3 is provided, use {output_trial} = $3.{output_trial} = $1 + 1.Save the revised plan:
./problem_bank/$0/plan_{output_trial}.mdIf {output_trial} equals $1 (same trial — revising in place), overwrite the existing plan file.
After saving, the user should run /writing-solution $0 {output_trial} to implement the revised plan.