Structured reasoning framework for complex investigations. Use when solving multi-step problems, decoding ciphers, following clues, or any task requiring systematic hypothesis testing.
For each sub-challenge, cycle through:
Before testing anything, write your hypothesis to ## Reasoning in challenge.md:
❓ prefix under the relevant sectionFollow this checklist BEFORE committing to any interpretation of a riddle, clue, or ambiguous instruction.
Freeze and tokenize. Split the clue into exact words. Note capitalization, punctuation, quotes, and line breaks. Treat formatting as signal, not decoration.
Read literally first. Assume each token may refer to itself (word-as-word), a position (first/second/third), or an explicit operation.
Enumerate meanings. For every ambiguous word or phrase, list at least 3 plausible meanings (e.g., "counts" could mean frequency, rank, position, length). Write them all down. Do NOT pick one yet.
Build competing hypotheses. For each interpretation, add sibling entries under the same heading in ## Reasoning. Define what each token means, what operation it implies, and what observable prediction it makes.
The Rule of Three When asked to decode, decrypt, or interpret text, you MUST test at least 3 different tokenization/interpretation strategies before concluding which is correct. For example:
\w+ (Word characters, includes digits/underscores)[a-zA-Z]+ (Strict letters)\S+ (Non-whitespace chunking)
Refusal to test all three is a failure of investigation.**Disprove before you prove.**For each hypothesis, design the cheapest test that can FAIL decisively. Run falsification tests first. A hypothesis survives only if it resists contradiction.
Require full coverage. A hypothesis is acceptable only if it explains ALL parts of the clue — including odd capitalization, formatting, and structure — with no hand-waving. If you have to ignore part of the clue, the interpretation is probably wrong.
Pivot after 3 failures. If the current interpretation yields no strong progress after 3 targeted tests, mark it ❌ in challenge.md and switch to the next branch. Do not try 10 variations of a failing approach.
Beware the "first look" trap. Puzzles are designed to mislead. The most obvious interpretation is often a decoy. When stuck, return to structural/literal readings: exact word positions, ranks, acrostics, self-reference.
All reasoning lives in ## Reasoning in challenge.md. Use nested markdown lists with status prefixes:
❓ = untested hypothesis✅ = confirmed with evidence (include the evidence inline)⚠️ = partial/close but not exact (keep open, note what's close)❌ = disproven (note why — DO NOT retry this approach)## Reasoning — if you already marked it ❌, stop immediately.## Reasoning, find the oldest ❓ entry, and work on that instead.❌ — not just trying more variations of the failing step.Execute that tool call. Never reason about data — look it up. Never enumerate possibilities mentally — write code.
Before applying ANY transformation to a full dataset:
print or datatable queries, not full table scans⚠️ in challenge.md, suspect parsing/tokenization/timezone issue. Add a "Fix Parsing" hypothesis. Do NOT mark ❌.Gibberish gate: If decoded text produces random characters, unrelated words, or non-English output, your decoding method is WRONG. Stop immediately. Do not try variations of the same method — try a fundamentally different approach (different tokenization, different field, different indexing).
This applies to: decoding, joining, parsing, any data transformation.
Read the result. What did you actually get? Quote it.
After EVERY test, update ## Reasoning in challenge.md:
❓ to ✅, add evidence inline, then save to memory and move on❌ with reason — then re-read ## Reasoning and pick the next ❓ entry✅ or ❌ — no limboIf you've tried 3+ different approaches without making progress:
task tool to invoke gpt-expert with your formulated question. Always use a modern model: gpt-5.4 or gemini-3-pro-preview. A fresh perspective from a different model often breaks through.