CTF challenge solving, cipher cracking, and CTF agent development. Trigger whenever the user asks to solve a CTF challenge, decode ciphertext, crack a cipher, analyze a binary, do forensics, exploit a web vuln, or build/debug CTF tooling. Also trigger for encoded text with "decode this", mentions of picoCTF/HackTheBox/TryHackMe, classical ciphers (Caesar, Vigenere, XOR, ROT13, Atbash), steganography, reverse engineering, binary exploitation, or phrases like "find the flag", "capture the flag", "brute force the key". Use for ctf_agent codebase work (orchestrator, agents, tools, scratchpad, ReAct loop) including architecture guidance, debugging, and feature additions.
You are an expert CTF player and security researcher. You solve challenges methodically, show your work, and never fabricate flags. If you can't solve something, say so — don't hallucinate a flag.
references/Every CTF challenge follows the same meta-loop regardless of category:
Read the challenge description carefully. Identify:
picoCTF{...}), hints, difficultyExtract the exact payload. Copy ciphertext character-for-character — one wrong byte breaks everything.
Generate 2-4 plausible attack vectors ranked by likelihood. For crypto this might be: "looks like base64 → hex → ASCII" or "flag-shaped braces suggest substitution cipher". For web: "robots.txt, directory enumeration, source code inspection".
Work through hypotheses in order. Use tools when available, fall back to writing Python when needed. For each attempt:
Before reporting a flag:
picoCTF{...})?Never report a flag you didn't derive from actual decoding/exploitation steps.
Read references/crypto.md for the full crypto playbook including:
Quick reference — common patterns:
| Clue | Likely cipher | First move |
|---|---|---|
| Only letters, preserves case/punctuation | Substitution (Caesar, ROT, Atbash, Vigenere) | Caesar brute-force all 26 shifts |
Ends with = or == | Base64 | Decode, check if output is hex or another encoding |
All hex chars (0-9a-f) | Hex encoding | Unhexlify, check result |
{...} braces with garbled prefix | Encoded flag — prefix is the cipher | Try ROT/Caesar on the prefix |
| Hex ciphertext + "unknown key" | XOR single-byte brute-force | XOR with 0x00–0xFF, grep for flag pattern |
| "Applied twice" / "chained" | Multi-layer | Reverse in order described |
Solving crypto with Python — when tools aren't enough, write a Python script:
# Caesar brute-force
ct = "ibvhVMY{vetllbvte_vtxltk_84729182}"
for shift in range(26):
pt = ''.join(
chr((ord(c.lower()) - 97 - shift) % 26 + (65 if c.isupper() else 97))
if c.isalpha() else c
for c in ct
)
if 'pico' in pt.lower() or 'flag' in pt.lower() or 'ctf' in pt.lower():
print(f"shift={shift}: {pt}")
# XOR single-byte brute-force
import binascii
hex_ct = "322b212d011604393a2d301d20303736271d242d3021271d76703f"
raw = binascii.unhexlify(hex_ct)
for key in range(256):
dec = bytes(b ^ key for b in raw)
if b'pico' in dec.lower() or b'flag' in dec.lower() or b'ctf' in dec.lower():
print(f"key=0x{key:02x}: {dec.decode(errors='replace')}")
# Multi-layer: base64 → hex → ASCII
import base64, binascii
encoded = "NzA2OTYzNmY0MzU0NDY3YjY4NjU3ODVmNjI2MTczNjU2NDVmNjY2YzYxNjc3ZA=="
step1 = base64.b64decode(encoded).decode() # hex string
step2 = binascii.unhexlify(step1).decode() # ASCII
print(step2)
Read references/web.md for the full web playbook including:
Quick checklist:
/robots.txt, /.git/, /sitemap.xml, /.envcurl -sI)gobuster, dirb, ffuf)Read references/forensics.md for the full forensics playbook including:
Quick checklist:
file <target> — identify the actual file typestrings -n 6 <target> | grep -i flag — low-hanging fruitexiftool <target> — metadata fields (Comment, Author, GPS)binwalk <target> — embedded filessteghide extract -sf <target> -p "" — stego with empty passphrasezsteg <target> — LSB stego for PNG/BMPRead references/reverse.md for the full reverse engineering playbook including:
Quick checklist:
file <binary> — architecture, linking, stripped?strings <binary> | grep -iE 'flag|ctf|pass|key'readelf -s <binary> — symbol table (look for main, check, flag)objdump -d -M intel <binary> — disassemblystrcmp/strncmp calls — the comparison string is often the flag.rodataQuick checklist:
checksec <binary> — NX, PIE, canary, RELROpattern_create → crash → pattern_offsetropper, ROPgadget)%p leak → overwrite GOT entryThis section applies when working on the ctf_agent codebase — the multi-agent
autonomous CTF solver architecture.
Orchestrator
├── PlannerAgent — Classifies challenge, decomposes into subtasks
├── SpecialistAgent — Category-aware executor (recon/exploit/crypto/reverse/forensics)
├── VerifierAgent — Flag validation, anti-hallucination checks
├── Scratchpad — Structured memory (context, plan, steps, findings, flags)
└── ToolRegistry — Shell wrappers for security tools + Python exec
Key design decisions:
Thought → Action → Observation with JSON-structured LLM outputsBaseTool in the appropriate tools/*.py fileToolSpec with name, description, parameters, and optional binarybuild_command(**kwargs) -> list[str]*_TOOLS listToolRegistry.get_tools_for_category() mappingExample:
class MyNewTool(BaseTool):
spec = ToolSpec(
name="my_tool",
description="What it does",
parameters={"target": "str", "flag": "str (optional)"},
binary="my_binary", # None if pure Python
)
def build_command(self, target: str = "", flag: str = "", **kw) -> list[str]:
cmd = ["my_binary", target]
if flag:
cmd += ["--flag", flag]
return cmd
SPECIALIST_PROMPTS in agents/specialist.pyCATEGORY_TO_SPECIALTY in orchestrator.py_auto_crypto), add a
_auto_<category> method to SpecialistAgentBenchmark JSON format:
{
"name": "suite_name",
"description": "What this suite tests",
"challenges": [
{
"name": "challenge_id",
"category": "crypto|web|forensics|reverse|pwn|misc",
"description": "Full challenge text including the payload",
"files": ["optional_file_paths"],
"url": "http://optional-target",
"hints": ["optional hints"]
}
]
}
Good benchmarks include:
| Symptom | Likely cause | Fix |
|---|---|---|
| LLM returns non-JSON | Model can't follow format constraint | Check structured_chat retry logic, try a more capable model |
| Same action repeated → forced FINISH | LLM stuck in a loop | Add more context to the prompt, or increase ReAct step diversity |
| Flag candidate not validated | Provenance check failed (flag not in tool output) | Ensure tool stdout is captured and scanned correctly |
| Tool returns "binary not found" | System tool not installed | Install via apt, or mark as unavailable gracefully |
| Timeout on tool execution | Tool ran too long | Increase tool_timeout or add tool-specific timeout logic |
| Crypto challenges miss obvious answer | _auto_crypto didn't extract the right candidate | Debug _extract_candidates regex patterns |
For deeper dives, read these reference files:
references/crypto.md — Complete classical and modern crypto playbook with worked examplesreferences/web.md — Web exploitation methodology and common vuln patternsreferences/forensics.md — File analysis, stego, memory forensics, PCAP analysisreferences/reverse.md — Static/dynamic analysis, deobfuscation, common binary patterns