Clean up and restructure messy notes/plans: semantic sort, AI rewrite, 100% verified gap detection, optional split into spec + references. Triggers: "cleanup", "/cleanup", "почисти", "реорганизуй", "clean up", "plan-rewrite", "/plan-rewrite", "rewrite plan", "sort plan"
Clean up, reorganize, and optionally split a plan file with 100% verified gap detection.
/cleanup <file path> [file2] [file3]
Multiple files: concatenated with <!-- from: filename --> markers, sorted together.
$ARGUMENTS
Arguments: <file path> [file2] [file3]
cp <fileN> <fileN>.bak<!-- from: file1.md -->
<contents of file1>
<!-- from: file2.md -->
<contents of file2>
cp <file> <file>.bak## headers.### subsection in the target section. Respect code block boundaries (...).## headers for organization, but NEVER delete or rename existing lines. Header cleanup happens in Phase 3.repr() of lines for \xa0 and similar characters, then use Python for byte-for-byte copying from the original.Run the verification script:
python3 scripts/verify-sort.py <file>.bak <file>
Rewrite the sorted file into a clean, well-structured document. Create <basename>.rewritten.<ext>:
## section summaries, create ### subsections where logical.Chat log handling:
Constraints:
<details> / <summary> HTML blocks for critical content. Gap detection agents and scripts search raw text and may miss content inside HTML tags. If you use <details>, duplicate key facts outside the collapsible block (e.g., in a summary line above it).python3 scripts/verify-rewrite.py <sorted> <rewritten>
Extracts and compares URLs (only). Report missing URLs.
Small file optimization: If sorted file has <50 non-empty content lines → skip 4b entirely, rely on 4c fuzzy matching + agent verification. This saves significant time on small files.
GATE CHECK (for files ≥50 lines): 4b and 4c are DIFFERENT checks. BOTH are mandatory, in order.
- 4b = per-section semantic comparison → MISSING / PARTIAL / REVERSED
- 4c = safety net script fuzzy-match → only TRUE_MISSING Do NOT skip 4b. Do NOT substitute 4b with 4c.
Step 1 — Pre-filter via rg (reduces agent load by ~60-80%):
## section in the sorted file, collect all non-empty content lines.https://t.me/foo/123 → foo/123; from https://github.com/user/repo → repo).rg each keyword in the rewritten file.Example:
Автоматизация отработки фандингов https://t.me/automaker_main/43фандинг, automakerautomaker in rewritten → found at line 85 → COVERED, skip.Step 2 — Spawn agents for sections with unfound lines:
Assign 1-2 sections (by headers) per agent (only sections with remaining unfound lines after Step 1). No limit on agent count.
subagent_type: Explorerun_in_background: trueAgent prompt (use this for each agent, replacing SECTIONS and FILE_PATHS):
You are a gap detector. Compare SORTED file vs REWRITTEN file for these sections: [SECTIONS].
Files:
- Sorted: [SORTED_PATH]
- Rewritten: [REWRITTEN_PATH]
CRITICAL: Use rg tool to search for key phrases from each sorted line. Do NOT rely on manual reading for large files. For each line, grep 3-5 unique words in the rewritten file.
IMPORTANT: The rewritten file may contain HTML blocks (<details>, <summary>, <table>).
Content inside these tags IS valid — search INSIDE them with rg. A line found inside
<details>...</details> counts as present.
For EACH non-empty line in your assigned sections of the SORTED file:
1. Search for a semantic equivalent in the REWRITTEN file (search the ENTIRE file, not just the same section)
2. If found with same meaning → SKIP
3. If found but details lost → PARTIAL (quote both lines + what was lost)
4. If meaning changed/reversed → REVERSED (quote both lines)
5. If NOT found anywhere → MISSING (quote the sorted line)
RULES:
- Grammar/formatting changes are NOT gaps. "setup nginx" → "Set up Nginx" is fine.
- PARTIAL only when a SPECIFIC IDEA, DETAIL, or CONTEXT is lost — not formatting.
- These are NOT gaps: bullet→checkbox, case changes, typo fixes, punctuation, link text changes.
- If the core idea and all details are preserved, it's a SKIP regardless of formatting.
- You MUST quote exact text from both files. If you cannot quote the rewritten equivalent — it IS missing.
- Output format per finding:
SECTION: <header>
TYPE: MISSING|PARTIAL|REVERSED
SORTED_LINE: "<exact quote>"
REWRITTEN_LINE: "<exact quote or NOT_FOUND>"
LOST_DETAIL: "<what was lost>" (PARTIAL only)
4b is complete when: All section agents have returned results. Now proceed to 4c.
Step 1: Run the coverage script:
python3 scripts/verify-coverage.py <sorted> <rewritten> <gaps>
The script checks every sorted line against rewritten (fuzzy match) and gaps. Lines not found → written to <basename>.uncovered.tmp.
Step 2: If uncovered candidates exist, split into batches of 100 lines. Spawn one agent per batch in parallel. No limit on agent count — use as many as needed. NEVER skip this step.
subagent_type: Explore.uncovered.tmp, plus the rewritten fileAgent prompt:
You are a coverage verifier. You have a list of lines that a fuzzy-matching script
could not find in the rewritten file. Many of these are FALSE POSITIVES — the content
IS in the rewritten file but was rephrased, reformatted, or had typos fixed.
IMPORTANT: The rewritten file may use <details><summary>...</summary>...</details> blocks.
Content inside these blocks IS present — search the raw file text, not rendered output.
Many false positives come from content moved into <details> blocks.
CHAT SUMMARIZATION RULE: The rewritten file intentionally summarizes raw chat logs
(timestamped messages like "☀️, [date]") into structured "Key takeaways" sections.
If a chat message's substantive facts (numbers, prices, names, conclusions) appear
in summarized form — it is COVERED, not MISSING. Specifically:
- Timestamps, emoji markers, informal greetings → always FALSE POSITIVE
- Conversational fragments ("Ну хз", "Ага", "Потом конечная") → FALSE POSITIVE
- Back-and-forth debate condensed to conclusion → COVERED
- Specific numbers/facts preserved in summary → COVERED
Only report TRUE_MISSING if the substantive IDEA has no equivalent anywhere in the file.
Files:
- Uncovered candidates: [UNCOVERED_TMP_PATH]
- Rewritten: [REWRITTEN_PATH]
For EACH line in the uncovered file:
1. Search the ENTIRE rewritten file for content with the same meaning
2. If found (even rephrased, reformatted, typo-fixed, summarized) → FALSE POSITIVE, skip it
3. If truly NOT found anywhere → TRUE MISSING, report it
Output ONLY the TRUE MISSING lines, one per line, with format:
TRUE_MISSING: "<exact line from uncovered file>"
If all lines are false positives, output: "ALL COVERED — no true gaps found."
Step 3: Only TRUE MISSING lines from the agent get added to the gaps file as [UNCOVERED]. Delete the .uncovered.tmp file.
<basename>.gaps.md..uncovered.tmp.Gaps file format:
# Gaps: <filename>
<!-- Delete lines you don't need. Keep lines to apply to rewritten. -->
<!-- Summary: N MISSING, M PARTIAL, K REVERSED, L UNCOVERED -->
## <Section Name>
- [MISSING] `<exact sorted line>`
- [PARTIAL] `<sorted line>` → rewritten: `<rewritten line>` | Lost: <detail>
- [REVERSED] `<sorted line>` → rewritten: `<rewritten line>`
- [UNCOVERED] `<sorted line>`
IF gaps_count == 0 (no MISSING, PARTIAL, REVERSED, or UNCOVERED items): → Delete the empty gaps file. → Output: "No gaps found. Skipping to final verification." → Jump to Phase 8 (Final Verification).
ELSE → continue to Phase 6.
Output a report:
=== CLEANUP COMPLETE (Phases 1-5) ===
Files:
Backup: <path>.bak (<N> lines)
Sorted: <path> (<N> lines, verified)
Rewritten: <path>.rewritten (<N> lines)
Gaps: <path>.gaps.md (<N> items: X missing, Y partial, Z reversed, W uncovered)
Next: edit .gaps.md, delete what you don't need. Write me when you're ready to continue.
STOP. Do not continue until the user indicates they are ready.
When the user indicates they are ready:
[MISSING] / [UNCOVERED] → insert the original line into the appropriate section in rewritten.[PARTIAL] → augment the rewritten line with the lost detail.[REVERSED] → fix the meaning in rewritten.Verify the final rewritten file against the original backup:
# All URLs from original present in final?
python3 scripts/verify-rewrite.py <file>.bak <basename>.rewritten.<ext>
# Every original line covered in final?
python3 scripts/verify-coverage.py <file>.bak <basename>.rewritten.<ext> /dev/null
## headers (что проверено в Phase 2),
то допустимо запустить 1 агент на ВЕСЬ список uncovered (вместо батчей по 100),
с инструкцией "expect mostly false positives, report only truly unique content".
Use this prompt:
You are a final coverage verifier. Lines from the ORIGINAL BACKUP were not fuzzy-matched
in the FINAL rewritten file. Many are FALSE POSITIVES (rephrased, reformatted, reorganized).
IMPORTANT: The rewritten file may contain <details>, <summary> and other HTML elements.
Content inside these tags IS present — search inside them.
CHAT SUMMARIZATION RULE: The rewritten file intentionally summarizes raw chat logs
(timestamped messages like "☀️, [date]") into structured "Key takeaways" sections.
If a chat message's substantive facts (numbers, prices, names, conclusions) appear
in summarized form — it is COVERED, not MISSING. Specifically:
- Timestamps, emoji markers, informal greetings → always FALSE POSITIVE
- Conversational fragments → FALSE POSITIVE
- Back-and-forth debate condensed to conclusion → COVERED
- Specific numbers/facts preserved in summary → COVERED
Only report TRUE_MISSING if the substantive IDEA has no equivalent anywhere.
Files:
- Uncovered candidates: [UNCOVERED_TMP_PATH]
- Final rewritten: [REWRITTEN_PATH]
For EACH line:
1. Search ENTIRE rewritten file for same meaning
2. Found (even rephrased, inside <details>, summarized) → FALSE POSITIVE, skip
3. Truly not found → TRUE MISSING
Output: TRUE_MISSING: "<exact line>" or "ALL COVERED — no true gaps found."
mv <file>.rewritten <file> — original is replaced, .bak stays as backup.Output a final report:
=== CLEANUP REPORT ===
Metrics:
Original: <N> lines
Rewritten: <N> lines (<ratio>% compression)
Gaps found: <N> (X applied, Y dismissed by user)
URLs: <N> original, <M> preserved, <K> missing (user-approved)
Original replaced: yes/no
Backup: <path>.bak
Issues encountered:
- <any verification failures, skipped steps, agent errors, or anomalies>
Fixes (brief, only if issues found):
- <concrete action to resolve each issue>
Recommendations:
- <suggestions for the file or future rewrites>
After Phase 9 Report, offer to split the cleaned file into structured spec + reference files.
"File is focused on a single topic. Recommend: /compact then /clarify <file>"
/plan):
# Split Plan: <filename>
## Output files
### spec-<topic-A>.md (~N lines)
Sections: <list of ## headers going here>
Content: <brief description>
### references-<topic-A>.md (~M lines)
Sections: <list of ## headers going here>
Content: links, research, external refs related to topic A
### spec-<topic-B>.md (~K lines)
...
## Cross-references
- spec-<A>.md → references-<A>.md
- spec-<B>.md → references-<B>.md
<basename>/ (sibling to input file).spec-<topic-slug>.md — main content (tasks, goals, requirements, decisions).references-<topic-slug>.md — links, research notes, external refs, raw data.> References: [references-<topic>.md](references-<topic>.md)
python3 scripts/verify-split.py <original-clean-file> <output-dir>
The script concatenates all .md files in output dir and checks that every line from the original is present (fuzzy match). New lines (navigation headers, cross-references) are OK.
=== SPLIT COMPLETE ===
Files:
<list of created files with line counts>
Recommend: /clear then /clarify <spec-file.md>
Use /clear (not /compact) — clarify works best with a fresh context window, especially when split produced multiple specs that will each get their own clarify cycle.
The skill must handle files of any size. Scale resources proportionally:
git add <input-files> && git commit -m "pre-cleanup: <filename>" — snapshot before any changescleanup: rewrite <filename>cleanup: split <filename> into <N> filesThe cleanup output is suitable as input for /clarify. The output file MUST:
## section headers for all major topics[MISSING], [PARTIAL], [REVERSED], [UNCOVERED])spec-*.md file is an independent clarify inputAll scripts are in scripts/ (plugin root) or ~/.codex/skills/cleanup/scripts/ (legacy):
| Script | Purpose | Args |
|---|---|---|
verify-sort.py | Superset check — all original lines preserved, new lines OK | <backup> <sorted> |
verify-rewrite.py | URL presence check (with normalization) | <source> <target> |
verify-coverage.py | Safety net — every line accounted for | <sorted> <rewritten> <gaps> |
verify-split.py | Split verification — all lines present across output files | <original> <output-dir> |