Rigorous mathematical explainer for a single concept, step, or equation. Use whenever the user asks to "explain mathematically", "show the derivation", "rewrite that with math", or whenever a previous explanation felt vague about a formula. Also invoke when paper-reader, pipeline-walk, or contrib-extract needs to introduce an equation — this skill is the canonical gate every formula must pass through. Trigger aggressively whenever a user pushes back on a prior explanation with "be more mathematical", "derive it", "where does this come from", or similar.
Most formula explanations skip steps. The reader is forced to guess why a specialized equation was written down, what the symbols mean, which term drives the behavior, and when the formula breaks. This skill removes that guessing. Every formula that passes through this skill is introduced from its canonical parent, defined completely inline, dissected term by term, bounded by explicit assumptions, and — if it is an approximation — restored to its exact form so the reader sees what was dropped.
Every variable, symbol, and equation — inline or display, including everything inside where … is … blocks and Diff/Insight lines — is written in LaTeX: $x$ inline, $$…$$ for display. Backticks are for file paths and code identifiers, never for math. Backticked math does not render and breaks downstream tools.
Before you write an equation, load this checklist into your head. After you write it, verify all five items are satisfied.
where ... is ..., ... is ... block defining every symbol that appears in it. Re-define symbols even if they were defined earlier in the document; the reader must never flip back.ε ≪ 1; the dropped O(ε²) term biases the result upward"). If the formula is exact, say "exact — no approximation".Never drop a specialized equation from the sky. The reader's trust comes from seeing the specialized form emerge from something they already know.
Pattern:
If you ever find yourself writing "the paper states equation X is ..." without doing steps 1–3, stop and restart.
Non-universal concepts need their own derivation chain. If the paper uses a tool or formalism that is itself specialized (e.g., Sparse Autoencoder, normalizing flow, Riemannian gradient, score matching), do not assume the reader knows it. Before you can use that tool as a building block, you must first derive or define it from something the reader does know. For example: if the paper uses a Top-K Sparse Autoencoder, the derivation chain is: standard autoencoder (reader knows this) → sparse autoencoder (add sparsity penalty, explain why) → Top-K SAE (replace penalty with hard Top-K gate, explain why). Only after this chain is complete can you use "Top-K SAE" as a known building block in subsequent formulas. Skipping intermediate links — e.g., jumping straight to "Top-K SAE" — is a concept-prerequisite gap (see zero-jump-check).
Paper's original equations must be cited. When the paper numbers an equation (e.g., Eq. 3, Eq. 7), the output must reference that number so the reader can cross-check against the original paper. Write "the paper's Eq. (3)" or similar after deriving the form. If the paper does not number its equations, reference by section and position (e.g., "the loss in Section 3.1, second display equation"). The reader must always be able to locate the original.
Do NOT front-load a preliminaries section full of equations. Introduce each formula at the exact narrative moment it is needed. If you need gradient descent, write it at the step that does the gradient descent — not in a "background" block at the top.
Every equation is followed by its own where block. Repetition across the document is expected and desired. The reader must never scroll up to recover a definition.
Good:
The update rule is
$$ \theta \leftarrow \theta - \eta , \nabla L(\theta) $$
where $\theta$ is the current parameter vector, $\eta > 0$ is the learning rate, and $\nabla L(\theta)$ is the gradient of the loss evaluated at $\theta$.
Bad (symbol $\eta$ introduced ten pages ago, no re-definition; or symbols written in backticks, which do not render):
The update rule is
θ ← θ − η ∇L(θ).
Every index, subscript, weight, and constant is an invitation. Answer each invitation. "What does $M$ stand for? Why is $w_i$ designed this way? Why is the sum over $j$ and not $i$?" If you leave one of these unanswered, the reader has to invent the answer themselves, and they will invent wrong.
After writing the explanation, re-read it and test each of the five checklist items explicitly. If any item fails, expand the explanation on the spot — don't leave it for the reader to notice.
If paper-reader, contrib-extract, or pipeline-walk loads this skill mid-run, apply the checklist to whichever formula is currently being introduced in the chunk. The caller is responsible for deciding which formula; your job is to make that single formula bulletproof.