How to write and tune AI system prompts for Maya's 3-tier LLM backend
Maya's responses come from one of three backends, detected at runtime by src/lib/ai/detect.ts:
The same prompt logic must work across all three. Write for the weakest model first (Nano), then add richness for stronger backends.
||COMPLETE|| TokenThis is the single most important thing in the prompt system. When Maya's response contains ||COMPLETE||, the game engine treats the submission as correct and awards XP. If this token is missing, the submission is treated as wrong.
CORRECT response: "oh thank god. the output matches. we're through. ||COMPLETE||"
WRONG response: "that looks right! well done!" (no token = game thinks it failed)
Every code evaluation instruction must include the ||COMPLETE|| rule. If you're editing prompts and you accidentally remove it, the game breaks.
src/lib/ai/engine.ts)Before a message reaches any LLM, it goes through the local engine — a deterministic response system keyed by step ID. Each step has its own bank of intro messages, FAQ patterns (keyword-matched), and code evaluation patterns.
// Engine banks are keyed by step ID
const banks: Record<string, StepBank> = {
"chapter-01:scaffold": ch01ScaffoldBank,
"chapter-01:transmit": ch01TransmitBank,
// ...
};
The engine handles ~80% of interactions without LLM calls. It falls through to the LLM only when no pattern matches.
When adding a new challenge step, you must also add a corresponding StepBank in engine.ts with:
intro — first message when the step loadsfaq — keyword-matched responses for common questions (ordered by specificity — more specific patterns first)code — code evaluation patterns that check for expected constructs and return appropriate responsesSystem prompts are built dynamically per-challenge and per-step. The builder lives in src/lib/ai/prompts.ts.
[CODE] blocks. Must include the ||COMPLETE|| token rule.Keep prompts lean. Every token adds latency, especially on-device.
| Backend | Max system prompt | Max response |
|---|---|---|
| Gemini Nano | ~500 tokens | ~100 tokens |
| WebLLM | ~800 tokens | ~150 tokens |
| Anthropic API | ~1200 tokens | ~180 tokens |
The prompt changes based on game state. These are conditional — only include the active ones:
[CODE]\n. The prompt must tell Maya to evaluate these differently from chat messages.When players submit wrong code and keep failing, they drop off. Every wrong-answer response must follow these rules:
Always show the expected output format in error messages. Don't say "not right, check your loop." Say "not quite — each line should be just the number: 1, 2, 3... up to 10. no labels yet." The player must understand what "correct" looks like.
Diagnose the specific mistake. Use outputPatterns to detect common wrong outputs and give targeted feedback:
fmt.Println(i, \"DENY\")"genericWrong is the last resort, not the norm. Write enough outputPatterns and codePatterns to catch common mistakes. If a player is seeing genericWrong repeatedly, you haven't written enough patterns. Aim for 3-5 output patterns per step.
genericWrong messages must still show format. Even fallback messages should include the expected output shape: "the output should be exactly 10 lines: the numbers 1 through 10. nothing else on each line."
Order output patterns from most specific to least specific. The engine returns the first match, so "labels without numbers" should come before "has some labels" which should come before the catch-all "you're printing but not classifying."
Anticipate premature solutions. When multi-step challenges build on each other (loop → classify → rewrite), add an output pattern that detects when the player jumps ahead (e.g. prints labels in the loop step) and explicitly tells them to hold off.
||COMPLETE||.After each successful submission, the zen system (src/lib/game/zen.ts) analyzes the code and delivers a "memory jolt" — Maya recovering her CS knowledge. This is a separate system from the LLM prompts; it fires deterministically based on code heuristics.
Narrative voice for jolts:
"...wait. something just... clicked." or "the fog is lifting."When writing jolt text:
When writing suggestion text (code didn't follow the rule):
"something's trying to come back..."buildMayaSystemPrompt({ challenge, step, inRush, powerCut, energyState }) — builds the system prompt. Takes both Challenge and ChallengeStep so it can reference step-specific brief and title.callMaya(history, challenge, step, userMessage, isCode, inRush, powerCut, energyState) — main entry point. Routes to Gemini Nano or server API.callMayaEngine(stepId, userMessage, isCode, turnIndex, inRush) — local engine. Returns a response or null (fall through to LLM).The prototype's prompt builder is in inspo/inspo.jsx lines 188-211 (sysPrompt function). It's simple but effective — don't over-engineer beyond what it does unless there's a clear need.