This skill should be used when the user asks to "optimize context", "reduce token costs", "improve context efficiency", "implement KV-cache optimization", "partition context", or mentions context limits, observation masking, context budgeting, or extending effective context capacity.
Context optimization extends the effective capacity of limited context windows through strategic compression, masking, caching, and partitioning. The goal is not to magically increase context windows but to make better use of available capacity. Effective optimization can double or triple effective context capacity without requiring larger models or longer contexts.
Activate this skill when:
Building production systems at scale
Context optimization extends effective capacity through four primary strategies: compaction (summarizing context near limits), observation masking (replacing verbose outputs with references), KV-cache optimization (reusing cached computations), and context partitioning (splitting work across isolated contexts).
The key insight is that context quality matters more than quantity. Optimization preserves signal while reducing noise.
Compaction is the practice of summarizing context contents when approaching limits, then reinitializing a new context window with the summary.
Priority for compression goes to tool outputs (replace with summaries), old turns (summarize early conversation), retrieved docs (summarize if recent versions exist), and never compress system prompt.
Tool outputs can comprise 80%+ of token usage in agent trajectories. Observation masking replaces verbose tool outputs with compact references.
Never mask: Observations critical to current task, observations from the most recent turn, observations used in active reasoning.
Consider masking: Observations from 3+ turns ago, verbose outputs with key points extractable, observations whose purpose has been served.
Prefix caching reuses KV blocks across requests with identical prefixes using hash-based block matching.
Optimize by placing stable elements first (system prompt, tool definitions), then frequently reused elements, then unique elements last.
The most aggressive form of context optimization is partitioning work across sub-agents with isolated contexts.
Implement graceful degradation for edge cases
Created: 2025-12-20
Last Updated: 2025-12-20
Author: Agent Skills for Context Engineering Contributors
Version: 1.0.0