Write structured research paper summaries in a background → problem → method → results format. Use when the user asks to summarize a paper, add a paper entry to a reading list/awesome list, or explain a research method. Triggers on phrases like "summarize this paper", "add this paper to <file>", "write up <paper name>", or when given an arxiv/conference URL with instructions to integrate it into notes.
Write compact, technically substantive paper entries that a reader can skim in ~30 seconds and still walk away with (a) what existed before, (b) what was broken, (c) how the method fixes it, and (d) whether it worked.
Produce four blocks in this order. Each block has a fixed role — do not reorder or merge them. Use a markdown bullet hierarchy (paper title at top level, subsections as nested bullets).
- <Paper Title> [[Venue'YY/MM](url)]
- Background: <existing techniques and their limitations>
- Key problem & insight: <what's broken and the core insight that fixes it>
- Proposed method — <name> with N components:
1. **<Component 1 name>**: <one-line idea>. <high-level mechanism in 1-3 lines>
2. **<Component 2 name>**: ...
3. **<Component 3 name>**: ...
- Results: <headline numbers, vs. which baselines>
$...$ for short formulas. Only use $$...$$ if the formula is longer than a line and central to the method.arxiv.org/html/<id>) or download via curl and use Read with pages:- CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving [[FAST'26](https://www.usenix.org/system/files/fast26-liu-yang.pdf)]
- Background:
- PDC (Position-Dependent Caching): KV tied to absolute positions, reuse only on exact prefix matches
- PIC (Position-Independent Caching): strips position encoding, reusable anywhere but loses attention fidelity
- RoPE: high positional sensitivity — any shift invalidates cached keys
- CoPE: content-gated position encoding, less sensitive to shifts
- Key problem & insight: agent prompts have reusable segments that maintain consistent *relative* ordering despite absolute shifts (RPDC pattern). PDC/PIC don't exploit this; CoPE can, if positions are locked to a learned template rather than recomputed from live context
- Proposed method — CacheSlide with three components:
1. **CCPE (Chunked Contextual Position Encoding)**: pretrain a template $e^*$ of the most frequent CoPE encoding per task; at inference, reuse chunks get positions from $e^*[i]$ (pinned), recompute chunks get live CoPE
2. **WCA (Weighted Correction Attention)**: token-level gate on top of CCPE — rank tokens in a reuse chunk by $d_i = \|K^{\text{new}}_i - K^{\text{cache}}_i\|$, top-k (~5-17%) get blended $K_i \leftarrow \alpha K^{\text{new}}_i + (1-\alpha) K^{\text{cache}}_i$, rest use cache as-is, applied every $\tau$ layers
3. **SLIDE (KV cache manager)**: make WCA's I/O pattern SSD-friendly — relocate updated tokens to fresh pages (sequential writes), spill clean pages first, reclaim scratch pages during decode
- Results: 3.1-4.3× latency reduction, 3.5-5.8× throughput improvement over state-of-the-art baselines
This example shows all four blocks at the target length — use it as a template, not a script.