Name: Liger Kernel Perf
Author: linkedin

Optimizes existing Liger Kernel Triton kernels through a 3-stage pipeline: Profile, Optimize, Finalize. Supports interactive mode (human checkpoints between stages) and autonomous mode (runs end-to-end). NVIDIA GPUs only.

Mode Detection

Interactive mode (default): Human checkpoints between each stage
Autonomous mode: User says "just optimize it", "run without asking me", "optimize autonomously" → all stages run end-to-end, user sees only the final report

Input Parsing

Extract from the user's request:

Field	Description	Default
`target_kernel`	Which kernel to optimize (e.g., "rms_norm", "cross_entropy")	Required
`optimization_goal`	speed / memory / balanced	balanced
`scope`	Specific pass (forward/backward), input regime, or general

Mode Detection

Interactive mode (default): Human checkpoints between each stage
Autonomous mode: User says "just optimize it", "run without asking me", "optimize autonomously" → all stages run end-to-end, user sees only the final report

Input Parsing

Extract from the user's request:

Field	Description	Default
`target_kernel`	Which kernel to optimize (e.g., "rms_norm", "cross_entropy")	Required
`optimization_goal`	speed / memory / balanced	balanced
`scope`	Specific pass (forward/backward), input regime, or general

Guardrail	Threshold	Action
Non-target metric regression	>5% worse	Reject variant
Cross-pass regression	>10% on one pass to marginally improve other	Reject variant
Smoke test failure	Any correctness failure	Discard variant immediately
Full test suite failure	Any	Do NOT apply winner, report failure, stop
Checkstyle failure	Any	Auto-fix with ruff, retry once

Liger Kernel Perf

Mode Detection

Input Parsing

Liger Kernel Perf

Mode Detection

Input Parsing

Pre-Flight Validation

Pipeline

Stage 1: Profile

Stage 2: Optimize

Stage 3: Finalize

Guardrails

Reference Files

Pytorch Patterns

Regex Vs Llm Structured Text

Effect

Flags

WPF to WinUI 3 Migration Skill

At Dispatch V2