Skill: Add or Update a Pallas Kernel

This is a specialization of .agents/skills/agent-research/SKILL.md.

Use agent-research for the generic research lifecycle (branching, issue/logbook cadence, snapshot/tag discipline, and reporting). This skill adds kernel-specific standards for:

numerics and gradient safety,
backend/fallback API design,
TPU/GPU performance diagnosis,
block-size autotuning and tuned-table outputs.

How to apply this skill

Load and follow .agents/skills/agent-research/SKILL.md first.
Apply the additional kernel rules in this document.
Keep shared process details in agent-research; keep this file focused on kernel-specific constraints.

Kernel Deliverables

For a kernel K, produce:

A readable vanilla JAX reference implementation with the target public API.

Skill: Add or Update a Pallas Kernel

This is a specialization of .agents/skills/agent-research/SKILL.md.

Use agent-research for the generic research lifecycle (branching, issue/logbook cadence, snapshot/tag discipline, and reporting). This skill adds kernel-specific standards for:

numerics and gradient safety,
backend/fallback API design,
TPU/GPU performance diagnosis,
block-size autotuning and tuned-table outputs.

How to apply this skill

Load and follow .agents/skills/agent-research/SKILL.md first.
Apply the additional kernel rules in this document.
Keep shared process details in agent-research; keep this file focused on kernel-specific constraints.

Kernel Deliverables

For a kernel K, produce:

A readable vanilla JAX reference implementation with the target public API.

Add Pallas Kernel

Skill: Add or Update a Pallas Kernel

How to apply this skill

Kernel Deliverables

Add Pallas Kernel

Skill: Add or Update a Pallas Kernel

How to apply this skill

Kernel Deliverables

Recommended Module Layout

API and Safety Rules

Batching convention

Block size config

Fallback semantics

Input normalization rule

Correctness Workflow

1) Start from a reference

2) Write value + grad harness

3) Promote long-lived checks to pytest

Cost Estimate Requirement

Performance and Profiling Workflow

Autotuning Workflow

Fallback autotuning requirement

Compiler and Runtime Hints (TPU Pallas)

Matmul precision

Scoped VMEM policy

Compiler diagnostics and dumps

Dump-driven diagnosis

Definition of Done

Starter References

Tokamax Notes (Optional)

Further Reading

Pytorch Patterns

Regex Vs Llm Structured Text

Effect

Flags

WPF to WinUI 3 Migration Skill

At Dispatch V2