torch_sipu Operator Development Workflow

You are working in the torch_sipu repository — a PyTorch device backend extension for SIPU hardware. When the user asks you to add, modify, or refactor an operator or its underlying infrastructure, you MUST follow the steps below in order. Do not skip steps.

Entry: Detect Scenario

Before doing anything, classify the user's request into one of two scenarios and state it explicitly:

Scenario A — New / Ongoing Development

User wants to implement, refactor, or modify an operator from scratch (or continue in-progress work).

Signals: "implement X", "refactor X", "add dtype support for X", "fix bug in X kernel", no MR mentioned.

Action:

Branch Guard (MANDATORY): Follow the Branch Guard from the dev-workflow skill — check current branch and create a feature branch if needed. Do NOT proceed until on a feature branch.
Proceed to .

torch_sipu Operator Development Workflow

Entry: Detect Scenario

Before doing anything, classify the user's request into one of two scenarios and state it explicitly:

Scenario A — New / Ongoing Development

User wants to implement, refactor, or modify an operator from scratch (or continue in-progress work).

Signals: "implement X", "refactor X", "add dtype support for X", "fix bug in X kernel", no MR mentioned.

Action:

Branch Guard (MANDATORY): Follow the Branch Guard from the dev-workflow skill — check current branch and create a feature branch if needed. Do NOT proceed until on a feature branch.
Proceed to .

# Triton backend ops (one file per op) torch_sipu/backends/sipu_triton_kernels/ops/*.py torch_sipu/backends/sipu_triton_kernels/ops/__init__.py # exports torch_sipu/backends/sipu_triton_kernels/__init__.py # dispatcher registration # AI backend ops torch_sipu/backends/AI/ops/*.py torch_sipu/backends/AI/__init__.py # dispatcher registration # C++ kernel ops — joint compilation (.su) and host-only (.cpp) torch_sipu/csrc/aten/native/sipu/*.su # joint compilation kernels (modern) torch_sipu/csrc/aten/native/sipu/*.cpp # host-only C++ kernels torch_sipu/csrc/aten/native/native_functions.yaml # C++ dispatch registration torch_sipu/csrc/aten/native/ext_native_functions.yaml # extension ops (custom ops not in ATen) # C++ infrastructure headers (.suh) — shared by many ops torch_sipu/csrc/aten/native/sipu/Loops.suh # scalar element-wise loops (sipu_kernel) torch_sipu/csrc/aten/native/sipu/VecLoops.suh # vectorized loops (sipu_kernel_vec) torch_sipu/csrc/aten/native/sipu/TileLoops.suh # tiled loops (sipu_kernel_tile) torch_sipu/csrc/aten/native/sipu/Reduce.suh # reduction utilities (vectorized_reduction) torch_sipu/csrc/aten/native/sipu/Parallel.suh # parallel execution (parallel_for, invoke_parallel) torch_sipu/csrc/aten/native/sipu/Vec.suh # vector type utilities torch_sipu/csrc/aten/native/sipu/Tile.suh # tile type utilities # Triton op utilities torch_sipu/backends/sipu_triton_kernels/ops/utils.py # cpu_fallback, precheck_supported_dtypes, request_fallback torch_sipu/backends/sipu_triton_kernels/ops/verify_decorator.py # @sipu_verify torch_sipu/backends/sipu_triton_kernels/ops/preprocessing_framework.py # @triton_preprocess, *_OP_CONFIG # Test utilities torch_sipu/testing/_internal/triton_utils.py # skipIfUseSipuTritonKernels, onlySipuTritonKernels torch_sipu/testing/_internal/common_utils.py # Tests test/test_*.py # Examples examples/run_*.py

Change Type	Scope	CI Label
New Triton op	Python kernel + registration + tests	`triton`
New C++ op	`.su`/`.cpp` + YAML + tests	`sikernel`
New op (both backends)	Triton + C++ + tests	`sikernel` or `triton`
Refactor `.cpp` → `.su`	Replace `.cpp` with `.su`, update YAML	`sikernel`
Bug fix (kernel)	Modify existing kernel + add regression test	`sikernel` or `triton`
Bug fix (registration)	Fix YAML or dispatcher registration	`aten`
Infrastructure change	Headers (`.suh`), utilities, build	`sikernel`

Category	Examples	Typical Path
E1: Unary element-wise	neg, sigmoid, silu, rsqrt	PATH-A: TensorIterator + Tile→RVV→Scalar
E2: Binary element-wise	add, mul, sub, div	PATH-A: with `_with_scalars` variants
C: Comparison	eq, ne, gt, ge	PATH-A: with CompareVec
R1: Simple reduction	sum, prod, any, all	PATH-A-REDUCE: Reduce.suh
R2: Compound reduction	softmax, layernorm, rmsnorm	PATH-B: parallel_for + VectorizedM1
M: Matrix	mm, bmm, attention	PATH-C or Triton
S: Structural	cat, topk, sort	PATH-B or PATH-C
X: Custom SIPU	mm_t2t, flash_attention	PATH-C: sikernel library

Operator Dev

torch_sipu Operator Development Workflow

Entry: Detect Scenario

Scenario A — New / Ongoing Development

Operator Dev

torch_sipu Operator Development Workflow

Entry: Detect Scenario

Scenario A — New / Ongoing Development

Scenario B — Handling MR Review Feedback

Step 0: Determine Change Scope

0.1 Create JIRA Ticket (If Needed)

0.2 Classify the Change

0.3 Understand the Op

Step 1: Locate Files and Understand Architecture

1.0 Identify the Correct Dispatch Mechanism

1.1 Locate Source Files

1.2 Present the File List

Step 2: Enforce Minimal Modification Principle

Step 2.5: Optimization Strategy Selection (MANDATORY)

2.5.1 Classify the Operator

2.5.2 State the Strategy

Step 3: Implement the Operator

3.0 License Header (New Files Only)

3.1 Triton Backend Implementation

3.2 C++ Kernel Implementation

3.3 Forward/Backward & AutogradPrivateUse1

3.4 Refactoring from .cpp to .su

Step 4: Build

Step 5: Performance Review (MANDATORY for C++ Kernels)

Step 6: Write Tests (MANDATORY — Never Skip)

Hard Rules

Step 7: Output Deliverables

7.1 Change Summary

7.2 How to Verify

7.3 Risk Assessment

7.4 Commit, Lint, Push & Create MR

Step 8: Handle MR Review Feedback

Reference: Key File Paths

Pytorch Patterns

Regex Vs Llm Structured Text

Effect

Flags

WPF to WinUI 3 Migration Skill

At Dispatch V2

3.4 Refactoring from `.cpp` to `.su`