This skill guides you through the complete lifecycle of implementing a neural network operator in ESP-DL: from C++ module code, through quantization support in esp-ppq, to Docker-based validation that ensures inference results align between the quantization tool and the on-device runtime.

Workflow Continuity — Read This First

This skill describes a multi-phase pipeline: research → implement → test → optimize → document. The most critical transition is from code modification (Phases 2–5) to testing (Phase 6).

After completing ANY code change — whether it's a new module, a base layer fix, an esp-ppq tweak, or a test config update — immediately proceed to Phase 6 (Docker Build & Test) without stopping to ask the user. The user expects the full implement-then-test cycle to happen as one continuous flow. Pausing after code changes to ask "should I run tests now?" breaks the workflow and forces unnecessary back-and-forth.

The only reasons to pause before testing are:

You need information the user hasn't provided (e.g., target chip, Docker image location)
A build/compilation error requires the user's input to resolve
The user explicitly asked you to stop at a certain phase

Category	Examples	Module Pattern	Base Pattern
Elementwise binary	Add, Sub, Mul, Div, Mod, Pow	`dl_module_add.hpp`	`dl_base_add.hpp/cpp` (elemwiseArgsType)
Elementwise unary	Relu, Sigmoid, Exp, Neg, Sqrt	`dl_module_relu.hpp`	`dl_base_relu.hpp/cpp` (ArgsType)
Convolution-like	Conv, ConvTranspose, DepthwiseConv	`dl_module_conv.hpp`	`dl_base_conv2d.hpp/cpp`
Pooling	AveragePool, MaxPool, GlobalAveragePool	`dl_module_average_pool.hpp`	`dl_base_avg_pool2d.hpp/cpp`
Reduce	ReduceSum, ReduceMean, ReduceMax	`dl_module_reduce_sum.hpp`	`dl_base_reduce.hpp/cpp`
Shape manipulation	Reshape, Transpose, Flatten, Slice	`dl_module_reshape.hpp`	Typically no base layer needed
Sequence/RNN	GRU, LSTM	`dl_module_gru.hpp`	Complex multi-step base
Activation (LUT)	HardSwish, HardSigmoid, Tanh	`dl_module_lut.hpp`	LUT-based implementation

Aspect	int8 / int16 (quantized)	float32
Arithmetic	`tool::truncate<int32_t>(result)` — clamp to type range	Direct arithmetic, no truncation
Scale/Rescale	Uses `args->mul_shift`, `input_scale`, `output_rescale`	Ignores these fields (exponent=0, scale=1.0)
SIMD dispatch	ISA-specific implementations (TIE728, ESP32-P4)	C reference only — no SIMD needed
Template specialization	Generic template handles quantization math	Explicit `template<>` specialization for `float`

Op Set in `espdl_typedef.py`	Layout Pattern	When to Use
`CONV_LAYOUT_OP_SET`	`ResetConvLayoutPattern`	Conv, Pool, DepthToSpace — ops with spatial layout
`PASSIVE_LAYOUT_OP_SET`	`BypassPassiveLayoutPattern`	Activations (Relu, Sigmoid...) + Math (Exp, Log...) — pass through layout
`ADD_LIKE_OP_SET`	`BypassAddLikePattern`	Binary elementwise (Add, Sub, Mul, Div, Mod, Pow...) — handles shape broadcasting between two inputs
`AXIS_TRANSFORM_OP_SET`	`AxisTransformPattern`	Softmax, Split, Reduce ops — transforms axis attributes
`OTHER_OP_SET`	`RestoreOriginLayoutPattern`	Reshape, Transpose, Gather, GRU... — restores to original layout

Check	File	Action
In `quant_operation_types`?	`EspdlQuantizer.py`	Add if missing
In a layout op set?	`espdl_typedef.py`	Always verify — add to correct op set
Special quant config?	`EspdlQuantizer.py`	Add rules in `create_espdl_quant_config()` if needed
Custom OpSocket?	`IR/base/opdef.py`	Add if inputs have heterogeneous platform needs
Export patterns?	`export_patterns.py`	Add if LUT/fusion/weight-layout needed

Operator Category	Add to Op Set	Why
Elementwise binary (Add-like)	`ADD_LIKE_OP_SET`	BypassAddLikePattern handles input shape broadcasting
Elementwise unary (activation)	`ACTIVATION_OP_SET`	BypassPassiveLayoutPattern passes through layout
Elementwise unary (math)	`MATH_OP_SET`	Also covered by PASSIVE_LAYOUT_OP_SET
Convolution-like	`CONV_LAYOUT_OP_SET`	ResetConvLayoutPattern transforms spatial layout
Reduce / Softmax-like	`REDUCE_OP_SET` or `SOFTMAX_LIKE_OP_SET`	AxisTransformPattern adjusts axis attrs
Shape manipulation	`OTHER_OP_SET`	RestoreOriginLayoutPattern restores original

Type	Model suffix	Tolerance	Common failure causes
int8	`*_s8.espdl`	Strict (2e-5)	Quantization config mismatch, rounding, exponent calculation
int16	`*_s16.espdl`	±1 allowed	Similar to int8, but wider range means fewer edge cases
float32	`*_f32.espdl`	2e-5	Usually data layout (NCHW vs NHWC), or missing float specialization

ESP-DL Operator Development Skill

ESP-DL Operator Development Skill

Workflow Continuity — Read This First

Project Layout

Phase 1: Research & Classify the Operator

1.1 Read the ONNX Specification

1.2 Classify the Operator

1.3 Determine Scope

Phase 2: Implement esp-dl Module Layer

2.1 Module Class Structure

2.2 Deserialization

2.3 Register in Creator

Phase 3: Implement esp-dl Base Layer (C Reference)

3.1 Architecture

3.2 ISA Dispatch Pattern

3.3 Float32 Implementation Differences

Phase 4: Determine esp-ppq Modifications

4.1 Check #1: Quantization Registration

4.2 Check #2: Layout Pattern Op Set (ALWAYS required)

4.3 Does it need special quantization rules?

4.4 Does it need a custom OpSocket?

4.5 Does it need additional export pattern changes?

Summary: What to Check for Every New Operator

Quick Category → Op Set Mapping

Phase 5: Configure Test Cases

5.1 Add Test Model Builder

5.2 Add Test Configuration

Phase 6: Docker Build & Test

6.1 Docker Run Template

6.2 Step 1: Generate Test Cases

6.3 Step 2: Build Test Application

6.4 Step 3: Generate Pytest Script

6.5 Step 4: Flash & Run Tests on Hardware

6.6 Quick One-Liner (All Steps)

6.7 Interpreting Test Results

Phase 7: SIMD Optimization (Optional)

7.1 When to Add SIMD

7.2 SIMD Architecture Overview

7.3 SIMD Implementation Steps

7.4 Important SIMD Conventions

Phase 8: Alignment Verification

Phase 9: Update Operator Support State (REQUIRED)

Quick Reference: Complete Checklist

esp-dl files to create/modify:

esp-ppq files to verify/modify (Phase 4 checks — ALWAYS do both):

SIMD files (optional optimization):

Validation:

Documentation (REQUIRED — do not skip):

Pytorch Patterns

Regex Vs Llm Structured Text

Effect

Flags

WPF to WinUI 3 Migration Skill

At Dispatch V2